There is an impedance mismatch between the scatterlist API using unsigned
int and our memory/page accounting in unsigned long. That is we may try
to create a scatterlist for a large object that overflows returning a
small table into which we try to fit very many pages. As the object size
is under the control of userspace, we have to be prudent and catch the
conversion errors.
To catch the implicit truncation we check before calling scattterlist
creation Apis. we use overflows_type check and report E2BIG if the
overflows may raise. When caller does not return errno, use WARN_ON to
report a problem.
This is already used in our create ioctls to indicate if the uABI request
is simply too large for the backing store. Failing that type check,
we have a second check at sg_alloc_table time to make sure the values
we are passing into the scatterlist API are not truncated.
v2: Move added i915_utils's macro into drm_util header (Jani N)
v5: Fix macros to be enclosed in parentheses for complex values
Fix too long line warning
v8: Replace safe_conversion() with check_assign() (Kees)
v14: Remove shadowing macros of scatterlist creation api and fix to
explicitly overflow check where the scatterlist creation APIs are
called. (Jani)
v15: Add missing returning of error code when the WARN_ON() has been
detected. (Jani)
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Co-developed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221228192252.917299-3-gwan-gyeong.mun@intel.com
We need to check that we avoid integer overflows when looking up a page,
and so fix all the instances where we have mistakenly used a plain
integer instead of a more suitable long. Be pedantic and add integer
typechecking to the lookup so that we can be sure that we are safe.
And it also uses pgoff_t as our page lookups must remain compatible with
the page cache, pgoff_t is currently exactly unsigned long.
v2: Move added i915_utils's macro into drm_util header (Jani N)
v3: Make not use the same macro name on a function. (Mauro)
For kernel-doc, macros and functions are handled in the same namespace,
the same macro name on a function prevents ever adding documentation
for it.
v4: Add kernel-doc markups to the kAPI functions and macros (Mauoro)
v5: Fix an alignment to match open parenthesis
v6: Rebase
v10: Use assert_typable instead of exactly_pgoff_t() macro. (Kees)
v11: Change the use of assert_typable to assert_same_typable (G.G)
v12: Change to use static_assert(__castable_to_type(n ,T)) style since
the assert_same_typable() macro has been dropped. (G.G)
v13: Change the use of __castable_to_type() to castable_to_type()
Remove an unnecessary header include line. (G.G)
v16: Fix "ERROR:SPACING" Checkpatch report (G.G)
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Co-developed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> (v2)
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> (v3)
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> (v5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221228192252.917299-2-gwan-gyeong.mun@intel.com
Sync after v6.2-rc1 landed in drm-next.
We need to get some dependencies in place before we can merge
the fixes series from Gwan-gyeong and Chris.
References: https://lore.kernel.org/all/Y6x5JCDnh2rvh4lA@intel.com/
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The catch-all evict can fail due to object lock contention, since it
only goes as far as trylocking the object, due to us already holding the
vm->mutex. Doing a full object lock here can deadlock, since the
vm->mutex is always our inner lock. Add another execbuf pass which drops
the vm->mutex and then tries to grab the object will the full lock,
before then retrying the eviction. This should be good enough for now to
fix the immediate regression with userspace seeing -ENOSPC from execbuf
due to contended object locks during GTT eviction.
v2 (Mani)
- Also revamp the docs for the different passes.
Testcase: igt@gem_ppgtt@shrink-vs-evict-*
Fixes: 7e00897be8 ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570
References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Mani Milani <mani@chromium.org>
Cc: <stable@vger.kernel.org> # v5.18+
Reviewed-by: Mani Milani <mani@chromium.org>
Tested-by: Mani Milani <mani@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com
(cherry picked from commit 801fa7a81f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The catch-all evict can fail due to object lock contention, since it
only goes as far as trylocking the object, due to us already holding the
vm->mutex. Doing a full object lock here can deadlock, since the
vm->mutex is always our inner lock. Add another execbuf pass which drops
the vm->mutex and then tries to grab the object will the full lock,
before then retrying the eviction. This should be good enough for now to
fix the immediate regression with userspace seeing -ENOSPC from execbuf
due to contended object locks during GTT eviction.
v2 (Mani)
- Also revamp the docs for the different passes.
Testcase: igt@gem_ppgtt@shrink-vs-evict-*
Fixes: 7e00897be8 ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570
References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Mani Milani <mani@chromium.org>
Cc: <stable@vger.kernel.org> # v5.18+
Reviewed-by: Mani Milani <mani@chromium.org>
Tested-by: Mani Milani <mani@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221216113456.414183-1-matthew.auld@intel.com
It seems we can have one or more framebuffers that are still pinned when
suspending lmem, in such a case we end up creating a shmem backup
object, instead of evicting the object directly, but this will skip
copying the CCS aux state, since we don't allocate the extra storage for
the CCS pages as part of the ttm_tt construction. Since we can already
deal with pinned objects just fine, it doesn't seem too nasty to just
extend to support dealing with the CCS aux state, if the object is a
pinned framebuffer. This fixes display corruption (like in gnome-shell)
seen on DG2 when returning from suspend.
Fixes: da0595ae91 ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: <stable@vger.kernel.org> # v5.19+
Tested-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212171958.82593-2-matthew.auld@intel.com
(cherry picked from commit 95df9cc24b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It seems we can have one or more framebuffers that are still pinned when
suspending lmem, in such a case we end up creating a shmem backup
object, instead of evicting the object directly, but this will skip
copying the CCS aux state, since we don't allocate the extra storage for
the CCS pages as part of the ttm_tt construction. Since we can already
deal with pinned objects just fine, it doesn't seem too nasty to just
extend to support dealing with the CCS aux state, if the object is a
pinned framebuffer. This fixes display corruption (like in gnome-shell)
seen on DG2 when returning from suspend.
Fixes: da0595ae91 ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: <stable@vger.kernel.org> # v5.19+
Tested-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212171958.82593-2-matthew.auld@intel.com
Initial accel subsystem support. There are no drivers yet, just the framework.
New driver:
- ofdrm - replacement for offb
fbdev:
- add support for nomodeset
fourcc:
- add Vivante tiled modifier
core:
- atomic-helpers: CRTC primary plane test fixes, fb access hooks
- connector: TV API consistency, cmdline parser improvements
- send connector hotplug on cleanup
- sort makefile objects
tests:
- sort kunit tests
- improve DP-MST tests
- add kunit helpers to create a device
sched:
- module param for scheduling policy
- refcounting fix
buddy:
- add back random seed log
ttm:
- convert ttm_resource to size_t
- optimize pool allocations
edid:
- HFVSDB parsing support fixes
- logging/debug improvements
- DSC quirks
dma-buf:
- Add unlocked vmap and attachment mapping
- move drivers to common locking convention
- locking improvements
firmware:
- new API for rPI firmware and vc4
xilinx:
- zynqmp: displayport bridge support
- dpsub fix
bridge:
- adv7533: Remove dynamic lane switching
- it6505: Runtime PM support, sync improvements
- ps8640: Handle AUX defer messages
- tc358775: Drop soft-reset over I2C
panel:
- panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
- Jadard JD9365DA-H3
- NewVision NV3051D
amdgpu:
- DCN support on ARM
- DCN 2.1 secure display
- Sienna Cichlid mode2 reset fixes
- new GC 11.x firmware versions
- drop AMD specific DSC workarounds in favour of drm code
- clang warning fixes
- scheduler rework
- SR-IOV fixes
- GPUVM locking fixes
- fix memory leak in CS IOCTL error path
- flexible array updates
- enable new GC/PSP/SMU/NBIO IP
- GFX preemption support for gfx9
amdkfd:
- cache size fixes
- userptr fixes
- enable cooperative launch on gfx 10.3
- enable GC 11.0.4 KFD support
radeon:
- replace kmap with kmap_local_page
- ACPI ref count fix
- HDA audio notifier support
i915:
- DG2 enabled by default
- MTL enablement work
- hotplug refactoring
- VBT improvements
- Display and watermark refactoring
- ADL-P workaround
- temp disable runtime_pm for discrete-
- fix for A380 as a secondary GPU
- Wa_18017747507 for DG2
- CS timestamp support fixes for gen5 and earlier
- never purge busy TTM objects
- use i915_sg_dma_sizes for all backends
- demote GuC kernel contexts to normal priority
- gvt: refactor for new MDEV interface
- enable DC power states on eDP ports
- fix gen 2/3 workarounds
nouveau:
- fix page fault handling
- Ampere acceleration support
- driver stability improvements
- nva3 backlight support
msm:
- MSM_INFO_GET_FLAGS support
- DPU: XR30 and P010 image formats
- Qualcomm SM6115 support
- DSI PHY support for QCM2290
- HDMI: refactored dev init path
- remove exclusive-fence hack
- fix speed-bin detection
- enable clamp to idle on 7c3
- improved hangcheck detection
vmwgfx:
- fb and cursor refactoring
- convert to generic hashtable
- cursor improvements
etnaviv:
- hw workarounds
- softpin MMU fixes
ast:
- atomic gamma LUT support
- convert to SHMEM
lcdif:
- support YUV planes
- Increase DMA burst size
- FIFO threshold tuning
meson:
- fix return type of cvbs mode_valid
mgag200:
- fix PLL setup on some revisions
sun4i:
- A100 and D1 support
udl:
- modesetting improvements
- hot unplug support
vc4:
- support PAL-M
- fix regression preventing 4K @ 60Hz
- fix NULL ptr deref
v3d:
- switch to drm managed resources
renesas:
- RZ/G2L DSI support
- DU Kconfig cleanup
mediatek:
- fixup dpi and hdmi
- MT8188 dpi support
- MT8195 AFBC support
tegra:
- NVDEC hardware on Tegra234 SoC
hdlcd:
- switch to drm managed resources
ingenic:
- fix registration error path
hisilicon:
- convert to drm_mode_init
maildp:
- use managed resources
mtk:
- use drm_mode_init
rockchip:
- use drm_mode_copy
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmOXxI0ACgkQDHTzWXnE
hr4NyBAAojK3N+XJf2b8LWuRKsShCr5FXlteEDxiYGLeB8/g4x3LztSfHgUg0iuS
nP1m7Cx4snXcVNS6iyOsoZVq1EGUAWvv+mPWJe1UywjpyqtciTVQ11GEHRvI/w+V
GRvkhmt/TsoZA0QIlS2MaOmhn9j17QOcuYTUjYdyRL4tsrHWrTASH5W1Jt2xmDyw
5FUJvfukPWm100DVWbh6hWbCKL22bDDF/nj1H+G6hYSyTjVbk7wZ0vy2m6TnIHNF
iyBHBIzFPg3BveiSlKe6aVX7Gq2d8bfqjHsgN5f1qcS4ejWEkHLVxJtBdOB+fOSC
7o8Ms7WHi1AmnkOVCGRIjJ0cJrLZu2HDlyhViguAO1XQ3Jvuo/4WW3mplv+YPOMc
c+P/zuPG42d4lrISuB8wspTdOgxmqpZDkg3HE6n1+jiVR0u4hTTYktoPnLsHX6KG
l/l2B6aVAxE4b6P0q3ofYoAnk5rNsb1YUS+a8kC6f97TQ3gmOsN75iZXD/ASHg2r
ozhh2wcFxIPkJhE7vqLWPIBCWQs93sGyQXoI7Q0TJaIAZTXV0VmO1BIofetpVImE
7FhDC4wvBedXywN8NYUEFbCTOnIcDMteM/i6S1ns78s5UjDa5osPuS5I02VT1lbN
tvnJoHNkhCt13lJz63b0HNFm3cPKoRosCQhJeshyUYaFKs+evL0=
=pABG
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"The biggest highlight is that the accel subsystem framework is merged.
Hopefully for 6.3 we will be able to line up a driver to use it.
In drivers land, i915 enables DG2 support by default now, and nouveau
has a big stability refactoring and initial ampere support, AMD
includes new hw IP support and should build on ARM again. There is
also an ofdrm driver to take over offb on platforms it's used.
Stuff outside my tree, the dma-buf patches hit a few places, the vc4
firmware changes also do, and i915 has some interactions with MEI for
discrete GPUs. I think all of those should have been acked/reviewed by
relevant parties.
New driver:
- ofdrm - replacement for offb
fbdev:
- add support for nomodeset
fourcc:
- add Vivante tiled modifier
core:
- atomic-helpers: CRTC primary plane test fixes, fb access hooks
- connector: TV API consistency, cmdline parser improvements
- send connector hotplug on cleanup
- sort makefile objects
tests:
- sort kunit tests
- improve DP-MST tests
- add kunit helpers to create a device
sched:
- module param for scheduling policy
- refcounting fix
buddy:
- add back random seed log
ttm:
- convert ttm_resource to size_t
- optimize pool allocations
edid:
- HFVSDB parsing support fixes
- logging/debug improvements
- DSC quirks
dma-buf:
- Add unlocked vmap and attachment mapping
- move drivers to common locking convention
- locking improvements
firmware:
- new API for rPI firmware and vc4
xilinx:
- zynqmp: displayport bridge support
- dpsub fix
bridge:
- adv7533: Remove dynamic lane switching
- it6505: Runtime PM support, sync improvements
- ps8640: Handle AUX defer messages
- tc358775: Drop soft-reset over I2C
panel:
- panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
- Jadard JD9365DA-H3
- NewVision NV3051D
amdgpu:
- DCN support on ARM
- DCN 2.1 secure display
- Sienna Cichlid mode2 reset fixes
- new GC 11.x firmware versions
- drop AMD specific DSC workarounds in favour of drm code
- clang warning fixes
- scheduler rework
- SR-IOV fixes
- GPUVM locking fixes
- fix memory leak in CS IOCTL error path
- flexible array updates
- enable new GC/PSP/SMU/NBIO IP
- GFX preemption support for gfx9
amdkfd:
- cache size fixes
- userptr fixes
- enable cooperative launch on gfx 10.3
- enable GC 11.0.4 KFD support
radeon:
- replace kmap with kmap_local_page
- ACPI ref count fix
- HDA audio notifier support
i915:
- DG2 enabled by default
- MTL enablement work
- hotplug refactoring
- VBT improvements
- Display and watermark refactoring
- ADL-P workaround
- temp disable runtime_pm for discrete-
- fix for A380 as a secondary GPU
- Wa_18017747507 for DG2
- CS timestamp support fixes for gen5 and earlier
- never purge busy TTM objects
- use i915_sg_dma_sizes for all backends
- demote GuC kernel contexts to normal priority
- gvt: refactor for new MDEV interface
- enable DC power states on eDP ports
- fix gen 2/3 workarounds
nouveau:
- fix page fault handling
- Ampere acceleration support
- driver stability improvements
- nva3 backlight support
msm:
- MSM_INFO_GET_FLAGS support
- DPU: XR30 and P010 image formats
- Qualcomm SM6115 support
- DSI PHY support for QCM2290
- HDMI: refactored dev init path
- remove exclusive-fence hack
- fix speed-bin detection
- enable clamp to idle on 7c3
- improved hangcheck detection
vmwgfx:
- fb and cursor refactoring
- convert to generic hashtable
- cursor improvements
etnaviv:
- hw workarounds
- softpin MMU fixes
ast:
- atomic gamma LUT support
- convert to SHMEM
lcdif:
- support YUV planes
- Increase DMA burst size
- FIFO threshold tuning
meson:
- fix return type of cvbs mode_valid
mgag200:
- fix PLL setup on some revisions
sun4i:
- A100 and D1 support
udl:
- modesetting improvements
- hot unplug support
vc4:
- support PAL-M
- fix regression preventing 4K @ 60Hz
- fix NULL ptr deref
v3d:
- switch to drm managed resources
renesas:
- RZ/G2L DSI support
- DU Kconfig cleanup
mediatek:
- fixup dpi and hdmi
- MT8188 dpi support
- MT8195 AFBC support
tegra:
- NVDEC hardware on Tegra234 SoC
hdlcd:
- switch to drm managed resources
ingenic:
- fix registration error path
hisilicon:
- convert to drm_mode_init
maildp:
- use managed resources
mtk:
- use drm_mode_init
rockchip:
- use drm_mode_copy"
* tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm: (1397 commits)
drm/amdgpu: fix mmhub register base coding error
drm/amdgpu: add tmz support for GC IP v11.0.4
drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4
drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4
drm/amdgpu: enable GFX IP v11.0.4 CG support
drm/amdgpu: Make amdgpu_ring_mux functions as static
drm/amdgpu: generally allow over-commit during BO allocation
drm/amd/display: fix array index out of bound error in DCN32 DML
drm/amd/display: 3.2.215
drm/amd/display: set optimized required for comp buf changes
drm/amd/display: Add debug option to skip PSR CRTC disable
drm/amd/display: correct DML calc error of UrgentLatency
drm/amd/display: correct static_screen_event_mask
drm/amd/display: Ensure commit_streams returns the DC return code
drm/amd/display: read invalid ddc pin status cause engine busy
drm/amd/display: Bypass DET swath fill check for max clocks
drm/amd/display: Disable uclk pstate for subvp pipes
drm/amd/display: Fix DCN2.1 default DSC clocks
drm/amd/display: Enable dp_hdmi21_pcon support
drm/amd/display: prevent seamless boot on displays that don't have the preferred dig
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
/ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
=QRhK
-----END PGP SIGNATURE-----
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
Starting with MTL, there will be two GT-tiles, a render and media
tile. PXP as a service for supporting workloads with protected
contexts and protected buffers can be subscribed by process
workloads on any tile. However, depending on the platform,
only one of the tiles is used for control events pertaining to PXP
operation (such as creating the arbitration session and session
tear-down).
PXP as a global feature is accessible via batch buffer instructions
on any engine/tile and the coherency across tiles is handled implicitly
by the HW. In fact, for the foreseeable future, we are expecting this
single-control-tile for the PXP subsystem.
In MTL, it's the standalone media tile (not the root tile) because
it contains the VDBOX and KCR engine (among the assets PXP relies on
for those events).
Looking at the current code design, each tile is represented by the
intel_gt structure while the intel_pxp structure currently hangs off the
intel_gt structure.
Keeping the intel_pxp structure within the intel_gt structure makes some
internal functionalities more straight forward but adds code complexity to
code readability and maintainibility to many external-to-pxp subsystems
which may need to pick the correct intel_gt structure. An example of this
would be the intel_pxp_is_active or intel_pxp_is_enabled functionality
which should be viewed as a global level inquiry, not a per-gt inquiry.
That said, this series promotes the intel_pxp structure into the
drm_i915_private structure making it a top-level subsystem and the PXP
subsystem will select the control gt internally and keep a pointer to
it for internal reference.
This promotion comes with two noteworthy changes:
1. Exported pxp functions that are called by external subsystems
(such as intel_pxp_enabled/active) will have to check implicitly
if i915->pxp is valid as that structure will not be allocated
for HW that doesn't support PXP.
2. Since GT is now considered a soft-dependency of PXP we are
ensuring that GT init happens before PXP init and vice versa
for fini. This causes a minor ordering change whereby we previously
called intel_pxp_suspend after intel_uc_suspend but now is before
i915_gem_suspend_late but the change is required for correct
dependency flows. Additionally, this re-order change doesn't
have any impact because at that point in either case, the top level
entry to i915 won't observe any PXP events (since the GPU was
quiesced during suspend_prepare). Also, any PXP event doesn't
really matter when we disable the PXP HW (global GT irqs are
already off anyway, so even if there was a bug that generated
spurious events we wouldn't see it and we would just clean it
up on resume which is okay since the default fallback action
for PXP would be to keep the sessions off at this suspend stage).
Changes from prior revs:
v11: - Reformat a comment (Tvrtko).
v10: - Change the code flow for intel_pxp_init to make it more
cleaner and readible with better comments explaining the
difference between full-PXP-feature vs the partial-teelink
inits depending on the platform. Additionally, only do
the pxp allocation when we are certain the subsystem is
needed. (Tvrtko).
v9: - Cosmetic cleanups in supported/enabled/active. (Daniele).
- Add comments for intel_pxp_init and pxp_get_ctrl_gt that
explain the functional flow for when PXP is not supported
but the backend-assets are needed for HuC authentication
(Daniele and Tvrtko).
- Fix two remaining functions that are accessible outside
PXP that need to be checking pxp ptrs before using them:
intel_pxp_irq_handler and intel_pxp_huc_load_and_auth
(Tvrtko and Daniele).
- User helper macro in pxp-debugfs (Tvrtko).
v8: - Remove pxp_to_gt macro (Daniele).
- Fix a bug in pxp_get_ctrl_gt for the case of MTL and we don't
support GSC-FW on it. (Daniele).
- Leave i915->pxp as NULL if we dont support PXP and in line
with that, do additional validity check on i915->pxp for
intel_pxp_is_supported/enabled/active (Daniele).
- Remove unncessary include header from intel_gt_debugfs.c
and check drm_minor i915->drm.primary (Daniele).
- Other cosmetics / minor issues / more comments on suspend
flow order change (Daniele).
v7: - Drop i915_dev_to_pxp and in intel_pxp_init use 'i915->pxp'
through out instead of local variable newpxp. (Rodrigo)
- In the case intel_pxp_fini is called during driver unload but
after i915 loading failed without pxp being allocated, check
i915->pxp before referencing it. (Alan)
v6: - Remove HAS_PXP macro and replace it with intel_pxp_is_supported
because : [1] introduction of 'ctrl_gt' means we correct this
for MTL's upcoming series now. [2] Also, this has little impact
globally as its only used by PXP-internal callers at the moment.
- Change intel_pxp_init/fini to take in i915 as its input to avoid
ptr-to-ptr in init/fini calls.(Jani).
- Remove the backpointer from pxp->i915 since we can use
pxp->ctrl_gt->i915 if we need it. (Rodrigo).
v5: - Switch from series to single patch (Rodrigo).
- change function name from pxp_get_kcr_owner_gt to
pxp_get_ctrl_gt.
- Fix CI BAT failure by removing redundant call to intel_pxp_fini
from driver-remove.
- NOTE: remaining open still persists on using ptr-to-ptr
and back-ptr.
v4: - Instead of maintaining intel_pxp as an intel_gt structure member
and creating a number of convoluted helpers that takes in i915 as
input and redirects to the correct intel_gt or takes any intel_gt
and internally replaces with the correct intel_gt, promote it to
be a top-level i915 structure.
v3: - Rename gt level helper functions to "intel_pxp_is_enabled/
supported/ active_on_gt" (Daniele)
- Upgrade _gt_supports_pxp to replace what was intel_gtpxp_is
supported as the new intel_pxp_is_supported_on_gt to check for
PXP feature support vs the tee support for huc authentication.
Fix pxp-debugfs-registration to use only the former to decide
support. (Daniele)
- Couple minor optimizations.
v2: - Avoid introduction of new device info or gt variables and use
existing checks / macros to differentiate the correct GT->PXP
control ownership (Daniele Ceraolo Spurio)
- Don't reuse the updated global-checkers for per-GT callers (such
as other files within PXP) to avoid unnecessary GT-reparsing,
expose a replacement helper like the prior ones. (Daniele).
v1: - Add one more patch to the series for the intel_pxp suspend/resume
for similar refactoring
References: https://patchwork.freedesktop.org/patch/msgid/20221202011407.4068371-1-alan.previn.teres.alexis@intel.com
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221208180542.998148-1-alan.previn.teres.alexis@intel.com
Merge and cleanup the two headers into a single description of the
object API. Also move all the documentation to the implementation and
drop unnecessary includes from the header.
No functional change.
v2: minimal checkpatch.pl cleanup
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-4-christian.koenig@amd.com
VT-d may cause overfetch of the scanout PTE, both before and after the
vma (depending on the scanout orientation). bspec recommends that we
provide a tile-row in either directions, and suggests using 168 PTE,
warning that the accesses will wrap around the ends of the GGTT.
Currently, we fill the entire GGTT with scratch pages when using VT-d to
always ensure there are valid entries around every vma, including
scanout. However, writing every PTE is slow as on recent devices we
perform 8MiB of uncached writes, incurring an extra 100ms during resume.
If instead we focus on only putting guard pages around scanout, we can
avoid touching the whole GGTT. To avoid having to introduce extra nodes
around each scanout vma, we adjust the scanout drm_mm_node to be smaller
than the allocated space, and fixup the extra PTE during dma binding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-5-andi.shyti@linux.intel.com
We already wrap i915_vma.node.start for use with the GGTT, as there we
can perform additional sanity checks that the node belongs to the GGTT
and fits within the 32b registers. In the next couple of patches, we
will introduce guard pages around the objects _inside_ the drm_mm_node
allocation. That is we will offset the vma->pages so that the first page
is at drm_mm_node.start + vma->guard (not 0 as is currently the case).
All users must then not use i915_vma.node.start directly, but compute
the guard offset, thus all users are converted to use a
i915_vma_offset() wrapper.
The notable exceptions are the selftests that are testing exact
behaviour of i915_vma_pin/i915_vma_insert.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-3-andi.shyti@linux.intel.com
The coming commit "drm/i915: Introduce guard pages to i915_vma"
from Chris, was originally changing display_alignment to u32
from u64. The reason is that the display GGTT is and will be
limited o 4GB.
Put it in a separate patch and use "max(...)" instead of
"max_t(64, ...)" when asigning the value. We can safely use max
as we know beforehand that the comparison is between two u32
variables.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-2-andi.shyti@linux.intel.com
- gvt-next stuff mostly with refactor for the new MDEV interface.
i915 Changes:
- PSR fixes and improvements (Jouni)
- DP DSC fixes (Vinod, Jouni)
- More general display cleanups (Jani)
- More display collor management cleanup targetting degamma (Ville)
- remove circ_buf.h includes (Jiri)
- wait power off delay at driver remove to optimize probe (Jani)
- More audio cleanup targeting the ELD precompute readout (Ville)
- Enable DC power states on all eDP ports (Imre)
- RPL-P stepping info (Matt Atwood)
- MTL enabling patches (RK)
- Removal of DG2 force_probe (Matt)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmN3+28ACgkQ+mJfZA7r
E8rGZQf9HiTNW1yM1/YGq7ezD6/XN2sbCaH/iVNd0+FHxz4HHaTJaEVQ1xEBTdIb
poMQiWl64ig3t2LuDrRGwlsVmFYV3vK5byt758sMpLb/M3cWbbB2swsW4YTbORM6
qcatv9eYyuoiylwj6fVLFPMXpwNTyCdbngbLZt+tTcl1oxaZYp0PVXBkzlKFP02U
KL4p+Dv4OHSjP38beIzUaHz3IJJAik/ttsxa1JhhC8F9UUJs7kuo6GXMY1OzRNpc
jKj3+F0OtDRA2xaw7YpbLI8yt+pWzerqFsnZDAvvx8Js7SQLkyNtjkuHusp8OP77
x99MlyOfbSfBnkbdm4Rhm7IIPTiUnw==
=SHn9
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2022-11-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
GVT Changes:
- gvt-next stuff mostly with refactor for the new MDEV interface.
i915 Changes:
- PSR fixes and improvements (Jouni)
- DP DSC fixes (Vinod, Jouni)
- More general display cleanups (Jani)
- More display collor management cleanup targetting degamma (Ville)
- remove circ_buf.h includes (Jiri)
- wait power off delay at driver remove to optimize probe (Jani)
- More audio cleanup targeting the ELD precompute readout (Ville)
- Enable DC power states on all eDP ports (Imre)
- RPL-P stepping info (Matt Atwood)
- MTL enabling patches (RK)
- Removal of DG2 force_probe (Matt)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y3f71obyEkImXoUF@intel.com
In i915_gem_madvise_ioctl() we immediately purge the object is not
currently used, like when the mm.pages are NULL. With shmem the pages
might still be hanging around or are perhaps swapped out. Similarly with
ttm we might still have the pages hanging around on the ttm resource,
like with lmem or shmem, but here we need to be extra careful since
async unbinds are possible as well as in-progress kernel moves. In
i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
for us, however if it's busy the memory is only moved to a ghost object,
which then leads to broken behaviour when for example clearing the
i915_tt->filp, since the actual ttm_tt is still alive and populated,
even though it's been moved to the ghost object. When we later destroy
the ghost object we hit the following, since the filp is now NULL:
[ +0.006982] #PF: supervisor read access in kernel mode
[ +0.005149] #PF: error_code(0x0000) - not-present page
[ +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
[ +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
[ +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
[ +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
30 e9 42 b2 ff ff 4c 89
[ +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
[ +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
[ +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
[ +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
[ +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
[ +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
[ +0.007154] FS: 0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
[ +0.008108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
[ +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ +0.007154] Call Trace:
[ +0.002459] <TASK>
[ +0.002126] ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
[ +0.005068] ttm_bo_tt_destroy+0x1c/0x50 [ttm]
[ +0.004464] ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
[ +0.005244] ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
[ +0.004721] ttm_bo_delayed_delete+0x235/0x250 [ttm]
[ +0.004981] ttm_device_delayed_workqueue+0x13/0x40 [ttm]
[ +0.005422] process_one_work+0x248/0x560
[ +0.004028] worker_thread+0x4b/0x390
[ +0.003682] ? process_one_work+0x560/0x560
[ +0.004199] kthread+0xeb/0x120
[ +0.003163] ? kthread_complete_and_exit+0x20/0x20
[ +0.004815] ret_from_fork+0x1f/0x30
v2:
- Just use ttm_bo_wait() directly (Niranjana)
- Add testcase reference
Testcase: igt@gem_madvise@dontneed-evict-race
Fixes: 213d509277 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Nirmoy Das <Nirmoy.Das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221115104620.120432-1-matthew.auld@intel.com
(cherry picked from commit 5524b5e52e)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This is a simple mechanical transformation done by:
@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
In i915_gem_madvise_ioctl() we immediately purge the object is not
currently used, like when the mm.pages are NULL. With shmem the pages
might still be hanging around or are perhaps swapped out. Similarly with
ttm we might still have the pages hanging around on the ttm resource,
like with lmem or shmem, but here we need to be extra careful since
async unbinds are possible as well as in-progress kernel moves. In
i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
for us, however if it's busy the memory is only moved to a ghost object,
which then leads to broken behaviour when for example clearing the
i915_tt->filp, since the actual ttm_tt is still alive and populated,
even though it's been moved to the ghost object. When we later destroy
the ghost object we hit the following, since the filp is now NULL:
[ +0.006982] #PF: supervisor read access in kernel mode
[ +0.005149] #PF: error_code(0x0000) - not-present page
[ +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
[ +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
[ +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
[ +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
30 e9 42 b2 ff ff 4c 89
[ +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
[ +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
[ +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
[ +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
[ +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
[ +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
[ +0.007154] FS: 0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
[ +0.008108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
[ +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ +0.007154] Call Trace:
[ +0.002459] <TASK>
[ +0.002126] ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
[ +0.005068] ttm_bo_tt_destroy+0x1c/0x50 [ttm]
[ +0.004464] ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
[ +0.005244] ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
[ +0.004721] ttm_bo_delayed_delete+0x235/0x250 [ttm]
[ +0.004981] ttm_device_delayed_workqueue+0x13/0x40 [ttm]
[ +0.005422] process_one_work+0x248/0x560
[ +0.004028] worker_thread+0x4b/0x390
[ +0.003682] ? process_one_work+0x560/0x560
[ +0.004199] kthread+0xeb/0x120
[ +0.003163] ? kthread_complete_and_exit+0x20/0x20
[ +0.004815] ret_from_fork+0x1f/0x30
v2:
- Just use ttm_bo_wait() directly (Niranjana)
- Add testcase reference
Testcase: igt@gem_madvise@dontneed-evict-race
Fixes: 213d509277 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Nirmoy Das <Nirmoy.Das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221115104620.120432-1-matthew.auld@intel.com
vm_fault_ttm() should not expect ttm ghost obj so remove that check.
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221024144558.27747-1-nirmoy.das@intel.com
All calls to i915_vma_move_to_active are surrounded by vma lock
and/or there are multiple local helpers for it in particular tests.
Let's replace it by common helper.
The patch should not introduce functional changes.
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019215906.295296-3-andrzej.hajda@intel.com
Since almost all calls to i915_vma_move_to_active are prepended with
i915_request_await_object, let's call the latter from
_i915_vma_move_to_active by default and add flag allowing bypassing it.
Adjust all callers accordingly.
The patch should not introduce functional changes.
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019215906.295296-2-andrzej.hajda@intel.com
In the fault handler, make sure we check if the BO maps lmem after
we schedule the migration, since the current resource might change from
lmem to smem, if the pages are in the non-cpu visible portion of lmem.
This then leads to adding the object to the lmem_userfault_list even
though the current resource is no longer lmem. If we then destroy the
object, the list might still contain a link to the now free object, since
we only remove it if the object is still in lmem.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7469
Fixes: ad74457a6b ("drm/i915/dgfx: Release mmap on rpm suspend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221107165414.56970-1-matthew.auld@intel.com
(cherry picked from commit 625b74460e)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When userspace mmaps dma-buf's fd, the dma-buf reservation lock must be
held. Add locking sanity checks to the dma-buf mmaping callbacks of DRM
drivers to ensure that the locking assumptions won't regress in future.
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221110201349.351294-3-dmitry.osipenko@collabora.com
Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
-> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h,
makes sense to drop it, but that requires adding quite a few new
includes all over the place.
Prefer including i915_reg.h where needed instead of adding another
implicit include, because eventually we'll want to split up i915_reg.h
and only include the specific registers at each place.
Also some places actually needed i915_irq.h too.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
Convert some usages of legacy DRM logging macros into versions which tell
us on which device have the events occurred.
v2:
* Don't have struct drm_device as local. (Jani, Ville)
v3:
* Store gt, not i915, in workaround list. (John)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109104633.2579245-1-tvrtko.ursulin@linux.intel.com
We rely on page_sizes.sg in setup_scratch_page() reporting the correct
value if the underlying sgl is not contiguous, however in
get_pages_internal() we are only looking at the layout of the created
pages when calculating the sg_page_sizes, and not the final sgl, which
could in theory be completely different. In such a situation we might
incorrectly think we have a 64K scratch page, when it is actually only
4K or similar split over multiple non-contiguous entries, which could
lead to broken behaviour when touching the scratch space within the
padding of a 64K GTT page-table. For most of the other backends we
already just call i915_sg_dma_sizes() on the final mapping, so rather
just move that into __i915_gem_object_set_pages() to avoid such issues
coming back to bite us later.
v2: Update missing conversion in gvt
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108103238.165447-1-matthew.auld@intel.com
Rather than getting some hard to debug uaf, add some warns to hopefully
catch issues with userfault_count being non-zero when destroying the
object. Also if we somehow add an object to lmem_userfault_list that
somehow doesn't map lmem.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7469
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221107165414.56970-2-matthew.auld@intel.com
In the fault handler, make sure we check if the BO maps lmem after
we schedule the migration, since the current resource might change from
lmem to smem, if the pages are in the non-cpu visible portion of lmem.
This then leads to adding the object to the lmem_userfault_list even
though the current resource is no longer lmem. If we then destroy the
object, the list might still contain a link to the now free object, since
we only remove it if the object is still in lmem.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7469
Fixes: ad74457a6b ("drm/i915/dgfx: Release mmap on rpm suspend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221107165414.56970-1-matthew.auld@intel.com
The conversion looks harmless, however the addr value is updated inside
the loop with the previous vm_end, which then incorrectly leads to
for_each_vma_range() iterating over stuff outside the range we care
about. Fix this by storing the end value separately. Also fix the case
where the range doesn't intersect with any vma, or if the vma itself
doesn't extend the entire range, which must mean we have hole at the
end. Both should result in an error, as per the previous behaviour.
v2: Fix the cases where the range is empty, or if there's a hole at
the end of the range
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7247
Testcase: igt@gem_userptr_blits@probe
Fixes: f683b9d613 ("i915: use the VMA iterator")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028130635.465839-1-matthew.auld@intel.com
(cherry picked from commit 6f7de35b50)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Currently on DG1, which does not have LLC, we hit the below
warning while rebinding an userptr invalidated object.
WARNING: CPU: 4 PID: 13008 at drivers/gpu/drm/i915/gem/i915_gem_pages.c:34 __i915_gem_object_set_pages+0x296/0x2d0 [i915]
...
RIP: 0010:__i915_gem_object_set_pages+0x296/0x2d0 [i915]
...
Call Trace:
<TASK>
i915_gem_userptr_get_pages+0x175/0x1a0 [i915]
____i915_gem_object_get_pages+0x32/0xb0 [i915]
i915_gem_object_userptr_submit_init+0x286/0x470 [i915]
eb_lookup_vmas+0x2ff/0xcf0 [i915]
? __intel_wakeref_get_first+0x55/0xb0 [i915]
i915_gem_do_execbuffer+0x785/0x21d0 [i915]
i915_gem_execbuffer2_ioctl+0xe7/0x3d0 [i915]
We shouldn't be setting the obj->cache_dirty for DGFX,
fix it.
Fixes: d70af57944 ("drm/i915/shmem: ensure flush during swap-in on non-LLC")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102051416.27327-1-niranjana.vishwanathapura@intel.com
(cherry picked from commit 0aeec60c76)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We need to iterate over the original entries here for the sg_table,
pulling out the struct page for each one, to be remapped. However
currently this incorrectly iterates over the final dma mapped entries,
which is likely just one gigantic sg entry if the iommu is enabled,
leading to us only mapping the first struct page (and any physically
contiguous pages following it), even if there is potentially lots more
data to follow.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7306
Fixes: 1286ff7397 ("i915: add dmabuf/prime buffer sharing support.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: <stable@vger.kernel.org> # v3.5+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-1-matthew.auld@intel.com
(cherry picked from commit 28d52f99bb)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The engine busyness stats has a worker function to do things like
64bit extend the 32bit hardware counters. The GuC's reset prepare
function flushes out this worker function to ensure no corruption
happens during the reset. Unforunately, the worker function has an
infinite wait for active resets to finish before doing its work. Thus
a deadlock would occur if the worker function had actually started
just as the reset starts.
The function being used to lock the reset-in-progress mutex is called
intel_gt_reset_trylock(). However, as noted it does not follow
standard 'trylock' conventions and exit if already locked. So rename
the current _trylock function to intel_gt_reset_lock_interruptible(),
which is the behaviour it actually provides. In addition, add a new
implementation of _trylock and call that from the busyness stats
worker instead.
v2: Rename existing trylock to interruptible rather than trying to
preserve the existing (confusing) naming scheme (review comments from
Tvrtko).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-3-John.C.Harrison@Intel.com
The conversion looks harmless, however the addr value is updated inside
the loop with the previous vm_end, which then incorrectly leads to
for_each_vma_range() iterating over stuff outside the range we care
about. Fix this by storing the end value separately. Also fix the case
where the range doesn't intersect with any vma, or if the vma itself
doesn't extend the entire range, which must mean we have hole at the
end. Both should result in an error, as per the previous behaviour.
v2: Fix the cases where the range is empty, or if there's a hole at
the end of the range
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7247
Testcase: igt@gem_userptr_blits@probe
Fixes: f683b9d613 ("i915: use the VMA iterator")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028130635.465839-1-matthew.auld@intel.com
Driver Changes:
- Fix for #7306: [Arc A380] white flickering when using arc as a
secondary gpu (Matt A)
- Add Wa_18017747507 for DG2 (Wayne)
- Avoid spurious WARN on DG1 due to incorrect cache_dirty flag
(Niranjana, Matt A)
- Corrections to CS timestamp support for Gen5 and earlier (Ville)
- Fix a build error used with clang compiler on hwmon (GG)
- Improvements to LMEM handling with RPM (Anshuman, Matt A)
- Cleanups in dmabuf code (Mike)
- Selftest improvements (Matt A)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y2N11wu175p6qeEN@jlahtine-mobl.ger.corp.intel.com
UAPI Changes:
Cross-subsystem Changes:
- dma-buf: locking improvements
- firmware: New API in the RaspberryPi firmware driver used by vc4
Core Changes:
- client: Null pointer dereference fix in drm_client_buffer_delete()
- mm/buddy: Add back random seed log
- ttm: Convert ttm_resource to use size_t for its size, fix for an
undefined behaviour
Driver Changes:
- bridge:
- adv7511: use dev_err_probe
- it6505: Fix return value check of pm_runtime_get_sync
- panel:
- sitronix: Fixes and clean-ups
- lcdif: Increase DMA burst size
- rockchip: runtime_pm improvements
- vc4: Fix for a regression preventing the use of 4k @ 60Hz, and
further HDMI rate constraints check.
- vmwgfx: Cursor improvements
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY2N8/gAKCRDj7w1vZxhR
xT1TAQDiwdSyL/bzOk/WYggmj5wmFqb2PPFbFRt/DOTLl52ZdwEAmtE9I2WKvY8w
sxZYF9gfFnQC3id7YwNs+CDb1kAangc=
=6GBd
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 6.2:
UAPI Changes:
Cross-subsystem Changes:
- dma-buf: locking improvements
- firmware: New API in the RaspberryPi firmware driver used by vc4
Core Changes:
- client: Null pointer dereference fix in drm_client_buffer_delete()
- mm/buddy: Add back random seed log
- ttm: Convert ttm_resource to use size_t for its size, fix for an
undefined behaviour
Driver Changes:
- bridge:
- adv7511: use dev_err_probe
- it6505: Fix return value check of pm_runtime_get_sync
- panel:
- sitronix: Fixes and clean-ups
- lcdif: Increase DMA burst size
- rockchip: runtime_pm improvements
- vc4: Fix for a regression preventing the use of 4k @ 60Hz, and
further HDMI rate constraints check.
- vmwgfx: Cursor improvements
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103083437.ksrh3hcdvxaof62l@houat
Currently on DG1, which does not have LLC, we hit the below
warning while rebinding an userptr invalidated object.
WARNING: CPU: 4 PID: 13008 at drivers/gpu/drm/i915/gem/i915_gem_pages.c:34 __i915_gem_object_set_pages+0x296/0x2d0 [i915]
...
RIP: 0010:__i915_gem_object_set_pages+0x296/0x2d0 [i915]
...
Call Trace:
<TASK>
i915_gem_userptr_get_pages+0x175/0x1a0 [i915]
____i915_gem_object_get_pages+0x32/0xb0 [i915]
i915_gem_object_userptr_submit_init+0x286/0x470 [i915]
eb_lookup_vmas+0x2ff/0xcf0 [i915]
? __intel_wakeref_get_first+0x55/0xb0 [i915]
i915_gem_do_execbuffer+0x785/0x21d0 [i915]
i915_gem_execbuffer2_ioctl+0xe7/0x3d0 [i915]
We shouldn't be setting the obj->cache_dirty for DGFX,
fix it.
Fixes: d70af57944 ("drm/i915/shmem: ensure flush during swap-in on non-LLC")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102051416.27327-1-niranjana.vishwanathapura@intel.com
- More VBT specific code clean-up, doc, organization,
and improvements (Ville)
- More MTL enabling work (Matt, RK, Anusha, Jose)
- FBC related clean-ups and improvements (Ville)
- Removing unused sw_fence_await_reservation (Niranjana)
- Big chunch of display house clean-up (Ville)
- Many Watermark fixes and clean-ups (Ville)
- Fix device info for devices without display (Jani)
- Fix TC port PLLs after readout (Ville)
- DPLL ID clean-ups (Ville)
- Prep work for finishing (de)gamma readout (Ville)
- PSR fixes and improvements (Jouni, Jose)
- Reject excessive dotclocks early (Ville)
- DRRS related improvements (Ville)
- Simplify uncore register updates (Andrzej)
- Fix simulated GPU reset wrt. encoder HW readout (Imre)
- Add a ADL-P workaround (Jose)
- Fix clear mask in GEN7_MISCCPCTL update (Andrzej)
- Temporarily disable runtime_pm for discrete (Anshuman)
- Improve fbdev debugs (Nirmoy)
- Fix DP FRL link training status (Ankit)
- Other small display fixes (Ankit, Suraj)
- Allow panel fixed modes to have differing sync
polarities (Ville)
- Clean up crtc state flag checks (Ville)
- Fix race conditions during DKL PHY accesses (Imre)
- Prep-work for cdclock squash and crawl modes (Anusha)
- ELD precompute and readout (Ville)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmNcHTsACgkQ+mJfZA7r
E8pXrwgAgYH8DQu+/C+gu4cB7wOQC+flrCAinCRaxE88q7iWDoQoY+XFdBLq9vEM
QWNAmnUG53+z1rW1+EAVoIDhRz7yYnsxScLj/FijJl52hNNP3QkZP4iZuAMJyQy2
NYDOQyzvLBGmOIDgz+4YBtqk28eiX8x0+sYSA/JuLU5lL1zXFeXt2pFt2kcAnT/S
Fe1MajA3TxLO9lAhnEzEUD3X/xLz5D/91BQS+7OL8n24Hxb9kBZ0N1UbwDOCDegc
NEGpQmJTJnkGTAEQ8tlqnvMgHUbGmm3OEImamfC2QGvIdolF8zax9baNKoUts7BZ
CaBDK2NK//W+sYl52Tig4pCv36bPdw==
=91XL
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2022-10-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Hotplug code clean-up and organization (Jani, Gustavo)
- More VBT specific code clean-up, doc, organization,
and improvements (Ville)
- More MTL enabling work (Matt, RK, Anusha, Jose)
- FBC related clean-ups and improvements (Ville)
- Removing unused sw_fence_await_reservation (Niranjana)
- Big chunch of display house clean-up (Ville)
- Many Watermark fixes and clean-ups (Ville)
- Fix device info for devices without display (Jani)
- Fix TC port PLLs after readout (Ville)
- DPLL ID clean-ups (Ville)
- Prep work for finishing (de)gamma readout (Ville)
- PSR fixes and improvements (Jouni, Jose)
- Reject excessive dotclocks early (Ville)
- DRRS related improvements (Ville)
- Simplify uncore register updates (Andrzej)
- Fix simulated GPU reset wrt. encoder HW readout (Imre)
- Add a ADL-P workaround (Jose)
- Fix clear mask in GEN7_MISCCPCTL update (Andrzej)
- Temporarily disable runtime_pm for discrete (Anshuman)
- Improve fbdev debugs (Nirmoy)
- Fix DP FRL link training status (Ankit)
- Other small display fixes (Ankit, Suraj)
- Allow panel fixed modes to have differing sync
polarities (Ville)
- Clean up crtc state flag checks (Ville)
- Fix race conditions during DKL PHY accesses (Imre)
- Prep-work for cdclock squash and crawl modes (Anusha)
- ELD precompute and readout (Ville)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y1wd6ZJ8LdJpCfZL@intel.com
Update open coded for loop to use the standard scatterlist
for_each_sg API.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-4-matthew.auld@intel.com
Some minor cleanup of some variables for consistency.
Normalize struct sg_table to sgt.
Normalize struct dma_buf_attachment to attach.
checkpatch issues sizeof(), !NULL updates.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-3-matthew.auld@intel.com
Using PAGE_SIZE here potentially hides issues so bump that to something
larger. This should also make it possible for iommu to coalesce entries
for us. With that in place verify we can write from the GPU using the
importers sg_table, followed by checking that our writes match when read
from the CPU side.
v2: Switch over to igt_gpu_fill_dw(), which looks to be more widely
supported than the migrate stuff (at least OOTB).
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7306
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-2-matthew.auld@intel.com
We need to iterate over the original entries here for the sg_table,
pulling out the struct page for each one, to be remapped. However
currently this incorrectly iterates over the final dma mapped entries,
which is likely just one gigantic sg entry if the iommu is enabled,
leading to us only mapping the first struct page (and any physically
contiguous pages following it), even if there is potentially lots more
data to follow.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7306
Fixes: 1286ff7397 ("i915: add dmabuf/prime buffer sharing support.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: <stable@vger.kernel.org> # v3.5+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-1-matthew.auld@intel.com
We had already grabbed the rpm wakeref at obj destruction path,
but it also required to grab the wakeref when object moves.
When i915_gem_object_release_mmap_offset() gets called by
i915_ttm_move_notify(), it will release the mmap offset without
grabbing the wakeref. We want to avoid that therefore,
grab the wakeref at i915_ttm_unmap_virtual() accordingly.
While doing that also changed the lmem_userfault_lock from
mutex to spinlock, as spinlock widely used for list.
Also changed if (obj->userfault_count) to
GEM_BUG_ON(!obj->userfault_count).
v2:
- Removed lmem_userfault_{list,lock} from intel_gt. [Matt Auld]
Fixes: ad74457a6b ("drm/i915/dgfx: Release mmap on rpm suspend")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027092242.1476080-3-anshuman.gupta@intel.com
Runtime pm is not really per GT, therefore it make sense to
move lmem_userfault_list, lmem_userfault_lock and
userfault_wakeref from intel_gt to intel_runtime_pm structure,
which is embedded to i915.
No functional change.
v2:
- Fixes the code comment nit. [Matt Auld]
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027092242.1476080-2-anshuman.gupta@intel.com
swiotlb_max_segment used to return either the maximum size that swiotlb
could bounce, or for Xen PV PAGE_SIZE even if swiotlb could bounce buffer
larger mappings. This made i915 on Xen PV work as it bypasses the
coherency aspect of the DMA API and can't cope with bounce buffering
and this avoided bounce buffering for the Xen/PV case.
So instead of adding this hack back, check for Xen/PV directly in i915
for the Xen case and otherwise use the proper DMA API helper to query
the maximum mapping size.
Replace swiotlb_max_segment() calls with dma_max_mapping_size().
In i915_gem_object_get_pages_internal() no longer consider max_segment
only if CONFIG_SWIOTLB is enabled. There can be other (iommu related)
causes of specific max segment sizes.
Fixes: a2daa27c0c ("swiotlb: simplify swiotlb_max_segment")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: added the Xen hack, rewrote the changelog]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020110308.1582518-1-hch@lst.de
(cherry picked from commit 78a07fe777)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We know that as long as GEM context create ioctl succeeds, a context was
created. There is no need to write about it, especially when such a message
heavily pollutes dmesg and makes debugging actual errors harder.
Since commit baa89ba3f1 ("drm/i915/gem: initial conversion to new
logging macros using coccinelle"), the logging for creating a new user
context was moved under the driver debug output (for lack of a means for
per-user logs, and a lack of user-focused drm.debug parameter). This
only reveals how obnoxious having that spam be part of the driver debug
logs, so remove it. [ from Chris Wilson ]
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221025091903.986819-1-karolina.drobnik@intel.com
Change ttm_resource structure from num_pages to size_t size in bytes.
v1 -> v2: change PFN_UP(dst_mem->size) to ttm->num_pages
v1 -> v2: change bo->resource->size to bo->base.size at some places
v1 -> v2: remove the local variable
v1 -> v2: cleanup cmp_size_smaller_first()
v2 -> v3: adding missing PFN_UP in ttm_bo_vm_fault_reserved
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027091237.983582-1-Amaranath.Somalapuram@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
swiotlb_max_segment used to return either the maximum size that swiotlb
could bounce, or for Xen PV PAGE_SIZE even if swiotlb could bounce buffer
larger mappings. This made i915 on Xen PV work as it bypasses the
coherency aspect of the DMA API and can't cope with bounce buffering
and this avoided bounce buffering for the Xen/PV case.
So instead of adding this hack back, check for Xen/PV directly in i915
for the Xen case and otherwise use the proper DMA API helper to query
the maximum mapping size.
Replace swiotlb_max_segment() calls with dma_max_mapping_size().
In i915_gem_object_get_pages_internal() no longer consider max_segment
only if CONFIG_SWIOTLB is enabled. There can be other (iommu related)
causes of specific max segment sizes.
Fixes: a2daa27c0c ("swiotlb: simplify swiotlb_max_segment")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: added the Xen hack, rewrote the changelog]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020110308.1582518-1-hch@lst.de
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.
Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.
Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).
Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.
Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.
Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.
Design awareness: Selftest impact.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.
Design awareness: Race between guc_request_alloc and guc_context_close.
If a context close is issued while there is a request submission in
flight and a delayed schedule disable is pending, guc_context_close
and guc_request_alloc will race to cancel the delayed disable.
To close the race, make sure that guc_request_alloc waits for
guc_context_close to finish running before checking any state.
Design awareness: GT Reset event.
If a gt reset is triggered, as preparation steps, add an additional step
to ensure all contexts that have a pending delay-disable-schedule task
be flushed of it. Move them directly into the closed state after cancelling
the worker. This is okay because the existing flow flushes all
yet-to-arrive G2H's dropping them anyway.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
Since a7c01fa93a ("signal: break out of wait loops on kthread_stop()")
kthread_stop() started asserting a pending signal which wreaks havoc with
a few of our selftests. Mainly because they are not fully expecting to
handle signals, but also cutting the intended test runtimes short due
signal_pending() now returning true (via __igt_timeout), which therefore
breaks both the patterns of:
kthread_run()
..sleep for igt_timeout_ms to allow test to exercise stuff..
kthread_stop()
And check for errors recorded in the thread.
And also:
Main thread | Test thread
---------------+------------------------------
kthread_run() |
kthread_stop() | do stuff until __igt_timeout
| -- exits early due signal --
Where this kthread_stop() was assume would have a "join" semantics, which
it would have had if not the new signal assertion issue.
To recap, threads are now likely to catch a previously impossible
ERESTARTSYS or EINTR, marking the test as failed, or have a pointlessly
short run time.
To work around this start using kthread_work(er) API which provides
an explicit way of waiting for threads to exit. And for cases where
parent controls the test duration we add explicit signaling which threads
will now use instead of relying on kthread_should_stop().
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020130841.3845791-1-tvrtko.ursulin@linux.intel.com
Currently i915_ttm_to_gem() returns NULL for ttm ghost
object which makes it unclear when we should add a NULL
check for a caller of i915_ttm_to_gem() as ttm ghost
objects are expected behaviour for certain cases.
Create a separate function to detect ttm ghost object and
use that in places where we expect a ghost obj from ttm.
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014131427.21102-1-nirmoy.das@intel.com
Prepare i915 driver to the common dynamic dma-buf locking convention
by starting to use the unlocked versions of dma-buf API functions
and handling cases where importer now holds the reservation lock.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-7-dmitry.osipenko@collabora.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
=M+mV
-----END PGP SIGNATURE-----
Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:
commit caa574ffc4
Author: Matthew Auld <matthew.auld@intel.com>
Date: Sat Feb 19 00:17:49 2022 +0530
drm/i915/uapi: document behaviour for DG2 64K support
On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.
With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.
Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.
Based on a patch from CQ Tang.
v2: update the comment wrt scratch page
v3: (Nirmoy)
- Fix the selftest to actually use the random size, plus some comment
improvements, also drop the rem stuff.
Reported-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:
@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)
@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@
- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)
@collapse_ret@
type T;
identifier VAR;
expression E;
@@
{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}
@drop_var@
type T;
identifier VAR;
@@
{
- T VAR;
... when != VAR
}
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
In the next patch we want to move the object (if the current resource is
not compatible), to the mappable part of lmem for some display buffers.
Currently that requires being able to unset the I915_BO_ALLOC_GPU_ONLY
hint.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jianshui Yu <jianshui.yu@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004131916.233474-3-matthew.auld@intel.com
(cherry picked from commit 999f456207)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
linux-next for a couple of months without, to my knowledge, any negative
reports (or any positive ones, come to that).
- Also the Maple Tree from Liam R. Howlett. An overlapping range-based
tree for vmas. It it apparently slight more efficient in its own right,
but is mainly targeted at enabling work to reduce mmap_lock contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
(https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
This has yet to be addressed due to Liam's unfortunately timed
vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down to
the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
=xfWx
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
The variable ret is being assigned with a value that is never read
both before and after a while-loop. The variable is being re-assigned
inside the while-loop and afterwards on the call to the function
i915_gem_object_lock_interruptible. Remove the redundants assignments.
Cleans up clang scan-build warnings:
warning: Although the value stored to 'ret' is used in the
enclosing expression, the value is never actually read
from 'ret' [deadcode.DeadStores]
warning: Value stored to 'ret' is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221007194745.2749277-1-colin.i.king@gmail.com
Patch which added graceful exit for non-persistent contexts missed the
fact it is not enough to set the exiting flag on a context and let the
backend handle it from there.
GuC backend cannot handle it because it runs independently in the
firmware and driver might not see the requests ever again. Patch also
missed the fact some usages of intel_context_is_banned in the GuC backend
needed replacing with newly introduced intel_context_is_schedulable.
Fix the first issue by calling into backend revoke when we know this is
the last chance to do it. Fix the second issue by replacing
intel_context_is_banned with intel_context_is_schedulable, which should
always be safe since latter is a superset of the former.
v2:
* Just call ce->ops->revoke unconditionally. (Andrzej)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 45c64ecf97 ("drm/i915: Improve user experience and driver robustness under SIGINT or similar")
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <stable@vger.kernel.org> # v6.0+
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221003121630.694249-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 0add082ceb)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Restore the previous behaviour here where we compare the
pci_resource_len() with the actual lmem_size, and not the dsm size,
since dsm here is just some subset snipped off the end of the lmem.
Otherwise we will incorrectly report an io_size > 0 on small-bar
systems.
It doesn't looks like MTL is expecting small-bar with its stolen memory,
based on:
GEM_BUG_ON(pci_resource_len(pdev, GEN12_LMEM_BAR) != SZ_256M)
GEM_BUG_ON((dsm_size + SZ_8M) > lmem_size)
So just move the HAS_BAR2_SMEM_STOLEN() check first, which then ignores
the small bar part, and we can go back to checking lmem_size against the
BAR size.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7007
Fixes: dbb2ffbfd7 ("drm/i915/mtl: enable local stolen memory")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221005153148.758822-2-matthew.auld@intel.com
The mask was added in commit e5f415bfc5 ("drm/i915: Add missing mask
when reading GEN12_DSMBASE"), but then looks to be dropped in some
unrelated code movement in commit dbb2ffbfd7 ("drm/i915/mtl: enable
local stolen memory") without explanation. Add it back.
Fixes: dbb2ffbfd7 ("drm/i915/mtl: enable local stolen memory")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221005153148.758822-1-matthew.auld@intel.com
core:
- convert selftests to kunit
- managed init for more objects
- move to idr_init_base
- rename fb and gem cma helpers to dma
- hide unregistered connectors from getconnector ioctl
- DSC passthrough aux support
- backlight handling improvements
- add dma_resv_assert_held to vmap/vunmap
edid:
- move luminance calculation to core
fbdev:
- fix aperture helper usage
fourcc:
- add more format helpers
- add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx, DRM_FORMAT_Dxx
- add packed AYUV8888, XYUV8888
- add some kunit tests
ttm:
- allow bos without backing store
- rewrite placement to use intersect/compatible functions
dma-buf:
- docs update
- improve signalling when debugging
udmabuf:
- fix failure path GPF
dp:
- drop dp/mst legacy code
- atomic mst state support
- audio infoframe packing
panel:
- Samsung LTL101AL01
- B120XAN01.0
- R140NWF5 RH
- DMT028VGHMCMI-1A T
- AUO B133UAN02.1
- IVO M133NW4J-R3
- Innolux N120ACA-EA1
amdgpu:
- Gang submit support
- Mode2 reset for RDNA2
- New IP support:
DCN 3.1.4, 3.2
SMU 13.x
NBIO 7.7
GC 11.x
PSP 13.x
SDMA 6.x
GMC 11.x
- DSC passthrough support
- PSP fixes for TA support
- vangogh GFXOFF stats
- clang fixes
- gang submit CS cleanup prep work
- fix VRAM eviction issues
amdkfd:
- GC 10.3 IP ISA fixes
- fix CRIU regression
- CPU fault on COW mapping fixes
i915:
- align fw versioning with kernel practices
- add display substruct to i915 private
- add initial runtime info to driver info
- split out HDCP and backlight registers
- MEI XeHP SDV GSC support
- add per-gt sysfs defaults
- TLB invalidation improvements
- Disable PCI BAR resize on 32-bit
- GuC firmware updates and compat changes
- GuC log timestamp translation
- DG2 preemption workaround changes
- DG2 improved HDMI pixel clocks support
- PCI BAR sanity checks
- Enable DC5 on DG2
- DG2 DMC fw bumped
- ADL-S PCI ID added
- Meteorlake enablement
- Rename ggtt_view to gtt_view
- host RPS fixes
- release mmaps on rpm suspend on discrete
- clocking and dpll refactoring
- VBT definitions and parsing updates
- SKL watermark code extracted to separate file
- allow seamless M/N changes on eDP panels
- BUG_ON removal and cleanups
msm:
- DPU: simplified VBIF configuration
- cleanup CTL interfaces
- DSI: removed unused msm_display_dsc_config struct
- switch regulator calls to new API
- switched to PANEL_BRIDGE for direct attached panels
- DSI_PHY: convert drivers to parent_hws
- DP: cleanup pixel_rate handling
- HDMI: turned hdmi-phy-8996 into OF clk provider
- misc dt-bindings fixes
- choose eDP as primary display if it's available
- support getting interconnects from either the mdss or the mdp5/dpu
device nodes
- gem: Shrinker + LRU re-work:
- adds a shared GEM LRU+shrinker helper and moves msm over to that
- reduces lock contention between retire and submit by avoiding the
need to acquire obj lock in retire path (and instead using resv
seeing obj's busyness in the shrinker
- fix reclaim vs submit issues
- GEM fault injection for triggering userspace error paths
- Map/unmap optimization
- Improved robustness for a6xx GPU recovery
virtio:
- Improve error and edge conditions handling
- Convert to use managed helpers
- stop exposing LINEAR modifier
mgag200:
- split modeset handling per model
udl:
- suspend/disconnect handling improvements
vc4:
- rework HDMI power up
- depend on PM
- better unplugging support
ast:
- resolution handling improvements
ingenic:
- Add JZ4760(B) support
- avoid a modeset when sharpness property is unchanged
- use the new PM ops
it6505:
- power seq and clock updates
ssd130x:
- regmap bulk write
- use atomic helpers instead of simple helpers
via:
- rename via_drv to via_dri1, consolidate all code.
radeon:
- drop DP MST experimental support
- delayed work flush fix
- use time_after
ti-sn65dsi86:
- DP support
mediatek:
- MT8195 DP support
- drop of_gpio header
- remove unneeded result
- small DP code improvements
vkms:
- RGB565, XRGB64 and ARGB64 support
sun4i:
- tv: convert to atomic
rcar-du:
- Synopsys DW HDMI bridge DT bindings update
exynos:
- use drm_display_info.is_hdmi
- correct return of mixer_mode_valid and hdmi_mode_valid
omap:
- refcounting fix
rockchip:
- RK3568 support
- RK3399 gamma support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmM894sACgkQDHTzWXnE
hr7EYw//WdVe69TNauCAQiOYdmPp1twmr2o5gDOFLoo4IZw5v+qK0HL/nTrDkBq6
xIu1GLTScOh0AItW1rhFmrtKhO/u/QPQ15P6cO7x8AzlUIhVOYqM79+OA0X6zIV8
IZjpc6EEWPSKJTCRud9HdzsV06DIa+QlwShLCaOFxRiGSuUqsxzacIHUqnFekRnV
PBG7RzcmdWwe6Gy/7T2wegsFjw1mh4S4FypEGs53emru3PGvcau5dwXcE5Jro7Br
k4BFFknuXahVJ2ynVfIFn3QUQRMLgAKRWflqxo7McLeKVQEt4gfB6+PaMwGpSiRQ
iC9QPy69TWEx6X015q2DvvlQDewnCbPOlzyoj9O991QDGLPIim8srPblr8DPeeOz
Y7IW1PRVnPdKReMJvTyrIVED/XT9fUoR7N+F9sfPnEee5HsvjXNGumEHbOE8avFf
rB6CFdby+Ecd9cSeINXowFy4ss0d5zCHMiKEVyQWTZOJysp29vLyKezNqU5m37FK
LAQHtsRdn1+V3o22H5y1PJyqssbOMImMV1ffqW/urRLLefPVHIKCKI8Ycgh0qxqc
B+gebHMgF8j6RR0DHAcQby+PIVi/Pn36TAMI3lPsVjFWGS5s5EQwpKlMNj46H0Cr
yE2Vr4w29+Nsv0b4Uz16AZ0mHcauqx4bvWMyT0frJfNcE86x8MI=
=u/MJ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Lots of stuff all over, some new AMD IP support and gang submit
support. i915 has further DG2 and Meteorlake pieces, and a bunch of
i915 display refactoring. msm has a shrinker rework. There are also a
bunch of conversions to use kunit.
This has two external pieces, some MEI changes needed for future Intel
discrete GPUs. These should be acked by Greg. There is also a cross
maintainer shared tree with some backlight rework from Hans in here.
Core:
- convert selftests to kunit
- managed init for more objects
- move to idr_init_base
- rename fb and gem cma helpers to dma
- hide unregistered connectors from getconnector ioctl
- DSC passthrough aux support
- backlight handling improvements
- add dma_resv_assert_held to vmap/vunmap
edid:
- move luminance calculation to core
fbdev:
- fix aperture helper usage
fourcc:
- add more format helpers
- add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx, DRM_FORMAT_Dxx
- add packed AYUV8888, XYUV8888
- add some kunit tests
ttm:
- allow bos without backing store
- rewrite placement to use intersect/compatible functions
dma-buf:
- docs update
- improve signalling when debugging
udmabuf:
- fix failure path GPF
dp:
- drop dp/mst legacy code
- atomic mst state support
- audio infoframe packing
panel:
- Samsung LTL101AL01
- B120XAN01.0
- R140NWF5 RH
- DMT028VGHMCMI-1A T
- AUO B133UAN02.1
- IVO M133NW4J-R3
- Innolux N120ACA-EA1
amdgpu:
- Gang submit support
- Mode2 reset for RDNA2
- New IP support:
DCN 3.1.4, 3.2
SMU 13.x
NBIO 7.7
GC 11.x
PSP 13.x
SDMA 6.x
GMC 11.x
- DSC passthrough support
- PSP fixes for TA support
- vangogh GFXOFF stats
- clang fixes
- gang submit CS cleanup prep work
- fix VRAM eviction issues
amdkfd:
- GC 10.3 IP ISA fixes
- fix CRIU regression
- CPU fault on COW mapping fixes
i915:
- align fw versioning with kernel practices
- add display substruct to i915 private
- add initial runtime info to driver info
- split out HDCP and backlight registers
- MEI XeHP SDV GSC support
- add per-gt sysfs defaults
- TLB invalidation improvements
- Disable PCI BAR resize on 32-bit
- GuC firmware updates and compat changes
- GuC log timestamp translation
- DG2 preemption workaround changes
- DG2 improved HDMI pixel clocks support
- PCI BAR sanity checks
- Enable DC5 on DG2
- DG2 DMC fw bumped
- ADL-S PCI ID added
- Meteorlake enablement
- Rename ggtt_view to gtt_view
- host RPS fixes
- release mmaps on rpm suspend on discrete
- clocking and dpll refactoring
- VBT definitions and parsing updates
- SKL watermark code extracted to separate file
- allow seamless M/N changes on eDP panels
- BUG_ON removal and cleanups
msm:
- DPU:
simplified VBIF configuration
cleanup CTL interfaces
- DSI:
removed unused msm_display_dsc_config struct
switch regulator calls to new API
switched to PANEL_BRIDGE for direct attached panels
- DSI_PHY: convert drivers to parent_hws
- DP: cleanup pixel_rate handling
- HDMI: turned hdmi-phy-8996 into OF clk provider
- misc dt-bindings fixes
- choose eDP as primary display if it's available
- support getting interconnects from either the mdss or the mdp5/dpu
device nodes
- gem: Shrinker + LRU re-work:
- adds a shared GEM LRU+shrinker helper and moves msm over to that
- reduce lock contention between retire and submit by avoiding the
need to acquire obj lock in retire path (and instead using resv
seeing obj's busyness in the shrinker
- fix reclaim vs submit issues
- GEM fault injection for triggering userspace error paths
- Map/unmap optimization
- Improved robustness for a6xx GPU recovery
virtio:
- improve error and edge conditions handling
- convert to use managed helpers
- stop exposing LINEAR modifier
mgag200:
- split modeset handling per model
udl:
- suspend/disconnect handling improvements
vc4:
- rework HDMI power up
- depend on PM
- better unplugging support
ast:
- resolution handling improvements
ingenic:
- add JZ4760(B) support
- avoid a modeset when sharpness property is unchanged
- use the new PM ops
it6505:
- power seq and clock updates
ssd130x:
- regmap bulk write
- use atomic helpers instead of simple helpers
via:
- rename via_drv to via_dri1, consolidate all code.
radeon:
- drop DP MST experimental support
- delayed work flush fix
- use time_after
ti-sn65dsi86:
- DP support
mediatek:
- MT8195 DP support
- drop of_gpio header
- remove unneeded result
- small DP code improvements
vkms:
- RGB565, XRGB64 and ARGB64 support
sun4i:
- tv: convert to atomic
rcar-du:
- Synopsys DW HDMI bridge DT bindings update
exynos:
- use drm_display_info.is_hdmi
- correct return of mixer_mode_valid and hdmi_mode_valid
omap:
- refcounting fix
rockchip:
- RK3568 support
- RK3399 gamma support"
* tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm: (1374 commits)
drm/amdkfd: Fix UBSAN shift-out-of-bounds warning
drm/amdkfd: Track unified memory when switching xnack mode
drm/amdgpu: Enable sram on vcn_4_0_2
drm/amdgpu: Enable VCN DPG for GC11_0_1
drm/msm: Fix build break with recent mm tree
drm/panel: simple: Use dev_err_probe() to simplify code
drm/panel: panel-edp: Use dev_err_probe() to simplify code
drm/panel: simple: Add Multi-Inno Technology MI0800FT-9
dt-bindings: display: simple: Add Multi-Inno Technology MI0800FT-9 panel
drm/amdgpu: correct the memcpy size for ip discovery firmware
drm/amdgpu: Skip put_reset_domain if it doesn't exist
drm/amdgpu: remove switch from amdgpu_gmc_noretry_set
drm/amdgpu: Fix mc_umc_status used uninitialized warning
drm/amd/display: Prevent OTG shutdown during PSR SU
drm/amdgpu: add page retirement handling for CPU RAS
drm/amdgpu: use RAS error address convert api in mca notifier
drm/amdgpu: support to convert dedicated umc mca address
drm/amdgpu: export umc error address convert interface
drm/amdgpu: fix sdma v4 init microcode error
drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()
...
On small-bar systems we could be given something non-mappable here,
which leads to nasty oops. Make this nicer by checking if the resource
is mappable or not, and return an error otherwise.
v2: drop GEM_BUG_ON(flags & I915_BO_ALLOC_GPU_ONLY)
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jianshui Yu <jianshui.yu@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004131916.233474-5-matthew.auld@intel.com
In the next patch we want to move the object (if the current resource is
not compatible), to the mappable part of lmem for some display buffers.
Currently that requires being able to unset the I915_BO_ALLOC_GPU_ONLY
hint.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jianshui Yu <jianshui.yu@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221004131916.233474-3-matthew.auld@intel.com
Patch which added graceful exit for non-persistent contexts missed the
fact it is not enough to set the exiting flag on a context and let the
backend handle it from there.
GuC backend cannot handle it because it runs independently in the
firmware and driver might not see the requests ever again. Patch also
missed the fact some usages of intel_context_is_banned in the GuC backend
needed replacing with newly introduced intel_context_is_schedulable.
Fix the first issue by calling into backend revoke when we know this is
the last chance to do it. Fix the second issue by replacing
intel_context_is_banned with intel_context_is_schedulable, which should
always be safe since latter is a superset of the former.
v2:
* Just call ce->ops->revoke unconditionally. (Andrzej)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 45c64ecf97 ("drm/i915: Improve user experience and driver robustness under SIGINT or similar")
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <stable@vger.kernel.org> # v6.0+
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221003121630.694249-1-tvrtko.ursulin@linux.intel.com
It looks like we need this for local-memory, if we want to use ptrace.
Something more is still needed if we want to handle non-mappable memory,
which looks quite annoying.
v2:
- ttm_bo_kmap doesn't seem to work well here, and seems to expect
contiguous resource.
v3(Andi):
- s/PAGE_SIZE/bytes/ when passing in the size of the mapping.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6989
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221003172819.99245-1-matthew.auld@intel.com
Daniele needs 84d4333c1e ("misc/mei: Add NULL check to component match
callback functions") in order to merge the DG2 HuC patches.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
As an integrated GPU, MTL does not have local memory and HAS_LMEM()
returns false. However the platform's stolen memory is presented via
BAR2 (i.e., the BAR we traditionally consider to be the GMADR on IGFX)
and should be managed by the driver the same way that local memory is
on dgpu platforms (which includes setting the "lmem" bit on page table
entries). We use the term "local stolen memory" to refer to this
model.
The major difference from the traditional BAR2 (GMADR) is that
the stolen area is mapped via the BAR2 while in the former BAR2 is an
aperture into the GTT VA through which access are made into stolen area.
BSPEC: 53098, 63830
v2:
1. dropped is_dsm_invalid, updated valid_stolen_size check from Lucas
(Jani, Lucas)
2. drop lmembar_is_igpu_stolen
3. revert to referring GFXMEM_BAR as GEN12_LMEM_BAR (Lucas)
v3:(Jani)
1. rename get_mtl_gms_size to mtl_get_gms_size
2. define register for MMIO address
v4:(Matt)
1. Use REG_FIELD_GET to read GMS value
2. replace the calculations with SZ_256M/SZ_8M
v5: Include more details to commit message on how it is different from
earlier platforms (Anshuman)
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Original-author: CQ Tang
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220929114658.145287-1-aravind.iddamsetty@intel.com
For delayed BO release i915_ttm_delete_mem_notify()
gets called twice, once with proper bo->resource and
another time with NULL. We shouldn't do anything for
the 2nd time as we already cleaned up the obj once.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6850
Fixes: ad74457a6b ("drm/i915/dgfx: Release mmap on rpm suspend")
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220920170628.3391-1-nirmoy.das@intel.com
(cherry picked from commit fb78189899)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Replace the linked list in probe_range() with the VMA iterator.
Link: https://lkml.kernel.org/r/20220906194824.2110408-65-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The inline function has no place in i915_drv.h. Move it away, un-inline,
and untangle some header dependencies while at it.
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220914163514.1837467-1-jani.nikula@intel.com
There is no reason to consider the setup of Data Stolen Memory fatal on
dgfx and non-fatal on integrated. Move the debug and error propagation
around so both have the same behavior: non-fatal. Before this change,
loading i915 on a system with TGL + DG2 would result in just TGL
succeeding the initialization (without stolen).
Now loading i915 on the same system with an injected failure in
i915_gem_init_stolen():
$ dmesg | grep stolen
i915 0000:00:02.0: [drm] Injected failure, disabling use of stolen memory
i915 0000:00:02.0: [drm:init_stolen_smem [i915]] Skip stolen region: failed to setup
i915 0000:03:00.0: [drm] Injected failure, disabling use of stolen memory
i915 0000:03:00.0: [drm:init_stolen_lmem [i915]] Skip stolen region: failed to setup
Both GPUs are still available:
$ sudo build/tools/lsgpu
card1 Intel Dg2 (Gen12) drm:/dev/dri/card1
└─renderD129 drm:/dev/dri/renderD129
card0 Intel Tigerlake (Gen12) drm:/dev/dri/card0
└─renderD128 drm:/dev/dri/renderD128
Reviewed-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220915-stolen-v2-3-20ff797de047@intel.com
Add some helpers: adjust_stolen(), request_smem_stolen_() and
init_reserved_stolen() that are now called by i915_gem_init_stolen() to
initialize each part of the Data Stolen Memory region.
Main goal is to split the reserved part within the stolen, also known as
WOPCM, as its calculation changes often per platform and is a big source
of confusion when handling stolen memory.
Reviewed-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220915-stolen-v2-2-20ff797de047@intel.com
DSMBASE register is defined so BDSM bitfield contains the bits 63 to 20
of the base address of stolen. For the supported platforms bits 0-19 are
zero but that may not be true in future. Add the missing mask.
v2: Use REG_GENMASK64()
Acked-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Caz Yokoyama <caz@caztech.com>
Reviewed-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220915-stolen-v2-1-20ff797de047@intel.com
drm/i915 feature pull for v6.1:
Features and functionality:
- Early Meteorlake (MTL) enabling (José, Radhakrishna, Clint, Imre, Vandita, Ville, Jani)
- Support more HDMI pixel clock frequencies on DG2 (Clint)
- Sanity check PCI BARs (Piotr Piórkowski)
- Enable DC5 on DG2 (Anusha)
- DG2 DMC firmware version bump to v2.07 (Madhumitha)
- New ADL-S PCI ID (José)
Refactoring and cleanups:
- Add display sub-struct to struct drm_i915_private (Jani)
- Add initial runtime info to device info (Jani)
- Split out HDCP and backlight registers to separate files (Jani)
Fixes:
- Skip wm/ddb readout for disabled pipes (Ville)
- HDMI port timing quirk for GLK ECS Liva Q2 (Diego Santa Cruz)
- Fix bw init null pointer dereference (Łukasz Bartosik)
- Disable PPS power hook for DP AUX backlight (Jouni)
- Avoid warnings on registering multiple backlight devices (Arun)
- Fix dual-link DSI backlight and CABC ports for display 11+ (Jani)
- Fix Type-C PHY ownership programming in HDMI legacy mode (Imre)
- Fix unclaimed register access while loading PIPEDMC-C/D (Imre)
- Bump up CDCLK for DG2 (Stan)
- Prune modes that require HDMI 2.1 FRL (Ankit)
- Disable FBC when PSR1 is enabled in display 12-13 (Matt)
- Fix TGL+ HDMI transcoder clock and DDI BUF disable order (Imre)
- Disable PSR before disable pipe (José)
- Disable DMC handlers during firmware loading/disabling on display 12+ (Imre)
- Disable clock gating for PIPEDMC-A/B as a workaround (Imre)
Merges:
- Two drm-next backmerges (Rodrigo, Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87k06rfaku.fsf@intel.com
Release all mmap mapping for all lmem objects which are associated
with userfault such that, while pcie function in D3hot, any access
to memory mappings will raise a userfault.
Runtime resume the dgpu(when gem object lies in lmem).
This will transition the dgpu graphics function to D0
state if it was in D3 in order to access the mmap memory
mappings.
v2:
- Squashes the patches. [Matt Auld]
- Add adequate locking for lmem_userfault_list addition. [Matt Auld]
- Reused obj->userfault_count to avoid double addition. [Matt Auld]
- Added i915_gem_object_lock to check
i915_gem_object_is_lmem. [Matt Auld]
v3:
- Use i915_ttm_cpu_maps_iomem. [Matt Auld]
- Fix 'ret == 0 to ret == VM_FAULT_NOPAGE'. [Matt Auld]
- Reuse obj->userfault_count as a bool 0 or 1. [Matt Auld]
- Delete the mmaped obj from lmem_userfault_list in obj
destruction path. [Matt Auld]
- Get a wakeref for object destruction patch. [Matt Auld]
- Use intel_wakeref_auto to delay runtime PM. [Matt Auld]
v4:
- Avoid using mmo offset to get the vma_node. [Matt Auld]
- Added comment to use the lmem_userfault_lock. [Matt Auld]
- Get lmem_userfault_lock in i915_gem_object_release_mmap_offset.
[Matt Auld]
- Fixed kernel test robot generated warning.
v5:
- Addressed the cosmetics comments. [Andi]
- Changed i915_gem_runtime_pm_object_release_mmap_offset() name to
i915_gem_object_runtime_pm_release_mmap_offset() to be rhythmic.
PCIe Specs 5.3.1.4.1
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6331
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913152714.16541-3-anshuman.gupta@intel.com
Refactor userfault_wakeref to re-use for discrete lmem mmap mapping
as well, as on discrete GTT mmap are not supported. Moving
userfault_wakeref from ggtt to gt structure.
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913152714.16541-2-anshuman.gupta@intel.com
i915_gem_lmem_obj_ops has been removed since
commit 213d509277 ("drm/i915/ttm: Introduce a TTM i915
gem object backend"), so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913024847.552254-7-cuigaosheng1@huawei.com
UAPI Changes:
- Revert "drm/i915/dg2: Add preemption changes for Wa_14015141709"
The intent of Wa_14015141709 was to inform us that userspace can no
longer control object-level preemption as it has on past platforms
(i.e., by twiddling register bit CS_CHICKEN1[0]). The description of
the workaround in the spec wasn't terribly well-written, and when we
requested clarification from the hardware teams we were told that on the
kernel side we should also probably stop setting
FF_SLICE_CS_CHICKEN1[14], which is the register bit that directs the
hardware to honor the settings in per-context register CS_CHICKEN1. It
turns out that this guidance about FF_SLICE_CS_CHICKEN1[14] was a
mistake; even though CS_CHICKEN1[0] is non-operational and useless to
userspace, there are other bits in the register that do still work and
might need to be adjusted by userspace in the future (e.g., to implement
other workarounds that show up). If we don't set
FF_SLICE_CS_CHICKEN1[14] in i915, then those future workarounds would
not take effect.
Even more details at:
https://lists.freedesktop.org/archives/intel-gfx/2022-September/305478.html
Driver Changes:
- Align GuC/HuC firmware versioning scheme to kernel practices (John)
- Fix#6639: h264 hardware video decoding broken in 5.19 on Intel(R)
Celeron(R) N3060 (Nirmoy)
- Meteorlake (MTL) enabling (Matt R)
- GuC SLPC improvements (Vinay, Rodrigo)
- Add thread execution tuning setting for ATS-M (Matt R)
- Don't start PXP without mei_pxp bind (Juston)
- Remove leftover verbose debug logging from GuC error capture (John)
- Abort suspend on low system memory conditions (Nirmoy, Matt A, Chris)
- Add DG2 Wa_16014892111 (Matt R)
- Rename ggtt_view as gtt_view (Niranjana)
- Consider HAS_FLAT_CCS() in needs_ccs_pages (Matt A)
- Don't try to disable host RPS when this was never enabled. (Rodrigo)
- Clear stalled GuC CT request after a reset (Daniele)
- Remove runtime info printing from GuC time stamp logging (Jani)
- Skip Bit12 fw domain reset for gen12+ (Sushma, Radhakrishna)
- Make GuC log sizes runtime configurable (John)
- Selftest improvements (Daniele, Matt B, Andrzej)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YxshfqUN+vDe92Zn@jlahtine-mobl.ger.corp.intel.com
drm-misc-next for v6.1-rc1:
[airlied - fix sun4i_tv build]
UAPI Changes:
- Hide unregistered connectors from GETCONNECTOR ioctl.
- drm/virtio no longer advertises LINEAR modifier, as it doesn't work.
-
Cross-subsystem Changes:
- Fix GPF in udmabuf failure path.
Core Changes:
- Rework TTM placement to use intersect/compatible functions.
- Drop legacy DP-MST support.
- More DP-MST related fixes, and move all state into atomic.
- Make DRM_MIPI_DBI select DRM_KMS_HELPER.
- Add audio_infoframe packing for DP.
- Add logging when some atomic check functions fail.
- Assorted documentation updates and fixes.
Driver Changes:
- Assorted cleanups and fixes in msm, lcdif, nouveau, virtio,
panel/ilitek, bridge/icn6211, tve200, gma500, bridge/*, panfrost, via,
bochs, qxl, sun4i.
- Add add AUO B133UAN02.1, IVO M133NW4J-R3, Innolux N120ACA-EA1 eDP panels.
- Improve DP-MST modeset state handling in amdgpu, nouveau, i915.
- Drop DP-MST from radeon driver, it was broken and only user of legacy
DP-MST.
- Handle unplugging better in vc4.
- Simplify drm cmdparser tests.
- Add DP support to ti-sn65dsi86.
- Add MT8195 DP support to mediatek.
- Support RGB565, XRGB64, and ARGB64 formats in vkms.
- Convert sun4i tv support to atomic.
- Refactor vc4/vec TV Modesetting, and fix timings.
- Use atomic helpers instead of simple display helpers in ssd130x.
Maintainer changes:
- Add Douglas Anderson as reviewer for panel-edp.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a489485b-3ebc-c734-0f80-aed963d89efe@linux.intel.com
So far, different views (normal, partial, rotated and remapped)
into the same object are only supported for GGTT mappings.
But with the upcoming VM_BIND feature, PPGTT will also use the
partial view mapping. Hence rename ggtt_view to more generic
gtt_view.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220901183854.3446-1-niranjana.vishwanathapura@intel.com
Just move the HAS_FLAT_CCS() check into needs_ccs_pages. This also then
fixes i915_ttm_memcpy_allowed() which was incorrectly reporting true on
DG1, even though it doesn't have small-BAR or flat-CCS.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6605
Fixes: efeb3caf43 ("drm/i915/ttm: disallow CPU fallback mode for ccs pages")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220905105329.41455-1-matthew.auld@intel.com
(cherry picked from commit 873fef8833)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
UAPI Changes:
Cross-subsystem Changes:
- DMA-buf: documentation updates.
- Assorted small fixes to vga16fb
- Fix fbdev drivers to use the aperture helpers.
- Make removal of conflicting drivers work correctly without fbdev enabled.
Core Changes:
- bridge, scheduler, dp-mst: Assorted small fixes.
- Add more format helpers to fourcc, and use it to replace the cpp usage.
- Add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx (single channel), and DRM_FORMAT_Dxx
("darkness", inverted single channel)
- Add packed AYUV8888 and XYUV8888 formats.
- Assorted documentation updates.
- Rename ttm_bo_init to ttm_bo_init_validate.
- Allow TTM bo's to exist without backing store.
- Convert drm selftests to kunit.
- Add managed init functions for (panel) bridge, crtc, encoder and connector.
- Fix endianness handling in various format conversion helpers.
- Make tests pass on big-endian platforms, and add test for rgb888 -> rgb565
- Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers and rename, so
drm_plane_helper is no longer needed in most drivers.
- Use idr_init_base instead of idr_init.
- Rename FB and GEM CMA helpers to DMA helpers.
- Rework XRGB8888 related conversion helpers, and add drm_fb_blit() that
takes a iosys_map. Make drm_fb_memcpy take an iosys_map too.
- Move edid luminance calculation to core, and use it in i915.
Driver Changes:
- bridge/{adv7511,ti-sn65dsi86,parade-ps8640}, panel/{simple,nt35510,tc358767},
nouveau, sun4i, mipi-dsi, mgag200, bochs, arm, komeda, vmwgfx, pl111:
Assorted small fixes and doc updates.
- vc4: Rework hdmi power up, and depend on PM.
- panel/simple: Add Samsung LTL101AL01.
- ingenic: Add JZ4760(B) support, avoid a modeset when sharpness property
is unchanged, and use the new PM ops.
- Revert some amdgpu commits that cause garbaged graphics when starting
X, and reapply them with the real problem fixed.
- Completely rework vc4 init to use managed helpers.
- Rename via_drv to via_dri1, and move all stuff there only used by the
dri1 implementation in preperation for atomic modeset.
- Use regmap bulk write in ssd130x.
- Power sequence and clock updates to it6505.
- Split panel-sitrox-st7701 init sequence and rework mode programming code.
- virtio: Improve error and edge conditions handling, and convert to use managed
helpers.
- Add Samsung LTL101AL01, B120XAN01.0, R140NWF5 RH, DMT028VGHMCMI-1A T, panels.
- Add generic fbdev support to komeda.
- Split mgag200 modeset handling to make it more model-specific.
- Convert simpledrm to use atomic helpers.
- Improve udl suspend/disconnect handling.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmMA59MACgkQ/lWMcqZw
E8MBihAAtttSIF8fvRIgnoFeswteB5HlXtrSOVRns+0h7mNzcmUL3yp/jV9betND
NEfBCERRIRk5vqkzm2o9P/rTG+KMYCyyCNdcvvh+rVn+HU2V9MwtJdowZqLu7Rxi
Qn59SaL6fMQanJwF8yHcii1ikqelU8TfsMXcFXt2d/yjn5D/h80tNOULEB74zIDH
Vey/xTawcqEz3Qbj2mL4VihEnzU1cziGTpZvAWU2GTdWLxxN88uA4RD2JDyvGaXG
oO6oilYT9TW+2C+0jfMUysu7RcNYuo4KD8dSVLvec+UKDKdO1pSqn40vgTZp0MBw
V0VGg/Lqg6+u33UQkxeRpdzTboD4cMjoeP0/IJdHQVMYYH449tXDgZEjXqVHdz2c
Ea8JIm36CHjqA0Ua3DT8y1qRtdrjuATJauoB6MXW8RpguqMzVuI4e9Dio77WRNYe
Lg/V5AlQEKcwxeqe6v+8K7aB4lvHZ9g0WSMFHoxYz28d2cFWDVeKgK/GCW3WWMKo
AQigRS84z3QBqK+IeTjbfZREoRFtx7NzQpzU9jQdL06XGWExhGOs8yU2OCoDNN8+
iFMBStEPHLPg27HrLV5JYeMRUBPJMnwQhAEnrSv9KqIfPD3/g2jonoGm3J57p+OI
Ag7pwg88spzP1DC7M6+UvzUnH76Dhf4hdnfQjz10c6BfS1ODPbA=
=kU/l
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2022-08-20-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.1:
UAPI Changes:
Cross-subsystem Changes:
- DMA-buf: documentation updates.
- Assorted small fixes to vga16fb
- Fix fbdev drivers to use the aperture helpers.
- Make removal of conflicting drivers work correctly without fbdev enabled.
Core Changes:
- bridge, scheduler, dp-mst: Assorted small fixes.
- Add more format helpers to fourcc, and use it to replace the cpp usage.
- Add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx (single channel), and DRM_FORMAT_Dxx
("darkness", inverted single channel)
- Add packed AYUV8888 and XYUV8888 formats.
- Assorted documentation updates.
- Rename ttm_bo_init to ttm_bo_init_validate.
- Allow TTM bo's to exist without backing store.
- Convert drm selftests to kunit.
- Add managed init functions for (panel) bridge, crtc, encoder and connector.
- Fix endianness handling in various format conversion helpers.
- Make tests pass on big-endian platforms, and add test for rgb888 -> rgb565
- Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers and rename, so
drm_plane_helper is no longer needed in most drivers.
- Use idr_init_base instead of idr_init.
- Rename FB and GEM CMA helpers to DMA helpers.
- Rework XRGB8888 related conversion helpers, and add drm_fb_blit() that
takes a iosys_map. Make drm_fb_memcpy take an iosys_map too.
- Move edid luminance calculation to core, and use it in i915.
Driver Changes:
- bridge/{adv7511,ti-sn65dsi86,parade-ps8640}, panel/{simple,nt35510,tc358767},
nouveau, sun4i, mipi-dsi, mgag200, bochs, arm, komeda, vmwgfx, pl111:
Assorted small fixes and doc updates.
- vc4: Rework hdmi power up, and depend on PM.
- panel/simple: Add Samsung LTL101AL01.
- ingenic: Add JZ4760(B) support, avoid a modeset when sharpness property
is unchanged, and use the new PM ops.
- Revert some amdgpu commits that cause garbaged graphics when starting
X, and reapply them with the real problem fixed.
- Completely rework vc4 init to use managed helpers.
- Rename via_drv to via_dri1, and move all stuff there only used by the
dri1 implementation in preperation for atomic modeset.
- Use regmap bulk write in ssd130x.
- Power sequence and clock updates to it6505.
- Split panel-sitrox-st7701 init sequence and rework mode programming code.
- virtio: Improve error and edge conditions handling, and convert to use managed
helpers.
- Add Samsung LTL101AL01, B120XAN01.0, R140NWF5 RH, DMT028VGHMCMI-1A T, panels.
- Add generic fbdev support to komeda.
- Split mgag200 modeset handling to make it more model-specific.
- Convert simpledrm to use atomic helpers.
- Improve udl suspend/disconnect handling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f0c71766-61e8-19b7-763a-5fbcdefc633d@linux.intel.com
On system suspend when system memory is low then i915_gem_obj_copy_ttm()
could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
suspend shouldn't continue if i915_ttm_backup() throws an error.
v2: Keep the fdo issue till we have a igt test(Matt).
v3: Use %pe(Andrzej)
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6529
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Suggested-by: Chris P Wilson <chris.p.wilson@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220901172217.18392-1-nirmoy.das@intel.com
Sync drm-intel-next with v6.0-rc as well as recent drm-intel-gt-next.
Since drm-next does not have commit f0c70d41e4 ("drm/i915/guc: remove
runtime info printing from time stamp logging") yet, only
drm-intel-gt-next, will need to do that as part of the merge here to
build.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
UAPI Changes:
- Create gt/gtN/.defaults/ for per gt sysfs defaults
Create a gt/gtN/.defaults/ directory (similar to
engine/<engine-name>/.defaults/) to expose default parameter values for
each gt in sysfs. This allows userspace to restore default parameter values
after they have changed.
Driver Changes:
- Support GuC v69 in parallel to v70 (Daniele)
- Improve TLB invalidation to limit performance regression (Chris, Mauro)
- Expose per-gt RPS defaults in sysfs (Ashutosh)
- Suppress OOM warning for shmemfs object allocation failure (Chris, Nirmoy)
- Disable PCI resize on 32-bit machines (Nirmoy)
- Update DG2 to GuC v70.4.1 (John)
- Fix CCS data copying on DG2 during swapping (Matt A)
- Add DG2 performance tuning setting recommended by spec (Matt R)
- Add GuC <-> kernel time stamp translation information to error logs (John)
- Record GuC CTB info in error logs (John)
- Route semaphores to GuC for Gen12+ when enabled (Michal Wi, John)
- Improve resilency to bug #3575: Handle reset timeouts under unrelated kernel hangs (Chris, Ashutosh)
- Avoid system freeze by removing shared locking on freeing objects (Chris, Nirmoy)
- Demote GuC error "No response for request" into debug when expected (Zhanjun)
- Fix GuC capture size warning and bump the size (John)
- Use streaming loads to speed up dumping the GuC log (Chris, John)
- Don't abort on CTB_UNUSED status from GuC (John)
- Don't send spurious policy update for GuC child contexts (Daniele)
- Don't leak the CCS state (Matt A)
- Prefer drm_err over pr_err (John)
- Eliminate unused calc_ctrl_surf_instr_size (Matt A)
- Add dedicated function for non-ctx register tuning settings (Matt R)
- Style and typo fixes, documentation improvements (Jason Wang, Mauro)
- Selftest improvements (Matt B, Rahul, John)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YwYTCjA/Rhpd1n4A@jlahtine-mobl.ger.corp.intel.com
Implemented a new intersect and compatible callback function
fetching start offset from drm buddy allocator.
v3: move the bits that are specific to buddy_man (Matthew)
v4: consider the block size /range (Matthew)
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220820073304.178444-4-Arunpravin.PaneerSelvam@amd.com
This reverts commit 6a07990384.
Everything in CI using GuC is now timing out[1], and killing the machine
with this change (perhaps a deadlock?). CI was recently on fire due to
some changes coming in from -rc1, so likely the pre-merge CI results for
this series were invalid? For now just revert, unless GuC experts
already have a fix in mind.
[1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html?
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220819123904.913750-1-matthew.auld@intel.com
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.
Alan Previn: Matt Brost first introduced this series back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.
Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).
Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.
Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.
Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
This will help in an upcoming patch where the live selftest wrappers
are extended to do more.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <john.c.harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-2-alan.previn.teres.alexis@intel.com
- don't leak the ccs state (Matt)
- TLB invalidation fixes (Chris)
[now with all fixes of fixes]
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmL1bnMACgkQ+mJfZA7r
E8oQBAf+IHHvIp89jM/lwwnKZ4IwsD+kWGo4TnN57uHwzxMaIXKTlG9Em2HPhSgL
PyvV86Q1SLv+fo59SCSOg2z0+m3gUZHdJ4IAjDkWUbluCoXvB+51NuYKtQjtZE7J
zkR0dLzsT3Jo0gUc5IofkIA7m7U1y9TA3W+eWyJuBz4gIEXttu/YX/ZMJbATL7qT
rDFFCEfTNDb/Zui8tNYwlPaVDWU/JxtU3LFpZgphcp42ancgXkf4J7mMA4Q3nXnz
QBbMlk7urYz0LIj+ICrWutTa4SN7y3y7Z9pknivyxawTIzwk+RXTJ70+kDHe3dhK
3x0x4bk2uFpEo8y0mMzz4Bhn9i1zXA==
=8hPH
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-fixes-2022-08-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- disable pci resize on 32-bit systems (Nirmoy)
- don't leak the ccs state (Matt)
- TLB invalidation fixes (Chris)
[now with all fixes of fixes]
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YvVumNCga+90fYN0@intel.com
For proper operation of i915 we need usable PCI GTTMMADDR BAR 0
(1 for GEN2). In most cases we also need usable PCI GFXMEM BAR 2.
Let's add functions to check if BARs are set, and that it have
a size greater than 0.
In case GTTMMADDR BAR, let's validate at the beginning of i915
initialization.
For other BARs, let's validate before first use.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805155959.1983584-3-piotr.piorkowski@intel.com
At the moment, when we refer to some PCI BAR we use the number of
this BAR in the code. The meaning of BARs between different platforms
may be different. Therefore, in order to organize the code,
let's start using defined names instead of numbers.
v2: Add lost header in cfg_space.c
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805155959.1983584-2-piotr.piorkowski@intel.com
The obj->base.resv may be shared across many objects, some of which may
still be live and locked, preventing objects from being freed
indefintely. We could individualise the lock during the free, or rely on
a freed object having no contention and being able to immediately free
the pages it owns.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6469
Fixes: be7612fd66 ("drm/i915: Require object lock when freeing pages during destruction")
Fixes: 6cb12fbda1 ("drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.")
Cc: <stable@vger.kernel.org> # v5.17+
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220726144844.18429-1-nirmoy.das@intel.com
(cherry picked from commit 7dd5c56531)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Invalidate TLB in batches, in order to reduce performance regressions.
Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.
We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.
That helps to reduce the performance regression introduced by TLB
invalidate logic.
[mchehab: rebased to not require moving the code to a separate file]
Cc: stable@vger.kernel.org
Fixes: 7938d61591 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org
(cherry picked from commit 5d36acb719)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.
This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.
Cc: stable@vger.kernel.org
Fixes: 7938d61591 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/278a57a672edac75683f0818b292e95da583a5fe.1658924372.git.mchehab@kernel.org
(cherry picked from commit 4bedceaed1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Including:
- Most intrusive patch is small and changes the default
allocation policy for DMA addresses. Before the change the
allocator tried its best to find an address in the first 4GB.
But that lead to performance problems when that space gets
exhaused, and since most devices are capable of 64-bit DMA
these days, we changed it to search in the full DMA-mask
range from the beginning. This change has the potential to
uncover bugs elsewhere, in the kernel or the hardware. There
is a Kconfig option and a command line option to restore the
old behavior, but none of them is enabled by default.
- Add Robin Murphy as reviewer of IOMMU code and maintainer for
the dma-iommu and iova code
- Chaning IOVA magazine size from 1032 to 1024 bytes to save
memory
- Some core code cleanups and dead-code removal
- Support for ACPI IORT RMR node
- Support for multiple PCI domains in the AMD-Vi driver
- ARM SMMU changes from Will Deacon:
- Add even more Qualcomm device-tree compatible strings
- Support dumping of IMP DEF Qualcomm registers on TLB sync
timeout
- Fix reference count leak on device tree node in Qualcomm
driver
- Intel VT-d driver updates from Lu Baolu:
- Make intel-iommu.h private
- Optimize the use of two locks
- Extend the driver to support large-scale platforms
- Cleanup some dead code
- MediaTek IOMMU refactoring and support for TTBR up to 35bit
- Basic support for Exynos SysMMU v7
- VirtIO IOMMU driver gets a map/unmap_pages() implementation
- Other smaller cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmLs3DIACgkQK/BELZcB
GuMizhAAguAnLLOkOLlR9/MhrTZfNXCUX+bfrEIevjFXMw4iPNfCCr4ydQ7EdVK6
ZA/3Z89huYl0d0x/FELolnQi+HOeqYrfTDe4rB7TgNgwZnWa+fdHcyYkgBGyfPaV
ilgjNcx8o//9o4NasyB6kU395jVmFxb735gMTTb+tcO9fr+/qIB6hxrHuCklxrNr
C7wK6kkoDPi5n0QuXCSjXEx2Hk245pAWKPLwqxsUYzHGlLfl7ULOxw65BUBGvn/H
uCsTfJFu7u+ErwQYf0qPuOwRBnRdsx9g5EAnfab8p074SoKWvbNnftIxgIRp8ZEM
YgCbhYa1GOFI4r+XzqRzEbc0/vPSttims4Jqz0KxYs7pr5EoVifrWLJFjJdCdc2h
Tio1gTvOq8HbH63kwYNKJhg4iSC6zVd37ihEhvfFO6LcgFl4iCfd2o9zK7oY40J4
XoOxofVnJ2e3tzdhZ/n5quCXiudHixm6WuVa7QYKscF7Ud0tY1wWKuibdlMQTeNM
68MvtlteKcfs1BrWzZyrFMrFeAfIY8LI82y6jdJuoNMU5LE9+5yelXBdJhnVygZ+
Jglv1TIt6W/z1H5JgXtNVZ1wWgBm7rurOqNyfN8XCd8eP1z321CLfX8ujkhKrIWP
ApG15cwvpnh1JX630+UFiEikTGU0fb2orMdPwYmwuu8DAsoLVHE=
=hI2K
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- The most intrusive patch is small and changes the default allocation
policy for DMA addresses.
Before the change the allocator tried its best to find an address in
the first 4GB. But that lead to performance problems when that space
gets exhaused, and since most devices are capable of 64-bit DMA these
days, we changed it to search in the full DMA-mask range from the
beginning.
This change has the potential to uncover bugs elsewhere, in the
kernel or the hardware. There is a Kconfig option and a command line
option to restore the old behavior, but none of them is enabled by
default.
- Add Robin Murphy as reviewer of IOMMU code and maintainer for the
dma-iommu and iova code
- Chaning IOVA magazine size from 1032 to 1024 bytes to save memory
- Some core code cleanups and dead-code removal
- Support for ACPI IORT RMR node
- Support for multiple PCI domains in the AMD-Vi driver
- ARM SMMU changes from Will Deacon:
- Add even more Qualcomm device-tree compatible strings
- Support dumping of IMP DEF Qualcomm registers on TLB sync
timeout
- Fix reference count leak on device tree node in Qualcomm driver
- Intel VT-d driver updates from Lu Baolu:
- Make intel-iommu.h private
- Optimize the use of two locks
- Extend the driver to support large-scale platforms
- Cleanup some dead code
- MediaTek IOMMU refactoring and support for TTBR up to 35bit
- Basic support for Exynos SysMMU v7
- VirtIO IOMMU driver gets a map/unmap_pages() implementation
- Other smaller cleanups and fixes
* tag 'iommu-updates-v5.20-or-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (116 commits)
iommu/amd: Fix compile warning in init code
iommu/amd: Add support for AVIC when SNP is enabled
iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement
ACPI/IORT: Fix build error implicit-function-declaration
drivers: iommu: fix clang -wformat warning
iommu/arm-smmu: qcom_iommu: Add of_node_put() when breaking out of loop
iommu/arm-smmu-qcom: Add SM6375 SMMU compatible
dt-bindings: arm-smmu: Add compatible for Qualcomm SM6375
MAINTAINERS: Add Robin Murphy as IOMMU SUBSYTEM reviewer
iommu/amd: Do not support IOMMUv2 APIs when SNP is enabled
iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled
iommu/amd: Set translation valid bit only when IO page tables are in use
iommu/amd: Introduce function to check and enable SNP
iommu/amd: Globally detect SNP support
iommu/amd: Process all IVHDs before enabling IOMMU features
iommu/amd: Introduce global variable for storing common EFR and EFR2
iommu/amd: Introduce Support for Extended Feature 2 Register
iommu/amd: Change macro for IOMMU control register bit shift to decimal value
iommu/exynos: Enable default VM instance on SysMMU v7
iommu/exynos: Add SysMMU v7 register set
...
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
New driver:
- logicvc
vfio:
- use aperture API
core:
- of: Add data-lane helpers and convert drivers
- connector: Remove deprecated ida_simple_get()
media:
- Add various RGB666 and RGB888 format constants
panel:
- Add HannStar HSD101PWW
- Add ETML0700Y5DHA
dma-buf:
- add sync-file API
- set dma mask for udmabuf devices
fbcon:
- Improve scrolling performance
- Sanitize input
fbdev:
- device unregistering fixes
- vesa: Support COMPILE_TEST
- Disable firmware-device registration when first native driver loads
aperture:
- fix segfault during hot-unplug
- export for use with other subsystems
client:
- use driver validated modes
dp:
- aux: make probing more reliable
- mst: Read extended DPCD capabilities during system resume
- Support waiting for HDP signal
- Port-validation fixes
edid:
- CEA data-block iterators
- struct drm_edid introduction
- implement HF-EEODB extension
gem:
- don't use fb format non-existing planes
probe-helper:
- use 640x480 as displayport fallback
scheduler:
- don't kill jobs in interrupt context
bridge:
- Add support for i.MX8qxp and i.MX8qm
- lots of fixes/cleanups
- Add TI-DLPC3433
- fy07024di26a30d: Optional GPIO reset
- ldb: Add reg and reg-name properties to bindings, Kconfig fixes
- lt9611: Fix display sensing;
- tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling
- tc358775: Fix clock settings
- ti-sn65dsi83: Allow GPIO to sleep
- adv7511: I2C fixes
- anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback
- fsl-ldb: Drop DE flip
- ti-sn65dsi86: Convert to atomic modesetting
amdgpu:
- use atomic fence helpers in DM
- fix VRAM address calculations
- export CRTC bpc via debugfs
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11 / SDMA 6
- Add gfxoff status query for vangogh
- Fix timestamps for cursor only commits
- Adjust GART size on newer APUs for S/G display
- fix buddy memory corruption
amdkfd:
- MMU notifier fixes
- P2P DMA support using dma-buf
- Add available memory IOCTL
- HMM profiler support
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
i915:
- General driver clean-up
- DG2 enabling (still under force probe)
- DG2 small BAR memory support
- HuC loading support
- DG2 workarounds
- DG2/ATS-M device IDs added
- Ponte Vecchio prep work and new blitter engines
- add Meteorlake support
- Fix sparse warnings
- DMC MMIO range checks
- Audio related fixes
- Runtime PM fixes
- PSR fixes
- Media freq factor and per-gt enhancements
- DSI fixes for ICL+
- Disable DMC flip queue handlers
- ADL_P voltage swing updates
- Use more the VBT for panel information
- Fix on Type-C ports with TBT mode
- Improve fastset and allow seamless M/N changes
- Accept more fixed modes with VRR/DMRRS panels
- Disable connector polling for a headless SKU
- ADL-S display PLL w/a
- Enable THP on Icelake and beyond
- Fix i915_gem_object_ggtt_pin_ww regression on old platforms
- Expose per tile media freq factor in sysfs
- Fix dma_resv fence handling in multi-batch execbuf
- Improve on suspend / resume time with VT-d enabled
- export CRTC bpc settings via debugfs
msm:
- gpu: a619 support
- gpu: Fix for unclocked GMU register access
- gpu: Devcore dump enhancements
- client utilization via fdinfo support
- fix fence rollover issue
- gem: Lockdep false-positive warning fix
- gem: Switch to pfn mappings
- WB support on sc7180
- dp: dropped custom bulk clock implementation
- fix link retraining on resolution change
- hdmi: dropped obsolete GPIO support
tegra:
- context isolation for host1x engines
- tegra234 soc support
mediatek:
- add vdosys0/1 for mt8195
- add MT8195 dp_intf driver
exynos:
- Fix resume function issue of exynos decon driver by calling
clk_disable_unprepare() properly if clk_prepare_enable() failed.
nouveau:
- set of misc fixes/cleanups
- display cleanups
gma500:
- Cleanup connector I2C handling
hyperv:
- Unify VRAM allocation of Gen1 and Gen2
meson:
- Support YUV422 output; Refcount fixes
mgag200:
- Support damage clipping
- Support gamma handling
- Protect concurrent HW access
- Fixes to connector
- Store model-specific limits in device-info structure
- fix PCI register init
panfrost:
- Valhall support
r128:
- Fix bit-shift overflow
rockchip:
- Locking fixes in error path
ssd130x:
- Fix built-in linkage
udl:
- Always advertize VGA connector
ast:
- Support multiple outputs
- fix black screen on resume
sun4i:
- HDMI PHY cleanups
vc4:
- Add support for BCM2711
vkms:
- Allocate output buffer with vmalloc()
mcde:
- Fix ref-count leak
mxsfb/lcdif:
- Support i.MX8MP LCD controller
stm/ltdc:
- Support dynamic Z order
- Support mirroring
ingenic:
- Fix display at maximum resolution
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmLp/7YACgkQDHTzWXnE
hr7NjhAAnefa+72EG42OAqajbwTQMENOtFfqyL3k6ueK2ciYbsj/wklw/xc4Ok3o
DM5kG54t+nA9L1M7UyE7eaO36/XcuvS8Ea0uKKkamWt+3Ux4g1Vo1J37nP5sK5jI
GT/wceKA5sk3nuYly2lBby6mVTGuhAX+3edTAFeOwmd0WvQzzpy4vV+nCAgfshUs
ql4gfQPdQdP+wiovUzCIEu6exCSCAI/Oc944fd3AJi5bZbOPFXRS4rMMOLSrdoXV
9P44EZExPbYrDuVUCx/UaZtN8D9myyyBfZe62CtdgNyTYUHXnHCBYue+7D/s5O+y
GaLWcP128MsqZNmJNhmcWFIlgqowO24YkKUH68JH0UtBLSWich8rfdEsrxIidYED
0ma1jodRapjyZOjrHEJ3N5deKpoflMmqvCMpvIk1Ev6pT8KX9a6u34kLgsOVCV41
2bDEYD+DbRW2FexGR79yB2huXHGSnguco6069ca1oy9RF4q8cX6Pb1w2u42oS7zX
lIgLIashilVR2AYg/qi6IPHavmOQ9ItSXPC+4YasYiMGp/mwePqpmL63b/wkhg0D
nXn6/F8Bm6wle2FFbkLGwo1fF1Hn7RzTHSlqRWDKSEaMLhCus6M09VsobFCB19i0
lO4FNVTL8ZtryR94bgVmgi616w9hOhDhM9A+C0kJ9KBkDnDYUJU=
=HQ9U
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- New driver for logicvc - which is a display IP core.
- EDID parser rework to add new extensions
- fbcon scrolling improvements
- i915 has some more DG2 work but not enabled by default, but should
have enough features for userspace to work now.
Otherwise it's lots of work all over the place. Detailed summary:
New driver:
- logicvc
vfio:
- use aperture API
core:
- of: Add data-lane helpers and convert drivers
- connector: Remove deprecated ida_simple_get()
media:
- Add various RGB666 and RGB888 format constants
panel:
- Add HannStar HSD101PWW
- Add ETML0700Y5DHA
dma-buf:
- add sync-file API
- set dma mask for udmabuf devices
fbcon:
- Improve scrolling performance
- Sanitize input
fbdev:
- device unregistering fixes
- vesa: Support COMPILE_TEST
- Disable firmware-device registration when first native driver loads
aperture:
- fix segfault during hot-unplug
- export for use with other subsystems
client:
- use driver validated modes
dp:
- aux: make probing more reliable
- mst: Read extended DPCD capabilities during system resume
- Support waiting for HDP signal
- Port-validation fixes
edid:
- CEA data-block iterators
- struct drm_edid introduction
- implement HF-EEODB extension
gem:
- don't use fb format non-existing planes
probe-helper:
- use 640x480 as displayport fallback
scheduler:
- don't kill jobs in interrupt context
bridge:
- Add support for i.MX8qxp and i.MX8qm
- lots of fixes/cleanups
- Add TI-DLPC3433
- fy07024di26a30d: Optional GPIO reset
- ldb: Add reg and reg-name properties to bindings, Kconfig fixes
- lt9611: Fix display sensing;
- tc358767: DSI/DPI refactoring and DSI-to-eDP support, DSI lane handling
- tc358775: Fix clock settings
- ti-sn65dsi83: Allow GPIO to sleep
- adv7511: I2C fixes
- anx7625: Fix error handling; DPI fixes; Implement HDP timeout via callback
- fsl-ldb: Drop DE flip
- ti-sn65dsi86: Convert to atomic modesetting
amdgpu:
- use atomic fence helpers in DM
- fix VRAM address calculations
- export CRTC bpc via debugfs
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11 / SDMA 6
- Add gfxoff status query for vangogh
- Fix timestamps for cursor only commits
- Adjust GART size on newer APUs for S/G display
- fix buddy memory corruption
amdkfd:
- MMU notifier fixes
- P2P DMA support using dma-buf
- Add available memory IOCTL
- HMM profiler support
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
i915:
- General driver clean-up
- DG2 enabling (still under force probe)
- DG2 small BAR memory support
- HuC loading support
- DG2 workarounds
- DG2/ATS-M device IDs added
- Ponte Vecchio prep work and new blitter engines
- add Meteorlake support
- Fix sparse warnings
- DMC MMIO range checks
- Audio related fixes
- Runtime PM fixes
- PSR fixes
- Media freq factor and per-gt enhancements
- DSI fixes for ICL+
- Disable DMC flip queue handlers
- ADL_P voltage swing updates
- Use more the VBT for panel information
- Fix on Type-C ports with TBT mode
- Improve fastset and allow seamless M/N changes
- Accept more fixed modes with VRR/DMRRS panels
- Disable connector polling for a headless SKU
- ADL-S display PLL w/a
- Enable THP on Icelake and beyond
- Fix i915_gem_object_ggtt_pin_ww regression on old platforms
- Expose per tile media freq factor in sysfs
- Fix dma_resv fence handling in multi-batch execbuf
- Improve on suspend / resume time with VT-d enabled
- export CRTC bpc settings via debugfs
msm:
- gpu: a619 support
- gpu: Fix for unclocked GMU register access
- gpu: Devcore dump enhancements
- client utilization via fdinfo support
- fix fence rollover issue
- gem: Lockdep false-positive warning fix
- gem: Switch to pfn mappings
- WB support on sc7180
- dp: dropped custom bulk clock implementation
- fix link retraining on resolution change
- hdmi: dropped obsolete GPIO support
tegra:
- context isolation for host1x engines
- tegra234 soc support
mediatek:
- add vdosys0/1 for mt8195
- add MT8195 dp_intf driver
exynos:
- Fix resume function issue of exynos decon driver by calling
clk_disable_unprepare() properly if clk_prepare_enable() failed.
nouveau:
- set of misc fixes/cleanups
- display cleanups
gma500:
- Cleanup connector I2C handling
hyperv:
- Unify VRAM allocation of Gen1 and Gen2
meson:
- Support YUV422 output; Refcount fixes
mgag200:
- Support damage clipping
- Support gamma handling
- Protect concurrent HW access
- Fixes to connector
- Store model-specific limits in device-info structure
- fix PCI register init
panfrost:
- Valhall support
r128:
- Fix bit-shift overflow
rockchip:
- Locking fixes in error path
ssd130x:
- Fix built-in linkage
udl:
- Always advertize VGA connector
ast:
- Support multiple outputs
- fix black screen on resume
sun4i:
- HDMI PHY cleanups
vc4:
- Add support for BCM2711
vkms:
- Allocate output buffer with vmalloc()
mcde:
- Fix ref-count leak
mxsfb/lcdif:
- Support i.MX8MP LCD controller
stm/ltdc:
- Support dynamic Z order
- Support mirroring
ingenic:
- Fix display at maximum resolution"
* tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm: (1480 commits)
drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code
drm/amdgpu: enable support for psp 13.0.4 block
drm/amdgpu: add files for PSP 13.0.4
drm/amdgpu: add header files for MP 13.0.4
drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index
drm/amdgpu: send msg to IMU for the front-door loading
drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"
drm/amdgpu: fix hive reference leak when reflecting psp topology info
drm/amd/pm: enable GFX ULV feature support for SMU13.0.0
drm/amd/pm: update driver if header for SMU 13.0.0
drm/amdgpu: move mes self test after drm sched re-started
drm/amdgpu: drop non-necessary call trace dump
drm/amdgpu: enable VCN cg and JPEG cg/pg
drm/amdgpu: vcn_4_0_2 video codec query
drm/amdgpu: add VCN_4_0_2 firmware support
drm/amdgpu: add VCN function in NBIO v7.7
drm/amdgpu: fix a vcn4 boot poll bug in emulation mode
drm/amd/amdgpu: add memory training support for PSP_V13
drm/amdkfd: remove an unnecessary amdgpu_bo_ref
drm/amd/pm: Add get_gfx_off_status interface for yellow carp
...
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into their
own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmLpViQACgkQDpNsjXcp
gj5pBgf/f3+K7Hi3qw7aYQCYJQ7IA/bLyE/DLWI59kuiao6wDSve40B9YH9X++Ha
mRLp55bkQS+bwS2xa4jlqrIDJzAfNoWlXaXZHUXGL1C/52ChTF6jaH2cvO9PVlDS
7fLv1hy2LwiIdzpKJkUW7T+kcQGj3QLKqtQ4x8zD0LGMg055yvt/qndHSUi41nWT
/58+6W8Sk4vvRgkpeChFzF1lGLy00+FGT8y5V2kM9uRliFQ7XPCwqB2a3e5jbW6z
C1NXQmRnopCrnOT1TFIhK3DyX6MDIWV5qcikNAmCKFb9fQFPmjDLPt9iSoMGjw2M
Z+UVhJCaU3ISccd0DG5Ra/vzs9/O9Q==
=DgUi
-----END PGP SIGNATURE-----
Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Fix an accounting bug that made NR_FILE_DIRTY grow without limit
when running xfstests
- Convert more of mpage to use folios
- Remove add_to_page_cache() and add_to_page_cache_locked()
- Convert find_get_pages_range() to filemap_get_folios()
- Improvements to the read_cache_page() family of functions
- Remove a few unnecessary checks of PageError
- Some straightforward filesystem conversions to use folios
- Split PageMovable users out from address_space_operations into
their own movable_operations
- Convert aops->migratepage to aops->migrate_folio
- Remove nobh support (Christoph Hellwig)
* tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits)
fs: remove the NULL get_block case in mpage_writepages
fs: don't call ->writepage from __mpage_writepage
fs: remove the nobh helpers
jfs: stop using the nobh helper
ext2: remove nobh support
ntfs3: refactor ntfs_writepages
mm/folio-compat: Remove migration compatibility functions
fs: Remove aops->migratepage()
secretmem: Convert to migrate_folio
hugetlb: Convert to migrate_folio
aio: Convert to migrate_folio
f2fs: Convert to filemap_migrate_folio()
ubifs: Convert to filemap_migrate_folio()
btrfs: Convert btrfs_migratepage to migrate_folio
mm/migrate: Add filemap_migrate_folio()
mm/migrate: Convert migrate_page() to migrate_folio()
nfs: Convert to migrate_folio
btrfs: Convert btree_migratepage to migrate_folio
mm/migrate: Convert expected_page_refs() to folio_expected_refs()
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()
...
The obj->base.resv may be shared across many objects, some of which may
still be live and locked, preventing objects from being freed
indefintely. We could individualise the lock during the free, or rely on
a freed object having no contention and being able to immediately free
the pages it owns.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6469
Fixes: be7612fd66 ("drm/i915: Require object lock when freeing pages during destruction")
Fixes: 6cb12fbda1 ("drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.")
Cc: <stable@vger.kernel.org> # v5.17+
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220726144844.18429-1-nirmoy.das@intel.com
Convert all callers to pass a folio. Most have the folio
already available. Switch all users from aops->migratepage to
aops->migrate_folio. Also turn the documentation into kerneldoc.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Invalidate TLB in batches, in order to reduce performance regressions.
Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.
We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.
That helps to reduce the performance regression introduced by TLB
invalidate logic.
[mchehab: rebased to not require moving the code to a separate file]
Cc: stable@vger.kernel.org
Fixes: 7938d61591 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org
Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.
This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.
Cc: stable@vger.kernel.org
Fixes: 7938d61591 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/278a57a672edac75683f0818b292e95da583a5fe.1658924372.git.mchehab@kernel.org
We report object allocation failures to userspace with ENOMEM, yet we
still show the memory warning after failing to shrink device allocated
pages. While this warning is similar to other system page allocation
failures, it is superfluous to the ENOMEM provided directly to
userspace.
v2: Add NOWARN in few more places from where we might return
ENOMEM to userspace.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4936
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727174023.16766-1-nirmoy.das@intel.com
- All related to the Small BAR support: (and all by Matt Auld)
* add probed_cpu_visible_size
* expose the avail memory region tracking
* apply ALLOC_GPU only by default
* add NEEDS_CPU_ACCESS hint
* tweak error capture on recoverable contexts
Driver highlights:
- Add Small BAR support (Matt)
- Add MeteorLake support (RK)
- Add support for LMEM PCIe resizable BAR (Akeem)
Driver important fixes:
- ttm related fixes (Matt Auld)
- Fix a performance regression related to waitboost (Chris)
- Fix GT resets (Chris)
Driver others:
- Adding GuC SLPC selftest (Vinay)
- Fix ADL-N GuC load (Daniele)
- Add platform workaround (Gustavo, Matt Roper)
- DG2 and ATS-M device ID updates (Matt Roper)
- Add VM_BIND doc rfc with uAPI documentation (Niranjana)
- Fix user-after-free in vma destruction (Thomas)
- Async flush of GuC log regions (Alan)
- Fixes in selftests (Chris, Dan, Andrzej)
- Convert to drm_dbg (Umesh)
- Disable OA sseu config param for newer hardware (Umesh)
- Multi-cast register steering changes (Matt Roper)
- Add lmem_bar_size modparam (Priyanka)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmLPOTUACgkQ+mJfZA7r
E8rnPQf+NcelYE/Zmt5ExAkBoqECHBHvWbAZsJ4Ve7pxFQMoCn1vMWcpzHhizD8m
ga9ZLQ8cURcNst0Zrc73LLNPtZyqGu3tn9dhw+atIkmFjb4nJ40gvSyEwuPVOYiD
Z0JjcmU9cUfPiYTadHoY+yYGPh9LkF61nOxVZdEJ/ykeQZ1ThgDlFRpjfw7pteMp
SET3Iy1pZXGlJMmlOU9fJXivyN3yO6fO8xY2mqhiPbtf/HUapVpovNYleYBtJt3i
zS52DnuGGsC/gY+rhKfaQE/BfsXBhQK5fy39oZLKRz8uL7PXzYVV6bMAwnODqPwq
Q5+4C2J9M8jChfrkRJyD0TK6U80Zlw==
=l7gi
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-gt-next-2022-07-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver uAPI changes:
- All related to the Small BAR support: (and all by Matt Auld)
* add probed_cpu_visible_size
* expose the avail memory region tracking
* apply ALLOC_GPU only by default
* add NEEDS_CPU_ACCESS hint
* tweak error capture on recoverable contexts
Driver highlights:
- Add Small BAR support (Matt)
- Add MeteorLake support (RK)
- Add support for LMEM PCIe resizable BAR (Akeem)
Driver important fixes:
- ttm related fixes (Matt Auld)
- Fix a performance regression related to waitboost (Chris)
- Fix GT resets (Chris)
Driver others:
- Adding GuC SLPC selftest (Vinay)
- Fix ADL-N GuC load (Daniele)
- Add platform workaround (Gustavo, Matt Roper)
- DG2 and ATS-M device ID updates (Matt Roper)
- Add VM_BIND doc rfc with uAPI documentation (Niranjana)
- Fix user-after-free in vma destruction (Thomas)
- Async flush of GuC log regions (Alan)
- Fixes in selftests (Chris, Dan, Andrzej)
- Convert to drm_dbg (Umesh)
- Disable OA sseu config param for newer hardware (Umesh)
- Multi-cast register steering changes (Matt Roper)
- Add lmem_bar_size modparam (Priyanka)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Ys85pcMYLkqF/HtB@intel.com
Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: aff1e0b09b ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
(cherry picked from commit 9306b2b2df)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
intel-iommu.h is not needed in drm/i915 anymore. Remove its include.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20220514014322.2927339-5-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: bc99f1209f ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.
On applying commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.
Commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.
Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.
Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.
Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
(cherry picked from commit 394e2b57a9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If we encounter some monster sized local-memory page that exceeds the
maximum sg length (UINT32_MAX), ensure that don't end up with some
misaligned address in the entry that follows, leading to fireworks
later. Also ensure we have some coverage of this in the selftests.
v2(Chris):
- Use round_down consistently to avoid udiv errors
v3(Nirmoy):
- Also update the max_segment in the selftest
Fixes: f701b16d4c ("drm/i915/ttm: add i915_sg_from_buddy_resource")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220711085859.24198-1-matthew.auld@intel.com
(cherry picked from commit bc99f1209f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.
On applying commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.
Commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.
Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.
Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.
Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
If we encounter some monster sized local-memory page that exceeds the
maximum sg length (UINT32_MAX), ensure that don't end up with some
misaligned address in the entry that follows, leading to fireworks
later. Also ensure we have some coverage of this in the selftests.
v2(Chris):
- Use round_down consistently to avoid udiv errors
v3(Nirmoy):
- Also update the max_segment in the selftest
Fixes: f701b16d4c ("drm/i915/ttm: add i915_sg_from_buddy_resource")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220711085859.24198-1-matthew.auld@intel.com
Rename ttm_bo_init to ttm_bo_init_validate since that better matches
what the function is actually doing.
Remove the unused size parameter, move the function's kerneldoc to the
implementation and cleanup the whole error handling.
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220707102453.3633-2-christian.koenig@amd.com
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.
In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.
The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Falling back to memcpy/memset shouldn't be allowed if we know we have
CCS state to manage using the blitter. Otherwise we are potentially
leaving the aux CCS state in an unknown state, which smells like an info
leak.
Fixes: 48760ffe92 ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-12-matthew.auld@intel.com
If the move or clear operation somehow fails, and the memory underneath
is not cleared, like when moving to lmem, then we currently fallback to
memcpy or memset. However with small-BAR systems this fallback might no
longer be possible. For now we use the set_wedged sledgehammer if we
ever encounter such a scenario, and mark the object as borked to plug
any holes where access to the memory underneath can happen. Add some
basic selftests to exercise this.
v2:
- In the selftests make sure we grab the runtime pm around the reset.
Also make sure we grab the reset lock before checking if the device
is wedged, since the wedge might still be in-progress and hence the
bit might not be set yet.
- Don't wedge or put the object into an unknown state, if the request
construction fails (or similar). Just returning an error and
skipping the fallback should be safe here.
- Make sure we wedge each gt. (Thomas)
- Peek at the unknown_state in io_reserve, that way we don't have to
export or hand roll the fault_wait_for_idle. (Thomas)
- Add the missing read-side barriers for the unknown_state. (Thomas)
- Some kernel-doc fixes. (Thomas)
v3:
- Tweak the ordering of the set_wedged, also add FIXME.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-11-matthew.auld@intel.com
We should always be explicit and allocate a fence slot before adding a
new fence.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-10-matthew.auld@intel.com
It's not supported, and just skips later anyway. With small-BAR things
get more complicated since all of stolen is likely not even CPU
accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the
object create failing.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-9-matthew.auld@intel.com
A non-recoverable context must be used if the user wants proper error
capture on discrete platforms. In the future the kernel may want to blit
the contents of some objects when later doing the capture stage. Also
extend to newer integrated platforms.
v2(Thomas):
- Also extend to newer integrated platforms, for capture buffer memory
allocation purposes.
v3 (Reported-by: kernel test robot <lkp@intel.com>):
- Fix build on !CONFIG_DRM_I915_CAPTURE_ERROR
Testcase: igt@gem_exec_capture@capture-recoverable
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-8-matthew.auld@intel.com
If set, force the allocation to be placed in the mappable portion of
I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the
object, that way we can always spill the object into system memory if we
can't make space.
Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-6-matthew.auld@intel.com
On small BAR configurations, when dealing with I915_MEMORY_CLASS_DEVICE
allocations, we assume that by default, all userspace allocations should
be placed in the non-CPU visible portion. Note that dumb buffers are
not included here, since these are not "GPU accelerated" and likely need
CPU access. We choose to just always set GPU_ONLY, and let the backend
figure out if that should be ignored or not, for example on full BAR
systems.
In a later patch userspace will be able to provide a hint if CPU access
to the buffer is needed.
v2(Thomas)
- Apply GPU_ONLY on all discrete devices, but only if the BO can be
placed in LMEM. Down in the depths this should be turned into a noop,
where required, and as an annotation it still make some sense. If we
apply it regardless of the placements then we end up needing to check
the placements during exec capture. Also it's slightly inconsistent
since the NEEDS_CPU_ACCESS can only be applied on objects that can be
placed in LMEM. The other annoyance would be gem_create_ext vs plain
gem_create, if we were to always apply GPU_ONLY.
Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-5-matthew.auld@intel.com
UAPI Changes:
- Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson)
- Document memory residency and Flat-CCS capability of obj (Ramalingam C)
- Disable GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK on Xe_HP+ (Matt Roper)
Cross-subsystem Changes:
- Rename intel-gtt symbols (Lucas De Marchi)
Core Changes:
Driver Changes:
- Support programming the EU priority in the GuC descriptor (DG2) (Matthew Brost)
- DG2 HuC loading support (Daniele Ceraolo Spurio)
- Fix build error without CONFIG_PM (YueHaibing)
- Enable THP on Icelake and beyond (Tvrtko Ursulin)
- Only setup private tmpfs mount when needed and fix logging (Tvrtko Ursulin)
- Make __guc_reset_context aware of guilty engines (Umesh Nerlige Ramappa)
- DG2 small bar memory probing fixes (Nirmoy Das)
- Remove unnecessary GuC err capture noise (Alan Previn)
- Fix i915_gem_object_ggtt_pin_ww regression on old platforms (Maarten Lankhorst)
- Fix undefined behavior in GuC backend due to shift overflowing the constant (Borislav Petkov)
- New DG2 workarounds (Swathi Dhanavanthri, Anshuman Gupta)
- Report no hwconfig support on ADL-N (Balasubramani Vivekanandan)
- Fix error_state_read ptr + offset use (Alan Previn)
- Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson)
- Fix memory leaks in per-gt sysfs (Ashutosh Dixit)
- Fix dma_resv fence handling in multi-batch execbuf (Nirmoy Das)
- Add extra registers to GPU error dump on Gen11+ (Stuart Summers)
- More PVC+DG2 workarounds (Matt Roper)
- Improve user experience and driver robustness under SIGINT or similar (Tvrtko Ursulin)
- Don't show engine classes not present (Tvrtko Ursulin)
- Improve on suspend / resume time with VT-d enabled (Thomas Hellström)
- Add missing else (katrinzhou)
- Don't leak lmem mapping in vma_evict (Juha-Pekka Heikkila)
- Add smem fallback allocation for dpt (Juha-Pekka Heikkila)
- Tweak the ordering in cpu_write_needs_clflush (Matthew Auld)
- Do not access rq->engine without a reference (Niranjana Vishwanathapura)
- Revert "drm/i915: Hold reference to intel_context over life of i915_request" (Niranjana Vishwanathapura)
- Don't update engine busyness stats too frequently (Alan Previn)
- Add additional steps for Wa_22011802037 for execlist backend (Umesh Nerlige Ramappa)
- Fix a lockdep warning at error capture (Nirmoy Das)
- Ponte Vecchio prep work and new blitter engines (Matt Roper, John Harrison, Lucas De Marchi)
- Read correct RP_STATE_CAP register (PVC) (Matt Roper)
- Define MOCS table for PVC (Ayaz A Siddiqui)
- Driver refactor and support Ponte Vecchio forcewake handling (Matt Roper)
- Remove additional 3D flags from PIPE_CONTROL (Ponte Vecchio) (Stuart Summers)
- XEHPSDV and PVC do not use HuC (Daniele Ceraolo Spurio)
- Extract stepping information from PCI revid (Ponte Vecchio) (Matt Roper)
- Add initial PVC workarounds (Stuart Summers)
- SSEU handling driver refactor and Ponte Vecchio support (Matt Roper)
- GuC depriv applies to PVC (Matt Roper)
- Add register steering (Ponte Vecchio) (Matt Roper)
- Add recommended MMIO setting (Ponte Vecchio) (Matt Roper)
- Move multicast register handling to a dedicated file (Matt Roper)
- Cleanup interface for MCR operations (Matt Roper)
- Extend i915_vma_pin_iomap() (CQ Tang)
- Re-do the intel-gtt split (Lucas De Marchi)
- Correct duplicated/misplaced GT register definitions (Matt Roper)
- Prefer "XEHP_" prefix for registers (Matt Roper)
- Don't use DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config (Tvrtko Ursulin)
- Don't use DRM_DEBUG_WARN_ON for ring unexpectedly not idle (Tvrtko Ursulin)
- Make drop_pages() return bool (Lucas De Marchi)
- Fix CFI violation with show_dynamic_id() (Nathan Chancellor)
- Use i915_probe_error instead of drm_error in GuC code (Vinay Belgaumkar)
- Fix use of static in macro mismatch (Andi Shyti)
- Update tiled blits selftest (Bommu Krishnaiah)
- Future-proof platform checks (Matt Roper)
- Only include what's needed (Jani Nikula)
- remove accidental static from a local variable (Jani Nikula)
- Add global forcewake request to drpc (Vinay Belgaumkar)
- Fix spelling typo in comment (pengfuyuan)
- Increase timeout for live_parallel_switch selftest (Akeem G Abodunrin)
- Use non-blocking H2G for waitboost (Vinay Belgaumkar)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YrwtLM081SQUG1Dc@tursulin-desk
For imported dma-buf objects we leave the object as cache_coherent = 0
across all platforms, which is reasonable given that have no clue what
the memory underneath is, and its not like the driver can ever manually
clflush the pages anyway (like with i915_gem_clflush_object) for such
objects. However on discrete we choose to treat cache_dirty = true as a
programmer error, leading to a warning. The simplest fix looks to be to
just change the ordering in cpu_write_needs_clflush to prevent ever
setting cache_dirty for dma-buf objects on discrete.
Fixes: d028a7690d ("drm/i915/dmabuf: Fix prime_mmap to work when using LMEM")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5266
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220622155919.355081-1-matthew.auld@intel.com
(cherry picked from commit 563aaf4a92)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We've been introducing new registers with a mix of "XEHP_"
(architecture) and "XEHPSDV_" (platform) prefixes. For consistency,
let's settle on "XEHP_" as the preferred form.
XEHPSDV_RP_STATE_CAP stays with its current name since that's truly a
platform-specific register and not something that applies to the Xe_HP
architecture as a whole.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Caz Yokoyama <caz@caztech.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220624210328.308630-2-matthew.d.roper@intel.com
XEHPSDV_FLAT_CCS_BASE_ADDR, GEN8_L3_LRA_1_GPGPU, and MMCD_MISC_CTRL were
duplicated between i915_reg.h and intel_gt_regs.h. These are all GT
registers, so we should drop the copy from i915_reg.h.
XEHPSDV_TILE0_ADDR_RANGE was defined in i915_reg.h, but really belongs
in intel_gt_regs.h. Move it.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220624210328.308630-1-matthew.d.roper@intel.com
For imported dma-buf objects we leave the object as cache_coherent = 0
across all platforms, which is reasonable given that have no clue what
the memory underneath is, and its not like the driver can ever manually
clflush the pages anyway (like with i915_gem_clflush_object) for such
objects. However on discrete we choose to treat cache_dirty = true as a
programmer error, leading to a warning. The simplest fix looks to be to
just change the ordering in cpu_write_needs_clflush to prevent ever
setting cache_dirty for dma-buf objects on discrete.
Fixes: d028a7690d ("drm/i915/dmabuf: Fix prime_mmap to work when using LMEM")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5266
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220622155919.355081-1-matthew.auld@intel.com
Let's replace the assortment of intel_gt_* and intel_uncore_* functions
that operate on MCR registers with a cleaner set of interfaces:
* intel_gt_mcr_read -- unicast read from specific instance
* intel_gt_mcr_read_any[_fw] -- unicast read from any non-terminated
instance
* intel_gt_mcr_unicast_write -- unicast write to specific instance
* intel_gt_mcr_multicast_write[_fw] -- multicast write to all instances
We'll also replace the historic "slice" and "subslice" terminology with
"group" and "instance" to match the documentation for more recent
platforms; these days MCR steering applies to more types of replication
than just slice/subslice.
v2:
- Reference the new kerneldoc from i915.rst. (Jani)
- Tweak the wording of the documentation for a couple functions to
clarify the difference between "_fw" and non-"_fw" forms.
v3:
- s/read/write/ to fix copy-paste mistake in a couple comments.
(Harish)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220615001019.1821989-3-matthew.d.roper@intel.com
Handling of multicast/replicated registers is spread across intel_gt.c
and intel_uncore.c today. As multicast handling and the related
steering logic gets more complicated with the addition of new platforms
and new rules it makes sense to centralize it all in one place.
For now the existing functions have been moved to the new .c/.h as-is.
Function renames and updates to operate in a more consistent manner will
be done in subsequent patches.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220615001019.1821989-2-matthew.d.roper@intel.com
We have long standing customer complaints that pressing Ctrl-C (or to the
effect of) causes engine resets with otherwise well behaving programs.
Not only is logging engine resets during normal operation not desirable
since it creates support incidents, but more fundamentally we should avoid
going the engine reset path when we can since any engine reset introduces
a chance of harming an innocent context.
Reason for this undesirable behaviour is that the driver currently does
not distinguish between banned contexts and non-persistent contexts which
have been closed.
To fix this we add the distinction between the two reasons for revoking
contexts, which then allows the strict timeout only be applied to banned,
while innocent contexts (well behaving) can preempt cleanly and exit
without triggering the engine reset path.
Note that the added context exiting category applies both to closed non-
persistent context, and any exiting context when hangcheck has been
disabled by the user.
At the same time we rename the backend operation from 'ban' to 'revoke'
which more accurately describes the actual semantics. (There is no ban at
the backend level since banning is a concept driven by the scheduling
frontend. Backends are simply able to revoke a running context so that
is the more appropriate name chosen.)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com
_i915_vma_move_to_active() can receive > 1 fences for
multiple batch buffers submission. Because dma_resv_add_fence()
can only accept one fence at a time, change _i915_vma_move_to_active()
to be aware of multiple fences so that it can add individual
fences to the dma resv object.
v6: fix multi-line comment.
v5: remove double fence reservation for batch VMAs.
v4: Reserve fences for composite_fence on multi-batch contexts and
also reserve fence slots to composite_fence for each VMAs.
v3: dma_resv_reserve_fences is not cumulative so pass num_fences.
v2: make sure to reserve enough fence slots before adding.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5614
Fixes: 544460c338 ("drm/i915: Multi-BB execbuf")
Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220525095955.15371-1-nirmoy.das@intel.com
(cherry picked from commit 420a07b841)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As with EU masks, it's easier to store subslice/DSS masks internally in
a format that's more natural for the driver to work with, and then only
covert into the u8[] uapi form when the query ioctl is invoked. Since
the hardware design changed significantly with Xe_HP, we'll use a union
to choose between the old "hsw-style" subslice masks or the newer xehp
mask. HSW-style masks will be stored in an array of u8's, indexed by
slice (there's never more than 6 subslices per slice on older
platforms). For Xe_HP and beyond where slices no longer exist, we only
need a single bitmask. However we already know that this mask is
eventually going to grow too large for a simple u64 to hold, so we'll
represent it in a manner that can be operated on by the utilities in
linux/bitmap.h.
v2:
- Fix typo: BIT(s) -> BIT(ss) in gen9_sseu_device_status()
v3:
- Eliminate sseu->ss_stride and just calculate the stride while
specifically handling uapi. (Tvrtko)
- Use BITMAP_BITS() macro to refer to size of masks rather than
passing I915_MAX_SS_FUSE_BITS directly. (Tvrtko)
- Report compute/geometry DSS masks separately when dumping Xe_HP SSEU
info. (Tvrtko)
- Restore dropped range checks to intel_sseu_has_subslice(). (Tvrtko)
v4:
- Make the bitmap size macro check the size of the .xehp field rather
than the containing union. (Tvrtko)
- Don't add GEM_BUG_ON() intel_sseu_has_subslice()'s check for whether
slice or subslice ID exceed sseu->max_[sub]slices; various loops
in the driver are expected to exceed these, so we should just
silently return 'false.'
v5:
- Move XEHP_BITMAP_BITS() to the header so that we can also replace a
usage of I915_MAX_SS_FUSE_BITS in one of the inline functions.
(Bala)
- Change the local variable in intel_slicemask_from_xehp_dssmask() from
u16 to 'unsigned long' to make it a bit more future-proof.
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220601150725.521468-6-matthew.d.roper@intel.com
_i915_vma_move_to_active() can receive > 1 fences for
multiple batch buffers submission. Because dma_resv_add_fence()
can only accept one fence at a time, change _i915_vma_move_to_active()
to be aware of multiple fences so that it can add individual
fences to the dma resv object.
v6: fix multi-line comment.
v5: remove double fence reservation for batch VMAs.
v4: Reserve fences for composite_fence on multi-batch contexts and
also reserve fence slots to composite_fence for each VMAs.
v3: dma_resv_reserve_fences is not cumulative so pass num_fences.
v2: make sure to reserve enough fence slots before adding.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5614
Fixes: 544460c338 ("drm/i915: Multi-BB execbuf")
Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220525095955.15371-1-nirmoy.das@intel.com
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first argument
like ->read_folio
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp
gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O
7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj
xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d
nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26
bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX
WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A==
=djLv
-----END PGP SIGNATURE-----
Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
Update the selftest to include Tile 4 mode and switch to Tile 4 on
platforms that supports Tile 4 but no Tile Y and vice versa.
Also switch to XY_FAST_COPY_BLT on platforms that supports it.
v4: update commit message to reflect the code changes properly.
v3: add a function to find X-tile availability for a platform.
v2: disable Tile X for iGPU in fastblit and
fix checkpath --strict warnings.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5879
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220516082015.32020-1-nirmoy.das@intel.com
When removing short term pins, I've changed the the batch buffer
pinning for relocation to use __i915_vma_pin, because
i915_gem_object_ggtt_pin_ww was destroying the old vma. This
caused regressions, because the functions are not identical.
Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again
on ggtt-only platforms, but only if the batch can be pinned without
being moved.
Fixes: b5cfe6f7a6 ("drm/i915: Remove short-term pins from execbuf, v6.")
Cc: Matthew Auld <matthew.auld@intel.com>
Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806
Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com
(cherry picked from commit 451374eef6)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
When removing short term pins, I've changed the the batch buffer
pinning for relocation to use __i915_vma_pin, because
i915_gem_object_ggtt_pin_ww was destroying the old vma. This
caused regressions, because the functions are not identical.
Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again
on ggtt-only platforms, but only if the batch can be pinned without
being moved.
Fixes: b5cfe6f7a6 ("drm/i915: Remove short-term pins from execbuf, v6.")
Cc: Matthew Auld <matthew.auld@intel.com>
Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806
Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com
Commit e4e8062530 ("drm/i915: Change shrink ordering to use locking
around unbinding.") changed the return type to int without changing the
return values or their meaning to "0 is success". Move it back to
boolean.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220503061556.513175-1-lucas.demarchi@intel.com
If i915 does not want to use huge pages there is a) no point in setting up
the private mount and b) should former fail, it is misleading to log THP
support is disabled in the caller, which does not even know if callee
tried to enable it.
Fix both by restructuring the flow in i915_gemfs_init and at the same time
note the failure to set it up in all cases.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220429100414.647857-2-tvrtko.ursulin@linux.intel.com
We have a statement from HW designers that the GPU read regression when
using 2M pages was fixed from Icelake onwards, which was also confirmed
by bencharking Eero did last year:
"""
When IOMMU is disabled, enabling THP causes following perf changes on
TGL-H (GT1):
10-15% SynMark Batch[0-3]
5-10% MemBW GPU texture, SynMark ShMapVsm
3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4
1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7]
+ few others
-7% MemBW GPU blend
In the above 3D benchmark names, * means all the variants of tests with
the same prefix. For example "SynMark TexMem*", means both TexMem128 &
TexMem512 tests in the synthetic (Intel internal) SynMark test suite.
In the (public, but proprietary) GfxBench & GLB(enchmark) test suites,
there are both onscreen and offscreen variants of each test. Unless
explicitly stated otherwise, numbers are for both variants.
All tests are run with FullHD monitor. All tests are fullscreen except
for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest
triangle is run both in fullscreen and 1/2 screen window).
"""
Since the only regression is MemBW GPU blend, against many more gains,
it sounds it is time to enable THP on Gen11+.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220429100414.647857-1-tvrtko.ursulin@linux.intel.com
pagecache_write_begin() and pagecache_write_end() are now trivial
wrappers, so call the aops directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
In order to get the GSC Support merged on drm-intel-gt-next
in a clean fashion we needed this ATS-M patch to avoid
conflict in i915_pci.c:
commit 412c942bdf ("drm/i915/ats-m: add ATS-M platform info")
--
Fixing a silent conflict on drivers/gpu/drm/i915/gt/intel_gt_gmch.c:
- if (!intel_vtd_active(i915))
+ if (!i915_vtd_active(i915))
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/drm-next has a build fix for the NewVision NV3052C panel
(drivers/gpu/drm/panel/panel-newvision-nv3052c.c), which needs to be
merged back to drm-misc-next, as it was failing to build there.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
On Xe-HP and later devices, dedicated compression control state (CCS)
stored in local memory is used for each surface, to support the
3D and media compression formats.
The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory
is reserved for the CCS data and a secure register will be programmed
with the CCS base address
So when an object is allocated in local memory, dont need to explicitly
allocate the space for ccs data. But when the obj is evicted into the
smem, to hold the compression related data along with the obj extra space
is needed in smem. i.e obj_size + (obj_size/256).
Hence when a smem pages are allocated for an obj with lmem placement
possibility we create with the extra pages required for the ccs data for
the obj size.
v2:
Used imperative wording [Thomas]
v3:
Inflate the pages only when obj's placement is lmem only
v4:
GEM_BUG_ON if the ttm->num_pages > obj page size [Thomas]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405150840.29351-9-ramalingam.c@intel.com
Sync up with v5.18-rc1, in particular to get 5e3094cfd9
("drm/i915/xehpsdv: Add has_flat_ccs to device info").
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The intent of the version check in the mmap ioctl was to maintain
support for existing platforms (i.e., ADL/RPL and earlier), but drop
support on all future igpu platforms. As we've seen on the dgpu side,
the hardware teams are using a more fine-grained numbering system for IP
version numbers these days, so it's possible the version number
associated with our next igpu could be some form of "12.xx" rather than
13 or higher. Comparing against the full ver.release number will ensure
the intent of the check is maintained no matter what numbering the
hardware teams settle on.
Fixes: d3f3baa356 ("drm/i915: Reinstate the mmap ioctl for some platforms")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407161839.1073443-1-matthew.d.roper@intel.com
(cherry picked from commit 8e7e5c077c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The intent of the version check in the mmap ioctl was to maintain
support for existing platforms (i.e., ADL/RPL and earlier), but drop
support on all future igpu platforms. As we've seen on the dgpu side,
the hardware teams are using a more fine-grained numbering system for IP
version numbers these days, so it's possible the version number
associated with our next igpu could be some form of "12.xx" rather than
13 or higher. Comparing against the full ver.release number will ensure
the intent of the check is maintained no matter what numbering the
hardware teams settle on.
Fixes: d3f3baa356 ("drm/i915: Reinstate the mmap ioctl for some platforms")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407161839.1073443-1-matthew.d.roper@intel.com
All of CI is just failing with the following, which prevents loading of
the module:
i915 0000:03:00.0: [drm] *ERROR* Scratch setup failed
Best guess is that this comes from the pin_map() for the scratch page,
which does an i915_gem_object_wait_moving_fence() somewhere. It looks
like this now calls into dma_resv_wait_timeout() which can return the
remaining timeout, leading to the caller thinking this is an error.
v2(Lucas): handle ret == 0
Fixes: 1d7f5e6c52 ("drm/i915: drop bo->moving dependency")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Christian König <christian.koenig@amd.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20220408084205.1353427-1-matthew.auld@intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
Add an usage for submissions independent of implicit sync but still
interesting for memory management.
v2: cleanup the kerneldoc a bit
v3: separate amdgpu changes from this
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-10-christian.koenig@amd.com
Add an usage for kernel submissions. Waiting for those are mandatory for
dynamic DMA-bufs.
As a precaution this patch also changes all occurrences where fences are
added as part of memory management in TTM, VMWGFX and i915 to use the
new value because it now becomes possible for drivers to ignore fences
with the WRITE usage.
v2: use "must" in documentation, fix whitespaces
v3: separate out some driver changes and better document why some
changes should still be part of this patch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-5-christian.koenig@amd.com
Instead of distingting between shared and exclusive fences specify
the fence usage while adding fences.
Rework all drivers to use this interface instead and deprecate the old one.
v2: some kerneldoc comments suggested by Daniel
v3: fix a missing case in radeon
v4: rebase on nouveau changes, fix lockdep and temporary disable warning
v5: more documentation updates
v6: separate internal dma_resv changes from this patch, avoids to
disable warning temporary, rebase on upstream changes
v7: fix missed case in lima driver, minimize changes to i915_gem_busy_ioctl
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-3-christian.koenig@amd.com
This change adds the dma_resv_usage enum and allows us to specify why a
dma_resv object is queried for its containing fences.
Additional to that a dma_resv_usage_rw() helper function is added to aid
retrieving the fences for a read or write userspace submission.
This is then deployed to the different query functions of the dma_resv
object and all of their users. When the write paratermer was previously
true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise.
v2: add KERNEL/OTHER in separate patch
v3: some kerneldoc suggestions by Daniel
v4: some more kerneldoc suggestions by Daniel, fix missing cases lost in
the rebase pointed out by Bas.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-2-christian.koenig@amd.com
Audit all the users of dma_resv_add_excl_fence() and make sure they
reserve a shared slot also when only trying to add an exclusive fence.
This is the next step towards handling the exclusive fence like a
shared one.
v2: fix missed case in amdgpu
v3: and two more radeon, rename function
v4: add one more case to TTM, fix i915 after rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220406075132.3263-2-christian.koenig@amd.com
Mixup in rebasing and patchwork re-runs made me push the wrong version of
the patch. Or I even forgot to send out the fixed version. Fix it up.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 49bd54b390 ("drm/i915: Track all user contexts per client")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405155345.3292769-1-tvrtko.ursulin@linux.intel.com
We soon want to start answering questions like how much GPU time is the
context belonging to a client which exited still using.
To enable this we start tracking all context belonging to a client on a
separate list.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401142205.3123159-5-tvrtko.ursulin@linux.intel.com
As contexts are abandoned we want to remember how much GPU time they used
(per class) so later we can used it for smarter purposes.
As GEM contexts are closed we want to have the DRM client remember how
much GPU time they used (per class) so later we can used it for smarter
purposes.
v2:
* Size past runtimes array by uabi class, not internal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401142205.3123159-4-tvrtko.ursulin@linux.intel.com
Make GEM contexts keep a reference to i915_drm_client for the whole of
of their lifetime which will come handy in following patches.
v2: Don't bother supporting selftests contexts from debugfs. (Chris)
v3 (Lucas): Finish constructing ctx before adding it to the list
v4 (Ram): Rebase.
v5: Trivial rebase for proto ctx changes.
v6: Rebase after clients no longer track name and pid.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v5
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v5
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401142205.3123159-3-tvrtko.ursulin@linux.intel.com
Add a parameter called "extra_pages" for ttm_tt_init, to indicate that
driver needs extra pages in ttm_tt.
v2:
Used imperative wording [Thomas and Christian]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401123751.27771-8-ramalingam.c@intel.com
Continuation of the effort to declutter i915_drv.h.
Also, component specific helpers which consult the iommu/virtualization
helpers moved to respective component source/header files as appropriate.
v2:
* s/dev_priv/i915/ in intel_scanout_needs_vtd_wa. (Lucas)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220329090204.2324499-1-tvrtko.ursulin@linux.intel.com
[tursulin: fixup conflict in i915_drv.h]
Instead of providing the bulk move structure for each LRU update set
this as property of the BO. This should avoid costly bulk move rebuilds
with some games under RADV.
v2: some name polishing, add a few more kerneldoc words.
v3: add some lockdep
v4: fix bugs, handle pin/unpin as well
v5: improve kerneldoc
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220321132601.2161-5-christian.koenig@amd.com
We only need this when allocating device local-memory, where this
influences the drm_buddy. Currently there is some funny behaviour where
an "in limbo" system memory object is lacking the relevant placement
flags etc. before we first allocate the ttm_tt, leading to ttm
performing a move when not needed, since the current placement is seen
as not compatible.
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 2ed38cec56 ("drm/i915: opportunistically apply ALLOC_CONTIGIOUS")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220324172143.377104-1-matthew.auld@intel.com
This way we finally fix the problem that new resource are
not immediately evict-able after allocation.
That has caused numerous problems including OOM on GDS handling
and not being able to use TTM as general resource manager.
v2: stop assuming in ttm_resource_fini that res->bo is still valid.
v3: cleanup kerneldoc, add more lockdep annotation
v4: consistently use res->num_pages
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220321132601.2161-1-christian.koenig@amd.com
Use drm_clflush_virt_range instead of clflushopt and remove the memory
barrier, since drm_clflush_virt_range takes care of that.
v2(Michael Cheng): Use sizeof(*addr) instead of sizeof(addr) to get the
actual size of the page. Thanks to Matt Roper for
pointing this out.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220321223819.72833-5-michael.cheng@intel.com
With the upcoming multitile support each tile will have its own
local memory. Mark the current LMEM with the suffix '0' to
emphasise that it belongs to the root tile.
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220318233938.149744-2-andi.shyti@linux.intel.com
Remove the uapi/drm/i915_drm.h include from drm/i915_drm.h, and stop
being a proxy for uapi/drm/i915_drm.h. Include uapi/drm/i915_drm.h and
drm/i915_drm.h only where needed.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220311100639.114685-2-jani.nikula@intel.com
Move i915_gem_object_needs_bit17_swizzle() to i915_gem_tiling.[ch] as a
i915_gem_object function related to tiling. Also un-inline while at it;
does not seem like this is a function needed in hot paths.
v2: i915_gem_tiling.[ch] instead of intel_ggtt_fencing.[ch] (Chris)
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220316095018.137998-1-jani.nikula@intel.com
For the ttm backend we can use existing placements fpfn and lpfn to
force the allocator to place the object at the requested offset,
potentially evicting stuff if the spot is currently occupied.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220315181425.576828-5-matthew.auld@intel.com
Add a generic interface for allocating an object at some specific
offset, and convert stolen over. Later we will want to hook this up to
different backends.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220315181425.576828-4-matthew.auld@intel.com
On client platforms with reduced LMEM BAR, we should be able to continue
with driver load with reduced io_size. Instead of using the BAR size to
determine the how large stolen should be, we should instead use the
ADDR_RANGE register to figure this out(at least on platforms like DG2).
For simplicity we don't attempt to support partially mappable stolen.
v2: rearrange the io_mapping_init_wc slightly, since the stolen setup
might result in reduced io_size.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220315181425.576828-2-matthew.auld@intel.com
It looks like this code was accidentally dropped at some point(in a
slightly different form), so add it back. The gist is that if we know
the allocation will be one single chunk, then we can just annotate the
BO with I915_BO_ALLOC_CONTIGUOUS, even if the user doesn't bother. In
the future this should allow us to avoid using vmap for such objects,
in some upcoming patches.
v2(Thomas):
- Tweak the commit message to mention the future motivation
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220202173154.3758970-1-matthew.auld@intel.com
vms are not getting properly closed. Rather than fixing that,
Remove the vm open count and instead rely on the vm refcount.
The vm open count existed solely to break the strong references the
vmas had on the vms. Now instead make those references weak and
ensure vmas are destroyed when the vm is destroyed.
Unfortunately if the vm destructor and the object destructor both
wants to destroy a vma, that may lead to a race in that the vm
destructor just unbinds the vma and leaves the actual vma destruction
to the object destructor. However in order for the object destructor
to ensure the vma is unbound it needs to grab the vm mutex. In order
to keep the vm mutex alive until the object destructor is done with
it, somewhat hackishly grab a vm_resv refcount that is released late
in the vma destruction process, when the vm mutex is no longer needed.
v2: Address review-comments from Niranjana
- Clarify that the struct i915_address_space::skip_pte_rewrite is a hack
and should ideally be replaced in an upcoming patch.
- Remove an unneeded continue in clear_vm_list and update comment.
v3:
- Documentation update
- Commit message formatting
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220304082641.308069-2-thomas.hellstrom@linux.intel.com
The current implementation of i915 prime mmap only works when initializing
drm_i915_gem_object with shmem_region.
When using LMEM, drm_i915_gem_object is initialized with ttm_system_region.
In order to make prime mmap work even this case, when using LMEM
(when using ttm in i915), dma_buf_ops.mmap callback function calls
drm_gem_prime_mmap(). drm_gem_prime_mmap() of drm core calls internally
i915_gem_mmap() so that prime mmap can perform normally.
The fake offset is processed inside drm_gem_prime_mmap().
Testcase: igt/prime_mmap
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225131316.1433515-3-gwan-gyeong.mun@intel.com
The dma_buf_ops.unmap_dma_buf callback used in i915,
i915_gem_unmap_dma_buf(), has the same code as drm_gem_unmap_dma_buf().
In order to eliminate defining and using duplicate function, it updates
the dma_buf_ops.unmap_dma_buf callback to use drm_gem_unmap_dma_buf().
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225131316.1433515-2-gwan-gyeong.mun@intel.com
Include linux/highmem.h and linux/swap.h explicitly where needed so we
can drop the linux/i2c.h include from i915_drv.h where it pulled in the
dependencies implicitly.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220303181931.1661767-5-jani.nikula@intel.com
Remove the local yesno() implementation and adopt the str_yes_no() from
linux/string_helpers.h.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225234631.3725943-1-lucas.demarchi@intel.com
A different emit breadcrumbs ring programming is required for compute /
render and we don't have UMD user so just reject parallel submission for
these engine classes.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-11-matthew.d.roper@intel.com
The compute engine handles the same commands the render engine can
(except 3D pipeline), so it makes sense that CCS is more similar to RCS
than non-render engines.
The CCS context state (lrc) is also similar to the render one, so reuse
it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE
register.
In order to avoid having multiple RCS && CCS checks, add the following
engine flag:
- I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.
BSpec: 46260
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-6-matthew.d.roper@intel.com
Exercise each of the migration scenarios, verifying that the final
placement and buffer contents match our expectations.
v2(Thomas): Replace for_i915_gem_ww() block with simpler object_lock()
v3:
- For testing purposes allow forcing the io_size such that we can
exercise the allocation + migration path on devices that don't have the
small BAR limit.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-4-matthew.auld@intel.com
If we have to contend with non-mappable LMEM, then we need to ensure the
object fits within the mappable portion, like in the selftests, where we
later try to CPU access the pages. However if it can't then we need to
gracefully handle this, without throwing an error.
Also it looks like TTM will return -ENOMEM, in ttm_bo_mem_space() after
exhausting all possible placements.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-3-matthew.auld@intel.com
The end goal is to have userspace tell the kernel what buffers will
require CPU access, however if we ever reach the CPU fault handler, and
the current resource is not mappable, then we should attempt to migrate
the buffer to the mappable portion of LMEM, or even system memory, if the
allowable placements permit it.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-2-matthew.auld@intel.com
If we need to make room for some mappable object, then we should
only victimize objects that have one or pages that occupy the visible
portion of LMEM. Let's also create a new priority hint for objects that
are placed in mappable memory, where we know that CPU access was
requested, that way we hopefully victimize these last.
v2(Thomas): s/TTM_PL_PRIV/I915_PL_LMEM0/
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-1-matthew.auld@intel.com
It's unclear what reference the initial vma kref reference refers to.
A vma can have multiple weak references, the object vma list,
the vm's bound list and the GT's closed_list, and the initial vma
reference can be put from lookups of all these lists.
With the current implementation this means
that any holder of yet another vma refcount (currently only
i915_gem_object_unbind()) needs to be holding two of either
*) An object refcount,
*) A vm open count
*) A vma open count
in order for us to not risk leaking a reference by having the
initial vma reference being put twice.
Address this by re-introducing i915_vma_destroy() which removes all
weak references of the vma and *then* puts the initial vma refcount.
This makes a strong vma reference hold on to the vma unconditionally.
Perhaps a better name would be i915_vma_revoke() or i915_vma_zombify(),
since other callers may still hold a refcount, but with the prospect of
being able to replace the vma refcount with the object lock in the near
future, let's stick with i915_vma_destroy().
Finally this commit fixes a race in that previously i915_vma_release() and
now i915_vma_destroy() could destroy a vma without taking the vm->mutex
after an advisory check that the vma mm_node was not allocated.
This would race with the ungrab_vma() function creating a trace similar
to the below one. This was fixed in one of the __i915_vma_put() callsites
in
commit bc1922e5d3 ("drm/i915: Fix a race between vma / object destruction and unbinding")
but although not seemingly triggered by CI, that
is not sufficient. This patch is needed to fix that properly.
[823.012188] Console: switching to colour dummy device 80x25
[823.012422] [IGT] gem_ppgtt: executing
[823.016667] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
[852.436465] stack segment: 0000 [#1] PREEMPT SMP NOPTI
[852.436480] CPU: 0 PID: 3200 Comm: gem_ppgtt Not tainted 5.16.0-CI-CI_DRM_11115+ #1
[852.436489] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021
[852.436499] RIP: 0010:ungrab_vma+0x9/0x80 [i915]
[852.436711] Code: ef e8 4b 85 cf e0 e8 36 a3 d6 e0 8b 83 f8 9c 00 00 85 c0 75 e1 5b 5d 41 5c 41 5d c3 e9 d6 fd 14 00 55 53 48 8b af c0 00 00 00 <8b> 45 00 85 c0 75 03 5b 5d c3 48 8b 85 a0 02 00 00 48 89 fb 48 8b
[852.436727] RSP: 0018:ffffc90006db7880 EFLAGS: 00010246
[852.436734] RAX: 0000000000000000 RBX: ffffc90006db7598 RCX: 0000000000000000
[852.436742] RDX: ffff88815349e898 RSI: ffff88815349e858 RDI: ffff88810a284140
[852.436748] RBP: 6b6b6b6b6b6b6b6b R08: ffff88815349e898 R09: ffff88815349e8e8
[852.436754] R10: 0000000000000001 R11: 0000000051ef1141 R12: ffff88810a284140
[852.436762] R13: 0000000000000000 R14: ffff88815349e868 R15: ffff88810a284458
[852.436770] FS: 00007f5c04b04e40(0000) GS:ffff88849f000000(0000) knlGS:0000000000000000
[852.436781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[852.436788] CR2: 00007f5c04b38fe0 CR3: 000000010a6e8001 CR4: 0000000000770ef0
[852.436797] PKRU: 55555554
[852.436801] Call Trace:
[852.436806] <TASK>
[852.436811] i915_gem_evict_for_node+0x33c/0x3c0 [i915]
[852.437014] i915_gem_gtt_reserve+0x106/0x130 [i915]
[852.437211] i915_vma_pin_ww+0x8f4/0xb60 [i915]
[852.437412] eb_validate_vmas+0x688/0x860 [i915]
[852.437596] i915_gem_do_execbuffer+0xc0e/0x25b0 [i915]
[852.437770] ? deactivate_slab+0x5f2/0x7d0
[852.437778] ? _raw_spin_unlock_irqrestore+0x50/0x60
[852.437789] ? i915_gem_execbuffer2_ioctl+0xc6/0x2c0 [i915]
[852.437944] ? init_object+0x49/0x80
[852.437950] ? __lock_acquire+0x5e6/0x2580
[852.437963] i915_gem_execbuffer2_ioctl+0x116/0x2c0 [i915]
[852.438129] ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
[852.438300] drm_ioctl_kernel+0xac/0x140
[852.438310] drm_ioctl+0x201/0x3d0
[852.438316] ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
[852.438490] __x64_sys_ioctl+0x6a/0xa0
[852.438498] do_syscall_64+0x37/0xb0
[852.438507] entry_SYSCALL_64_after_hwframe+0x44/0xae
[852.438515] RIP: 0033:0x7f5c0415b317
[852.438523] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[852.438542] RSP: 002b:00007ffd765039a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[852.438553] RAX: ffffffffffffffda RBX: 000055e4d7829dd0 RCX: 00007f5c0415b317
[852.438562] RDX: 00007ffd76503a00 RSI: 00000000c0406469 RDI: 0000000000000017
[852.438571] RBP: 00007ffd76503a00 R08: 0000000000000000 R09: 0000000000000081
[852.438579] R10: 00000000ffffff7f R11: 0000000000000246 R12: 00000000c0406469
[852.438587] R13: 0000000000000017 R14: 00007ffd76503a00 R15: 0000000000000000
[852.438598] </TASK>
[852.438602] Modules linked in: snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg drm_buddy coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ttm ghash_clmulni_intel snd_hwdep snd_hda_core e1000e drm_dp_helper ptp snd_pcm mei_me drm_kms_helper pps_core mei syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii
[852.440310] ---[ end trace e52cdd2fe4fd911c ]---
v2: Fix typos in the commit message.
Fixes: 7e00897be8 ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
Fixes: bc1922e5d3 ("drm/i915: Fix a race between vma / object destruction and unbinding")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220222133209.587978-1-thomas.hellstrom@linux.intel.com
If the user doesn't require CPU access for the buffer, then
ALLOC_GPU_ONLY should be used, in order to prioritise allocating in the
non-mappable portion of LMEM, on devices with small BAR.
v2(Thomas):
- The BO_ALLOC_TOPDOWN naming here is poor, since this is pure lies on
systems that don't even have small BAR. A better name is GPU_ONLY,
which is accurate regardless of the configuration.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-3-matthew.auld@intel.com
On devices with non-mappable LMEM ensure we always allocate the pages
within the mappable portion. For now we assume that all LMEM buffers
will require CPU access, which is also inline with pretty much all
current kernel internal users. In the next patch we will introduce a new
flag to override this behaviour.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-2-matthew.auld@intel.com
With small LMEM-BAR we need to be able to differentiate between the
total size of LMEM, and how much of it is CPU mappable. The end goal is
to be able to utilize the entire range, even if part of is it not CPU
accessible.
v2: also update intelfb_create
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-1-matthew.auld@intel.com
Add check for zero usable stolen memory before calling drm_mm_init
to support configurations where stolen memory exists but is fully
reserved.
Also skip memory test in cases that usable stolen is smaller than
page size(amount mapped and used to test memory).
v2:
- skiping test if available memory is smaller than page size (Lucas)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Steve Carbonari <steven.carbonari@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223194946.725328-1-jose.souza@intel.com
UAPI Changes:
- Weak parallel submission support for execlists
Minimal implementation of the parallel submission support for
execlists backend that was previously only implemented for GuC.
Support one sibling non-virtual engine.
Core Changes:
- Two backmerges of drm/drm-next for header file renames/changes and
i915_regs reorganization
Driver Changes:
- Add new DG2 subplatform: DG2-G12 (Matt R)
- Add new DG2 workarounds (Matt R, Ram, Bruce)
- Handle pre-programmed WOPCM registers for DG2+ (Daniele)
- Update guc shim control programming on XeHP SDV+ (Daniele)
- Add RPL-S C0/D0 stepping information (Anusha)
- Improve GuC ADS initialization to work on ARM64 on dGFX (Lucas)
- Fix KMD and GuC race on accessing PMU busyness (Umesh)
- Use PM timestamp instead of RING TIMESTAMP for reference in PMU with GuC (Umesh)
- Report error on invalid reset notification from GuC (John)
- Avoid WARN splat by holding RPM wakelock during PXP unbind (Juston)
- Fixes to parallel submission implementation (Matt B.)
- Improve GuC loading status check/error reports (John)
- Tweak TTM LRU priority hint selection (Matt A.)
- Align the plane_vma to min_page_size of stolen mem (Ram)
- Introduce vma resources and implement async unbinding (Thomas)
- Use struct vma_resource instead of struct vma_snapshot (Thomas)
- Return some TTM accel move errors instead of trying memcpy move (Thomas)
- Fix a race between vma / object destruction and unbinding (Thomas)
- Remove short-term pins from execbuf (Maarten)
- Update to GuC version 69.0.3 (John, Michal Wa.)
- Improvements to GT reset paths in GuC backend (Matt B.)
- Use shrinker_release_pages instead of writeback in shmem object hooks (Matt A., Tvrtko)
- Use trylock instead of blocking lock when freeing GEM objects (Maarten)
- Allocate intel_engine_coredump_alloc with ALLOW_FAIL (Matt B.)
- Fixes to object unmapping and purging (Matt A)
- Check for wedged device in GuC backend (John)
- Avoid lockdep splat by locking dpt_obj around set_cache_level (Maarten)
- Allow dead vm to unbind vma's without lock (Maarten)
- s/engine->i915/i915/ for DG2 engine workarounds (Matt R)
- Use to_gt() helper for GGTT accesses (Michal Wi.)
- Selftest improvements (Matt B., Thomas, Ram)
- Coding style and compiler warning fixes (Matt B., Jasmine, Andi, Colin, Gustavo, Dan)
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Yg4i2aCZvvee5Eai@jlahtine-mobl.ger.corp.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Fixed conflicts while applying, using the fixups/drm-intel-gt-next.patch
from drm-rerere's 1f2b1742abdd ("2022y-02m-23d-16h-07m-57s UTC: drm-tip
rerere cache update")]
discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.
v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-8-ramalingam.c@intel.com
For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.
We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.
For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.
v3:
* use needs_compact_pt flag to discriminate between
64K and 64K with compact-pt
* add i915_vm_obj_min_alignment
* use i915_vm_obj_min_alignment to round up vma reservation
if compact-pt instead of hard coding
v5:
* fix i915_vm_obj_min_alignment for internal objects which
have no memory region
v6:
* tiled_blits_create correctly pick largest required alignment
v8:
* i915_vm_min_alignment protect against array overflow for mock region
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-7-ramalingam.c@intel.com
Registers that exist within the MCH BAR and are mirrored into the GPU's
MMIO space are a good candidate to separate out into their own header.
For reference, the mirror of the MCH BAR starts at the following
locations in the graphics MMIO space (the end of the MCHBAR range
differs slightly on each platform):
* Pre-gen6: 0x10000
* Gen6-Gen11 + RKL: 0x140000
v2:
- Create separate patch to swtich a few register definitions to be
relative to the MCHBAR mirror base.
- Drop upper bound of MCHBAR mirror from commit message; there are too
many different combinations between various platforms to list out,
and the documentation is spotty for the older pre-gen6 platforms
anyway.
Bspec: 134, 51771
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220215061342.2055952-2-matthew.d.roper@intel.com
Include drm_fourcc.h, drm_plane.h, and drm_color_mgmt.h where needed, so
we can drop the includes for drm_atomic.h and drm_fourcc.h from
i915_drv.h, reducing the build dependencies.
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b03711b2286396b2e9d5822f6adef4e7a6dc0f7b.1644507885.git.jani.nikula@intel.com
For some reason we are selecting PRIO_HAS_PAGES when we don't have
mm.pages, and vice versa.
v2(Thomas):
- Add missing fixes tag
Fixes: 213d509277 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220209111652.468762-1-matthew.auld@intel.com
Rename struct dma_buf_map to struct iosys_map and corresponding APIs.
Over time dma-buf-map grew up to more functionality than the one used by
dma-buf: in fact it's just a shim layer to abstract system memory, that
can be accessed via regular load and store, from IO memory that needs to
be acessed via arch helpers.
The idea is to extend this API so it can fulfill other needs, internal
to a single driver. Example: in the i915 driver it's desired to share
the implementation for integrated graphics, which uses mostly system
memory, with discrete graphics, which may need to access IO memory.
The conversion was mostly done with the following semantic patch:
@r1@
@@
- struct dma_buf_map
+ struct iosys_map
@r2@
@@
(
- DMA_BUF_MAP_INIT_VADDR
+ IOSYS_MAP_INIT_VADDR
|
- dma_buf_map_set_vaddr
+ iosys_map_set_vaddr
|
- dma_buf_map_set_vaddr_iomem
+ iosys_map_set_vaddr_iomem
|
- dma_buf_map_is_equal
+ iosys_map_is_equal
|
- dma_buf_map_is_null
+ iosys_map_is_null
|
- dma_buf_map_is_set
+ iosys_map_is_set
|
- dma_buf_map_clear
+ iosys_map_clear
|
- dma_buf_map_memcpy_to
+ iosys_map_memcpy_to
|
- dma_buf_map_incr
+ iosys_map_incr
)
@@
@@
- #include <linux/dma-buf-map.h>
+ #include <linux/iosys-map.h>
Then some files had their includes adjusted and some comments were
update to remove mentions to dma-buf-map.
Since this is not specific to dma-buf anymore, move the documentation to
the "Bus-Independent Device Accesses" section.
v2:
- Squash patches
v3:
- Fix wrong removal of dma-buf.h from MAINTAINERS
- Move documentation from dma-buf.rst to device-io.rst
v4:
- Change documentation title and level
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220204170541.829227-1-lucas.demarchi@intel.com
Backmerge to bring in 5.17-rc2 to introduce a common baseline
to merge i915_regs changes from drm-intel-next.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Several of our i915 header files, have been including i915_reg.h. This
means that any change to i915_reg.h will trigger a full rebuild of
pretty much every file of the driver, even those that don't have any
kind of register access. Let's delete the i915_reg.h include from all
headers and add an explicit include from the .c files that truly
need the register definitions; those that need a definition of
i915_reg_t for a function definition can get it from i915_reg_defs.h
instead.
We also remove two non-register #define's (VLV_DISPLAY_BASE and
GEN12_SFC_DONE_MAX) into i915_reg_defs.h to allow us to drop the
i915_reg.h include from a couple of headers.
There's probably a lot more header dependency optimization possible, but
the changes here roughly cut the number of files compiled after 'touch
i915_reg.h' in half --- a good first step.
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-7-matthew.d.roper@intel.com
This is a huge, chaotic mass of registers copied over as-is without any
real cleanup. We'll come back and organize these better, align on
consistent coding style, remove dead code, etc. in separate patches
later that will be easier to review.
v2:
- Add missing include in intel_pxp_irq.c
v3:
- Correct a few indentation errors (Lucas)
- Minor conflict resolution
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-6-matthew.d.roper@intel.com
At the moment we only use R_PWR_CLK_STATE in the context of the RCS
engine, but upcoming support for compute engines will start using
instances relative to the CCS engine base offsets. Let's parameterize
the register and move it to the engine reg header.
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-4-matthew.d.roper@intel.com
The i915_ttm_accel_move() function may return error codes that should
be propagated further up the stack rather than consumed assuming that
the accel move failed and could be replaced with a memcpy move.
For -EINTR, -ERESTARTSYS and -EAGAIN, just propagate those codes, rather
than retrying with a memcpy move.
Fixes: 2b0a750caf ("drm/i915/ttm: Failsafe migration blits")
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220201070340.16457-1-thomas.hellstrom@linux.intel.com
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next
for a possible topic branch for merging the split of i915_regs...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The vma destruction code was using an unlocked advisory check for
drm_mm_node_allocated() to avoid racing with eviction code unbinding
the vma.
This is very fragile and prohibits the dereference of non-refcounted
pointers of dying vmas after a call to __i915_vma_unbind(). It also
prohibits the dereference of vma->obj of refcounted pointers of
dying vmas after a call to __i915_vma_unbind(), since even if a
refcount is held on the vma, that won't guarantee that its backing
object doesn't get destroyed.
So introduce an unbind under the vm mutex at object destroy time,
removing all weak references of the vma and its object from the
object vma list and from the vm bound list.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127115622.302970-1-thomas.hellstrom@linux.intel.com
We need to flush TLBs before releasing backing store otherwise userspace
is able to encounter stale entries if a) it is not declaring access to
certain buffers and b) it races with the backing store release from a
such undeclared execution already executing on the GPU in parallel.
The approach taken is to mark any buffer objects which were ever bound
to the GPU and to trigger a serialized TLB flush when their backing
store is released.
Alternatively the flushing could be done on VMA unbind, at which point
we would be able to ascertain whether there is potential a parallel GPU
execution (which could race), but essentially it boils down to paying
the cost of TLB flushes potentially needlessly at VMA unbind time (when
the backing store is not known to be going away so not needed for
safety), versus potentially needlessly at backing store relase time
(since we at that point cannot tell whether there is anything executing
on the GPU which uses that object).
Thereforce simplicity of implementation has been chosen for now with
scope to benchmark and refine later as required.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't use the interruptable version of the timeline mutex lock in the
error path of eb_pin_timeline as the cleanup must always happen.
v2:
(John Harrison)
- Don't check for interrupt during mutex lock
v3:
(Tvrtko)
- A comment explaining why lock helper isn't used
Fixes: 544460c338 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220111163929.14017-1-matthew.brost@intel.com
Add a flag PIN_VALIDATE, to indicate we don't need to pin and only
protected by the object lock.
This removes the need to unpin, which is done by just releasing the
lock.
eb_reserve is slightly reworked for readability, but the same steps
are still done:
- First pass pins with NONBLOCK.
- Second pass unbinds all objects first, then pins.
- Third pass is only called when not all objects are softpinned, and
unbinds all objects, then calls i915_gem_evict_vm(), then pins.
Changes since v1:
- Split out eb_reserve() into separate functions for readability.
Changes since v2:
- Make batch buffer mappable on platforms where only GGTT is available,
to prevent moving the batch buffer during relocations.
Changes since v3:
- Preserve current behavior for batch buffer, instead be cautious when
calling i915_gem_object_ggtt_pin_ww, and re-use the current batch vma
if it's inside ggtt and map-and-fenceable.
- Remove impossible condition check from eb_reserve. (Matt)
Changes since v5:
- Do not even temporarily pin, just call i915_gem_evict_vm() and mark
all vma's as unpinned.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220114132320.109030-7-maarten.lankhorst@linux.intel.com
We want to remove more members of i915_vma, which requires the locking to
be held more often.
Start requiring gem object lock for i915_vma_unbind, as it's one of the
callers that may unpin pages.
Some special care is needed when evicting, because the last reference to
the object may be held by the VMA, so after __i915_vma_unbind, vma may be
garbage, and we need to cache vma->obj before unlocking.
Changes since v1:
- Make trylock failing a WARN. (Matt)
- Remove double i915_vma_wait_for_bind() (Matt)
- Move atomic_set to right before mutex_unlock(), to make it more clear
they belong together. (Matt)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220114132320.109030-5-maarten.lankhorst@linux.intel.com
i915_gem_evict_vm will need to be able to evict objects that are
locked by the current ctx. By testing if the current context already
locked the object, we can do this correctly. This allows us to
evict the entire vm even if we already hold some objects' locks.
Previously, this was spread over several commits, but it makes
more sense to commit the changes to i915_gem_evict_vm separately
from the changes to i915_gem_evict_something() and
i915_gem_evict_for_node().
Changes since v1:
- Handle evicting dead objects better.
Changes since v2:
- Use for_i915_gem_ww in igt_evict_vm. (Thomas)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
[mlankhorst: Fix up doc warning.]
Link: https://patchwork.freedesktop.org/patch/msgid/20220117075604.131477-1-maarten.lankhorst@linux.intel.com
Now that we cannot unbind kill the currently locked object directly
because we're removing short term pinning, we may have to unbind the
object from gtt manually, using a i915_gem_evict_vm() call.
Changes since v1:
- Remove -ENOSPC warning, can still happen with concurrent mmaps
where we can't unbind the other mmap because of the lock held.
This fixes the gem_mmap_gtt@cpuset tests.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220114132320.109030-2-maarten.lankhorst@linux.intel.com
Maarten needs backmerge to account for header file renames/changes which
landed via drm-intel-next and are interfering with his pinning work.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
drivers fixes:
- i915 fixes for ttm backend + one pm wakelock fix
- amdgpu fixes, fairly big pile of small things all over. Note this
doesn't yet containe the fixed version of the otg sync patch that
blew up
- small driver fixes: meson, sun4i, vga16fb probe fix
drm core fixes:
- cma-buf heap locking
- ttm compilation
- self refresh helper state check
- wrong error message in atomic helpers
- mipi-dbi buffer mapping
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmHhrxUACgkQTA9ye/CY
qnEowxAAgBPwGEobRGMbR3Me98vEKvcWqSxBe/k1VC4LhO5DrvbG5iW9cuxCJZM2
wlGlGAtU7C7pcCP5Xp1UlMqZ5a0rSVhqMPPkMKO9+7033ofSlAQatnMI1EENH6Hn
BkhXwTyuOBSN6zqskg8FKqzF+VPTt5ZV2U5qJzQweP/wFtZPAKI4tWE4oKiHactH
fJHnAi7T6ytF6a7J21BsSEluk4z7BjmcmFF0tW6iuq7Y6TXDFXFq9QFDR041b2rI
GYDUXl2mebp/L+2M3sPYuMiIiyJ8enh7crNIdmi+EstmzRADa7RMjnY3j2tJg/7M
pqnuJZAVcpkCurb7NMr3ycmrxnhfUsZfbuXvm+k5yJYfQCaGNiKy1ObsFWH9zBDz
XMuxcE+csSaX/7rjoyXrL2ZTRPXnVwJNJ8x1CuKn3giLxMSqnPnDMjyHmNLB8qa1
R0wbPQbdx5+jWgs/ngUGFNo4vFBnNmqQP4G3LaWJ/Ku5cSrEM+Jt9GJOw5jjVgun
gaOlKlUpMBlKmXOwkvhRW2AwHRcL7lrBuIw0oFOThdMzkSNlKSNBMpAkgvD/9C7g
IDtJgA7a3MqzLjhPLmhUB3rFyXz5dg5rNZhoH8z4DFiJVpTKYYA5/UoU/RTXZGqW
yFdrBkFmNJeKTZZAe+e/4OP/dm1vKVEKK6Ko/M9CrELzBKSussg=
=h74X
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-01-14' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"drivers fixes:
- i915 fixes for ttm backend + one pm wakelock fix
- amdgpu fixes, fairly big pile of small things all over. Note this
doesn't yet containe the fixed version of the otg sync patch that
blew up
- small driver fixes: meson, sun4i, vga16fb probe fix
drm core fixes:
- cma-buf heap locking
- ttm compilation
- self refresh helper state check
- wrong error message in atomic helpers
- mipi-dbi buffer mapping"
* tag 'drm-next-2022-01-14' of git://anongit.freedesktop.org/drm/drm: (49 commits)
drm/mipi-dbi: Fix source-buffer address in mipi_dbi_buf_copy
drm: fix error found in some cases after the patch d1af5cd86997
drm/ttm: fix compilation on ARCH=um
dma-buf: cma_heap: Fix mutex locking section
video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
drm/amdkfd: Fix ASIC name typos
drm/amdkfd: Fix DQM asserts on Hawaii
drm/amdgpu: Use correct VIEWPORT_DIMENSION for DCN2
drm/amd/pm: only send GmiPwrDnControl msg on master die (v3)
drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt
drm/amdgpu: not return error on the init_apu_flags
drm/amdkfd: Use prange->update_list head for remove_list
drm/amdkfd: Use prange->list head for insert_list
drm/amdkfd: make SPDX License expression more sound
drm/amdkfd: Check for null pointer after calling kmemdup
drm/amd/display: invalid parameter check in dmub_hpd_callback
Revert "drm/amdgpu: Don't inherit GEM object VMAs in child process"
drm/amd/display: reset dcn31 SMU mailbox on failures
drm/amdkfd: use default_groups in kobj_type
drm/amdgpu: use default_groups in kobj_type
...
There is always a struct vma_resource guaranteed to be alive when we
access a corresponding struct vma_snapshot.
So ditch the latter and instead of allocating vma_snapshots, reference
the already existning vma_resource.
This requires a couple of extra members in struct vma_resource but that's
a small price to pay for the simplification.
v2:
- Fix a missing include and declaration (kernel test robot <lkp@intel.com>)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-7-thomas.hellstrom@linux.intel.com
Add a selftest to exercise asynchronous migration and -unbining.
Extend the gem_migrate selftest to perform the migrations while
depending on a spinner and a bound vma set up on the migrated
buffer object.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-6-thomas.hellstrom@linux.intel.com
Implement async (non-blocking) unbinding by not syncing the vma before
calling unbind on the vma_resource.
Add the resulting unbind fence to the object's dma_resv from where it is
picked up by the ttm migration code.
Ideally these unbind fences should be coalesced with the migration blit
fence to avoid stalling the migration blit waiting for unbind, as they
can certainly go on in parallel, but since we don't yet have a
reasonable data structure to use to coalesce fences and attach the
resulting fence to a timeline, we defer that for now.
Note that with async unbinding, even while the unbind waits for the
preceding bind to complete before unbinding, the vma itself might have been
destroyed in the process, clearing the vma pages. Therefore we can
only allow async unbinding if we have a refcounted sg-list and keep a
refcount on that for the vma resource pages to stay intact until
binding occurs. If this condition is not met, a request for an async
unbind is diverted to a sync unbind.
v2:
- Use a separate kmem_cache for vma resources for now to isolate their
memory allocation and aid debugging.
- Move the check for vm closed to the actual unbinding thread. Regardless
of whether the vm is closed, we need the unbind fence to properly wait
for capture.
- Clear vma_res::vm on unbind and update its documentation.
v4:
- Take cache coloring into account when searching for vma resources
pending unbind. (Matthew Auld)
v5:
- Fix timeout and error check in i915_vma_resource_bind_dep_await().
- Avoid taking a reference on the object for async binding if
async unbind capable.
- Fix braces around a single-line if statement.
v6:
- Fix up the cache coloring adjustment. (Kernel test robot <lkp@intel.com>)
- Don't allow async unbinding if the vma_res pages are not the same as
the object pages. (Matthew Auld)
v7:
- s/unsigned long/u64/ in a number of places (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-5-thomas.hellstrom@linux.intel.com
When introducing asynchronous unbinding, the vma itself may no longer
be alive when the actual binding or unbinding takes place.
Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource
instead of a struct i915_vma for the bind_vma() and unbind_vma() ops.
Similarly change the insert_entries() op for struct i915_address_space.
Replace a couple of i915_vma_snapshot members with their newly introduced
i915_vma_resource counterparts, since they have the same lifetime.
Also make sure to avoid changing the struct i915_vma_flags (in particular
the bind flags) async. That should now only be done sync under the
vm mutex.
v2:
- Update the vma_res::bound_flags when binding to the aliased ggtt
v6:
- Remove I915_VMA_ALLOC_BIT (Matthew Auld)
- Change some members of struct i915_vma_resource from unsigned long to u64
(Matthew Auld)
v7:
- Fix vma resource size parameters to be u64 rather than unsigned long
(Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-3-thomas.hellstrom@linux.intel.com
Introduce vma resources, sort of similar to TTM resources, needed for
asynchronous bind management. Initially we will use them to hold
completion of unbinding when we capture data from a vma, but they will
be used extensively in upcoming patches for asynchronous vma unbinding.
v6:
- Some documentation updates
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-2-thomas.hellstrom@linux.intel.com
core:
- add privacy screen support
- move nomodeset option into drm subsystem
- clean up nomodeset handling in drivers
- make drm_irq.c legacy
- fix stack_depot name conflicts
- remove DMA_BUF_SET_NAME ioctl restrictions
- sysfs: send hotplug event
- replace several DRM_* logging macros with drm_*
- move hashtable to legacy code
- add error return from gem_create_object
- cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
- kernel.h related include cleanups
- support XRGB2101010 source buffers
ttm:
- don't include drm hashtable
- stop pruning fences after wait
- documentation updates
dma-buf:
- add dma_resv selftest
- add debugfs helpers
- remove dma_resv_get_excl_unlocked
- documentation
- make fences mandatory in dma_resv_add_excl_fence
dp:
- add link training delay helpers
gem:
- link shmem/cma helpers into separate modules
- use dma_resv iteratior
- import dma-buf namespace into gem helper modules
scheduler:
- fence grab fix
- lockdep fixes
bridge:
- switch to managed MIPI DSI helpers
- register and attach during probe fixes
- convert to YAML in several places.
panel:
- add bunch of new panesl
simpledrm:
- support FB_DAMAGE_CLIPS
- support virtual screen sizes
- add Apple M1 support
amdgpu:
- enable seamless boot for DCN 3.01
- runtime PM fixes
- use drm_kms_helper_connector_hotplug_event
- get all fences at once
- use generic drm fb helpers
- PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
- add smart trace buffer (STB) for supported GPUs
- display debugfs entries
- new SMU debug option
- Documentation update
amdkfd:
- IP discovery enumeration refactor
- interface between driver fixes
- SVM fixes
- kfd uapi header to define some sysfs bitfields.
i915:
- support VESA panel backlights
- enable ADL-P by default
- add eDP privacy screen support
- add Raptor Lake S (RPL-S) support
- DG2 page table support
- lots of GuC/HuC fw refactoring
- refactored i915->gt interfaces
- CD clock squashing support
- enable 10-bit gamma support
- update ADL-P DMC fw to v2.14
- enable runtime PM autosuspend by default
- ADL-P DSI support
- per-lane DP drive settings for ICL+
- add support for pipe C/D DMC firmware
- Atomic gamma LUT updates
- remove CCS FB stride restrictions on ADL-P
- VRR platform support for display 11
- add support for display audio codec keepalive
- lots of display refactoring
- fix runtime PM handling during PXP suspend
- improved eviction performance with async TTM moves
- async VMA unbinding improvements
- VMA locking refactoring
- improved error capture robustness
- use per device iommu checks
- drop bits stealing from i915_sw_fence function ptr
- remove dma_resv_prune
- add IC cache invalidation on DG2
nouveau:
- crc fixes
- validate LUTs in atomic check
- set HDMI AVI RGB quant to full
tegra:
- buffer objects reworks for dma-buf compat
- NVDEC driver uAPI support
- power management improvements
etnaviv:
- IOMMU enabled system support
- fix > 4GB command buffer mapping
- close a DoS vector
- fix spurious GPU resets
ast:
- fix i2c initialization
rcar-du:
- DSI output support
exynos:
- replace legacy gpio interface
- implement generic GEM object mmap
msm:
- dpu plane state cleanup in prep for multirect
- dpu debugfs cleanups
- dp support for sc7280
- a506 support
- removal of struct_mutex
- remove old eDP sub-driver
anx7625:
- support MIPI DSI input
- support HDMI audio
- fix reading EDID
lvds:
- fix bridge DT bindings
megachips:
- probe both bridges before registering
dw-hdmi:
- allow interlace on bridge
ps8640:
- enable runtime PM
- support aux-bus
tx358768:
- enable reference clock
- add pulse mode support
ti-sn65dsi86:
- use regmap bulk write
- add PWM support
etnaviv:
- get all fences at once
gma500:
- gem object cleanups
kmb:
- enable fb console
radeon:
- use dma_resv_wait_timeout
rockchip:
- add DSP hold timeout
- suspend/resume fixes
- PLL clock fixes
- implement mmap in GEM object functions
- use generic fbdev emulation
sun4i:
- use CMA helpers without vmap support
vc4:
- fix HDMI-CEC hang with display is off
- power on HDMI controller while disabling
- support 4K@60Hz modes
- support 10-bit YUV 4:2:0 output
vmwgfx:
- fix leak on probe errors
- fail probing on broken hosts
- new placement for MOB page tables
- hide internal BOs from userspace
- implement GEM support
- implement GL 4.3 support
virtio:
- overflow fixes
xen:
- implement mmap as GEM object function
omapdrm:
- fix scatterlist export
- support virtual planes
mediatek:
- MT8192 support
- CMDQ refinement
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmHX1vMACgkQDHTzWXnE
hr6rAw/9ES5RO5N3Ku9foFk1CI9bqy1Kh663KLkkEc+rDdhKpiZbBnAsrKkZ9sGu
fNuHmWNN5nWXtDSOqHWuslt3F7Gh+qEBQtlkqC9mZsBm3bWB0aJK6E4QaJxfeSaK
ta6AmyGx8DaV+C69i86dnemQurYSDVjROd7LDPKnCU0Fye/JxiXSXQmXksKMFVxd
x5vmO9yfeDSg3EF+u1yB6nJNUYZBV0vhrAfjPqxPCRBXuQc7akuaglE/SFwlGnEk
vn0GjVHEQcRTqYKrHr64xvQxIoKXcJP0pkDUyT7KYCsyj8GJkvxkb7/ls5pp5DvL
SwyNg3J3vwUVP6w6GEvzf3ffG720qqUZvCbvLmE+A/t2DhGILiAm+HXSo43PTOW8
uagT7Gxma8dy8EovjSxioS9HPX8Gcu+S+XYavgOsevOZ7oeEt4f4TLW7LXsw9d6y
75FrMhiUpreab5hAh8Le0swuLYZHjdnJRdjSTqZJ/T6VdTdVftLT6IfwvSDx5CHy
cWuufgcAjd7xVTXFquHWYXWLTQkiSMGf1M02jx9IWolTd4Cm41LNBhqMEDHZLHJD
7ngGgoaREVDQ+MqjG90yfIwJFIpJPI3YOaHLi/Kznga+iDzFY6cyOQWW2vX7ZdY5
7+LJWsgGT8Feb7/bzD5hX1mYqJLxh1pWUIaqIKMl+7LJL7gTVU8=
=MByd
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights are support for privacy screens found in new laptops, a
bunch of nomodeset refactoring, and i915 enables ADL-P systems by
default, while starting to add RPL-S support.
vmwgfx adds GEM and support for OpenGL 4.3 features in userspace.
Lots of internal refactorings around dma reservations, and lots of
driver refactoring as well.
Summary:
core:
- add privacy screen support
- move nomodeset option into drm subsystem
- clean up nomodeset handling in drivers
- make drm_irq.c legacy
- fix stack_depot name conflicts
- remove DMA_BUF_SET_NAME ioctl restrictions
- sysfs: send hotplug event
- replace several DRM_* logging macros with drm_*
- move hashtable to legacy code
- add error return from gem_create_object
- cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
- kernel.h related include cleanups
- support XRGB2101010 source buffers
ttm:
- don't include drm hashtable
- stop pruning fences after wait
- documentation updates
dma-buf:
- add dma_resv selftest
- add debugfs helpers
- remove dma_resv_get_excl_unlocked
- documentation
- make fences mandatory in dma_resv_add_excl_fence
dp:
- add link training delay helpers
gem:
- link shmem/cma helpers into separate modules
- use dma_resv iteratior
- import dma-buf namespace into gem helper modules
scheduler:
- fence grab fix
- lockdep fixes
bridge:
- switch to managed MIPI DSI helpers
- register and attach during probe fixes
- convert to YAML in several places.
panel:
- add bunch of new panesl
simpledrm:
- support FB_DAMAGE_CLIPS
- support virtual screen sizes
- add Apple M1 support
amdgpu:
- enable seamless boot for DCN 3.01
- runtime PM fixes
- use drm_kms_helper_connector_hotplug_event
- get all fences at once
- use generic drm fb helpers
- PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
- add smart trace buffer (STB) for supported GPUs
- display debugfs entries
- new SMU debug option
- Documentation update
amdkfd:
- IP discovery enumeration refactor
- interface between driver fixes
- SVM fixes
- kfd uapi header to define some sysfs bitfields.
i915:
- support VESA panel backlights
- enable ADL-P by default
- add eDP privacy screen support
- add Raptor Lake S (RPL-S) support
- DG2 page table support
- lots of GuC/HuC fw refactoring
- refactored i915->gt interfaces
- CD clock squashing support
- enable 10-bit gamma support
- update ADL-P DMC fw to v2.14
- enable runtime PM autosuspend by default
- ADL-P DSI support
- per-lane DP drive settings for ICL+
- add support for pipe C/D DMC firmware
- Atomic gamma LUT updates
- remove CCS FB stride restrictions on ADL-P
- VRR platform support for display 11
- add support for display audio codec keepalive
- lots of display refactoring
- fix runtime PM handling during PXP suspend
- improved eviction performance with async TTM moves
- async VMA unbinding improvements
- VMA locking refactoring
- improved error capture robustness
- use per device iommu checks
- drop bits stealing from i915_sw_fence function ptr
- remove dma_resv_prune
- add IC cache invalidation on DG2
nouveau:
- crc fixes
- validate LUTs in atomic check
- set HDMI AVI RGB quant to full
tegra:
- buffer objects reworks for dma-buf compat
- NVDEC driver uAPI support
- power management improvements
etnaviv:
- IOMMU enabled system support
- fix > 4GB command buffer mapping
- close a DoS vector
- fix spurious GPU resets
ast:
- fix i2c initialization
rcar-du:
- DSI output support
exynos:
- replace legacy gpio interface
- implement generic GEM object mmap
msm:
- dpu plane state cleanup in prep for multirect
- dpu debugfs cleanups
- dp support for sc7280
- a506 support
- removal of struct_mutex
- remove old eDP sub-driver
anx7625:
- support MIPI DSI input
- support HDMI audio
- fix reading EDID
lvds:
- fix bridge DT bindings
megachips:
- probe both bridges before registering
dw-hdmi:
- allow interlace on bridge
ps8640:
- enable runtime PM
- support aux-bus
tx358768:
- enable reference clock
- add pulse mode support
ti-sn65dsi86:
- use regmap bulk write
- add PWM support
etnaviv:
- get all fences at once
gma500:
- gem object cleanups
kmb:
- enable fb console
radeon:
- use dma_resv_wait_timeout
rockchip:
- add DSP hold timeout
- suspend/resume fixes
- PLL clock fixes
- implement mmap in GEM object functions
- use generic fbdev emulation
sun4i:
- use CMA helpers without vmap support
vc4:
- fix HDMI-CEC hang with display is off
- power on HDMI controller while disabling
- support 4K@60Hz modes
- support 10-bit YUV 4:2:0 output
vmwgfx:
- fix leak on probe errors
- fail probing on broken hosts
- new placement for MOB page tables
- hide internal BOs from userspace
- implement GEM support
- implement GL 4.3 support
virtio:
- overflow fixes
xen:
- implement mmap as GEM object function
omapdrm:
- fix scatterlist export
- support virtual planes
mediatek:
- MT8192 support
- CMDQ refinement"
* tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits)
drm/amdgpu: no DC support for headless chips
drm/amd/display: fix dereference before NULL check
drm/amdgpu: always reset the asic in suspend (v2)
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
drm/amd/display: Fix the uninitialized variable in enable_stream_features()
drm/amdgpu: fix runpm documentation
amdgpu/pm: Make sysfs pm attributes as read-only for VFs
drm/amdgpu: save error count in RAS poison handler
drm/amdgpu: drop redundant semicolon
drm/amd/display: get and restore link res map
drm/amd/display: support dynamic HPO DP link encoder allocation
drm/amd/display: access hpo dp link encoder only through link resource
drm/amd/display: populate link res in both detection and validation
drm/amd/display: define link res and make it accessible to all link interfaces
drm/amd/display: 3.2.167
drm/amd/display: [FW Promotion] Release 0.0.98
drm/amd/display: Undo ODM combine
drm/amd/display: Add reg defs for DCN303
drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
drm/amd/display: Set optimize_pwr_state for DCN31
...
Purging can happen during swapping out, or directly invoked with the
madvise ioctl. In such cases this doesn't involve a ttm move, which
skips umapping the object.
v2(Thomas):
- add ttm_truncate helper, and just call into i915_ttm_move_notify() to
handle the unmapping step
Fixes: 213d509277 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-4-matthew.auld@intel.com
(cherry picked from commit ab4911b7d4)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Ensure we call ttm_bo_unmap_virtual when releasing the pages.
Importantly this should now handle the ttm swapping case, and all other
places that already call into i915_ttm_move_notify().
v2: fix up the selftest
Fixes: cf3e3e86d7 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-3-matthew.auld@intel.com
(cherry picked from commit 903e038727)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Don't attempt to fault and re-populate purged objects. By some fluke
this passes the dontneed-after-mmap IGT, but for the wrong reasons.
Fixes: cf3e3e86d7 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-2-matthew.auld@intel.com
(cherry picked from commit f3cb4a2de5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Purging can happen during swapping out, or directly invoked with the
madvise ioctl. In such cases this doesn't involve a ttm move, which
skips umapping the object.
v2(Thomas):
- add ttm_truncate helper, and just call into i915_ttm_move_notify() to
handle the unmapping step
Fixes: 213d509277 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-4-matthew.auld@intel.com
Ensure we call ttm_bo_unmap_virtual when releasing the pages.
Importantly this should now handle the ttm swapping case, and all other
places that already call into i915_ttm_move_notify().
v2: fix up the selftest
Fixes: cf3e3e86d7 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-3-matthew.auld@intel.com
Don't attempt to fault and re-populate purged objects. By some fluke
this passes the dontneed-after-mmap IGT, but for the wrong reasons.
Fixes: cf3e3e86d7 ("drm/i915: Use ttm mmap handling for ttm bo's.")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-2-matthew.auld@intel.com
Add some proper flags for the different modes, and shorten the name to
something more snappy.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211215110746.865-2-matthew.auld@intel.com
Ditch the writeback hook and drop i915_gem_object_writeback(). We
already support the shrinker_release_pages hook which can just call
shmem_writeback directly.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211215110746.865-1-matthew.auld@intel.com
GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211219212500.61432-4-andi.shyti@linux.intel.com
Using local pointer ttm as argument in __i915_ttm_move instead of bo->ttm,
as local pointer was previously assigned to bo->ttm in function.
This will make code a bit more readable.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Jasmine Newsome <jasmine.newsome@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220104203642.231878-1-jasmine.newsome@intel.com
Increment composite fence seqno on each fence creation.
Fixes: 544460c338 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214195913.35735-1-matthew.brost@intel.com
(cherry picked from commit 62eeb9ae13)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
'prev_engine' was declared inside the output loop and checked in the
inner after at least 1 pass of either loop. The variable should be
declared outside both loops as it needs to be persistent across the
entire loop structure.
Fixes: e5e32171a2 ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211219001909.24348-1-matthew.brost@intel.com
(cherry picked from commit cbffbac9c1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
'prev_engine' was declared inside the output loop and checked in the
inner after at least 1 pass of either loop. The variable should be
declared outside both loops as it needs to be persistent across the
entire loop structure.
Fixes: e5e32171a2 ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211219001909.24348-1-matthew.brost@intel.com
Driver Changes:
- Added bits of DG2 support around page table handling (Stuart Summers, Matthew Auld)
- Fixed wakeref leak in PMU busyness during reset in GuC mode (Umesh Nerlige Ramappa)
- Fixed debugfs access crash if GuC failed to load (John Harrison)
- Bring back GuC error log to error capture, undoing accidental earlier breakage (Thomas Hellström)
- Fixed memory leak in error capture caused by earlier refactoring (Thomas Hellström)
- Exclude reserved stolen from driver use (Chris Wilson)
- Add memory region sanity checking and optional full test (Chris Wilson)
- Fixed buffer size truncation in TTM shmemfs backend (Robert Beckett)
- Use correct lock and don't overwrite internal data structures when stealing GuC context ids (Matthew Brost)
- Don't hog IRQs when destroying GuC contexts (John Harrison)
- Make GuC to Host communication more robust (Matthew Brost)
- Continuation of locking refactoring around VMA and backing store handling (Maarten Lankhorst)
- Improve performance of reading GuC log from debugfs (John Harrison)
- Log when GuC fails to reset an engine (John Harrison)
- Speed up GuC/HuC firmware loading by requesting RP0 (Vinay Belgaumkar)
- Further work on asynchronous VMA unbinding (Thomas Hellström, Christian König)
- Refactor GuC/HuC firmware handling to prepare for future platforms (John Harrison)
- Prepare for future different GuC/HuC firmware signing key sizes (Daniele Ceraolo Spurio, Michal Wajdeczko)
- Add noreclaim annotations (Matthew Auld)
- Remove racey GEM_BUG_ON between GPU reset and GuC communication handling (Matthew Brost)
- Refactor i915->gt with to_gt(i915) to prepare for future platforms (Michał Winiarski, Andi Shyti)
- Increase GuC log size for CONFIG_DEBUG_GEM (John Harrison)
- Fixed engine busyness in selftests when in GuC mode (Umesh Nerlige Ramappa)
- Make engine parking work with PREEMPT_RT (Sebastian Andrzej Siewior)
- Replace X86_FEATURE_PAT with pat_enabled() (Lucas De Marchi)
- Selftest for stealing of guc ids (Matthew Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YcRvKO5cyPvIxVCi@tursulin-mobl2
A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Doing as little as possible to support this interface for
execlists - basically just passing submit fences between each request
generated and virtual engines are not allowed. This is on par with what
is there for the existing (hopefully soon deprecated) bonding interface.
We perma-pin these execlists contexts to align with GuC implementation.
v2:
(John Harrison)
- Drop siblings array as num_siblings must be 1
v3:
(John Harrison)
- Drop single submission
v4:
(John Harrison)
- Actually drop single submission
- Use IS_ERR check on return value from intel_context_create
- Set last request to NULL on unpin
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211222223532.28698-1-matthew.brost@intel.com
Convert free_work into delayed_work, similar to ttm to allow converting the
blocking lock in __i915_gem_free_objects to a trylock.
Unlike ttm, the object should already be idle, as it's kept alive
by a reference through struct i915_vma->active, which is dropped
after all vma's are idle.
Because of this, we can use a no wait by default, or when the lock
is contested, we use ttm's 10 ms.
The trylock should only fail when the object is sharing it's resv with
other objects, and typically objects are not kept locked for a long
time, so we can safely retry on failure.
Fixes: be7612fd66 ("drm/i915: Require object lock when freeing pages during destruction")
Testcase: igt/gem_exec_alignment/pi*
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211222155622.2960379-1-maarten.lankhorst@linux.intel.com
Protect updates of struct i915_vma flags and async binding / unbinding
with the vm::mutex. This means that i915_vma_bind() needs to assert
vm::mutex held. In order to make that possible drop the caching of
kmap_atomic() maps around i915_vma_bind().
An alternative would be to use kmap_local() but since we block cpu
unplugging during sleeps inside kmap_local() sections this may have
unwanted side-effects. Particularly since we might wait for gpu while
holding the vm mutex.
This change may theoretically increase execbuf cpu-usage on snb, but
at least on non-highmem systems that increase should be very small.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-5-thomas.hellstrom@linux.intel.com
First of all as discussed multiple times now kernel copies *must* always
wait for all fences in a BO before actually doing the copy. This is
mandatory.
Additional to that drop the handling when there can't be a shared slot
allocated on the source BO and just properly return an error code.
Otherwise this code path would only be tested under out of memory
conditions.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-3-thomas.hellstrom@linux.intel.com
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Since the gt migration code was using only a single fence for
dependencies, these were collected in a dma_fence_array. However, it
turns out that it's illegal to use some dma_fences in a dma_fence_array,
in particular other dma_fence_arrays and dma_fence_chains, and this
causes trouble for us moving forward.
Have the gt migration code instead take a const struct i915_deps for
dependencies. This means we can skip the dma_fence_array creation
and instead pass the struct i915_deps instead to circumvent the
problem.
v2:
- Make the prev_deps() function static. (kernel test robot <lkp@intel.com>)
- Update the struct i915_deps kerneldoc.
v4:
- Rebase.
Fixes: 5652df829b ("drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-2-thomas.hellstrom@linux.intel.com
In the next commits, we may not evict when refcount = 0.
igt_vm_isolation() continuously tries to pin/unpin at same address,
but also calls put() on the object, which means the object may not
be unpinned in time.
Instead of this, re-use the same object over and over, so they can
be unbound as required.
Changes since v1:
- Fix cleaning up obj_b on failure. (Matt)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216142749.1966107-7-maarten.lankhorst@linux.intel.com
Call drop_pages with the gem object lock held, instead of the other
way around. This will allow us to drop the vma bindings with the
gem object lock held.
We plan to require the object lock for unpinning in the future,
and this is an easy target.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216142749.1966107-3-maarten.lankhorst@linux.intel.com
Use to_gt() helper consistently throughout the codebase.
Pure mechanical s/i915->gt/to_gt(i915). No functional changes.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214193346.21231-6-andi.shyti@linux.intel.com
drm/i915 feature pull #2 for v5.17:
Features and functionality:
- Add eDP privacy screen support (Hans)
- Add Raptor Lake S (RPL-S) support (Anusha)
- Add CD clock squashing support (Mika)
- Properly support ADL-P without force probe (Clint)
- Enable pipe color support (10 bit gamma) for display 13 platforms (Uma)
- Update ADL-P DMC firmware to v2.14 (Madhumitha)
Refactoring and cleanups:
- More FBC refactoring preparing for multiple FBC instances (Ville)
- Plane register cleanups (Ville)
- Header refactoring and include cleanups (Jani)
- Crtc helper and vblank wait function cleanups (Jani, Ville)
- Move pipe/transcoder/abox masks under intel_device_info.display (Ville)
Fixes:
- Add a delay to let eDP source OUI write take effect (Lyude)
- Use div32 version of MPLLB word clock for UHBR on SNPS PHY (Jani)
- Fix DMC firmware loader overflow check (Harshit Mogalapalli)
- Fully disable FBC on FIFO underruns (Ville)
- Disable FBC with double wide pipe as mutually exclusive (Ville)
- DG2 workarounds (Matt)
- Non-x86 build fixes (Siva)
- Fix HDR plane max width for NV12 (Vidya)
- Disable IRQ for selftest timestamp calculation (Anshuman)
- ADL-P VBT DDC pin mapping fix (Tejas)
Merges:
- Backmerge drm-next for privacy screen plumbing (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87ee6f5h9u.fsf@intel.com
PAT can be disabled on boot with "nopat" in the command line. Replace
one x86-ism with another, which is slightly more correct to prepare for
supporting other architectures.
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211202003048.1015511-1-lucas.demarchi@intel.com
ttm->num_pages is uint32_t which was causing very large buffers to
only populate a truncated size.
This fixes gem_create@create-clear igt test on large memory systems.
Fixes: 7ae034590c ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211210195005.2582884-1-bob.beckett@collabora.com
Remove the portion of stolen memory reserved for private use from driver
access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211208153404.27546-2-ramalingam.c@intel.com
Conditionally allocate LMEM with 64K granularity, since 4K page support
for LMEM will be dropped on some platforms when using the PPGTT.
v2:
updated commit msg [Thomas]
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211208154854.28037-1-ramalingam.c@intel.com
Only hw that supports mappable aperture would hit this path
vm_fault_gtt/vm_fault_tmm, So we never hit this function
remap_io_mapping() in discrete, So skip this code for non-x86
architectures.
v2: use IS_ENABLED () instead of #if defined
v3: move function prototypes from i915_drv.h to i915_mm.h
v4: added kernel error message in stub function
v5: fixed compilation warnings
v6: checkpatch style
Signed-off-by: Siva Mullati <siva.mullati@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211208041215.763098-1-siva.mullati@intel.com
Originally "out_fence" was set using out_fence = sync_file_create() but
which returns NULL, but now it is set with out_fence = eb_requests_create()
which returns error pointers. The error path needs to be modified to
avoid an Oops in the "goto err_request;" path.
Fixes: 544460c338 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211202044831.29583-1-matthew.brost@intel.com
(cherry picked from commit 8722ded49c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Follow up on below commit, to increase the timeout further on new
platforms, to accomodate the additional time required for the completion
of guc submissions for numerous requests created in loop.
commit 5e076529e2
Author: Matthew Brost <matthew.brost@intel.com>
Date: Mon Jul 26 20:17:03 2021 -0700
drm/i915/selftests: Increase timeout in i915_gem_contexts selftests
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211207003845.12419-1-yu.bruce.chang@intel.com
Originally "out_fence" was set using out_fence = sync_file_create() but
which returns NULL, but now it is set with out_fence = eb_requests_create()
which returns error pointers. The error path needs to be modified to
avoid an Oops in the "goto err_request;" path.
Fixes: 544460c338 ("drm/i915: Multi-BB execbuf")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211202044831.29583-1-matthew.brost@intel.com
drm/i915 feature pull for v5.17:
Features and functionality:
- Implement per-lane DP drive settings for ICL+ (Ville)
- Enable runtime pm autosuspend by default (Tilak Tangudu)
- ADL-P DSI support (Vandita)
- Add support for pipe C and D DMC firmware (Anusha)
- Implement (near)atomic gamma LUT updates via vblank workers (Ville)
- Split plane updates to noarm+arm phases (Ville)
- Remove the CCS FB stride restrictions on ADL-P (Imre)
- Add PSR selective fetch support for biplanar formats (Jouni)
- Add support for display audio codec keepalive (Kai)
- VRR platform support for display 11 (Manasi)
Refactoring and cleanups:
- FBC refactoring and cleanups preparing for multiple FBC instances (Ville)
- PCH modeset refactoring, move to its own file (Ville)
- Refactor and simplify handling of modifiers (Imre)
- PXP cleanups (Ville)
- Display header and include refactoring (Jani)
- Some register macro cleanups (Ville)
- Refactor DP HDMI DFP limit code (Ville)
Fixes:
- Disable DSB usage for now due to incorrect gamma LUT updates (Ville)
- Check async flip state of every crtc and plane only once (José)
- Fix DPT FB suspend/resume (Imre)
- Fix black screen on reboot due to disabled DP++ TMDS output buffers (Ville)
- Don't request GMBUS to generate irqs when called while irqs are off (Ville)
- Fix type1 DVI DP dual mode adapter heuristics for modern platforms (Ville)
- Fix fix integer overflow in 128b/132b data rate calculation (Jani)
- Fix bigjoiner state readout (Ville)
- Build fix for non-x86 (Siva)
- PSR fixes (José, Jouni, Ville)
- Disable ADL-P underrun recovery (José)
- Fix DP link parameter usage before valid DPCD (Imre)
- VRR vblank and frame counter fixes (Ville)
- Fix fastsets on TypeC ports following a non-blocking modeset (Imre)
- Compiler warning fixes (Nathan Chancellor)
- Fix DSI HS mode commands (William Tseng)
- Error return fixes (Dan Carpenter)
- Update memory bandwidth calculations (Radhakrishna)
- Implement WM0 cursor WA for DG2 (Stan)
- Fix DSI Double pixelclock on read-back for dual-link panels (Hans de Goede)
- HDMI 2.1 PCON FRL configuration fixes (Ankit)
Merges:
- DP link training delay helpers, via topic branch (Jani)
- Backmerge drm-next (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87v909it0t.fsf@intel.com
With asynchronous migrations, the vma state may be several migrations
ahead of the state that matches the request we're capturing.
Address that by introducing an i915_vma_snapshot structure that
can be used to snapshot relevant state at request submission.
In order to make sure we access the correct memory, the snapshots take
references on relevant sg-tables and memory regions.
Also move the capture list allocation out of the fence signaling
critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
avoid compiling in members and functions used for error capture
when they're not used.
Finally, Introduce lockdep annotation.
v4:
- Break out the capture allocation mode change to a separate patch.
v5:
- Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case
(kernel test robot)
v6:
- Use #if IS_ENABLED() instead of #ifdef to match driver style.
- Move yet another change of allocation mode to the separate patch.
- Commit message rework due to patch reordering.
v7:
- Adjust for removal of region refcounting.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com
With both integrated and discrete Intel GPUs in a system, the current
global check of intel_iommu_gfx_mapped, as done from intel_vtd_active()
may not be completely accurate.
In this patch we add i915 parameter to intel_vtd_active() in order to
prepare it for multiple GPUs and we also change the check away from Intel
specific intel_iommu_gfx_mapped (global exported by the Intel IOMMU
driver) to probing the presence of IOMMU on a specific device using
device_iommu_mapped().
This will return true both for IOMMU pass-through and address translation
modes which matches the current behaviour. If in the future we wanted to
distinguish between these two modes we could either use
iommu_get_domain_for_dev() and check for __IOMMU_DOMAIN_PAGING bit
indicating address translation, or ask for a new API to be exported from
the IOMMU core code.
v2:
* Check for dmar translation specifically, not just iommu domain. (Baolu)
v3:
* Go back to plain "any domain" check for now, rewrite commit message.
v4:
* Use device_iommu_mapped. (Robin, Baolu)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211126141424.493753-1-tvrtko.ursulin@linux.intel.com
Rather than stealing bits from i915_sw_fence function pointer use
separate fields for function pointer and flags. If using two different
fields, the 4 byte alignment for the i915_sw_fence function pointer can
also be dropped.
v2:
(CI)
- Set new function field rather than flags in __i915_sw_fence_init
v3:
(Tvrtko)
- Remove BUG_ON(!fence->flags) in reinit as that will now blow up
- Only define fence->flags if CONFIG_DRM_I915_SW_FENCE_CHECK_DAG is
defined
v4:
- Rebase, resend for CI
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116194929.10211-1-matthew.brost@intel.com
The signaled bit is already used for quick testing if a fence is signaled.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/460722/
Signed-off-by: Christian König <christian.koenig@amd.com>
vfs_kernel_mount() modifies the passed in mount options, leaving us with
"huge", instead of "huge=within_size". Normally this shouldn't matter
with the usual module load/unload flow, however with the core_hotunplug
IGT we are hitting the following, when re-probing the memory regions:
i915 0000:00:02.0: [drm] Transparent Hugepage mode 'huge'
tmpfs: Bad value for 'huge'
[drm] Unable to create a private tmpfs mount, hugepage support will be disabled(-22).
References: https://gitlab.freedesktop.org/drm/intel/-/issues/4651
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211126110843.2028582-1-matthew.auld@intel.com
The signaled bit is already used for quick testing if a fence is signaled.
On top of that, it's a terrible abuse of dma-fence api, and in the common
case where the object is already locked by the caller, the trylock will fail.
If it were useful, the core dma-api would have exposed the same functionality.
The fact that i915 has a dma_resv_utils.c file should be a warning that the
functionality either belongs in core, or is not very useful at all.
In this case the latter.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[mlankhorst: Improve commit message]
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021103605.735002-3-maarten.lankhorst@linux.intel.com
Reviewed-by: Matthew Auld <matthew.auld@intel.com> #irc
Maarten requested a backmerge due his work depending on subtle semantic
changes introduced by:
7e2e69ed46 ("drm/i915: Fix i915_request fence wait semantics")
2cbb8d4d67 ("drm/i915: use new iterator in i915_gem_object_wait_reservation")
Both should probably have been merged to drm-intel-gt-next anyway.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Don't wait sync while migrating, but rather make the GPU blit await the
dependencies and add a moving fence to the object.
This also enables asynchronous VRAM management in that on eviction,
rather than waiting for the moving fence to expire before freeing VRAM,
it is freed immediately and the fence is stored with the VRAM manager and
handed out to newly allocated objects to await before clears and swapins,
or for kernel objects before setting up gpu vmas or mapping.
To collect dependencies before migrating, add a set of utilities that
coalesce these to a single dma_fence.
What is still missing for fully asynchronous operation is asynchronous vma
unbinding, which is still to be implemented.
This commit substantially reduces execution time in the gem_lmem_swapping
test.
v2:
- Make a couple of functions static.
v4:
- Fix some style issues (Matthew Auld)
- Audit and add more checks for ghost objects (Matthew Auld)
- Add more documentation for the i915_deps utility (Mattew Auld)
- Simplify the i915_deps_sync() function
v6:
- Re-check for fence signaled before returning -EBUSY (Matthew Auld)
- Use dma_resv_iter_is_exclusive() (Matthew Auld)
- Await all dma-resv fences before a migration blit (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-6-thomas.hellstrom@linux.intel.com
With async migration, the shrinker may end up wanting to release the
pages of an object while the migration blit is still running, since
the GT migration code doesn't set up VMAs and the shrinker is thus
oblivious to the fact that the GPU is still using the pages.
Add waiting for gpu in the shrinker_release_pages() op and an
argument to that function indicating whether the shrinker expects it
to not wait for gpu. In the latter case the shrinker_release_pages()
op will return -EBUSY if the object is not idle.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-5-thomas.hellstrom@linux.intel.com
There is an interesting refcounting loop:
struct intel_memory_region has a struct ttm_resource_manager,
ttm_resource_manager->move may hold a reference to i915_request,
i915_request may hold a reference to intel_context,
intel_context may hold a reference to drm_i915_gem_object,
drm_i915_gem_object may hold a reference to intel_memory_region.
Break this loop by dropping region reference counting.
In addition, Have regions with a manager moving fence make sure
that all region objects are released before freeing the region.
v6:
- Fix a code comment.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-4-thomas.hellstrom@linux.intel.com
For now, we will only allow async migration when TTM is used,
so the paths we care about are related to TTM.
The mmap path is handled by having the fence in ttm_bo->moving,
when pinning, the binding only becomes available after the moving
fence is signaled, and pinning a cpu map will only work after
the moving fence signals.
This should close all holes where userspace can read a buffer
before it's fully migrated.
v2:
- Fix a couple of SPARSE warnings
v3:
- Fix a NULL pointer dereference
v4:
- Ditch the moving fence waiting for i915_vma_pin_iomap() and
replace with a verification that the vma is already bound.
(Matthew Auld)
- Squash with a previous patch introducing moving fence waiting and
accessing interfaces (Matthew Auld)
- Rename to indicated that we also add support for sync waiting.
v5:
- Fix check for NULL and unreferencing i915_vma_verify_bind_complete()
(Matthew Auld)
- Fix compilation failure if !CONFIG_DRM_I915_DEBUG_GEM
- Fix include ordering. (Matthew Auld)
v7:
- Fix yet another compilation failure with clang if
!CONFIG_DRM_I915_DEBUG_GEM
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-2-thomas.hellstrom@linux.intel.com
This reverts commit 777226dac0 ("drm/i915/dmabuf: fix broken build").
Approach taken in the patch was rejected by Linus and the upstream tree
now already contains the required include directive via 304ac8032d
("Merge tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drm").
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 777226dac0 ("drm/i915/dmabuf: fix broken build")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122135758.85444-1-tvrtko.ursulin@linux.intel.com
[tursulin: fixup commit message sha format]
drm-intel-gt-next fails to build with:
drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function ‘vm_fault_ttm’:
drivers/gpu/drm/i915/gem/i915_gem_ttm.c:862:23: error: too many arguments to function ‘ttm_bo_vm_fault_reserved’
862 | ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
| ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211123125814.1703220-1-matthew.auld@intel.com
Correct kernel-doc warnings in i915_drm_object.c:
i915_gem_object.c:103: warning: expecting prototype for i915_gem_object_fini(). Prototype was for __i915_gem_object_fini() instead
i915_gem_object.c:110: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Mark up the object's coherency levels for a given cache_level
i915_gem_object.c:110: warning: missing initial short description on line:
* Mark up the object's coherency levels for a given cache_level
i915_gem_object.c:457: warning: No description found for return value of 'i915_gem_object_read_from_page'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211123050928.20434-1-rdunlap@infradead.org
This function returns a bool type so returning -EBUSY is equivalent to
returning true. It should return false instead.
Fixes: 7ae034590c ("drm/i915/ttm: add tt shmem backend")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122061438.GA2492@kili
We currently have to special case vma->obj being NULL because
of gen6 ppgtt and mock_engine. Fix gen6 ppgtt, so we may soon
be able to remove a few checks. As the object only exists as
a fake object pointing to ggtt, we have no backing storage,
so no real object is created. It just has to look real enough.
Also kill pin_mutex, it's not compatible with ww locking,
and we can use the vm lock instead.
v2:
- Drop IS_SHRINKABLE and shorten overly long line
v3:
- Checkpatch fix for alignment
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-2-matthew.auld@intel.com
Simplifying the code a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
[mlankhorst: Handle timeout = 0 correctly, use new i915_request_wait_timeout.]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116102431.198905-7-christian.koenig@amd.com
bridge:
- HPD improvments for lt9611uxc
- eDP aux-bus support for ps8640
- LVDS data-mapping selection support
ttm:
- remove huge page functionality (needs reworking)
- fix a race condition during BO eviction
panels:
- add some new panels
fbdev:
- fix double-free
- remove unused scrolling acceleration
- CONFIG_FB dep improvements
locking:
- improve contended locking logging
- naming collision fix
dma-buf:
- add dma_resv_for_each_fence iterator
- fix fence refcounting bug
- name locking fixesA
prime:
- fix object references during mmap
nouveau:
- various code style changes
- refcount fix
- device removal fixes
- protect client list with a mutex
- fix CE0 address calculation
i915:
- DP rates related fixes
- Revert disabling dual eDP that was causing state readout problems
- put the cdclk vtables in const data
- Fix DVO port type for older platforms
- Fix blankscreen by turning DP++ TMDS output buffers on encoder->shutdown
- CCS FBs related fixes
- Fix recursive lock in GuC submission
- Revert guc_id from i915_request tracepoint
- Build fix around dmabuf
amdgpu:
- GPU reset fix
- Aldebaran fix
- Yellow Carp fixes
- DCN2.1 DMCUB fix
- IOMMU regression fix for Picasso
- DSC display fixes
- BPC display calculation fixes
- Other misc display fixes
- Don't allow partial copy from user for DC debugfs
- SRIOV fixes
- GFX9 CSB pin count fix
- Various IP version check fixes
- DP 2.0 fixes
- Limit DCN1 MPO fix to DCN1
amdkfd:
- SVM fixes
- Fix gfx version for renoir
- Reset fixes
udl:
- timeout fix
imx:
- circular locking fix
virtio:
- NULL ptr deref fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmGN3YwACgkQDHTzWXnE
hr6aZQ/+Pobf1VE7V3wPUcopxccJYmgBvG/uY8EDyjA8qaxHs2pQqGN2IooOGxr6
F8G1N94Hem/PCDn3T8JI2Tqw5z4sy4UwLahEWISurFCen1IMAfA7hYfutp9X3O7X
8h7b+PgkvVruEAHF7z0kqnWGPHmcro29cIHNkXVRjnJuz+Gmn1XRfo6Jj65n6D7u
NfMeU4/lWRR3767oJQzTqyAYtGxsKaZT3/tBD5WggZBzEKC7hqhAl8EUoOLWwojo
fDqwiEpLXpraPRIQH8trkXVHhzPeLAmG916WwS8JG3CEk9mUQ+I7Jshhd8cw+bsQ
XPuk3OBfU9mtuiGgNzrLP3xXJZs/QN3EkpKZWLefTnJY+C4BgiP2RifTnghmwV31
6/7Pr83CX/cn3BRd7r0xaeBZYvVYBZmwoZcsZFJBM8SVjd/ofKUfAmCzZZKheio2
5qa6bj9DQoyjEoFAULh23plcX6hvATGP7wzfRTnJ9AlAJ0KyEjVJ3r0qE6jHMDc/
uzcTAnKIWCxt9kSgE5qwLQtxLBaBpr/iOniZbCqGkPjiZeMzqP/ug1AKVP7kk39x
FxZVT8ZOKk8Xt4iLZx8jmHi2KKheXYZi9LqieoTrJd44qMXDOmR9DCtQX9FZuWJS
EJAlMj6sCowAZdODPZMVpoMc3Gti9nZ2Fpu7mLrRcMk1gKfjKwo=
=qMNk
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drm
Pull more drm updates from Dave Airlie:
"I missed a drm-misc-next pull for the main pull last week. It wasn't
that major and isn't the bulk of this at all. This has a bunch of
fixes all over, a lot for amdgpu and i915.
bridge:
- HPD improvments for lt9611uxc
- eDP aux-bus support for ps8640
- LVDS data-mapping selection support
ttm:
- remove huge page functionality (needs reworking)
- fix a race condition during BO eviction
panels:
- add some new panels
fbdev:
- fix double-free
- remove unused scrolling acceleration
- CONFIG_FB dep improvements
locking:
- improve contended locking logging
- naming collision fix
dma-buf:
- add dma_resv_for_each_fence iterator
- fix fence refcounting bug
- name locking fixesA
prime:
- fix object references during mmap
nouveau:
- various code style changes
- refcount fix
- device removal fixes
- protect client list with a mutex
- fix CE0 address calculation
i915:
- DP rates related fixes
- Revert disabling dual eDP that was causing state readout problems
- put the cdclk vtables in const data
- Fix DVO port type for older platforms
- Fix blankscreen by turning DP++ TMDS output buffers on encoder->shutdown
- CCS FBs related fixes
- Fix recursive lock in GuC submission
- Revert guc_id from i915_request tracepoint
- Build fix around dmabuf
amdgpu:
- GPU reset fix
- Aldebaran fix
- Yellow Carp fixes
- DCN2.1 DMCUB fix
- IOMMU regression fix for Picasso
- DSC display fixes
- BPC display calculation fixes
- Other misc display fixes
- Don't allow partial copy from user for DC debugfs
- SRIOV fixes
- GFX9 CSB pin count fix
- Various IP version check fixes
- DP 2.0 fixes
- Limit DCN1 MPO fix to DCN1
amdkfd:
- SVM fixes
- Fix gfx version for renoir
- Reset fixes
udl:
- timeout fix
imx:
- circular locking fix
virtio:
- NULL ptr deref fix"
* tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drm: (126 commits)
drm/ttm: Double check mem_type of BO while eviction
drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64)
drm/amdgpu: drop jpeg IP initialization in SRIOV case
drm/amd/display: reject both non-zero src_x and src_y only for DCN1x
drm/amd/display: Add callbacks for DMUB HPD IRQ notifications
drm/amd/display: Don't lock connection_mutex for DMUB HPD
drm/amd/display: Add comment where CONFIG_DRM_AMD_DC_DCN macro ends
drm/amdkfd: Fix retry fault drain race conditions
drm/amdkfd: lower the VAs base offset to 8KB
drm/amd/display: fix exit from amdgpu_dm_atomic_check() abruptly
drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
drm/i915/adlp/fb: Prevent the mapping of redundant trailing padding NULL pages
drm/i915/fb: Fix rounding error in subsampled plane size calculation
drm/i915/hdmi: Turn DP++ TMDS output buffers back on in encoder->shutdown()
drm/locking: fix __stack_depot_* name conflict
drm/virtio: Fix NULL dereference error in virtio_gpu_poll
drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support()
drm/amdgpu: Fix dangling kfd_bo pointer for shared BOs
drm/amd/amdkfd: Don't sent command to HWS on kfd reset
...
There's a small window of opportunity during which the adjust_lru()
function can be called with a GEM refcount of zero from the TTM
eviction code. This results in a kernel BUG().
Ensure that we don't attempt to modify the GEM shrinker lists unless
we have a GEM refcount.
Fixes: ebd4a8ec77 ("drm/i915/ttm: move shrinker management into adjust_lru")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211110085527.1033475-1-thomas.hellstrom@linux.intel.com
In the next commit, we don't evict when refcount = 0, so we need to
call drain freed objects, because we want to pin new bo's in the same
place, causing a test failure.
Furthermore, since each subtest is separated, it's a lot better to use
i915_live_selftests, so each subtest starts with a clean slate, and a
clean address space.
v2(Reported-by: kernel test robot <lkp@intel.com>):
- Make hugepage_ctx static.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028125855.3281674-9-matthew.auld@intel.com
If the initial fill blit or copy blit of an object fails, the old
content of the data might be exposed and read as soon as either CPU- or
GPU PTEs are set up to point at the pages.
Intercept the blit fence with an async callback that checks the
blit fence for errors and if there are errors performs an async cpu blit
instead. If there is a failure to allocate the async dma_fence_work,
allocate it on the stack and sync wait for the blit to complete.
Add selftests that simulate gpu blit failures and failure to allocate
the async dma_fence_work.
A previous version of this pach used dma_fence_work, now that's
opencoded which adds more code but might lower the latency
somewhat in the common non-error case.
v3:
- Style fixes (Matthew Auld)
v4:
- Use "#if IS_ENABLED()" instead of #ifdef (Matthew Auld)
v5:
- Fix an issue where we, if the dependency was already signaled, might
end up waiting for a memcpy fence that would never signal.
v6:
- Add a missing i915_ttm_memcpy_release() (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104110718.688420-3-thomas.hellstrom@linux.intel.com
We are about to introduce failsafe- and asynchronous migration and
ttm moves.
This will add complexity and code to the TTM move code so it makes sense
to split it out to a separate file to make the i915 TTM code easer to
digest.
Split the i915 TTM move code out and since we will have to change the name
of the gpu_binds_iomem() and cpu_maps_iomem() functions anyway,
we alter the name of gpu_binds_iomem() to i915_ttm_gtt_binds_lmem() which
is more reflecting what it is used for.
With this we also add some more documentation. Otherwise there should be
no functional change.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104110718.688420-2-thomas.hellstrom@linux.intel.com
Here is the big set of char and misc and other tiny driver subsystem
updates for 5.16-rc1.
Loads of things in here, all of which have been in linux-next for a
while with no reported problems (except for one called out below.)
Included are:
- habanana labs driver updates, including dma_buf usage,
reviewed and acked by the dma_buf maintainers
- iio driver update (going through this tree not staging as they
really do not belong going through that tree anymore)
- counter driver updates
- hwmon driver updates that the counter drivers needed, acked by
the hwmon maintainer
- xillybus driver updates
- binder driver updates
- extcon driver updates
- dma_buf module namespaces added (will cause a build error in
arm64 for allmodconfig, but that change is on its way through
the drm tree)
- lkdtm driver updates
- pvpanic driver updates
- phy driver updates
- virt acrn and nitr_enclaves driver updates
- smaller char and misc driver updates
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYYPX2A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymUUgCbB4EKysgLuXYdjUalZDx+vvZO4k0AniS14O4k
F+2dVSZ5WX6wumUzCaA6
=bXQM
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char and misc and other tiny driver subsystem
updates for 5.16-rc1.
Loads of things in here, all of which have been in linux-next for a
while with no reported problems (except for one called out below.)
Included are:
- habanana labs driver updates, including dma_buf usage, reviewed and
acked by the dma_buf maintainers
- iio driver update (going through this tree not staging as they
really do not belong going through that tree anymore)
- counter driver updates
- hwmon driver updates that the counter drivers needed, acked by the
hwmon maintainer
- xillybus driver updates
- binder driver updates
- extcon driver updates
- dma_buf module namespaces added (will cause a build error in arm64
for allmodconfig, but that change is on its way through the drm
tree)
- lkdtm driver updates
- pvpanic driver updates
- phy driver updates
- virt acrn and nitr_enclaves driver updates
- smaller char and misc driver updates"
* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (386 commits)
comedi: dt9812: fix DMA buffers on stack
comedi: ni_usb6501: fix NULL-deref in command paths
arm64: errata: Enable TRBE workaround for write to out-of-range address
arm64: errata: Enable workaround for TRBE overwrite in FILL mode
coresight: trbe: Work around write to out of range
coresight: trbe: Make sure we have enough space
coresight: trbe: Add a helper to determine the minimum buffer size
coresight: trbe: Workaround TRBE errata overwrite in FILL mode
coresight: trbe: Add infrastructure for Errata handling
coresight: trbe: Allow driver to choose a different alignment
coresight: trbe: Decouple buffer base from the hardware base
coresight: trbe: Add a helper to pad a given buffer area
coresight: trbe: Add a helper to calculate the trace generated
coresight: trbe: Defer the probe on offline CPUs
coresight: trbe: Fix incorrect access of the sink specific data
coresight: etm4x: Add ETM PID for Kryo-5XX
coresight: trbe: Prohibit trace before disabling TRBE
coresight: trbe: End the AUX handle on truncation
coresight: trbe: Do not truncate buffer on IRQ
coresight: trbe: Fix handling of spurious interrupts
...
We were overzealous here; even though discrete is non-LLC, it should
still be always coherent.
v2(Thomas & Daniel)
- Be extra cautious and limit to DG1
- Add some more commentary
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211029122137.3484203-1-matthew.auld@intel.com
Should not be needed. Even with non-coherent display, we should be using
device local-memory there, and not system memory.
v2: also add a warning in i915_gem_clflush_object
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-4-matthew.auld@intel.com
Move it next to its partner in crime; gpu_write_needs_clflush. For
better readability lets keep gpu vs cpu at least in the same file.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-3-matthew.auld@intel.com
We seem to have an unfortunate issue where we arrive from:
i915_gem_object_flush_if_display+0x86/0xd0 [i915]
intel_user_framebuffer_dirty+0x1a/0x50 [i915]
drm_mode_dirtyfb_ioctl+0xfb/0x1b0
which can be before the pages are populated(and pinned for display), and
so i915_gem_object_has_struct_page() might still return true, as per the
ttm backend. We could re-order the later get_pages() call here, but
since on discrete everything should already be coherent, with the
exception of the display engine, and even there display surfaces must be
allocated in device local-memory anyway, so there should in theory be no
conceivable reason to ever call i915_gem_clflush_object() on discrete.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/4320
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-2-matthew.auld@intel.com
In theory if clflush_work_create() somehow fails here, and we don't yet
have mm.pages populated then we end up resetting cache_dirty, which is
likely wrong, since that will potentially skip the flush-on-acquire, if
it was needed.
It looks like intel_user_framebuffer_dirty() can arrive here before the
pages are populated.
v2(Thomas):
- Move setting cache_dirty out of the async portion, also add a
comment for why that should still be safe.
v3:
- Add Thomas' irc r-b
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-1-matthew.auld@intel.com
As we start to introduce asynchronous failsafe object migration,
where we update the object state and then submit asynchronous
commands we need to record what memory resources are actually used
by various part of the command stream. Initially for three purposes:
1) Error capture.
2) Asynchronous migration error recovery.
3) Asynchronous vma bind.
At the time where these happens, the object state may have been updated
to be several migrations ahead and object sg-tables discarded.
In order to make it possible to keep sg-tables with memory resource
information for these operations, introduce refcounted sg-tables that
aren't freed until the last user is done with them.
The alternative would be to reference information sitting on the
corresponding ttm_resources which typically have the same lifetime as
these refcountes sg_tables, but that leads to other awkward constructs:
Due to the design direction chosen for ttm resource managers that would
lead to diamond-style inheritance, the LMEM resources may sometimes be
prematurely freed, and finally the subclassed struct ttm_resource would
have to bleed into the asynchronous vma bind code.
v3:
- Address a number of style issues (Matthew Auld)
v4:
- Dont check for st->sgl being NULL in i915_ttm_tt__shmem_unpopulate(),
that should never happen. (Matthew Auld)
v5:
- Fix a Potential double-free (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211101122444.114607-1-thomas.hellstrom@linux.intel.com