Commit Graph

3762 Commits

Author SHA1 Message Date
Christoph Manszewski
111fb43a55
drm/xe: Fix vm_bind_ioctl double free bug
If the argument check during an array bind fails, the bind_ops are freed
twice as seen below. Fix this by setting bind_ops to NULL after freeing.

==================================================================
BUG: KASAN: double-free in xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
Free of addr ffff88813bb9b800 by task xe_vm/14198

CPU: 5 UID: 0 PID: 14198 Comm: xe_vm Not tainted 6.16.0-xe-eudebug-cmanszew+ #520 PREEMPT(full)
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2411.A02.2110081023 10/08/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x82/0xd0
 print_report+0xcb/0x610
 ? __virt_addr_valid+0x19a/0x300
 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
 kasan_report_invalid_free+0xc8/0xf0
 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
 check_slab_allocation+0x102/0x130
 kfree+0x10d/0x440
 ? should_fail_ex+0x57/0x2f0
 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
 xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
 ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
 ? __lock_acquire+0xab9/0x27f0
 ? lock_acquire+0x165/0x300
 ? drm_dev_enter+0x53/0xe0 [drm]
 ? find_held_lock+0x2b/0x80
 ? drm_dev_exit+0x30/0x50 [drm]
 ? drm_ioctl_kernel+0x128/0x1c0 [drm]
 drm_ioctl_kernel+0x128/0x1c0 [drm]
 ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
 ? find_held_lock+0x2b/0x80
 ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
 ? should_fail_ex+0x57/0x2f0
 ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
 drm_ioctl+0x352/0x620 [drm]
 ? __pfx_drm_ioctl+0x10/0x10 [drm]
 ? __pfx_rpm_resume+0x10/0x10
 ? do_raw_spin_lock+0x11a/0x1b0
 ? find_held_lock+0x2b/0x80
 ? __pm_runtime_resume+0x61/0xc0
 ? rcu_is_watching+0x20/0x50
 ? trace_irq_enable.constprop.0+0xac/0xe0
 xe_drm_ioctl+0x91/0xc0 [xe]
 __x64_sys_ioctl+0xb2/0x100
 ? rcu_is_watching+0x20/0x50
 do_syscall_64+0x68/0x2e0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9acb24ded

Fixes: b43e864af0 ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250813101231.196632-2-christoph.manszewski@intel.com
(cherry picked from commit a01b704527c28a2fd43a17a85f8996b75ec8492a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-21 15:06:58 -04:00
Piotr Piórkowski
8a30114073
drm/xe: Move ASID allocation and user PT BO tracking into xe_vm_create
Currently, ASID assignment for user VMs and page-table BO accounting for
client memory tracking are performed in xe_vm_create_ioctl.
To consolidate VM object initialization, move this logic to
xe_vm_create.

v2:
 - removed unnecessary duplicate BO tracking code
 - using the local variable xef to verify whether the VM is being created
   by userspace

Fixes: 658a1c8e0a ("drm/xe: Assign ioctl xe file handler to vm in xe_vm_create")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-3-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 30e0c3f43a414616e0b6ca76cf7f7b2cd387e1d4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Added fixes tag]
2025-08-21 15:06:58 -04:00
Piotr Piórkowski
658a1c8e0a
drm/xe: Assign ioctl xe file handler to vm in xe_vm_create
In several code paths, such as xe_pt_create(), the vm->xef field is used
to determine whether a VM originates from userspace or the kernel.

Previously, this handler was only assigned in xe_vm_create_ioctl(),
after the VM was created by xe_vm_create(). However, xe_vm_create()
triggers page table creation, and that function assumes vm->xef should
be already set. This could lead to incorrect origin detection.

To fix this problem and ensure consistency in the initialization of
the VM object, let's move the assignment of this handler to
xe_vm_create.

v2:
 - take reference to the xe file object only when xef is not NULL
 - release the reference to the xe file object on the error path (Matthew)

Fixes: 7f387e6012 ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 9337166fa1d80f7bb7c7d3a8f901f21c348c0f2a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-19 10:15:08 -04:00
Michał Winiarski
94eae6ee4c
drm/xe/pf: Set VF LMEM BAR size
LMEM is partitioned between multiple VFs and we expect that the more
VFs we have, the less LMEM is assigned to each VF.
This means that we can achieve full LMEM BAR access without the need to
attempt full VF LMEM BAR resize via pci_resize_resource().

Always try to set the largest possible BAR size that allows to fit the
number of enabled VFs and inform the user in case the resize attempt is
not successful.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250527120637.665506-7-michal.winiarski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 32a4d1b98e6663101fd0abfaf151c48feea7abb1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-14 10:30:53 -04:00
Karthik Poosa
55d49f0616
drm/xe/hwmon: Add SW clamp for power limits writes
Clamp writes to power limits powerX_crit/currX_crit, powerX_cap,
powerX_max, to the maximum supported by the pcode mailbox
when sysfs-provided values exceed this limit.
Although the pcode already performs clamping, values beyond the pcode
mailbox's supported range get truncated, leading to incorrect
critical power settings.
This patch ensures proper clamping to prevent such truncation.

v2:
 - Address below review comments. (Riana)
 - Split comments into multiple sentences.
 - Use local variables for readability.
 - Add a debug log.
 - Use u64 instead of unsigned long.

v3:
 - Change drm_dbg logs to drm_info. (Badal)

v4:
 - Rephrase the drm_info log. (Rodrigo, Riana)
 - Rename variable max_mbx_power_limit to max_supp_power_limit, as
   limit is same for platforms with and without mailbox power limit
   support.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 92d44a422d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fb1b70607f ("drm/xe/hwmon: Expose power attributes")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250808185310.3466529-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit d301eb950da59f962bafe874cf5eb6d61a85b2c2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-12 12:52:26 -04:00
Thomas Hellström
2dd7a47669
drm/xe: Defer buffer object shrinker write-backs and GPU waits
When the xe buffer-object shrinker allows GPU waits and write-back,
(typically from kswapd), perform multiple passes, skipping
subsequent passes if the shrinker number of scanned objects target
is reached.

1) Without GPU waits and write-back
2) Without write-back
3) With both GPU-waits and write-back

This is to avoid stalls and costly write- and readbacks unless they
are really necessary.

v2:
- Don't test for scan completion twice. (Stuart Summers)
- Update tags.

Reported-by: melvyn <melvyn2@dnsense.pub>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5557
Cc: Summers Stuart <stuart.summers@intel.com>
Fixes: 00c8efc318 ("drm/xe: Add a shrinker for xe bos")
Cc: <stable@vger.kernel.org> # v6.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250805074842.11359-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 80944d334182ce5eb27d00e2bf20a88bfc32dea1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-12 12:52:26 -04:00
Matthew Auld
145832fbdd
drm/xe/migrate: prevent potential UAF
If we hit the error path, the previous fence (if there is one) has
already been put() prior to this, so doing a fence_wait could lead to
UAF. Tweak the flow to do to the put() until after we do the wait.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-8-matthew.auld@intel.com
(cherry picked from commit 9b7ca35ed28fe5fad86e9d9c24ebd1271e4c9c3e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-12 12:52:26 -04:00
Matthew Auld
4126cb327a
drm/xe/migrate: don't overflow max copy size
With non-page aligned copy, we need to use 4 byte aligned pitch, however
the size itself might still be close to our maximum of ~8M, and so the
dimensions of the copy can easily exceed the S16_MAX limit of the copy
command leading to the following assert:

xe 0000:03:00.0: [drm] Assertion `size / pitch <= ((s16)(((u16)~0U) >> 1))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
tile: 0 VRAM 10.0 GiB
GT: 0 type 1

WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe]

To fix this account for the pitch when calculating the number of current
bytes to copy.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com
(cherry picked from commit 8c2d61e0e916e077fda7e7b8e67f25ffe0f361fc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-12 12:52:26 -04:00
Matthew Auld
9d7a1cbebb
drm/xe/migrate: prevent infinite recursion
If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to
using a bounce buffer. However the bounce buffer here is allocated on
the stack, and the only alignment requirement here is that it's
naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce
buffer is also misaligned we then recurse back into the function again,
however the new bounce buffer might also not be aligned, and might never
be until we eventually blow through the stack, as we keep recursing.

Instead of using the stack use kmalloc, which should respect the
power-of-two alignment request here. Fixes a kernel panic when
triggering this path through eudebug.

v2 (Stuart):
 - Add build bug check for power-of-two restriction
 - s/EINVAL/ENOMEM/

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com
(cherry picked from commit 38b34e928a08ba594c4bbf7118aa3aadacd62fff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-12 12:52:26 -04:00
Linus Torvalds
ffe8ac927d drm fixes for 6.17-rc1
i915:
 - DP LPFS fixes
 
 xe:
 - SRIOV: PF fixes and removal of need of module param
 - Fix driver unbind around Devcoredump
 - Mark xe driver as BROKEN if kernel page size is not 4kB
 
 amdgpu:
 - GC 9.5.0 fixes
 - SMU fix
 - DCE 6 DC fixes
 - mmhub client ID fixes
 - VRR fix
 - Backlight fix
 - UserQ fix
 - Legacy reset fix
 - Misc fixes
 
 amdkfd:
 - CRIU fix
 - Debugfs fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmiVQYoACgkQDHTzWXnE
 hr4OVg/+OT59WwFrqodc79YRnGkbMllYfhhsC+hgczRWS1WV6GCp5yYSZVqWjs5I
 H44mM/rUEywXGf5m6lNx9GCCLF+ASgTAosVY/+cSDDpdB+Chl5CSYMbwmmlaS86a
 yzOxbNz7LC+gxrD9ZSmFxaWOop789ISjRhiafAWeKcJeVIo/tuW7rX2wJsN+LdNo
 ioUnNR4UHPKhdk7SQhm6ho3izL02e3H72fK/jcUj2FMehqJRLW6ilNTSntJ+bwp4
 WmtSHSHTfeFximemIiU1l11cilwpAkWqmDjR9ZBW/F6psBU3FX3eOP4MaKL1Io2X
 0gKXcnrz8XSLlZ46rKwHLj3IB3cNuGd2jO/rTcHDTQWGhrjPgn5fZnorOsfMRlTU
 ySaEeVsgCupQyiB1hOYFMECse+tqzV++05Bu/BL3NpYn6GMikghPkianAKlwpwxS
 lVDeOTmC9WX/jD9FeW8a69nsGZJoT3oSp9Ro2xjqlo1bwm1FVmUlxKgXSRIdHofg
 oZ8PGGDQGM5hf69Y/iLNFSPC+Z22n8cDrA0in92Gek965/h2HHkttQVWEMsaVc/x
 gkxb6rWv6ZMeLMrRY4ZQFd8JaOFfEzrc7aWemknEmF0HIdyJJUu2zWiUwm2MgpI/
 tshUMJdgoJaPoTinHFJeID8HrdzgFy9owHnKUne9nNNkPFjskV4=
 =rAxo
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This is the fixes that built up in the merge window, mostly amdgpu and
  xe with one i915 display fix, seems like things are pretty good for
  rc1.

  i915:
   - DP LPFS fixes

  xe:
   - SRIOV: PF fixes and removal of need of module param
   - Fix driver unbind around Devcoredump
   - Mark xe driver as BROKEN if kernel page size is not 4kB

  amdgpu:
   - GC 9.5.0 fixes
   - SMU fix
   - DCE 6 DC fixes
   - mmhub client ID fixes
   - VRR fix
   - Backlight fix
   - UserQ fix
   - Legacy reset fix
   - Misc fixes

  amdkfd:
   - CRIU fix
   - Debugfs fix"

* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
  drm/amdgpu: add missing vram lost check for LEGACY RESET
  drm/amdgpu/discovery: fix fw based ip discovery
  drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
  amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
  drm/amdgpu: Update SDMA firmware version check for user queue support
  drm/amdgpu: Add NULL check for asic_funcs
  drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
  drm/amd/display: fix a Null pointer dereference vulnerability
  drm/amd/display: Add primary plane to commits for correct VRR handling
  drm/amdgpu: update mmhub 3.3 client id mappings
  drm/amdgpu: update mmhub 3.0.1 client id mappings
  drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
  drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
  drm/amd/display: Don't overwrite dce60_clk_mgr
  drm/amdkfd: Fix checkpoint-restore on multi-xcc
  drm/amd: Restore cached manual clock settings during resume
  drm/amd: Restore cached power limit during resume
  drm/amdgpu: Update external revid for GC v9.5.0
  drm/amdgpu: Update supported modes for GC v9.5.0
  Mark xe driver as BROKEN if kernel page size is not 4kB
  ...
2025-08-08 06:48:14 +03:00
Simon Richter
022906afdf
Mark xe driver as BROKEN if kernel page size is not 4kB
This driver, for the time being, assumes that the kernel page size is 4kB,
so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB
pages.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de
(cherry picked from commit 0521a868222ffe636bf202b6e9d29292c1e19c62)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-04 11:59:11 -04:00
Michal Wajdeczko
cb7a3f949a
drm/xe/pf: Make sure PF is ready to configure VFs
The PF driver might be resumed just to configure VFs, but since
it is doing some asynchronous GuC reconfigurations after fresh
reset, we should wait until all pending works are completed.

This is especially important in case of LMEM provisioning, since
we also need to update the LMTT and send invalidation requests
to all GuCs, which are expected to be already in the VGT mode.

Fixes: 68ae022278 ("drm/xe/pf: Force GuC virtualization mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com
(cherry picked from commit c6c86441c465ea440dfb5039f1c26e629a6fd64c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-04 11:59:06 -04:00
Michal Wajdeczko
c286ce6b01
drm/xe/pf: Disable PF restart worker on device removal
We can't let restart worker run once device is removed, since other
data that it might want to access could be already released.
Explicitly disable worker as part of device cleanup action.

Fixes: a4d1c5d0b9 ("drm/xe/pf: Move VFs reprovisioning to worker")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com
(cherry picked from commit a424353937c24554bb242a6582ed8f018b4a411c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-04 11:59:01 -04:00
Balasubramani Vivekanandan
465f1dba74
drm/xe/devcoredump: Defer devcoredump initialization during probe
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.

The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.

With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.

[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
 <TASK>
 ? drm_printf+0x64/0x90
 __xe_devcoredump_read+0x23f/0x2d0 [xe]
 ? __pfx___drm_printfn_coredump+0x10/0x10
 ? __pfx___drm_puts_coredump+0x10/0x10
 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
 process_one_work+0x22e/0x6f0
 worker_thread+0x1e8/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x11f/0x250
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x47/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)

Fixes: 4209d635a8 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1fdc4c381ff765479d76ccf3134717c430c871b8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-04 11:58:56 -04:00
Michal Wajdeczko
df9bdd4381
drm/xe/pf: Enable SR-IOV PF mode by default
We already claim official support for SR-IOV PF/VF modes on PTL
and BMG platforms, but by default we start the Xe driver on those
platforms in non-virtualized mode (native) since we still have
max_vfs modparam set to disable creation of the VFs.

It's time to let the Xe driver support SR-IOV PF mode by default.
We were already testing this on our CI, which was relying on the
patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a2b461bd6f3b36bded0a74178dec0e58e4714d3d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-04 11:56:18 -04:00
Linus Torvalds
e991acf1bc Significant patch series in this pull request:
- The 2 patch series "squashfs: Remove page->mapping references" from
   Matthew Wilcox gets us closer to being able to remove page->mapping.
 
 - The 5 patch series "relayfs: misc changes" from Jason Xing does some
   maintenance and minor feature addition work in relayfs.
 
 - The 5 patch series "kdump: crashkernel reservation from CMA" from Jiri
   Bohac switches us from static preallocation of the kdump crashkernel's
   working memory over to dynamic allocation.  So the difficulty of
   a-priori estimation of the second kernel's needs is removed and the
   first kernel obtains extra memory.
 
 - The 5 patch series "generalize panic_print's dump function to be used
   by other kernel parts" from Feng Tang implements some consolidation and
   rationalizatio of the various ways in which a faiing kernel splats
   information at the operator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+82gAKCRDdBJ7gKXxA
 jj4JAP9xb+w9DrBY6sa+7KTPIb+aTqQ7Zw3o9O2m+riKQJv6jAEA6aEwRnDA0451
 fDT5IqVlCWGvnVikdZHSnvhdD7TGsQ0=
 =rT71
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
WangYuli
26197b0fd2 drm/xe: fix typo "notifer"
There is a spelling mistake of 'notifer' in the comment which
should be 'notifier'.

Link: https://lkml.kernel.org/r/94190C5F54A19F3E+20250722073431.21983-3-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:39 -07:00
Linus Torvalds
89748acdf2 drm fixes for 6.17-rc1
amdgpu:
 - DSC divide by 0 fix
 - clang fix
 - DC debugfs fix
 - Userq fixes
 - Avoid extra evict-restore with KFD
 - Backlight fix
 - Documentation fix
 - RAS fix
 - Add new kicker handling
 - DSC fix for DCN 3.1.4
 - PSR fix
 - Atomic fix
 - DC reset fixes
 - DCN 3.0.1 fix
 - MMHUB client mapping fix
 
 xe:
 - Fix BMG probe on unsupported mailbox command
 - Fix OA static checker warning about null gt
 - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter
 - Fix missing unwind goto in GuC/HuC
 - Don't register I2C devices if VF
 - Clear whole GuC g2h_fence during initialization
 - Avoid call kfree for drmm_kzalloc
 - Fix pci_dev reference leak on configfs
 - SRIOV: Disable CSC support on VF
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmiL9c8ACgkQDHTzWXnE
 hr5V3RAAji19x0wr2RsNTZRLP5m9+8vAkevIcUv5Y3ir8dUTXONN00ynE7/J/GHt
 X2MI2A+KrS0jvWlNw1vKCRNY/r6kiPd+HpPFwpN7y98Hn9o8H3byKHCvCcV6z469
 XJlXRO7yLe71Bn+iVkgQRSHnsAJVvvtoV0eeJR39wW0pvscWlV/wnTVx9XExNODJ
 lQFehlcbPQB10VRQrRBcWuPtqTX9+sIefEYERKDSXgOKcngPXTwx4QDnbBkuGy5e
 mJ0Xz9SjB554IL2pWo+gOT8FYZLw7lypVAhdCJjR9o4WzqjlxLWK2h3vcLRvo6Pq
 PyzvH0MXIEwfTT6HExXZXYtzt3ovUJ6ZaIJ6Y5Y6z8PcNzeO0+qWyQBwuKSXsN1S
 OOgHYhDnbjz8bS7J9/FPr6yE8RAl391ZWQldG3qXNp8OoXVoQv+TRcX80A9HVwai
 y4f5Ob0bRSKsnJIddsldDZz3usJ7K3xhRjcw2F03mkQKNLcmAQ1YwE0ak573N7gG
 wUj85wa9xREfWiA0ZuhXlV4BVv8a6U8JdolYsJ+HIl0l7xsFKkwLT6adzVSovE2b
 u52fV8MTjJMV17tEF7pzu0Hv3Uar0lecXMKx4G1gYOzBhZCuzeq5+SPepIAc05Ad
 QUR2LwvH0cnd5/GFPkXT8+GsDnjIJKUjEEsJUxoQS966IVTzI4A=
 =AgPB
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Just a bunch of amdgpu and xe fixes.

  amdgpu:
   - DSC divide by 0 fix
   - clang fix
   - DC debugfs fix
   - Userq fixes
   - Avoid extra evict-restore with KFD
   - Backlight fix
   - Documentation fix
   - RAS fix
   - Add new kicker handling
   - DSC fix for DCN 3.1.4
   - PSR fix
   - Atomic fix
   - DC reset fixes
   - DCN 3.0.1 fix
   - MMHUB client mapping fix

  xe:
   - Fix BMG probe on unsupported mailbox command
   - Fix OA static checker warning about null gt
   - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter
   - Fix missing unwind goto in GuC/HuC
   - Don't register I2C devices if VF
   - Clear whole GuC g2h_fence during initialization
   - Avoid call kfree for drmm_kzalloc
   - Fix pci_dev reference leak on configfs
   - SRIOV: Disable CSC support on VF

* tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
  drm/xe/vf: Disable CSC support on VF
  drm/amdgpu: update mmhub 4.1.0 client id mappings
  drm/amd/display: Allow DCN301 to clear update flags
  drm/amd/display: Pass up errors for reset GPU that fails to init HW
  drm/amd/display: Only finalize atomic_obj if it was initialized
  drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported
  drm/amd/display: Disable dsc_power_gate for dcn314 by default
  drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14
  drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access
  drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c'
  drm/amd/display: fix initial backlight brightness calculation
  drm/amdgpu: Avoid extra evict-restore process.
  drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop
  drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities
  drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()
  drm/amd/display: Fix divide by zero when calculating min ODM factor
  drm/xe/configfs: Fix pci_dev reference leak
  drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc()
  drm/xe/guc: Clear whole g2h_fence during initialization
  drm/xe/vf: Don't register I2C devices if VF
  ...
2025-07-31 21:47:36 -07:00
Linus Torvalds
260f6f4fda drm for 6.17-rc1
non-drm:
 rust:
 - make ETIMEDOUT available
 - add size constants up to SZ_2G
 - add DMA coherent allocation bindings
 mtd:
 - driver for Intel GPU non-volatile storage
 i2c
 - designware quirk for Intel xe
 
 core:
 - atomic helpers: tune enable/disable sequences
 - add task info to wedge API
 - refactor EDID quirks
 - connector: move HDR sink to drm_display_info
 - fourcc: half-float and 32-bit float formats
 - mode_config: pass format info to simplify
 
 dma-buf:
 - heaps: Give CMA heap a stable name
 
 ci:
 - add device tree validation and kunit
 
 displayport:
 - change AUX DPCD access probe address
 - add quirk for DPCD probe
 - add panel replay definitions
 - backlight control helpers
 
 fbdev:
 - make CONFIG_FIRMWARE_EDID available on all arches
 
 fence:
 - fix UAF issues
 
 format-helper:
 - improve tests
 
 gpusvm:
 - introduce devmem only flag for allocation
 - add timeslicing support to GPU SVM
 
 ttm:
 - improve eviction
 
 sched:
 - tracing improvements
 - kunit improvements
 - memory leak fixes
 - reset handling improvements
 
 color mgmt:
 - add hardware gamma LUT handling helpers
 
 bridge:
 - add destroy hook
 - switch to reference counted drm_bridge allocations
 - tc358767: convert to devm_drm_bridge_alloc
 - improve CEC handling
 
 panel:
 - switch to reference counter drm_panel allocations
 - fwnode panel lookup
 - Huiling hl055fhv028c support
 - Raspberry Pi 7" 720x1280 support
 - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
 - simple: AUO P238HAN01
 - st7701: Winstar wf40eswaa6mnn0
 - visionox: rm69299-shift
 - Renesas R61307, Renesas R69328 support
 - DJN HX83112B
 
 hdmi:
 - add CEC handling
 - YUV420 output support
 
 xe:
 - WildCat Lake support
 - Enable PanthorLake by default
 - mark BMG as SRIOV capable
 - update firmware recommendations
 - Expose media OA units
 - aux-bux support for non-volatile memory
 - MTD intel-dg driver for non-volatile memory
 - Expose fan control and voltage regulator in sysfs
 - restructure migration for multi-device
 - Restore GuC submit UAF fix
 - make GEM shrinker drm managed
 - SRIOV VF Post-migration recovery of GGTT nodes
 - W/A additions/reworks
 - Prefetch support for svm ranges
 - Don't allocate managed BO for each policy change
 - HWMON fixes for BMG
 - Create LRC BO without VM
 - PCI ID updates
 - make SLPC debugfs files optional
 - rework eviction rejection of bound external BOs
 - consolidate PAT programming logic for pre/post Xe2
 - init changes for flicker-free boot
 - Enable GuC Dynamic Inhibit Context switch
 
 i915:
 - drm_panic support for i915/xe
 - initial flip queue off by default for LNL/PNL
 - Wildcat Lake Display support
 - Support for DSC fractional link bpp
 - Support for simultaneous Panel Replay and Adaptive sync
 - Support for PTL+ double buffer LUT
 - initial PIPEDMC event handling
 - drm_panel_follower support
 - DPLL interface renames
 - allocate struct intel_display dynamically
 - flip queue preperation
 - abstract DRAM detection better
 - avoid GuC scheduling stalls
 - remove DG1 force probe requirement
 - fix MEI interrupt handler on RT kernels
 - use backlight control helpers for eDP
 - more shared display code refactoring
 
 amdgpu:
 - add userq slot to INFO ioctl
 - SR-IOV hibernation support
 - Suspend improvements
 - Backlight improvements
 - Use scaling for non-native eDP modes
 - cleaner shader updates for GC 9.x
 - Remove fence slab
 - SDMA fw checks for userq support
 - RAS updates
 - DMCUB updates
 - DP tunneling fixes
 - Display idle D3 support
 - Per queue reset improvements
 - initial smartmux support
 
 amdkfd:
 - enable KFD on loongarch
 - mtype fix for ext coherent system memory
 
 radeon:
 - CS validation additional GL extensions
 - drop console lock during suspend/resume
 - bump driver version
 
 msm:
 - VM BIND support
 - CI: infrastructure updates
 - UBWC single source of truth
 - decouple GPU and KMS support
 - DP: rework I/O accessors
 - DPU: SM8750 support
 - DSI: SM8750 support
 - GPU: X1-45 support and speedbin support for X1-85
 - MDSS: SM8750 support
 
 nova:
 - register! macro improvements
 - DMA object abstraction
 - VBIOS parser + fwsec lookup
 - sysmem flush page support
 - falcon: generic falcon boot code and HAL
 - FWSEC-FRTS: fb setup and load/execute
 
 ivpu:
 - Add Wildcat Lake support
 - Add turbo flag
 
 ast:
 - improve hardware generations implementation
 
 imx:
 - IMX8qxq Display Controller support
 
 lima:
 - Rockchip RK3528 GPU support
 
 nouveau:
 - fence handling cleanup
 
 panfrost:
 - MT8370 support
 - bo labeling
 - 64-bit register access
 
 qaic:
 - add RAS support
 
 rockchip:
 - convert inno_hdmi to a bridge
 
 rz-du:
 - add RZ/V2H(P) support
 - MIPI-DSI DCS support
 
 sitronix:
 - ST7567 support
 
 sun4i:
 - add H616 support
 
 tidss:
 - add TI AM62L support
 - AM65x OLDI bridge support
 
 bochs:
 - drm panic support
 
 vkms:
 - YUV and R* format support
 - use faux device
 
 vmwgfx:
 - fence improvements
 
 hyperv:
 - move out of simple
 - add drm_panic support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmiJM/0ACgkQDHTzWXnE
 hr6MpA/+JJKGdSdrE95QkaMcOZh/3e3areGXZ0V/RrrJXdB4/DoAfQSHhF0H7m7y
 MhBGVLGNMXq7KHrz28p1MjLHrE1mwmvJ6hZ4J076ed4u9naoCD0m6k5w5wiue+KL
 HyPR54ADxN0BYmgV0l/B0wj42KsHyTO4x4hdqPJu02V9Dtmx6FCh2ujkOF3p9nbK
 GMwWDttl4KEKljD0IvQ9YIYJ66crYGx/XmZi7JoWRrS104K/h1u8qZuXBp5jVKTy
 OZRAVyLdmJqdTOLH7l599MBBcEd/bNV37/LVwF4T5iFunEKOAiyN0QY0OR+IeRVh
 ZfOv2/gp4UNyIfyahQ7LKLgEilNPGHoPitvDJPvBZxW2UjwXVNvA1QfdK5DAlVRS
 D5NoFRjlFFCz8/c2hQwlKJ9o7eVgH3/pK0mwR7SPGQTuqzLFCrAfCuzUvg/gV++6
 JFqmGKMHeCoxO2o4GMrwjFttStP41usxtV/D+grcbPteNO9UyKJS4C38n4eamJXM
 a9Sy9APuAb6F0w5+yMItEF7TQifgmhIbm5AZHlxE1KoDQV6TdiIf1Gou5LeDGoL6
 OACbXHJPL52tUnfCRpbfI4tE/IVyYsfL01JnvZ5cZZWItXfcIz76ykJri+E0G60g
 yRl/zkimHKO4B0l/HSzal5xROXr+3VzeWehEiz/ot1VriP5OesA=
 =n9MO
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "Highlights:

   - Intel xe enable Panthor Lake, started adding WildCat Lake

   - amdgpu has a bunch of reset improvments along with the usual IP
     updates

   - msm got VM_BIND support which is important for vulkan sparse memory

   - more drm_panic users

   - gpusvm common code to handle a bunch of core SVM work outside
     drivers.

  Detail summary:

  Changes outside drm subdirectory:
   - 'shrink_shmem_memory()' for better shmem/hibernate interaction
   - Rust support infrastructure:
      - make ETIMEDOUT available
      - add size constants up to SZ_2G
      - add DMA coherent allocation bindings
   - mtd driver for Intel GPU non-volatile storage
   - i2c designware quirk for Intel xe

  core:
   - atomic helpers: tune enable/disable sequences
   - add task info to wedge API
   - refactor EDID quirks
   - connector: move HDR sink to drm_display_info
   - fourcc: half-float and 32-bit float formats
   - mode_config: pass format info to simplify

  dma-buf:
   - heaps: Give CMA heap a stable name

  ci:
   - add device tree validation and kunit

  displayport:
   - change AUX DPCD access probe address
   - add quirk for DPCD probe
   - add panel replay definitions
   - backlight control helpers

  fbdev:
   - make CONFIG_FIRMWARE_EDID available on all arches

  fence:
   - fix UAF issues

  format-helper:
   - improve tests

  gpusvm:
   - introduce devmem only flag for allocation
   - add timeslicing support to GPU SVM

  ttm:
   - improve eviction

  sched:
   - tracing improvements
   - kunit improvements
   - memory leak fixes
   - reset handling improvements

  color mgmt:
   - add hardware gamma LUT handling helpers

  bridge:
   - add destroy hook
   - switch to reference counted drm_bridge allocations
   - tc358767: convert to devm_drm_bridge_alloc
   - improve CEC handling

  panel:
   - switch to reference counter drm_panel allocations
   - fwnode panel lookup
   - Huiling hl055fhv028c support
   - Raspberry Pi 7" 720x1280 support
   - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
   - simple: AUO P238HAN01
   - st7701: Winstar wf40eswaa6mnn0
   - visionox: rm69299-shift
   - Renesas R61307, Renesas R69328 support
   - DJN HX83112B

  hdmi:
   - add CEC handling
   - YUV420 output support

  xe:
   - WildCat Lake support
   - Enable PanthorLake by default
   - mark BMG as SRIOV capable
   - update firmware recommendations
   - Expose media OA units
   - aux-bux support for non-volatile memory
   - MTD intel-dg driver for non-volatile memory
   - Expose fan control and voltage regulator in sysfs
   - restructure migration for multi-device
   - Restore GuC submit UAF fix
   - make GEM shrinker drm managed
   - SRIOV VF Post-migration recovery of GGTT nodes
   - W/A additions/reworks
   - Prefetch support for svm ranges
   - Don't allocate managed BO for each policy change
   - HWMON fixes for BMG
   - Create LRC BO without VM
   - PCI ID updates
   - make SLPC debugfs files optional
   - rework eviction rejection of bound external BOs
   - consolidate PAT programming logic for pre/post Xe2
   - init changes for flicker-free boot
   - Enable GuC Dynamic Inhibit Context switch

  i915:
   - drm_panic support for i915/xe
   - initial flip queue off by default for LNL/PNL
   - Wildcat Lake Display support
   - Support for DSC fractional link bpp
   - Support for simultaneous Panel Replay and Adaptive sync
   - Support for PTL+ double buffer LUT
   - initial PIPEDMC event handling
   - drm_panel_follower support
   - DPLL interface renames
   - allocate struct intel_display dynamically
   - flip queue preperation
   - abstract DRAM detection better
   - avoid GuC scheduling stalls
   - remove DG1 force probe requirement
   - fix MEI interrupt handler on RT kernels
   - use backlight control helpers for eDP
   - more shared display code refactoring

  amdgpu:
   - add userq slot to INFO ioctl
   - SR-IOV hibernation support
   - Suspend improvements
   - Backlight improvements
   - Use scaling for non-native eDP modes
   - cleaner shader updates for GC 9.x
   - Remove fence slab
   - SDMA fw checks for userq support
   - RAS updates
   - DMCUB updates
   - DP tunneling fixes
   - Display idle D3 support
   - Per queue reset improvements
   - initial smartmux support

  amdkfd:
   - enable KFD on loongarch
   - mtype fix for ext coherent system memory

  radeon:
   - CS validation additional GL extensions
   - drop console lock during suspend/resume
   - bump driver version

  msm:
   - VM BIND support
   - CI: infrastructure updates
   - UBWC single source of truth
   - decouple GPU and KMS support
   - DP: rework I/O accessors
   - DPU: SM8750 support
   - DSI: SM8750 support
   - GPU: X1-45 support and speedbin support for X1-85
   - MDSS: SM8750 support

  nova:
   - register! macro improvements
   - DMA object abstraction
   - VBIOS parser + fwsec lookup
   - sysmem flush page support
   - falcon: generic falcon boot code and HAL
   - FWSEC-FRTS: fb setup and load/execute

  ivpu:
   - Add Wildcat Lake support
   - Add turbo flag

  ast:
   - improve hardware generations implementation

  imx:
   - IMX8qxq Display Controller support

  lima:
   - Rockchip RK3528 GPU support

  nouveau:
   - fence handling cleanup

  panfrost:
   - MT8370 support
   - bo labeling
   - 64-bit register access

  qaic:
   - add RAS support

  rockchip:
   - convert inno_hdmi to a bridge

  rz-du:
   - add RZ/V2H(P) support
   - MIPI-DSI DCS support

  sitronix:
   - ST7567 support

  sun4i:
   - add H616 support

  tidss:
   - add TI AM62L support
   - AM65x OLDI bridge support

  bochs:
   - drm panic support

  vkms:
   - YUV and R* format support
   - use faux device

  vmwgfx:
   - fence improvements

  hyperv:
   - move out of simple
   - add drm_panic support"

* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
  drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
  drm/tidss: encoder: convert to devm_drm_bridge_alloc()
  drm/amdgpu: move reset support type checks into the caller
  drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
  drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
  drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
  drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
  drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
  drm/amdgpu: Add WARN_ON to the resource clear function
  drm/amd/pm: Use cached metrics data on SMUv13.0.6
  drm/amd/pm: Use cached data for min/max clocks
  gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
  drm/amdgpu: Replace HQD terminology with slots naming
  drm/amdgpu: Add user queue instance count in HW IP info
  drm/amd/amdgpu: Add helper functions for isp buffers
  drm/amd/amdgpu: Initialize swnode for ISP MFD device
  ...
2025-07-30 19:26:49 -07:00
Lukasz Laguna
f62408efc8
drm/xe/vf: Disable CSC support on VF
CSC is not accessible by VF drivers, so disable its support flag on VF
to prevent further initialization attempts.

Fixes: e02cea83d3 ("drm/xe/gsc: add Battlemage support")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com
(cherry picked from commit 552dbba1caaf0cb40ce961806d757615e26ec668)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-30 15:09:27 -04:00
Linus Torvalds
9669b2499e platform-drivers-x86 for v6.17-1
Highlights
 
  - alienware: Add more precise labels to fans
 
  - amd/hsmp: Improve misleading probe errors (make the legacy driver
              aware when HSMP is supported through the ACPI driver)
 
  - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list
 
  - drm/xe: Correct (D)VSEC information to support PMT crashlog feature
 
  - fujitsu: Clamp charge threshold instead of returning an error
 
  - ideapad: Expore change types
 
  - intel/pmt:
 
    - Add PMT Discovery driver
 
    - Add API to retrieve telemetry regions by feature
 
    - Fix crashlog NULL access
 
    - Support Battlemage GPU (BMG) crashlog
 
  - intel/vsec:
 
    - Add Discovery feature
 
    - Add feature dependency support using device links
 
  - lenovo:
 
    - Move lenovo drivers under drivers/platform/x86/lenovo/
 
    - Add WMI drivers for Lenovo Gaming series
 
    - Improve DMI handling
 
  - oxpec:
 
    - Add support for OneXPlayer X1 Mini Pro (Strix Point variant)
 
    - Fix EC registers for G1 AMD
 
  - samsung-laptop: Expose change types
 
  - wmi: Fix WMI device naming issue (same GUID corner cases)
 
  - x86-android-tables: Add ovc-capacity-table to generic battery nodes
 
  - Miscellaneous cleanups / refactoring / improvements
 
 The following is an automated shortlog grouped by driver:
 
 Add Lenovo Capability Data 01 WMI Driver:
  - Add Lenovo Capability Data 01 WMI Driver
 
 Add Lenovo Gamezone WMI Driver:
  - Add Lenovo Gamezone WMI Driver
 
 Add Lenovo Other Mode WMI Driver:
  - Add Lenovo Other Mode WMI Driver
 
 Add lenovo-wmi-* driver Documentation:
  - Add lenovo-wmi-* driver Documentation
 
 Add Lenovo WMI Events Driver:
  - Add Lenovo WMI Events Driver
 
 Add lenovo-wmi-helpers:
  - Add lenovo-wmi-helpers
 
 alienware-wmi-wmax:
  -  Add appropriate labels to fans
 
 amd/hsmp:
  -  Enhance the print messages to prevent confusion
  -  Use IS_ENABLED() instead of IS_REACHABLE()
 
 amd: pmc:
  -  Add Lenovo Yoga 6 13ALC6 to pmc quirk list
 
 arm64: lenovo-yoga-c630:
  -  use the auxiliary device creation helper
 
 dell_rbu:
  -  Remove unused struct
 
 dell-uart-backlight:
  -  Use blacklight power constant
 
 docs:
  -  Add ABI documentation for intel_pmt feature directories
 
 Documentation: ABI:
  -  Update WMI device paths in ABI docs
 
 drm/xe:
  -  Correct BMG VSEC header sizing
  -  Correct the rev value for the DVSEC entries
 
 fujitsu:
  -  clamp charge_control_end_threshold values to 50
  -  use unsigned int for kstrtounit
 
 ideapad:
  -  Expose charge_types
 
 intel/pmt:
  -  Add PMT Discovery driver
  -  add register access helpers
  -  correct types
  -  decouple sysfs and namespace
 
 intel/pmt/discovery:
  -  fix format string warning
  -  Fix size_t specifiers for 32-bit
  -  Get telemetry attributes
 
 intel/pmt:
  -  fix a crashlog NULL pointer access
  -  fix build dependency for kunit test
  -  KUNIT test for PMT Enhanced Discovery API
  -  mutex clean up
  -  refactor base parameter
  -  re-order trigger logic
  -  support BMG crashlog
 
 intel/pmt/telemetry:
  -  Add API to retrieve telemetry regions by feature
 
 intel/pmt:
  -  use a version struct
  -  use guard(mutex)
  -  white space cleanup
 
 intel_telemetry:
  -  Remove unused telemetry_*_events()
  -  Remove unused telemetry_[gs]et_sampling_period()
  -  Remove unused telemetry_raw_read_events()
 
 intel/tpmi:
  -  Get OOBMSM CPU mapping from TPMI
  -  Relocate platform info to intel_vsec.h
 
 intel/vsec:
  -  Add device links to enforce dependencies
  -  Add new Discovery feature
  -  Add private data for per-device data
  -  Create wrapper to walk PCI config space
  -  Set OOBMSM to CPU mapping
  -  Skip absent features during initialization
  -  Skip driverless features
 
 lenovo:
  -  gamezone needs "other mode"
 
 lenovo-yoga-tab2-pro-1380-fastcharger:
  -  Use devm_pinctrl_register_mappings()
 
 MAINTAINERS:
  -  Add link to documentation of Intel PMT ABI
 
 Move Lenovo files into lenovo subdir:
  - Move Lenovo files into lenovo subdir
 
 oxpec:
  -  Add support for OneXPlayer X1 Mini Pro (Strix Point)
  -  Fix turbo register for G1 AMD
 
 samsung-laptop:
  -  Expose charge_types
 
 silicom:
  -  remove unnecessary GPIO line direction check
 
 thinklmi:
  -  improved DMI handling
 
 thinkpad_acpi:
  -  Handle KCOV __init vs inline mismatches
 
 wmi:
  -  Fix WMI device naming issue
 
 x86-android-tablets:
  -  Add generic_lipo_4v2_battery info
  -  Add ovc-capacity-table info
 
 Merges:
  -  Merge branch 'fixes' into 'for-next'
  -  Merge branch 'fixes' into for-next
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaIdbygAKCRBZrE9hU+XO
 MbqTAQCqqczU2YXRnq7TIvw/yl40+scIKMXobjX0EEpmgqhlHwEAkWwjQG0ytS2j
 hzES5gog1xT6A4TIjVr0Up5MUj3crwU=
 =+TRi
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers from Ilpo Järvinen:

 - alienware: Add more precise labels to fans

 - amd/hsmp: Improve misleading probe errors (make the legacy driver
   aware when HSMP is supported through the ACPI driver)

 - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list

 - drm/xe: Correct (D)VSEC information to support PMT crashlog feature

 - fujitsu: Clamp charge threshold instead of returning an error

 - ideapad: Expore change types

 - intel/pmt:
     - Add PMT Discovery driver
     - Add API to retrieve telemetry regions by feature
     - Fix crashlog NULL access
     - Support Battlemage GPU (BMG) crashlog

 - intel/vsec:
     - Add Discovery feature
     - Add feature dependency support using device links

 - lenovo:
     - Move lenovo drivers under drivers/platform/x86/lenovo/
     - Add WMI drivers for Lenovo Gaming series
     - Improve DMI handling

 - oxpec:
     - Add support for OneXPlayer X1 Mini Pro (Strix Point variant)
     - Fix EC registers for G1 AMD

 - samsung-laptop: Expose change types

 - wmi: Fix WMI device naming issue (same GUID corner cases)

 - x86-android-tables: Add ovc-capacity-table to generic battery nodes

 - Miscellaneous cleanups / refactoring / improvements

* tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (63 commits)
  platform/x86: oxpec: Add support for OneXPlayer X1 Mini Pro (Strix Point)
  platform/x86: oxpec: Fix turbo register for G1 AMD
  platform/x86/intel/pmt: support BMG crashlog
  platform/x86/intel/pmt: use a version struct
  platform/x86/intel/pmt: refactor base parameter
  platform/x86/intel/pmt: add register access helpers
  platform/x86/intel/pmt: decouple sysfs and namespace
  platform/x86/intel/pmt: correct types
  platform/x86/intel/pmt: re-order trigger logic
  platform/x86/intel/pmt: use guard(mutex)
  platform/x86/intel/pmt: mutex clean up
  platform/x86/intel/pmt: white space cleanup
  drm/xe: Correct BMG VSEC header sizing
  drm/xe: Correct the rev value for the DVSEC entries
  platform/x86/intel/pmt: fix a crashlog NULL pointer access
  platform/x86: samsung-laptop: Expose charge_types
  platform/x86/amd: pmc: Add Lenovo Yoga 6 13ALC6 to pmc quirk list
  platform/x86: dell-uart-backlight: Use blacklight power constant
  platform/x86/intel/pmt: fix build dependency for kunit test
  platform/x86: lenovo: gamezone needs "other mode"
  ...
2025-07-28 23:21:28 -07:00
Michal Wajdeczko
942ac8da63
drm/xe/configfs: Fix pci_dev reference leak
We are using pci_get_domain_bus_and_slot() function to verify if
the given config directory name matches any existing PCI device,
but we missed to call matching pci_dev_put() to release reference.

While around, also change error code in case of no device match,
to make it more specific than generic formatting error.

Fixes: 16280ded45 ("drm/xe: Add configfs to enable survivability mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 0bdd05c2a82bbf2419415d012fd4f5faeca7f1af)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:48 -04:00
Shuicheng Lin
4846856c3a
drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc()
Memory allocated with drmm_kzalloc() should not be freed using
kfree(), as it is managed by the DRM subsystem. The memory will
be automatically freed when the associated drm_device is released.
These 3 group pointers are allocated using drmm_kzalloc() in
hw_engine_group_alloc(), so they don't require manual deallocation.

Fixes: 6797906074 ("drm/xe/hw_engine_group: Fix potential leak")
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250724193854.1124510-2-shuicheng.lin@intel.com
(cherry picked from commit f98de826b418885a21ece67f0f5b921ae759b7bf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:43 -04:00
Michal Wajdeczko
a2e1407eb8
drm/xe/guc: Clear whole g2h_fence during initialization
The struct g2h_fence must be explicitly initializated using the
g2h_fence_init() function to avoid trash values in its members,
but we missed to update this helper function with the new member.

To fix that and avoid any future mistakes, memset the whole struct
first, then update remaining non-zero members.

Fixes: 94de94d24e ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250723175639.206875-1-michal.wajdeczko@intel.com
(cherry picked from commit 159afd92bae8153bdd8d8b34aea0d463fe19c978)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:38 -04:00
Lukasz Laguna
cccb918e02
drm/xe/vf: Don't register I2C devices if VF
VF drivers can't access I2C devices, so skip their registration when
running as VF.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Fixes: f0e53aadd7 ("drm/xe: Support for I2C attached MCUs")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250717155420.25298-1-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 9a220e065914b67b55d3d0ab91c3e215742fdd73)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:33 -04:00
Zhanjun Dong
dc94168eaa
drm/xe/uc: Fix missing unwind goto
Fix missing unwind goto on error handling.

Fixes: b2c4ac219f ("drm/xe/uc: Disable GuC communication on hardware initialization error")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250721214520.954014-1-zhanjun.dong@intel.com
(cherry picked from commit 176f44a5ec0b074aaf44852db77d0c183c36696d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:28 -04:00
Dan Carpenter
2bd986021c
drm/xe: Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter()
The fwnode_create_software_node() function returns error pointers.  It
never returns NULL.  Update the checks to match.

Fixes: f0e53aadd7 ("drm/xe: Support for I2C attached MCUs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/65825d00-81ab-4665-af51-4fff6786a250@sabinyo.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 2f264d58cc805a3cefc6b98097f90fbc388136ef)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:22 -04:00
Ashutosh Dixit
6aaceed7fe
drm/xe/oa: Fix static checker warning about null gt
There is a static checker warning that gt returned by xe_device_get_gt can
be NULL and that is being dereferenced. Use xe_root_mmio_gt instead, which
is equivalent and cannot return a NULL gt 0.

Fixes: 10d42ef34b ("drm/xe/oa: Assign hwe for OAM_SAG")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250715181422.2807624-1-ashutosh.dixit@intel.com
(cherry picked from commit 308dc9b27874d0e8a0258869b9e681b0fdd2e579)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:17 -04:00
Raag Jadav
d9e9aa3e97
drm/xe: Don't fail probe on unsupported mailbox command
If the device is running older pcode firmware, it is possible that newer
mailbox commands are not supported by it. The sysfs attributes aren't
useful in that case, but we shouldn't fail driver probe because of it.
As of now, it is unknown if we can distinguish unsupported commands before
attempting them. But until we figure out a way to do that, fix the
regressions.

v2: Add debug message (Lucas)

Fixes: cdc36b66cd ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit ed5461daa150b037e36b8202381da1ef85d6b16b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-28 10:22:12 -04:00
Lucas De Marchi
99e91521ce drm/xe: Fix build without debugfs
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:

	ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
	drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
	ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
	collect2: error: ld returned 1 exit status

Do not use the gt_reset_failure attribute if debugfs is not enabled.

Fixes: 8f3013e0b2 ("drm/xe: Introduce fault injection for gt reset")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4d3bbe9dd28c0a4ca119e4b8823c5f5e9cb3ff90)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-07-24 21:06:52 +02:00
Michael J. Ruhl
5b27388171
drm/xe: Correct BMG VSEC header sizing
The intel_vsec_header information for the crashlog feature is
incorrect.

Update the VSEC header with correct sizing and count.

Since the crashlog entries are "merged" (num_entries = 2), the
separate capabilities entries must be merged as well.

Fixes: 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20250713172943.7335-4-michael.j.ruhl@intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-22 17:38:49 +03:00
Michael J. Ruhl
0ba9e9cf76
drm/xe: Correct the rev value for the DVSEC entries
By definition, the Designated Vendor Specific Extended Capability
(DVSEC) revision should be 1.

Add the rev value to be correct.

Fixes: 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20250713172943.7335-3-michael.j.ruhl@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-22 17:38:47 +03:00
Dave Airlie
be3cd668ff drm-misc-next for 6.17:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
 
 - mode_config: Change fb_create prototype to pass the drm_format_info
   and avoid redundant lookups in drivers
 - sched: kunit improvements, memory leak fixes, reset handling
   improvements
 - tests: kunit EDID update
 
 Driver Changes:
 
 - amdgpu: Hibernation fixes, structure lifetime fixes
 - nouveau: sched improvements
 - sitronix: Add Sitronix ST7567 Support
 
 - bridge:
   - Make connector available to bridge detect hook
 
 - panel:
   - More refcounting changes
   - New panels: BOE NE14QDM
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaHisvgAKCRAnX84Zoj2+
 dihcAX49H551lQ42amN11jN4pR2tpKLjRMCSsNnXzkokJYx8adEaGTcZq+0Oi9rZ
 muGQKJABf2SSb/bem7qEu0JVL3/Pz8DOLbKIE4ltPMlHfYF5NzJhy3AKIMIjMLH7
 XqSDXOMgjA==
 =CfQG
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2025-07-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.17:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

- mode_config: Change fb_create prototype to pass the drm_format_info
  and avoid redundant lookups in drivers
- sched: kunit improvements, memory leak fixes, reset handling
  improvements
- tests: kunit EDID update

Driver Changes:

- amdgpu: Hibernation fixes, structure lifetime fixes
- nouveau: sched improvements
- sitronix: Add Sitronix ST7567 Support

- bridge:
  - Make connector available to bridge detect hook

- panel:
  - More refcounting changes
  - New panels: BOE NE14QDM

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
2025-07-21 09:16:51 +10:00
Dave Airlie
af42cf30ea Driver Changes:
- Create and use XE_DEVICE_WA infrastructure (Atwood)
  - SRIOV: Mark BMG as SR-IOV capable (Michal)
  - Dont skip TLB invalidations on VF (Tejas)
  - Fix migration copy direction in access_memory (Auld)
  - General code clean-up (Lucas, Brost, Dr. David, Xin)
  - More missing XeLP workarounds (Tvrtko)
  - SRIOV: Relax VF/PF version negotiation (Michal)
  - SRIOV: LMTT invalidation (Michal)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmh2my8ACgkQ+mJfZA7r
 E8qFgQf/WY/Chfnvp1XF9QtrmYaFnFPet8PZEYBYBKo/k1cWQSXgHP7w+7AujEKh
 LM6x7t277h0xQ0AYiOQgsMiFRopf/G9/VJB+438+L0s0GjIFSdBpmQ31oWz9vR0C
 RZJWvZvopU4isoL9yjgXrkRBH+3FBGyZnGISIIi9t/omWNQhdzSuWhj8+Bwwc4bV
 nZsevszkCAfw+kr0R3LF5YM31+NFQiYMaZvFIKCodwye6nW6JDOXC20/jzR7laTm
 yoKqFKCR4gDKQXVEG4yso9DC+NdcUHXSYL6vzAut04rbU0rE9wcDXx/pl05El6Fl
 wmkwYrlIZdkD5BIEsRw3i+XRdZubYQ==
 =VElm
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-next-2025-07-15' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Driver Changes:
 - Create and use XE_DEVICE_WA infrastructure (Atwood)
 - SRIOV: Mark BMG as SR-IOV capable (Michal)
 - Dont skip TLB invalidations on VF (Tejas)
 - Fix migration copy direction in access_memory (Auld)
 - General code clean-up (Lucas, Brost, Dr. David, Xin)
 - More missing XeLP workarounds (Tvrtko)
 - SRIOV: Relax VF/PF version negotiation (Michal)
 - SRIOV: LMTT invalidation (Michal)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aHacDvF9IaVHI61C@intel.com
2025-07-18 19:48:20 +10:00
Michal Wajdeczko
5c244eeca5 drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
(cherry picked from commit 1c38dd6afa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Michal Wajdeczko
81dccec448 drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.

However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:

 $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset

this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:

 [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
 [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
 [ ] xe 0000:00:02.0: [drm] GT0: reset queued
 [ ] xe 0000:00:02.0: [drm] GT0: reset started
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
 [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
 [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed

While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
(cherry picked from commit 9f50b729dd)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Lucas De Marchi
057a7d66f9 drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 81e139db69)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Matthew Brost
3155ac8925 drm/xe: Move page fault init after topology init
We need the topology to determine GT page fault queue size, move page
fault init after topology init.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com
(cherry picked from commit beb72acb5b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:27 -03:00
Balasubramani Vivekanandan
2a58b21ade drm/xe/mocs: Initialize MOCS index early
MOCS uc_index is used even before it is initialized in the following
callstack
    guc_prepare_xfer()
    __xe_guc_upload()
    xe_guc_min_load_for_hwconfig()
    xe_uc_init_hwconfig()
    xe_gt_init_hwconfig()

Do MOCS index initialization earlier in the device probe.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 241cc827c0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:12 -03:00
Matthew Auld
1381c231c1 drm/xe/migrate: fix copy direction in access_memory
After we do the modification on the host side, ensure we write the
result back to VRAM and not the other way around, otherwise the
modification will be lost if treated like a read.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com
(cherry picked from commit c12fe703ca)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:45:19 -03:00
Tejas Upadhyay
fd25fa90ed drm/xe: Dont skip TLB invalidations on VF
Skipping TLB invalidations on VF causing unrecoverable
faults. Probable reason for skipping TLB invalidations
on SRIOV could be lack of support for instruction
MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some
additional handling.

Helps in resolving,
[  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
                ASID: 0
                VFID: 0
                PDATA: 0x0d92
                Faulted Address: 0x0000000002fa0000
                FaultType: 0
                AccessType: 1
                FaultLevel: 0
                EngineClass: 3 bcs
                EngineInstance: 8
[  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22

V2:
 - Use Xmas tree (MichalW)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 97515d0b3e ("drm/xe/vf: Don't emit access to Global HWSP if VF")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit b528e896fa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:45:19 -03:00
Ville Syrjälä
800df9e50c drm/i915: Pass along the format info from .fb_create() to drm_helper_mode_fill_fb_struct()
Plumb the format info from .fb_create() all the way to
drm_helper_mode_fill_fb_struct() to avoid the redundant
lookup.

For the fbdev case a manual drm_get_format_info() lookup
is needed.

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-14-ville.syrjala@linux.intel.com
2025-07-16 20:09:08 +03:00
Maíra Canal
53dcd0eaa2
drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Xe can skip the reset if TDR has fired before the free job worker and can
also re-arm the timeout timer in some scenarios. Instead of manipulating
scheduler's internals, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.

Note that, in the first case, there is no need to restart submission if it
hasn't been stopped.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-7-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15 08:27:08 -03:00
Maíra Canal
0a5dc1b67e
drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET
Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.

However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".

Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.

Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15 08:27:00 -03:00
Michal Wajdeczko
a816487681 drm/xe/pf: Invalidate LMTT after completing changes
Once we finish populating all leaf pages in the VF's LMTT we should
make sure that hardware will not access any stale data. Explicitly
force LMTT invalidation (as it was already planned in the past).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com
2025-07-15 13:05:22 +02:00
Michal Wajdeczko
e497957fee drm/xe/pf: Invalidate LMTT during LMEM unprovisioning
Invalidate LMTT immediately after removing VF's LMTT page tables
and clearing root PTE in the LMTT PD to avoid any invalid access
by the hardware (and VF) due to stale data.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com
2025-07-15 13:05:20 +02:00
Michal Wajdeczko
68ae022278 drm/xe/pf: Force GuC virtualization mode
By default the GuC starts in the 'native' mode and enables the VGT
mode (aka 'virtualization' mode) only after it receives at least one
set of VF configuration data. While this happens naturally while PF
begins VFs provisioning, we might need this sooner as some actions,
like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in
the VGT mode.

And this becomes a real problem if we would want to use above action
to invalidate the LMTT early during VFs auto-provisioning, before VFs
are enabled, as such H2G would be rejected:

 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0
 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: Fence 0x804e was used by action 0x7002 sent at:
      h2g_write+0x33e/0x870 [xe]
      __guc_ct_send_locked+0x1e1/0x1110 [xe]
      guc_ct_send_locked+0x9f/0x740 [xe]
      xe_guc_ct_send_locked+0x19/0x60 [xe]
      send_tlb_invalidation+0xc2/0x470 [xe]
      xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe]
      xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe]
      lmtt_invalidate_hw+0x64/0x1a0 [xe]
      xe_lmtt_invalidate_hw+0x5c/0x340 [xe]
      pf_update_vf_lmtt+0x398/0xae0 [xe]
      pf_provision_vf_lmem+0x350/0xa60 [xe]
      xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe]
      xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe]
      xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe]
      xe_pci_sriov_configure+0x360/0x1200 [xe]
      sriov_numvfs_store+0xbc/0x1d0
      dev_attr_store+0x17/0x40
      sysfs_kf_write+0x4a/0x80
      kernfs_fop_write_iter+0x166/0x220
      vfs_write+0x2ba/0x580
      ksys_write+0x77/0x100
      __x64_sys_write+0x19/0x30
      x64_sys_call+0x2bf/0x2660
      do_syscall_64+0x93/0x7a0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: CT dequeue failed: -71
 [ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe]

This could be mitigated by pushing earlier a PF self-configuration
with some hard-coded values that cover unlimited access to the GGTT,
use of all GuC contexts and doorbells.  This step is sufficient for
the GuC to switch into the VGT mode.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com
2025-07-15 13:05:19 +02:00
Michal Wajdeczko
92ba2032a1 drm/xe/pf: Move GGTT config KLVs encoding to helper
In upcoming patch we will want to encode GGTT config KLVs based
on raw numbers, without relying on the allocated GGTT node.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-4-michal.wajdeczko@intel.com
2025-07-15 13:05:18 +02:00
Michal Wajdeczko
1c38dd6afa drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
2025-07-15 13:05:17 +02:00
Michal Wajdeczko
9f50b729dd drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.

However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:

 $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset

this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:

 [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
 [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
 [ ] xe 0000:00:02.0: [drm] GT0: reset queued
 [ ] xe 0000:00:02.0: [drm] GT0: reset started
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
 [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
 [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed

While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
2025-07-15 13:05:15 +02:00