The PF driver might be resumed just to configure VFs, but since
it is doing some asynchronous GuC reconfigurations after fresh
reset, we should wait until all pending works are completed.
This is especially important in case of LMEM provisioning, since
we also need to update the LMTT and send invalidation requests
to all GuCs, which are expected to be already in the VGT mode.
Fixes: 68ae022278 ("drm/xe/pf: Force GuC virtualization mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com
(cherry picked from commit c6c86441c465ea440dfb5039f1c26e629a6fd64c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.
However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:
$ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset
this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:
[ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
[ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
[ ] xe 0000:00:02.0: [drm] GT0: reset queued
[ ] xe 0000:00:02.0: [drm] GT0: reset started
[ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
[ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
[ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
[ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
[ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed
While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.
Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
The migration support only needs to be initialized once, but it
was incorrectly called from the xe_gt_sriov_pf_init_hw(), which
is part of the reset flow and may be called multiple times.
Fixes: d86e3737c7 ("drm/xe/pf: Add functions to save and restore VF GuC state")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250120232443.544-1-michal.wajdeczko@intel.com
Some VF accessible registers (like GuC scratch registers) must be
explicitly reset during the FLR. While this is today done by the GuC
firmware, according to the design, this should be responsibility of
the PF driver, as future platforms may require more registers to be
reset. Likewise GuC, the PF can access VFs registers by adding some
platform specific offset to the original register address.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240902192953.1792-1-michal.wajdeczko@intel.com
Any prior configurations pushed to the GuC are lost when the GT
is reset. Push again all non-empty VF configurations to the GuC
as part of the GuC reset procedure.
This will also help restore early manual provisioning, when the
PF was in the meantime suspended and then resumed.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240701102738.934-3-michal.wajdeczko@intel.com
On older platforms (12.00) the PF driver must explicitly unblock
VF's modifications to the GGTT. On newer platforms this capability
is enabled by default.
Bspec: 49908, 53204
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425143927.2265-1-michal.wajdeczko@intel.com
The PF driver must maintain additional GT level data per each VF.
This additional per-VF data will be added in upcoming patches and
will include: provisioning configuration (like GGTT space or LMEM
allocation sizes or scheduling parameters), monitoring thresholds
and counters, and more.
As number of supported VFs varies across platforms use flexible
array where first entry will contain metadata for the PF itself
(if such configuration parameter is applicable for the PF) and
all remaining entries will contain data for potential VFs.
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-6-michal.wajdeczko@intel.com