Commit Graph

20 Commits

Author SHA1 Message Date
Karol Wachowski
a47e36dc5d accel/ivpu: Trigger device recovery on engine reset/resume failure
Trigger full device recovery when the driver fails to restore device state
via engine reset and resume operations. This is necessary because, even if
submissions from a faulty context are blocked, the NPU may still process
previously submitted faulty jobs if the engine reset fails to abort them.
Such jobs can continue to generate faults and occupy device resources.
When engine reset is ineffective, the only way to recover is to perform
a full device recovery.

Fixes: dad945c27a ("accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250528154253.500556-1-jacek.lawrynowicz@linux.intel.com
2025-06-05 14:36:56 +02:00
Karol Wachowski
320323d2e5 accel/ivpu: Add debugfs interface for setting HWS priority bands
Add debugfs interface to modify following priority bands properties:
 * grace period
 * process grace period
 * process quantum

This allows for the adjustment of hardware scheduling algorithm parameters
for each existing priority band, facilitating validation and fine-tuning.

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-4-jacek.lawrynowicz@linux.intel.com
2025-02-10 10:45:41 +01:00
Andrzej Kacprowski
94b2a2c0e7 accel/ivpu: Remove copy engine support
Copy engine was deprecated by the FW and is no longer supported.
Compute engine includes all copy engine functionality and should be used
instead.

This change does not affect user space as the copy engine was never
used outside of a couple of tests.

Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-4-jacek.lawrynowicz@linux.intel.com
2024-10-30 10:22:05 +01:00
Jacek Lawrynowicz
3e521803e5 accel/ivpu: Remove HWS_EXTRA_EVENTS from test modes
IVPU_TEST_MODE_HWS_EXTRA_EVENTS was never used and can be safely removed

Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-30-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:44:39 +02:00
Jacek Lawrynowicz
ed3fb318fd accel/ivpu: Fix ivpu_jsm_dyndbg_control()
Use correct channel for dyndbg JSM message.

Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-29-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:44:39 +02:00
Karol Wachowski
5eaa497411 accel/ivpu: Prevent recovery invocation during probe and resume
Refactor IPC send and receive functions to allow correct
handling of operations that should not trigger a recovery process.

Expose ivpu_send_receive_internal(), which is now utilized by the D0i3
entry, DCT initialization, and HWS initialization functions.
These functions have been modified to return error codes gracefully,
rather than initiating recovery.

The updated functions are invoked within ivpu_probe() and ivpu_resume(),
ensuring that any errors encountered during these stages result in a proper
teardown or shutdown sequence. The previous approach of triggering recovery
within these functions could lead to a race condition, potentially causing
undefined behavior and kernel crashes due to null pointer dereferences.

Fixes: 45e45362e0 ("accel/ivpu: Introduce ivpu_ipc_send_receive_active()")
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-23-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:44:39 +02:00
Tomasz Rusinowicz
5e162f872d accel/ivpu: Add FW state dump on TDR
Send JSM state dump message at the beginning of TDR handler. This allows
FW to collect debug info in the FW log before the state of the NPU is
lost allowing to analyze the cause of a TDR.

Wait a predefined timeout (10 ms) so the FW has a chance to write debug
logs. We cannot wait for JSM response at this point because IRQs are
already disabled before TDR handler is invoked.

Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-9-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:44:38 +02:00
Andrzej Kacprowski
a4293cc753 accel/ivpu: Update VPU FW API headers
This commit bumps:
  - Boot API from 3.24.0 to 3.26.3
  - JSM API from 3.16.0 to 3.25.0

Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Co-developed-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-2-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:35:55 +02:00
Jacek Lawrynowicz
a0a306f9f6 accel/ivpu: Remove duplicated debug messages
Remove duplicated debug messages from ivpu_jsm_(un)register_db().
Debug messages are already printed one level higher.

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-15-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:15:13 +02:00
Jacek Lawrynowicz
a19bffb10c accel/ivpu: Implement DCT handling
When host system is under heavy load and the NPU is already running
on the lowest frequency, PUNIT may request Duty Cycle Throttling (DCT).
This will further reduce NPU power usage.

PUNIT requests DCT mode using Survabilty IRQ and mailbox register.
The driver then issues a JSM message to the FW that enables
the DCT mode. If the NPU resets while in DCT mode, the driver request
DCT mode during FW boot.

Also add debugfs "dct" file that allows to set arbitrary DCT percentage,
which is used by driver tests.

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-7-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:13:32 +02:00
Maciej Falkowski
b7ed87ffc7 accel/ivpu: Abort jobs of faulty context
Abort all jobs that belong to contexts generating MMU faults in order
to avoid flooding host with MMU IRQs.

Jobs are cancelled with:
  - SSID_RELEASE command when OS scheduling is enabled
  - DESTROY_CMDQ command when HW scheduling is enabled

Signed-off-by: Maciej Falkowski <maciej.falkowski@intel.com>
Co-developed-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-3-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:12:11 +02:00
Tomasz Rusinowicz
cdfad4db77 accel/ivpu: Add NPU profiling support
Implement time based Metric Streamer profiling UAPI.

This is a generic mechanism allowing user mode tools to sample
NPU metrics. These metrics are defined by the FW and transparent to
the driver.

The user space can check for this feature by checking
DRM_IVPU_CAP_METRIC_STREAMER driver capability.

Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513120431.3187212-9-jacek.lawrynowicz@linux.intel.com
2024-05-15 07:42:23 +02:00
Wachowski, Karol
cf40fbaf70 accel/ivpu: Add HWS JSM messages
Add JSM messages that will be used to implement hardware scheduler.
Most of these messages are used to create and manage HWS specific
command queues.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513120431.3187212-6-jacek.lawrynowicz@linux.intel.com
2024-05-15 07:42:15 +02:00
Andrzej Kacprowski
3198a62eb8 accel/ivpu: Add support for delayed D0i3 entry message
Currently the VPU firmware prepares for D0i3 every time the VPU
is entering D0i2 Idle state. This is not optimal as we might not
enter D0i3 every time we enter D0i2 Idle and this preparation
is quite costly.

This optimization moves D0i3 preparation to a dedicated
message sent from the host driver only when the driver is about
to enter D0i3 - this reduces power consumption and latency for
certain workloads, for example audio workloads that submit
inference every 10 ms.

The VPU needs non zero time to enter IDLE state after responding to
D0i3 entry message. If the driver does not wait for the VPU to enter
IDLE state it could cause warm boot failures.

Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-12-stanislaw.gruszka@linux.intel.com
2023-10-30 11:06:13 +01:00
Krystian Pradzynski
8c63b47412 accel/ivpu: Update FW API
Bump boot API to 4.20
Bump JSM API to 3.15

Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-2-stanislaw.gruszka@linux.intel.com
2023-10-30 11:06:07 +01:00
Krystian Pradzynski
a3cd664e7f accel/ivpu: Print IPC type string instead of number
Introduce ivpu_jsm_msg_type_to_str() helper to print type of IPC
message. This will make reading logs and debugging IPC issues easier.

Co-developed-by: Maciej Falkowski <maciej.falkowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@intel.com>
Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020104501.697763-5-stanislaw.gruszka@linux.intel.com
2023-10-23 09:00:28 +02:00
Krystian Pradzynski
276e4834b7 accel/ivpu: Use ratelimited warn and err in IPC/JSM
Quite often during test corner cases IPC, JSM functions can flood
dmesg with warn or err messages. With that lost dmesg history.
Change warn, err to ratelimited versions in IPC, JSM to suppress
dmesg spam occurrence during fail test scenarios.

Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020104501.697763-2-stanislaw.gruszka@linux.intel.com
2023-10-23 08:55:11 +02:00
Justin Stitt
4b2fd81f2a accel/ivpu: refactor deprecated strncpy
`strncpy` is deprecated for use on NUL-terminated destination strings [1].

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on its destination buffer argument which is
_not_ the case for `strncpy`!

Also remove extraneous if-statement as it can never be entered. The
return value from `strncpy` is it's first argument. In this case,
`...dyndbg_cmd` is an array:
| 	char dyndbg_cmd[VPU_DYNDBG_CMD_MAX_LEN];
             ^^^^^^^^^^
This can never be NULL which means `strncpy`'s return value cannot be
NULL here. Just use `strscpy` which is more robust and results in
simpler and less ambiguous code.

Moreover, remove needless `... - 1` as `strscpy`'s implementation
ensures NUL-termination and we do not need to carefully dance around
ending boundaries with a "- 1" anymore.

Fixes: 5d7422cfb4 ("accel/ivpu: Add IPC driver and JSM messages")
Link: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230824-strncpy-drivers-accel-ivpu-ivpu_jsm_msg-c-v1-1-12d9b52d2dff@google.com
2023-08-25 11:09:40 +02:00
Andrzej Kacprowski
dffaa98c8b accel/ivpu: Send VPU_JSM_MSG_CONTEXT_DELETE when deleting context
The VPU_JSM_MSG_CONTEXT_DELETE will remove any resources associated
with the SSID, that included any blobs create by the user space
application.

The command can also remove doorbell registrations, but since this
does not work in HW scheduling case, we do not depend on this
capability and unregister the doorbells explicitly.

Fixes: cd7272215c ("accel/ivpu: Add command buffer submission logic")
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230202092114.2637452-3-stanislaw.gruszka@linux.intel.com
(cherry picked from commit 38257f514d)
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2023-02-06 09:27:26 +01:00
Jacek Lawrynowicz
5d7422cfb4 accel/ivpu: Add IPC driver and JSM messages
The IPC driver is used to send and receive messages to/from firmware
running on the VPU.

The only supported IPC message format is Job Submission Model (JSM)
defined in vpu_jsm_api.h header.

Co-developed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Co-developed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117092723.60441-5-jacek.lawrynowicz@linux.intel.com
2023-01-19 11:11:45 +01:00