Commit Graph

27 Commits

Author SHA1 Message Date
Andrzej Kacprowski
4720e0ad30 accel/ivpu: Add missing locks around mmu queues
Multiple threads were accessing mmu cmd queue simultaneously
causing sporadic failures in ivpu_mmu_cmdq_sync() function.
Protect critical code with mmu mutex.

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-2-jacek.lawrynowicz@linux.intel.com
2025-02-10 10:45:40 +01:00
Karol Wachowski
353b8f4839 accel/ivpu: Fix missing MMU events from reserved SSID
Generate recovery when fault from reserved context is detected.
Add Abort (A) bit to reserved (1) SSID to ensure NPU also receives a fault.

There is no way to create a file_priv with reserved SSID
but it is still possible to receive MMU faults from that SSID
as it is a default NPU HW setting. Such situation will occur if
FW freed context related resources but still performed access to DRAM.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-9-maciej.falkowski@linux.intel.com
2025-01-09 09:35:44 +01:00
Karol Wachowski
4480912f3f accel/ivpu: Move parts of MMU event IRQ handling to thread handler
To prevent looping infinitely in MMU event handler we stop
generating new events by removing 'R' (record) bit from context
descriptor, but to ensure this change has effect KMD has to perform
configuration invalidation followed by sync command.

Because of that move parts of the interrupt handler that can take longer
to a thread not to block in interrupt handler for too long.
This includes:
 * disabling event queue for the time KMD updates MMU event queue consumer
   to ensure proper synchronization between MMU and KMD

 * removal of 'R' (record) bit from context descriptor to ensure no more
   faults are recorded until that context is destroyed

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-8-maciej.falkowski@linux.intel.com
2025-01-09 09:35:44 +01:00
Karol Wachowski
0240fa18d2 accel/ivpu: Dump only first MMU fault from single context
Stop dumping consecutive faults from an already faulty context immediately,
instead of waiting for the context abort thread handler (IRQ handler bottom
half) to abort currently executing jobs.

Remove 'R' (record events) bit from context descriptor of a faulty
context to prevent future faults generation.

This change speeds up the IRQ handler by eliminating the need to print the
fault content repeatedly. Additionally, it prevents flooding dmesg with
errors, which was occurring due to the delay in the bottom half of the
handler stopping fault-generating jobs.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-7-maciej.falkowski@linux.intel.com
2025-01-09 09:35:44 +01:00
Karol Wachowski
5bbccadaf3 accel/ivpu: Abort all jobs after command queue unregister
With hardware scheduler it is not expected to receive JOB_DONE
notifications from NPU FW for the jobs aborted due to command queue destroy
JSM command.

Remove jobs submitted to unregistered command queue from submitted_jobs_xa
to avoid triggering a TDR in such case.

Add explicit submitted_jobs_lock that protects access to list of submitted
jobs which is now used to find jobs to abort.

Move context abort procedure to separate work queue not to slow down
handling of IPCs or DCT requests in case where job abort takes longer,
especially when destruction of the last job of a specific context results
in context release.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-4-maciej.falkowski@linux.intel.com
2025-01-09 09:35:44 +01:00
Karol Wachowski
add38f8211 accel/ivpu: Clear CDTAB entry in case of failure
Don't leave a context descriptor in case CFGI_ALL flush fails.
Mark it as invalid (by clearing valid bit) so nothing is left in
partially-initialized state.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-5-jacek.lawrynowicz@linux.intel.com
2024-10-30 10:22:06 +01:00
Karol Wachowski
a74f4d9913 accel/ivpu: Defer MMU root page table allocation
Defer root page table allocation and unify context init/fini functions.
Move allocation of the root page table from the file_priv_open function to
perform a lazy allocation approach during ivpu_bo_pin().

By doing so, we avoid the overhead of allocating page tables for simple
operations like GET_PARAM that do not require them.
Additionally, the MMU context descriptor table initialization has been
moved to the ivpu_mmu_context_map_page function.

This change streamlines the process and ensures that the descriptor table
is only initialized when it is actually needed.
Refactor init/fini functions to remove redundant code and make the context
management more straightforward.

Overall, these changes lead to a reduction in the time taken by the file
descriptor open operation, as the costly root page table allocation is now
avoided for operations that do not require it.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-3-jacek.lawrynowicz@linux.intel.com
2024-10-30 10:22:04 +01:00
Maciej Falkowski
b7ed87ffc7 accel/ivpu: Abort jobs of faulty context
Abort all jobs that belong to contexts generating MMU faults in order
to avoid flooding host with MMU IRQs.

Jobs are cancelled with:
  - SSID_RELEASE command when OS scheduling is enabled
  - DESTROY_CMDQ command when HW scheduling is enabled

Signed-off-by: Maciej Falkowski <maciej.falkowski@intel.com>
Co-developed-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-3-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:12:11 +02:00
Wachowski, Karol
2c3801b174 accel/ivpu: Add force snoop module parameter
Add module parameter that enforces snooping for all NPU accesses,
both through MMU PTEs mappings and through TCU page table walk
override register bits for MMU page walks / configuration access.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513120431.3187212-10-jacek.lawrynowicz@linux.intel.com
2024-05-15 07:42:26 +02:00
Wachowski, Karol
3556f92261 accel/ivpu: Improve clarity of MMU error messages
This patch improves readability and clarity of MMU error messages.
Previously, the error strings were somewhat confusing and could lead to
ambiguous interpretations, making it difficult to diagnose issues.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240402104929.941186-6-jacek.lawrynowicz@linux.intel.com
2024-04-08 10:54:21 +02:00
Wachowski, Karol
b039f1c4d3 accel/ivpu: Correct MMU queue size checking functions
Do not use kernel CIRC_SPACE and CIRC_CNT that
incorrectly return space of a queue when wrap bit was set.
Use correct implementation that compares producer, consumer and
wrap bit values.

Without this fix it was possible to lose events in case when event
queue was full.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240126122804.2169129-3-jacek.lawrynowicz@linux.intel.com
2024-02-06 13:36:33 +01:00
Wachowski, Karol
c9da9a1f17 accel/ivpu: Force snooping for MMU writes
Set AW_SNOOP_OVERRIDE bit in VPU_37/40XX_HOST_IF_TCU_PTW_OVERRIDES
to force snooping for MMU write accesses (setting event queue events).

MMU event queue buffer is the only buffer written by MMU and
mapped as write-back which break cache coherency. Force write
transactions to be snooped solving the problem.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240126122804.2169129-2-jacek.lawrynowicz@linux.intel.com
2024-02-06 13:36:32 +01:00
Jacek Lawrynowicz
27d19268cf accel/ivpu: Improve recovery and reset support
- Synchronize job submission with reset/recovery using reset_lock
  - Always print recovery reason and call diagnose_failure()
  - Don't allow for autosupend during recovery
  - Prevent immediate autosuspend after reset/recovery
  - Prevent force_recovery for issuing TDR when device is suspended
  - Reset VPU instead triggering recovery after changing debugfs params

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-4-jacek.lawrynowicz@linux.intel.com
2024-01-25 10:17:37 +01:00
Wachowski, Karol
929acfb9c5 accel/ivpu: Call diagnose failure in ivpu_mmu_cmdq_sync()
Check for possible failure reasons in the buttress.
Some errors (like external abort) should have corresponding buttress errors
registers set indicating the real reason of failure.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-3-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:27:43 +01:00
Wachowski, Karol
30cf36bb04 accel/ivpu: Dump MMU events in case of VPU boot timeout
Add ivpu_mmu_evtq_dump() function that dumps existing MMU events from
MMU event queue. Call this function if VPU boot failed.

Previously MMU events were only checked in interrupt handler, but if VPU
failed to boot due to MMU faults, those faults were missed because of
interrupts not yet being enabled. This will allow checking potential
fault reason of VPU not booting.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-2-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:27:37 +01:00
Jacek Lawrynowicz
48aea7f2a2 accel/ivpu: Fix locking in ivpu_bo_remove_all_bos_from_context()
ivpu_bo_remove_all_bos_from_context() could race with ivpu_bo_free()
when prime buffer was closed after vpu device was closed.

Move the bo_list from context to vdev and use a dedicated lock to
sync it. This list is not modified when BO is added/removed from
a context.

Also rename ivpu_bo_free_vpu_addr() to ivpu_bo_unbind() because this
function does more then just free vpu_addr.

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073156.1301669-3-stanislaw.gruszka@linux.intel.com
2023-11-08 16:27:35 +01:00
Jacek Lawrynowicz
0c287c27fb accel/ivpu: Simplify MMU SYNC command
CMD_SYNC does not need any args as we poll for completion anyway.

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-8-stanislaw.gruszka@linux.intel.com
2023-10-31 16:45:50 +01:00
Karol Wachowski
e013aa9ab0 accel/ivpu: Print CMDQ errors after consumer timeout
Add checking of error reason bits in IVPU_MMU_CMDQ_CONS
register when waiting for consumer timeout occurred.

Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-6-stanislaw.gruszka@linux.intel.com
2023-10-31 16:45:07 +01:00
Krystian Pradzynski
74ce0f3873 accel/ivpu: Fix verbose version of REG_POLL macros
Remove two out of four _POLL macros. For two remaining _POLL
macros add message about polling register start and finish.

Additionally avoid inconsequence when using REGV_WR/RD macros in
MMU code - passing raw register offset instead of register name.

Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020104501.697763-3-stanislaw.gruszka@linux.intel.com
2023-10-23 08:57:25 +02:00
Jacek Lawrynowicz
beaf3ebf29 accel/ivpu: Move MMU register definitions to ivpu_mmu.c
MMU registers are not platform specific so they should be defined
separate to platform regs.

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230901094957.168898-12-stanislaw.gruszka@linux.intel.com
2023-09-04 11:01:53 +02:00
Jacek Lawrynowicz
51d66a7b7d accel/ivpu: Use generation based function and registers names
Given that VPU generation can be used by multiple platforms, driver should
use VPU IP generation names instead of a platform.

Change naming for functions and registries.

Use 37XX format, where:
  3 - major VPU IP generation version
  7 - minor VPU IP generation version
  XX - postfix indicating this is an architecture and not marketing name

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230731161258.2987564-3-stanislaw.gruszka@linux.intel.com
2023-08-09 13:42:16 +02:00
Jacek Lawrynowicz
864a00b8f0 accel/ivpu: Rename sources to use generation based names
Given that VPU generation can be used by multiple platforms, driver should
use VPU IP generation in names instead of a platform.

Change naming for sources files.

Use 37XX format, where:
  3 - major VPU IP generation version
  7 - minor VPU IP generation version
  XX - postfix indicating this is an architecture and not marketing name

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230731161258.2987564-2-stanislaw.gruszka@linux.intel.com
2023-08-09 13:39:52 +02:00
Thomas Zimmermann
de8a334f21 Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get commit 2c1c7ba457
("drm/amdgpu: support partition drm devices"), which is required to fix
commit 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap").

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-06-19 16:33:14 +02:00
Stanislaw Gruszka
b563e47957 accel/ivpu: Do not use mutex_lock_interruptible
If we get signal when waiting for the mmu->lock we do not invalidate
current MMU configuration that might result in undefined behavior.

Additionally there is little or no benefit on break waiting for
ipc->lock. In current code base, we keep this lock for short periods.

Fixes: 263b2ba5fc ("accel/ivpu: Add Intel VPU MMU support")
Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-2-stanislaw.gruszka@linux.intel.com
2023-06-08 08:15:46 +02:00
Karol Wachowski
a2fd4a6fae accel/ivpu: Add MMU support for 4 level page mappings
Program additional fourth level required for mappings with VA above 38bits.

Co-developed-by: Raymond Tan <raymond.tan@intel.com>
Signed-off-by: Raymond Tan <raymond.tan@intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-3-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:33 +02:00
Jacek Lawrynowicz
852be13f3b accel/ivpu: Add PM support
- Implement cold and warm firmware boot flows
  - Add hang recovery support
  - Add runtime power management support

Co-developed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117092723.60441-8-jacek.lawrynowicz@linux.intel.com
2023-01-19 11:12:08 +01:00
Jacek Lawrynowicz
263b2ba5fc accel/ivpu: Add Intel VPU MMU support
VPU Memory Management Unit is based on ARM MMU-600.
It allows the creation of multiple virtual address spaces for
the device and map noncontinuous host memory (there is no dedicated
memory on the VPU).

Address space is implemented as a struct ivpu_mmu_context, it has an ID,
drm_mm allocator for VPU addresses and struct ivpu_mmu_pgtable that
holds actual 3-level, 4KB page table.
Context with ID 0 (global context) is created upon driver initialization
and it's mainly used for mapping memory required to execute
the firmware.
Contexts with non-zero IDs are user contexts allocated each time
the devices is open()-ed and they map command buffers and other
workload-related memory.
Workloads executing in a given contexts have access only
to the memory mapped in this context.

This patch is has two main files:
  - ivpu_mmu_context.c handles MMU page tables and memory mapping
  - ivpu_mmu.c implements a driver that programs the MMU device

Co-developed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Co-developed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117092723.60441-3-jacek.lawrynowicz@linux.intel.com
2023-01-19 11:07:22 +01:00