Repeated loading and unloading of a device specific QAT driver, for
example qat_4xxx, in a tight loop can lead to a crash due to a
use-after-free scenario. This occurs when a power management (PM)
interrupt triggers just before the device-specific driver (e.g.,
qat_4xxx.ko) is unloaded, while the core driver (intel_qat.ko) remains
loaded.
Since the driver uses a shared workqueue (`qat_misc_wq`) across all
devices and owned by intel_qat.ko, a deferred routine from the
device-specific driver may still be pending in the queue. If this
routine executes after the driver is unloaded, it can dereference freed
memory, resulting in a page fault and kernel crash like the following:
BUG: unable to handle page fault for address: ffa000002e50a01c
#PF: supervisor read access in kernel mode
RIP: 0010:pm_bh_handler+0x1d2/0x250 [intel_qat]
Call Trace:
pm_bh_handler+0x1d2/0x250 [intel_qat]
process_one_work+0x171/0x340
worker_thread+0x277/0x3a0
kthread+0xf0/0x120
ret_from_fork+0x2d/0x50
To prevent this, flush the misc workqueue during device shutdown to
ensure that all pending work items are completed before the driver is
unloaded.
Note: This approach may slightly increase shutdown latency if the
workqueue contains jobs from other devices, but it ensures correctness
and stability.
Fixes: e5745f3411 ("crypto: qat - enable power management for QAT GEN4")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove extra description from comments as it is not required.
This is to fix the following warning when compiling the QAT driver
using the clang compiler with CC=clang W=2:
drivers/crypto/intel/qat/qat_common/adf_dev_mgr.c:65: warning: contents before sections
drivers/crypto/intel/qat/qat_common/adf_isr.c:380: warning: contents before sections
drivers/crypto/intel/qat/qat_common/adf_vf_isr.c:298: warning: contents before sections
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Notify a fatal error condition and optionally reset the device in
the following cases:
* if the device reports an uncorrectable fatal error through an
interrupt
* if the heartbeat feature detects that the device is not
responding
This patch is based on earlier work done by Shashank Gupta.
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
So, use the purpose specific kcalloc_node() function instead of the
argument count * size in the kzalloc_node() function.
Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add infrastructure for enabling, disabling and reporting errors in the QAT
driver. This adds a new structure, adf_ras_ops, to adf_hw_device_data that
contains the following methods:
- enable_ras_errors(): allows to enable RAS errors at device
initialization.
- disable_ras_errors(): allows to disable RAS errors at device shutdown.
- handle_interrupt(): allows to detect if there is an error and report if
a reset is required. This is executed immediately after the error is
reported, in the context of an ISR.
An initial, empty, implementation of the methods above is provided
for QAT GEN4.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The power management feature in QAT 4xxx devices can disable clock
sources used to implement timers. Because of that, the firmware needs to
get an external reliable source of time.
Add a kernel delayed work that periodically sends an event to the
firmware. This is triggered every 200ms. At each execution, the driver
sends a sync request to the firmware reporting the current timestamp
counter value.
This is a pre-requisite for enabling the heartbeat, telemetry and
rate limiting features.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With the growing number of Intel crypto drivers, it makes sense to
group them all into a single drivers/crypto/intel/ directory.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>