When a remote device sends a completion event to the host, it contains a
pointer to the consumed TRE. The host uses this pointer to process all of
the TREs between it and the host's local copy of the ring's read pointer.
This works when processing completion for chained transactions, but can
lead to nasty results if the device sends an event for a single-element
transaction with a read pointer that is multiple elements ahead of the
host's read pointer.
For instance, if the host accesses an event ring while the device is
updating it, the pointer inside of the event might still point to an old
TRE. If the host uses the channel's xfer_cb() to directly free the buffer
pointed to by the TRE, the buffer will be double-freed.
This behavior was observed on an ep that used upstream EP stack without
'commit 6f18d174b7 ("bus: mhi: ep: Update read pointer only after buffer
is written")'. Where the device updated the events ring pointer before
updating the event contents, so it left a window where the host was able to
access the stale data the event pointed to, before the device had the
chance to update them. The usual pattern was that the host received an
event pointing to a TRE that is not immediately after the last processed
one, so it got treated as if it was a chained transaction, processing all
of the TREs in between the two read pointers.
This commit aims to harden the host by ensuring transactions where the
event points to a TRE that isn't local_rp + 1 are chained.
Fixes: 1d3173a3ba ("bus: mhi: core: Add support for processing events from client device")
Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com>
[mani: added stable tag and reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250714163039.3438985-1-quic_yabdulra@quicinc.com
A client driver may use mhi_unprepare_from_transfer() to quiesce
incoming data during the client driver's tear down. The client driver
might also be processing data at the same time, resulting in a call to
mhi_queue_buf() which will invoke mhi_gen_tre(). If mhi_gen_tre() runs
after mhi_unprepare_from_transfer() has torn down the channel, a panic
will occur due to an invalid dereference leading to a page fault.
This occurs because mhi_gen_tre() does not verify the channel state
after locking it. Fix this by having mhi_gen_tre() confirm the channel
state is valid, or return error to avoid accessing deinitialized data.
Cc: stable@vger.kernel.org # 6.8
Fixes: b89b6a863d ("bus: mhi: host: Add spinlock to protect WP access when queueing TREs")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Reviewed-by: Youssef Samir <quic_yabdulra@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Troy Hanson <quic_thanson@quicinc.com>
Link: https://lore.kernel.org/r/20250306172913.856982-1-jeff.hugo@oss.qualcomm.com
[mani: added stable tag]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_device_get() and mhi_queue_dma() haven't been used since 2020's
commit 189ff97cca ("bus: mhi: core: Add support for data transfer")
added them.
Remove them.
Note that mhi_queue_dma_sync() is used and has been left.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250127215859.261105-1-linux@treblig.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Some controllers may want to access a specific doorbell register. Hence add
a new API that reads the CHDBOFF register and returns the offset of the
doorbell registers from MMIO base, so that the controller can calculate the
address of the specific doorbell register by adding the register offset
with doorbell offset and MMIO base address.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/1713928915-18229-3-git-send-email-quic_qianyu@quicinc.com
[mani: reworded commit message and Kdoc]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
User space tools can't map strings if we use directly, as the string
address is internal to kernel.
So add trace point strings for the user space tools to map strings
properly.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240218-ftrace_string-v1-1-27da85c1f844@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
This change adds ftrace support for following functions which
helps in debugging the issues when there is Channel state & MHI
state change and also when we receive data and control events:
1. mhi_intvec_mhi_states
2. mhi_process_data_event_ring
3. mhi_process_ctrl_ev_ring
4. mhi_gen_tre
5. mhi_update_channel_state
6. mhi_tryset_pm_state
7. mhi_pm_st_worker
Change the implementation of the arrays which has enum to strings mapping
to make it consistent in both trace header file and other files.
Where ever the trace events are added, debug messages are removed.
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240206-ftrace_support-v11-1-3f71dc187544@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Ensure read and write locks for the channel are not taken in succession by
dropping the read lock from parse_xfer_event() such that a callback given
to client can potentially queue buffers and acquire the write lock in that
process. Any queueing of buffers should be done without channel read lock
acquired as it can result in multiple locks and a soft lockup.
Cc: <stable@vger.kernel.org> # 5.7
Fixes: 1d3173a3ba ("bus: mhi: core: Add support for processing events from client device")
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Tested-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1702276972-41296-3-git-send-email-quic_qianyu@quicinc.com
[mani: added fixes tag and cc'ed stable]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Protect WP accesses such that multiple threads queueing buffers for
incoming data do not race.
Meanwhile, if CONFIG_TRACE_IRQFLAGS is enabled, irq will be enabled once
__local_bh_enable_ip is called as part of write_unlock_bh. Hence, let's
take irqsave lock after TRE is generated to avoid running write_unlock_bh
when irqsave lock is held.
Cc: stable@vger.kernel.org
Fixes: 189ff97cca ("bus: mhi: core: Add support for data transfer")
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Tested-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1702276972-41296-2-git-send-email-quic_qianyu@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Though we do check the event ring read pointer by "is_valid_ring_ptr"
to make sure it is in the buffer range, but there is another risk the
pointer may be not aligned. Since we are expecting event ring elements
are 128 bits(struct mhi_ring_element) aligned, an unaligned read pointer
could lead to multiple issues like DoS or ring buffer memory corruption.
So add a alignment check for event ring read pointer.
Fixes: ec32332df7 ("bus: mhi: core: Sanity check values from remote device before use")
cc: stable@vger.kernel.org
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20231031-alignment_check-v2-1-1441db7c5efd@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Some devices(eg. SDX75) take longer than expected (default, 8 seconds) to
set ready after reboot. Hence add optional ready timeout parameter and pass
the appropriate timeout value to mhi_poll_reg_field() to wait enough for
device ready as part of power up sequence.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1699344890-87076-2-git-send-email-quic_qianyu@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Clang warns about a parameter that is decremented but never evaluated here:
bus/mhi/host/main.c:803:13: error: parameter 'event_quota' set but not used [-Werror,-Wunused-but-set-parameter]
u32 event_quota)
Remove the access to the variable to avoid that warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20230811134547.3231160-1-arnd@kernel.org
[mani: minor spelling fix to commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
If we detect a system error via intvec, we only process the syserr if the
current ee is different than the last observed ee. The reason for this
check is to prevent bhie from running multiple times, but with the single
queue handling syserr, that is not possible.
The check can cause an issue with device recovery. If PBL loads a bad SBL
via BHI, but that SBL hangs before notifying the host of an ee change,
then issuing soc_reset to crash the device and retry (after supplying a
fixed SBL) will not recover the device as the host will observe a PBL->PBL
transition and not process the syserr. The device will be stuck until
either the driver is reloaded, or the host is rebooted. Instead, remove
the check so that we can attempt to recover the device.
Fixes: ef2126c4e2 ("bus: mhi: core: Process execution environment changes serially")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1681142292-27571-2-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Currently, mhi_process_data_event_ring()/mhi_process_ctrl_ev_ring() APIs
are ringing DB even if there are no ring elements to process. This could
cause the device to process the DB event in the absence of ring elements.
So to avoid this unnecessary device processing, let's ring event DB only
if there are any ring elements to process.
Signed-off-by: Vivek Pernamitta <quic_vpernami@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1680601458-9105-1-git-send-email-quic_vpernami@quicinc.com
[mani: massaged the commit message a bit]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_poll() API is not used within the MHI stack and also not by any client
drivers in mainline. So let's remove it until any consumer is available.
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
The irq handler for a shared IRQ ought to be prepared for running
even now it's being freed. So let's check the pointer used by
mhi_irq_handler to avoid null pointer access since it is probably
released before freeing IRQ.
Fixes: 1227d2a20c ("bus: mhi: host: Move IRQ allocation to controller registration phase")
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/1658459838-30802-1-git-send-email-quic_qianyu@quicinc.com
[mani: added fixes tag]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Helper API to write register fields relies on successful reads
of the register/address prior to the write. Bail out if a failure
is seen when reading the register before the actual write is
performed.
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1650304226-11080-2-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_recycle_ev_ring() computes the shared write pointer for the ring
(ctxt_wp) using a read/modify/write pattern where the ctxt_wp value in the
shared memory is read, incremented, and written back. There are no checks
on the read value, it is assumed that it is kept in sync with the locally
cached value. Per the MHI spec, this is correct. The device should only
read ctxt_wp, never write it.
However, there are devices in the wild that violate the spec, and can
update the ctxt_wp in a specific scenario. This can cause corruption, and
violate the above assumption that the ctxt_wp is in sync with the cached
value.
This can occur when the device has loaded firmware from the host, and is
transitioning from the SBL EE to the AMSS EE. As part of shutting down
SBL, the SBL flushes it's local MHI context to the shared memory since
the local context will not persist across an EE change. In the case of
the event ring, SBL will flush its entire context, not just the parts that
it is allowed to update. This means SBL will write to ctxt_wp, and
possibly corrupt it.
An example:
Host Device
---- ---
Update ctxt_wp to 0x1f0
SBL observes 0x1f0
Update ctxt_wp to 0x0
Starts transition to AMSS EE
Context flush, writes 0x1f0 to ctxt_wp
Update ctxt_wp to 0x200
Update ctxt_wp to 0x210
AMSS observes 0x210
0x210 exceeds ring size
AMSS signals syserr
The reason the ctxt_wp goes off the end of the ring is that the rollover
check is only performed on the cached wp, which is out of sync with
ctxt_wp.
Since the host is the authority of the value of ctxt_wp per the MHI spec,
we can fix this issue by not reading ctxt_wp from the shared memory, and
instead compute it based on the cached value. If SBL corrupts ctxt_wp,
the host won't observe it, and will correct the value at some point later.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com>
Reviewed-by: Bhaumik Bhatt <quic_bbhatt@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1649868113-18826-1-git-send-email-quic_jhugo@quicinc.com
[mani: used the quicinc domain for Hemant and Bhaumik]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_state_str[] array could be used by MHI endpoint stack also. So let's
make the array as "static inline function" and move it inside the
"common.h" header so that the endpoint stack could also make use of it.
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-11-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Structure "struct mhi_tre" is representing a generic MHI ring element and
not specifically a Transfer Ring Element (TRE). Fix the naming.
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-9-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Functions like mhi_read_reg_field(), mhi_poll_reg_field() and
mhi_write_reg_field() could be modified to not depend on the shift value
passed as an argument. Instead, the bitfield operation could be used to
extract the shift value from the mask itself.
This eliminates the need to define _SHIFT (and _SHFT) macros and
simplifies the code a bit. For shift values those cannot be determined
during build time, "__ffs()" helper is used find the shift value during
runtime.
While at it, let's also get rid of 32-bit masks like CHDBOFF_CHDBOFF_MASK
by doing the full 32-bit register read.
Suggested-by: Alex Elder <elder@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-6-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation of the endpoint MHI support, let's move the host MHI code
to its own "host" directory and adjust the toplevel MHI Kconfig & Makefile.
While at it, let's also move the "pci_generic" driver to "host" directory
as it is a host MHI controller driver.
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-5-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>