linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-05 11:53:41 +00:00

Author	SHA1	Message	Date
Jeffrey Hugo	4d92e7c5cc	bus: mhi: host: Fix conflict between power_up and SYSERR When mhi_async_power_up() enables IRQs, it is possible that we could receive a SYSERR notification from the device if the firmware has crashed for some reason. Then the SYSERR notification queues a work item that cannot execute until the pm_mutex is released by mhi_async_power_up(). So the SYSERR work item will be pending. If mhi_async_power_up() detects the SYSERR, it will handle it. If the device is in PBL, then the PBL state transition event will be queued, resulting in a work item after the pending SYSERR work item. Once mhi_async_power_up() releases the pm_mutex, the SYSERR work item can run. It will blindly attempt to reset the MHI state machine, which is the recovery action for SYSERR. PBL/SBL are not interrupt driven and will ignore the MHI Reset unless SYSERR is actively advertised. This will cause the SYSERR work item to timeout waiting for reset to be cleared, and will leave the host state in SYSERR processing. The PBL transition work item will then run, and immediately fail because SYSERR processing is not a valid state for PBL transition. This leaves the device uninitialized. This issue has a fairly unique signature in the kernel log: mhi mhi3: Requested to power ON Qualcomm Cloud AI 100 0000:36:00.0: Fatal error received from device. Attempting to recover mhi mhi3: Power on setup success mhi mhi3: Device failed to exit MHI Reset state mhi mhi3: Device MHI is not in valid state We cannot remove the SYSERR handling from mhi_async_power_up() because the device may be in the SYSERR state, but we missed the notification as the irq was fired before irqs were enabled. We also can't queue the SYSERR work item from mhi_async_power_up() if SYSERR is detected because that may result in a duplicate work item, and cause the same issue since the duplicate item will blindly issue MHI reset even if SYSERR is no longer active. Instead, add a check in the SYSERR work item to make sure that MHI reset is only issued if the device is in SYSERR state for PBL or SBL EEs. Fixes: `a6e2e3522f` ("bus: mhi: core: Add support for PM state transitions") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Troy Hanson <quic_thanson@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250328163526.3365497-1-jeff.hugo@oss.qualcomm.com	2025-05-14 11:30:46 +01:00
Dr. David Alan Gilbert	c8006fbd0f	bus: mhi: host: Remove unused functions mhi_device_get() and mhi_queue_dma() haven't been used since 2020's commit `189ff97cca` ("bus: mhi: core: Add support for data transfer") added them. Remove them. Note that mhi_queue_dma_sync() is used and has been left. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250127215859.261105-1-linux@treblig.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2025-02-07 23:15:01 +05:30
Baochen Qiang	813e0ae613	bus: mhi: host: Add mhi_power_down_keep_dev() API to support system suspend/hibernation Currently, ath11k fails to resume from system suspend/hibernation on some the x86 host machines with below error message: ``` ath11k_pci 0000:06:00.0: timeout while waiting for restart complete ``` This happens because, ath11k powers down the MHI stack during suspend and that leads to destruction of the struct device associated with the MHI channels. And during resume, ath11k calls calling mhi_sync_power_up() to power up the MHI subsystem and that eventually calls the driver framework's device_add() API from mhi_create_devices(). But the PM framework blocks the struct device creation during device_add() and this leads to probe deferral as below: ``` mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral ``` The reason for deferring device creation during resume is explained in dpm_prepare(): /* * It is unsafe if probing of devices will happen during suspend or * hibernation and system behavior will be unpredictable in this * case. So, let's prohibit device's probing here and defer their * probes instead. The normal behavior will be restored in * dpm_complete(). */ Due to the device probe deferral, qcom_mhi_qrtr_probe() API is not getting called during resume and thus MHI channels are not prepared. So this blocks the QMI messages from being transferred between ath11k and firmware, resulting in a firmware initialization failure. After consulting with Rafael, it was decided to not destroy the struct device for the MHI channels during system suspend/hibernation because the device is bound to appear again during resume. So to achieve this, a new API called mhi_power_down_keep_dev() is introduced for MHI controllers to keep the struct device when required. This API is similar to the existing mhi_power_down() API, except that it keeps the struct device associated with MHI channels instead of destroying them. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240305021320.3367-2-quic_bqiang@quicinc.com [mani: reworded the commit message and subject] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2024-04-01 16:09:09 +05:30
Krishna chaitanya chundru	ceeb64f41f	bus: mhi: host: Add tracing support This change adds ftrace support for following functions which helps in debugging the issues when there is Channel state & MHI state change and also when we receive data and control events: 1. mhi_intvec_mhi_states 2. mhi_process_data_event_ring 3. mhi_process_ctrl_ev_ring 4. mhi_gen_tre 5. mhi_update_channel_state 6. mhi_tryset_pm_state 7. mhi_pm_st_worker Change the implementation of the arrays which has enum to strings mapping to make it consistent in both trace header file and other files. Where ever the trace events are added, debug messages are removed. Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240206-ftrace_support-v11-1-3f71dc187544@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2024-02-06 11:54:44 +05:30
Jeffrey Hugo	bce3f77068	bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state When processing a SYSERR, if the device does not respond to the MHI_RESET from the host, the host will be stuck in a difficult to recover state. The host will remain in MHI_PM_SYS_ERR_PROCESS and not clean up the host channels. Clients will not be notified of the SYSERR via the destruction of their channel devices, which means clients may think that the device is still up. Subsequent SYSERR events such as a device fatal error will not be processed as the state machine cannot transition from PROCESS back to DETECT. The only way to recover from this is to unload the mhi module (wipe the state machine state) or for the mhi controller to initiate SHUTDOWN. This issue was discovered by stress testing soc_reset events on AIC100 via the sysfs node. soc_reset is processed entirely in hardware. When the register write hits the endpoint hardware, it causes the soc to reset without firmware involvement. In stress testing, there is a rare race where soc_reset N will cause the soc to reset and PBL to signal SYSERR (fatal error). If soc_reset N+1 is triggered before PBL can process the MHI_RESET from the host, then the soc will reset again, and re-run PBL from the beginning. This will cause PBL to lose all state. PBL will be waiting for the host to respond to the new syserr, but host will be stuck expecting the previous MHI_RESET to be processed. Additionally, the AMSS EE firmware (QSM) was hacked to synthetically reproduce the issue by simulating a FW hang after the QSM issued a SYSERR. In this case, soc_reset would not recover the device. For this failure case, to recover the device, we need a state similar to PROCESS, but can transition to DETECT. There is not a viable existing state to use. POR has the needed transitions, but assumes the device is in a good state and could allow the host to attempt to use the device. Allowing PROCESS to transition to DETECT invites the possibility of parallel SYSERR processing which could get the host and device out of sync. Thus, invent a new state - MHI_PM_SYS_ERR_FAIL This essentially a holding state. It allows us to clean up the host elements that are based on the old state of the device (channels), but does not allow us to directly advance back to an operational state. It does allow the detection and processing of another SYSERR which may recover the device, or allows the controller to do a clean shutdown. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20240112180800.536733-1-quic_jhugo@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2024-01-30 23:52:40 +05:30
Qiang Yu	6ab3d50b10	bus: mhi: host: Add a separate timeout parameter for waiting ready Some devices(eg. SDX75) take longer than expected (default, 8 seconds) to set ready after reboot. Hence add optional ready timeout parameter and pass the appropriate timeout value to mhi_poll_reg_field() to wait enough for device ready as part of power up sequence. Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/1699344890-87076-2-git-send-email-quic_qianyu@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2023-12-14 10:57:34 +05:30
Qiang Yu	cabce92dd8	bus: mhi: host: Skip MHI reset if device is in RDDM In RDDM EE, device can not process MHI reset issued by host. In case of MHI power off, host is issuing MHI reset and polls for it to get cleared until it times out. Since this timeout can not be avoided in case of RDDM, skip the MHI reset in this scenarios. Cc: <stable@vger.kernel.org> Fixes: `a6e2e3522f` ("bus: mhi: core: Add support for PM state transitions") Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://lore.kernel.org/r/1684390959-17836-1-git-send-email-quic_qianyu@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2023-07-12 17:49:38 +05:30
Qiang Yu	869a99907f	bus: mhi: host: Fix race between channel preparation and M0 event There is a race condition where mhi_prepare_channel() updates the read and write pointers as the base address and in parallel, if an M0 transition occurs, the tasklet goes ahead and rings doorbells for all channels with a delta in TRE rings assuming they are already enabled. This causes a null pointer access. Fix it by adding a channel enabled check before ringing channel doorbells. Cc: stable@vger.kernel.org # 5.19 Fixes: `a6e2e3522f` "bus: mhi: core: Add support for PM state transitions" Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://lore.kernel.org/r/1665889532-13634-1-git-send-email-quic_qianyu@quicinc.com [mani: CCed stable list] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2022-10-28 22:59:10 +05:30
Qiang Yu	1227d2a20c	bus: mhi: host: Move IRQ allocation to controller registration phase During runtime, the MHI endpoint may be powered up/down several times. So instead of allocating and destroying the IRQs all the time, let's just enable/disable IRQs during power up/down. The IRQs will be allocated during mhi_register_controller() and freed during mhi_unregister_controller(). This works well for things like PCI hotplug also as once the PCI device gets removed, the controller will get unregistered. And once it comes back, it will get registered back and even if the IRQ configuration changes (MSI), that will get accounted. Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://lore.kernel.org/r/1655952183-66792-1-git-send-email-quic_qianyu@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2022-06-24 12:54:19 +05:30
Bhaumik Bhatt	0bca889fd6	bus: mhi: host: Bail on writing register fields if read fails Helper API to write register fields relies on successful reads of the register/address prior to the write. Bail out if a failure is seen when reading the register before the actual write is performed. Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Hemant Kumar <hemantk@codeaurora.org> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/1650304226-11080-2-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2022-04-23 18:57:32 +05:30
Jeffrey Hugo	36e5505dfb	bus: mhi: host: Wait for ready state after reset After the device has signaled the end of reset by clearing the reset bit, it will automatically reinit MHI and the internal device structures. Once That is done, the device will signal it has entered the ready state. Signaling the ready state involves sending an interrupt (MSI) to the host which might cause IOMMU faults if it occurs at the wrong time. If the controller is being powered down, and possibly removed, then the reset flow would only wait for the end of reset. At which point, the host and device would start a race. The host may complete its reset work, and remove the interrupt handler, which would cause the interrupt to be disabled in the IOMMU. If that occurs before the device signals the ready state, then the IOMMU will fault since it blocked an interrupt. While harmless, the fault would appear like a serious issue has occurred so let's silence it by making sure the device hits the ready state before the host completes its reset processing. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/1650302562-30964-1-git-send-email-quic_jhugo@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>	2022-04-23 18:57:32 +05:30
Manivannan Sadhasivam	3a1b8e281a	bus: mhi: Make mhi_state_str[] array static inline and move to common.h mhi_state_str[] array could be used by MHI endpoint stack also. So let's make the array as "static inline function" and move it inside the "common.h" header so that the endpoint stack could also make use of it. Reviewed-by: Hemant Kumar <hemantk@codeaurora.org> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20220301160308.107452-11-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-18 14:02:55 +01:00
Manivannan Sadhasivam	d28cab4d4a	bus: mhi: Use bitfield operations for register read and write Functions like mhi_read_reg_field(), mhi_poll_reg_field() and mhi_write_reg_field() could be modified to not depend on the shift value passed as an argument. Instead, the bitfield operation could be used to extract the shift value from the mask itself. This eliminates the need to define _SHIFT (and _SHFT) macros and simplifies the code a bit. For shift values those cannot be determined during build time, "__ffs()" helper is used find the shift value during runtime. While at it, let's also get rid of 32-bit masks like CHDBOFF_CHDBOFF_MASK by doing the full 32-bit register read. Suggested-by: Alex Elder <elder@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20220301160308.107452-6-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-18 14:02:54 +01:00
Manivannan Sadhasivam	a0f5a63066	bus: mhi: Move host MHI code to "host" directory In preparation of the endpoint MHI support, let's move the host MHI code to its own "host" directory and adjust the toplevel MHI Kconfig & Makefile. While at it, let's also move the "pci_generic" driver to "host" directory as it is a host MHI controller driver. Reviewed-by: Hemant Kumar <hemantk@codeaurora.org> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20220301160308.107452-5-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-18 14:02:54 +01:00

14 Commits