Some MSI controllers can change address/data pair during the execution of
irq_chip::irq_set_affinity() callback. Since the current PCI Endpoint
framework cannot support mutable MSI controllers, call
irq_domain_is_msi_immutable() API to check if the controller is immutable
or not.
Also ensure that the MSI domain is a parent MSI domain so that it can
allocate address/data pairs.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[mani: reworded error message and commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20250710-ep-msi-v21-4-57683fc7fb25@nxp.com
Implement the doorbell feature by mapping the EP's MSI interrupt controller
message address to a dedicated BAR.
The EPF driver should pass the actual message data to be written to the
message address by the host through implementation-specific logic.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[mani: minor code cleanups and reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: fix kernel-doc]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20250710-ep-msi-v21-3-57683fc7fb25@nxp.com
Add driver support for DesignWare based PCIe controller in SG2044 SoC. The
driver currently supports the Root Complex mode.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
[mani: renamed the driver to 'pcie-sophgo.c' and Kconfig fix]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250504004420.202685-3-inochiama@gmail.com
Convert lock/unlock pairs to lock guard and tidy up the code.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/836cca37449c70922a2bea1fb13f37940a7a7132.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/1279fe6500a1d8135d8f5feb2f055df008746c88.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/b1353c797ce53714c22823de3bd2ae3d09fcd84f.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/5ac6e216bf2eaa438c8854baf2ff3e5cf0b2284f.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/b4620dc1808f217a69d0ae50700ffa12ffd657eb.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/ab4005db0a829549be1f348f6c27be50a2118b5e.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/76f6e6ce6021607cd0fdfd79fef7d2eb69d9f361.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message & fixed merge conflict]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/bfbd2e375269071b69e1aa85e629ee4b7c99518f.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message & squashed the kdoc cleanup patch]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/53946d74caf1fd134a1820eac82c3cf64d48779f.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/fa72703e06c2ee2c7554082c7152913eb0dd294f.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/0a88da04bb82bd588828a7889e9d58c515ea5dbb.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/68b2f9387bbe4f08bcd428bfab83ad1219fb8d80.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion, drop fwnode local var]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/af46c15c47a7716f7e0c50d0f7391509c95b49c2.1750858083.git.namcao@linutronix.de
Switch to msi_create_parent_irq_domain() from pci_msi_create_irq_domain()
which was using legacy MSI domain setup.
Signed-off-by: Nam Cao <namcao@linutronix.de>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: rebase on dev_fwnode() conversion]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/04d4a96046490e50139826c16423954e033cdf89.1750858083.git.namcao@linutronix.de
All irq_domain functions now accept fwnode instead of of_node. But many
PCI controllers still extract dev to of_node and then of_node to fwnode.
Instead, clean this up and simply use the dev_fwnode() helper to extract
fwnode directly from dev. Internally, it still does dev => of_node =>
fwnode steps, but it's now hidden from the users.
In the case of altera, this also removes an unused 'node' variable that is
only used when CONFIG_OF is enabled:
drivers/pci/controller/pcie-altera.c: In function 'altera_pcie_init_irq_domain':
drivers/pci/controller/pcie-altera.c:855:29: error: unused variable 'node' [-Werror=unused-variable]
855 | struct device_node *node = dev->of_node;
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # altera
[bhelgaas: squash together, rebase to precede msi-parent]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250521163329.2137973-1-arnd@kernel.org
Link: https://patch.msgid.link/20250611104348.192092-16-jirislaby@kernel.org
Link: https://patch.msgid.link/20250723065907.1841758-1-jirislaby@kernel.org
According to Documentation/PCI/endpoint/pci-endpoint-cfs.rst, the Endpoint
controller (EPC) should only start the link when userspace writes '1' to
the '/sys/kernel/config/pci_ep/controllers/<EPC>/start' attribute, which
ultimately results in calling imx_pcie_start_link() via
pci_epc_start_store().
To align with the documented behavior, do not start the link automatically
when adding the EP controller.
Fixes: 75c2f26da0 ("PCI: imx6: Add i.MX PCIe EP mode support")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: reworded commit subject and description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250709033722.2924372-3-hongxing.zhu@nxp.com
apps_reset corresponds to LTSSM_EN in i.MX7, i.MX8MQ, i.MX8MM and i.MX8MP
platforms. Since assertion/de-assertion of apps_reset is done in
imx_pcie_ltssm_enable() and imx_pcie_ltssm_disable(), remove it from
imx_pcie_assert_core_reset() and imx_pcie_deassert_core_reset().
This also fixes a failure in enumerating the PI7C9X2G608GP (hotplug) chip
reliably on i.MX8MM, as reported by Tim.
It should be noted that only i.MX7D, i.MX8MQ, i.MX8MM, and i.MX8MP
platforms have the apps_reset logic, so this change doesn't have any effect
on other platforms.
Fixes: ef61c7d8d0 ("PCI: imx6: Deassert apps_reset in imx_pcie_deassert_core_reset()")
Reported-by: Tim Harvey <tharvey@gateworks.com>
Closes: https://lore.kernel.org/all/CAJ+vNU3ohR2YKTwC4xoYrc1z-neDoH2TTZcMHDy+poj9=jSy+w@mail.gmail.com/
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: reworded commit subject and description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Tim Harvey <tharvey@gateworks.com> # imx8mp-venice-gw74xx (i.MX8MP + hotplug capable switch)
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250709033722.2924372-2-hongxing.zhu@nxp.com
Replace devm_add_action() with devm_add_action_or_reset() to avoid
explicitly dropping the 'port->clk' reference in error path.
Signed-off-by: Salah Triki <salah.triki@gmail.com>
[mani: reworded commit subject and description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/aHsgYALHfQbrgq0t@pc
Query support for Immediate Readiness irrespective of whether or not the
device supports PM capabilities, as nothing in the PCIe spec suggests that
Immediate Readiness is in any way dependent on PM functionality.
Fixes: d6112f8def ("PCI: Add support for Immediate Readiness")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Link: https://patch.msgid.link/20250722155926.352248-1-seanjc@google.com
Another utterly pointless aspect of the xgene-msi driver is that
it is built around CPU hotplug. Which is quite amusing since this
is one of the few arm64 platforms that, by construction, cannot
do CPU hotplug in a supported way (no EL3, no PSCI, no luck).
Drop the CPU hotplug nonsense and just setup the IRQs and handlers
in a less overdesigned way, grouping things more logically in the
process.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-13-maz@kernel.org
Now that we have made the dependency between the PCI driver and
the MSI driver explicit, there is no need to use subsys_initcall()
as a probing hook, and we can rely on builtin_platform_driver()
instead.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-12-maz@kernel.org
Since changing the affinity of an MSI really is about changing
the target address and that it isn't possible to mask an individual
MSI, it is completely possible for an interrupt to race with itself,
usually resulting in a lost interrupt.
Paper over the design blunder by informing the core code of this
sad state of affairs.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-11-maz@kernel.org
Plugging a device that doesn't use managed affinity on an XGene-1
machine results in messages such as:
genirq: irq_chip PCI-MSIX-0000:01:00.0 did not update eff. affinity mask of irq 39
As it turns out, the driver was never updated to populate the effective
affinity on irq_set_affinity() call, and the core code is prickly about
that.
But upon further investigation, it appears that the driver keeps repainting
the hwirq field of the irq_data structure as a way to track the affinity
of the MSI, something that is very much frowned upon as it breaks the
fundamentals of an IRQ domain (an array indexed by hwirq).
Fixing this results more or less in a rewrite of the driver:
- Define how a hwirq and a CPU affinity map onto the MSI termination
registers
- Allocate a single entry in the bitmap per MSI instead of *8*
- Correctly track CPU affinity
- Fix the documentation so that it actually means something (to me)
- Use standard bitmap iterators
- and plenty of other cleanups
With this, the driver behaves correctly on my vintage Mustang board.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[lpieralisi: replaced open coded GENMASK(6, 4) with MSInRx_HWIRQ_MASK]
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-10-maz@kernel.org
The xgene-msi driver uses an odd construct in the form of an
intermediate tracking structure, evidently designed to deal with
multiple instances of the MSI widget. However, the existing HW
only has one set, and it is obvious that there won't be new HW
coming down that particular line.
Simplify the driver by using a bit of pointer arithmetic instead,
directly tracking the interrupt and avoiding extra memory allocation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-9-maz@kernel.org
Since the MSI driver is probed as a platform device, there is no
reason to not use device-managed allocations. That's including
the top-level bookkeeping structure, which is better dynamically
allocated than being static.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-8-maz@kernel.org
The xgene_msi structure remembers both the of_node of the device
and the number of CPUs. All of which are perfectly useless.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-7-maz@kernel.org
The way the per-CPU interrupts are dealt with in the XGene MSI
driver isn't great:
- the affinity is set after the interrupt is enabled
- nothing prevents userspace from moving the interrupt around
- the affinity setting code pointlessly allocates memory
- the driver checks for conditions that cannot possibly happen
Address all of this in one go, resulting in slightly simpler setup
code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-6-maz@kernel.org
XGENE_PCIE_IP_VER_UNKN is only refered to when probing for the
original XGene PCIe implementation, and get immediately overridden
if the device has the "apm,xgene-pcie" compatible string.
Given that the only way to get there is by finding this very string in
the DT, it is obvious that we will always overwrite the version with
XGENE_PCIE_IP_VER_1.
Drop the whole thing.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-5-maz@kernel.org
pci-xgene.c only gets compiled if CONFIG_PCI_XGENE is selected.
It is therefore pointless to check for CONFIG_PCI_XGENE inside
the driver.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-4-maz@kernel.org
As a preparatory work to make the XGene MSI driver probe less of
a sorry hack, make the PCI driver check for the availability of
the MSI parent domain, and defer the probing otherwise.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-3-maz@kernel.org
If devicetree describes power supplies related to a PCI device, we
unnecessarily created a pwrctrl device even if CONFIG_PCI_PWRCTL was not
enabled.
We only need pci_pwrctrl_create_device() when CONFIG_PCI_PWRCTRL is
enabled. Compile it out when CONFIG_PCI_PWRCTRL is not enabled.
When pci_pwrctrl_create_device() creates and returns a pwrctrl device,
pci_scan_device() doesn't enumerate the PCI device. It assumes the pwrctrl
core will rescan the bus after turning on the power. However, if
CONFIG_PCI_PWRCTRL is not enabled, the rescan never happens, which breaks
PCI enumeration on any system that describes power supplies in devicetree
but does not use pwrctrl.
Jim reported that some brcmstb platforms break this way. The brcmstb
driver is still broken if CONFIG_PCI_PWRCTRL is enabled, but this commit at
least allows brcmstb to work when it's NOT enabled.
Fixes: 957f40d039 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()")
Reported-by: Jim Quinlan <james.quinlan@broadcom.com>
Link: https://lore.kernel.org/r/CA+-6iNwgaByXEYD3j=-+H_PKAxXRU78svPMRHDKKci8AGXAUPg@mail.gmail.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.15
Link: https://patch.msgid.link/20250701064731.52901-1-manivannan.sadhasivam@linaro.org
Add LUT entry for MSI/IOMMU in Endpoint mode by calling
imx_pcie_add_lut_by_rid() helper function. Since only one physical function
is supported in the Endpoint mode for now, '0' is passed as the Device ID.
This sets up a single LUT entry required for MSI/IOMMU.
The Endpoint function can operate without LUT configuration if neither
IOMMU nor MSI is used by the platform. This LUT configuration is used for
the EP doorbell feature by allowing the Root Complex to trigger the
doorbell on the Endpoint with the help of the Endpoint MSI controller.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[mani: reworded the comments & commit message and dropped tested-by tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250710-ep-msi-v21-2-57683fc7fb25@nxp.com
Add helper function imx_pcie_add_lut_by_rid(), which will be used by the
upcoming LUT configuration for MSI/IOMMU in the Endpoint mode. No
functional change.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[mani: reworded commit message and dropped tested-by tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250710-ep-msi-v21-1-57683fc7fb25@nxp.com
__iomem attribute is supposed to be used only with variables holding the
MMIO pointer. But here, 'mw_addr' variable is just holding a 'void *'
returned by pci_epf_alloc_space(). So annotating it with __iomem is clearly
wrong. Hence, drop the attribute.
This also fixes the below sparse warning:
drivers/pci/endpoint/functions/pci-epf-vntb.c:524:17: warning: incorrect type in assignment (different address spaces)
drivers/pci/endpoint/functions/pci-epf-vntb.c:524:17: expected void [noderef] __iomem *mw_addr
drivers/pci/endpoint/functions/pci-epf-vntb.c:524:17: got void *
drivers/pci/endpoint/functions/pci-epf-vntb.c:530:21: warning: incorrect type in assignment (different address spaces)
drivers/pci/endpoint/functions/pci-epf-vntb.c:530:21: expected unsigned int [usertype] *epf_db
drivers/pci/endpoint/functions/pci-epf-vntb.c:530:21: got void [noderef] __iomem *mw_addr
drivers/pci/endpoint/functions/pci-epf-vntb.c:542:38: warning: incorrect type in argument 2 (different address spaces)
drivers/pci/endpoint/functions/pci-epf-vntb.c:542:38: expected void *addr
drivers/pci/endpoint/functions/pci-epf-vntb.c:542:38: got void [noderef] __iomem *mw_addr
Fixes: e35f56bb03 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250709125022.22524-1-mani@kernel.org
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmh6uxMTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXsDqB/9EZrNSmMhoctjhBwrEjilRPaBHaD0f
7AGbdvs78sZDC2FMQeeqVeUDgv7nhZoT9B/cjSNkAXF8LdtEXOpiYY0sqG+fO8c9
2DHBLBBGbFg+/SI+2SLzgrg8ixplnEF2HPR3pIgpmvqOKJcBd6+LnlyXs3xuBhc/
XCerEwjHDpC27YQf3ZFvgomH1rw4w0yENZaz/ztpca41GScmbBq6K9c0SAzherBJ
IDcbSUHNkT9LoFdXG9gawwX++XrBsfsKwlfAX6lYuzj+rdY5uxJaqMBg5sIA5Huh
aRLB9gXHQVABTqp73kG76ZtBdmMsd2S1JF0YdRxdsL91MQ1C0n7AW9iK
=0Qh6
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed-20250718' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Select use CONFIG_SYSFB only if EFI is enabled (Michael Kelley)
- An assorted set of fixes to remove warnings for missing export.h
header inclusion (Naman Jain)
- An assorted set of fixes for when Linux run as the root partition
for Microsoft Hypervisor (Mukesh Rathor, Nuno Das Neves, Stanislav
Kinsburskii)
- Fix the check for HYPERVISOR_CALLBACK_VECTOR (Naman Jain)
- Fix fcopy tool to handle irregularities with size of ring buffer
(Naman Jain)
- Fix incorrect file path conversion in fcopy tool (Yasumasa Suenaga)
* tag 'hyperv-fixes-signed-20250718' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
tools/hv: fcopy: Fix irregularities with size of ring buffer
PCI: hv: Use the correct hypercall for unmasking interrupts on nested
x86/hyperv: Expose hv_map_msi_interrupt()
Drivers: hv: Use nested hypercall for post message and signal event
x86/hyperv: Clean up hv_map/unmap_interrupt() return values
x86/hyperv: Fix usage of cpu_online_mask to get valid cpu
PCI: hv: Don't load the driver for baremetal root partition
net: mana: Fix warnings for missing export.h header inclusion
PCI: hv: Fix warnings for missing export.h header inclusion
clocksource: hyper-v: Fix warnings for missing export.h header inclusion
x86/hyperv: Fix warnings for missing export.h header inclusion
Drivers: hv: Fix warnings for missing export.h header inclusion
Drivers: hv: Fix the check for HYPERVISOR_CALLBACK_VECTOR
tools/hv: fcopy: Fix incorrect file path conversion
Drivers: hv: Select CONFIG_SYSFB only if EFI is enabled
The delay that we are waiting on in brcm_pcie_start_link() is
PCIE_T_RRS_READY_MS, use it.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
[mani: Removed the redundant comment]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250624231923.990361-3-florian.fainelli@broadcom.com
Since it's not currently safe to take device_lock() in the IOMMU probe
path, that can race against really_probe() setting dev->driver before
attempting to bind. The race itself isn't so bad, since we're only
concerned with dereferencing dev->driver itself anyway, but sadly my
attempt to implement the check with minimal churn leads to a kind of
Time-of-Check to Time-of-Use (TOCTOU) issue, where dev->driver becomes
valid after to_pci_driver(NULL) is already computed, and thus the check
fails to work as intended.
Will and I both hit this with the platform bus, but the pattern here is
the same, so fix it for correctness too.
Fixes: bcb81ac6ae ("iommu: Get DT/ACPI parsing into the proper probe path")
Reported-by: Will McVicker <willmcvicker@google.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Will McVicker <willmcvicker@google.com>
Link: https://patch.msgid.link/20250425133929.646493-4-robin.murphy@arm.com
The DT binding has moved the PHY, PERST# properties to Root Port node from
the Host Bridge node. So add support for parsing the new binding. The new
binding uses 'reset-gpios' property for PERST#, hence parse the same
property in the driver instead of the legacy 'perst-gpios'.
To maintain DT backwards compatibility, fallback to the legacy method of
parsing the host bridge node if the properties are not present in the Root
Port node.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: refactored the root port parsing code, fixed a bug & commit message rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250702-perst-v5-2-920b3d1f6ee1@qti.qualcomm.com
For IMX8MM_EP and IMX8MP_EP, add fixed 256-byte BAR 4 and reserved BAR 5
in imx8m_pcie_epc_features.
Fixes: 75c2f26da0 ("PCI: imx6: Add i.MX PCIe EP mode support")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[bhelgaas: add details in subject]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250708091003.2582846-3-hongxing.zhu@nxp.com
The PCI core has historically not allowed built-in drivers to opt in to
async initial probing: Drivers may set "PROBE_PREFER_ASYNCHRONOUS", but
initial probing always happens synchronously. That's because the PCI core
uses device_attach() instead of device_initial_probe().
Should a driver return -EPROBE_DEFER on initial probe, reprobing later on
does honor the PROBE_PREFER_ASYNCHRONOUS setting. Modular drivers are
also allowed to probe asynchronously, which is inconsistent.
The choice of device_attach() is likely not deliberate: It was introduced
in 2013 with commit 58d9a38f6f ("PCI: Skip attaching driver in
device_add()"), but asynchronous probing was added two years later with
commit 765230b5f0 ("driver-core: add asynchronous probing support for
drivers").
According to the kernel-doc of "enum probe_type", "the end goal is to
switch the kernel to use asynchronous probing by default". To this end,
use device_initial_probe() to allow asynchronous initial probing. The
function returns void, making the return value check unnecessary.
Initial PCI probing often takes on the order of seconds even on laptops,
so this may speed up booting significantly.
A small number of PCI drivers already opt in to asynchronous probing.
Their maintainers (who are all cc'ed) should watch out for issues, now
that asynchronous probing is not just allowed for deferred and modular
probing, but also initial probing:
hl_pci_driver drivers/accel/habanalabs/common/habanalabs_drv.c
cxl_pci_driver drivers/cxl/pci.c
quicki2c_driver drivers/hid/intel-thc-hid/intel-quicki2c/pci-quicki2c.c
quickspi_driver drivers/hid/intel-thc-hid/intel-quickspi/pci-quickspi.c
i801_driver drivers/i2c/busses/i2c-i801.c
mei_me_driver drivers/misc/mei/pci-me.c
mei_vsc_drv drivers/misc/mei/platform-vsc.c
sdhci_driver drivers/mmc/host/sdhci-pci-core.c
nvme_driver drivers/nvme/host/pci.c
ehci_pci_driver drivers/usb/host/ehci-pci.c
hvfb_pci_stub_driver drivers/video/fbdev/hyperv_fb.c
All other driver maintainers may test asynchronous probing by specifying
the command line parameter "driver_async_probe=drv_name1,drv_name2,...",
and on success setting "probe_type = PROBE_PREFER_ASYNCHRONOUS" in the
pci_driver struct.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
[bhelgaas: updated commit log per https://lore.kernel.org/r/aHYUh7WoDlhHckxd@wunner.de]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/53abe6f5ac7c631f95f5d061aa748b192eda0379.1751614426.git.lukas@wunner.de
Running as nested root on MSHV imposes a different requirement
for the pci-hyperv controller.
In this setup, the interrupt will first come to the L1 (nested) hypervisor,
which will deliver it to the appropriate root CPU. Instead of issuing the
RETARGET hypercall, issue the MAP_DEVICE_INTERRUPT hypercall to L1 to
complete the setup.
Rename hv_arch_irq_unmask() to hv_irq_retarget_interrupt().
Co-developed-by: Jinank Jain <jinankjain@linux.microsoft.com>
Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/1752261532-7225-4-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1752261532-7225-4-git-send-email-nunodasneves@linux.microsoft.com>
Drivers could leverage the fact that the VF BAR MMIO reservation is created
for total number of VFs supported by the device by resizing the BAR to
larger size when smaller number of VFs is enabled.
Add pci_iov_vf_bar_set_size() to control the size and a
pci_iov_vf_bar_get_sizes() helper to get the VF BAR sizes that will allow
up to num_vfs to be successfully enabled with the current underlying
reservation size.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250702093522.518099-6-michal.winiarski@intel.com
When the resource representing a VF MMIO BAR reservation is created, its
size is always large enough to accommodate the BAR of all SR-IOV Virtual
Functions that can potentially be created (total VFs). If for whatever
reason it's not possible to accommodate all VFs, the resource is not
assigned and no VFs can be created.
An upcoming change will allow VF BAR size to be modified by drivers at a
later point in time, which means that the check for resource assignment is
no longer sufficient.
Add an additional check that verifies that the VF BAR for all enabled VFs
fits within the underlying reservation resource.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250702093522.518099-5-michal.winiarski@intel.com
Similar to regular resizable BARs, VF BARs can also be resized.
The capability layout is the same as PCI_EXT_CAP_ID_REBAR, which means we
can reuse most of the implementation, the only difference being resource
size calculation (which is multiplied by total VFs) and memory decoding
(which is controlled by a separate VF MSE field in SR-IOV cap).
Extend the pci_resize_resource() function to accept IOV resources.
See PCIe r6.2, sec 7.8.7.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250702093522.518099-4-michal.winiarski@intel.com
There are multiple places where conversions between IOV resources and
corresponding VF BAR numbers are done.
Extract the logic to pci_resource_num_from_vf_bar() and
pci_resource_num_to_vf_bar() helpers.
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20250702093522.518099-3-michal.winiarski@intel.com
Similar to regular resizable BARs, VF BARs can also be resized, e.g. by the
system firmware or the PCI subsystem itself.
The capability layout is the same as PCI_EXT_CAP_ID_REBAR.
Add the capability ID and restore it as a part of IOV state.
See PCIe r6.2, sec 7.8.7.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20250702093522.518099-2-michal.winiarski@intel.com
- Fix a randconfig build failure in armada-370-xp irqchip
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhzhl8ACgkQEsHwGGHe
VUozSw/9ENX6y5jnVhEGyhGHK0jaNN6FaXT6kqIXgchspzyKd9wX+d7nzk2cBCZp
YgfjpXRPP3ltEQqEvnjTjm1yKJnvmqUJgfE/r3FdjL8HlgodS1Uthyt19XLU9kqL
Dh8tn46Z6s7/0cE81YRBE8ALw6WEAuYJ7k48MYyDNfJ2GMwCQkcm1ZwkB7GvJ9Ni
hcz8aToudDydqqqfaqTMmE5lASrKKCsqqead4FGhbxTlSJt2U8syRkZwSF4OhHB3
9UxUIQo9ftkqXzUQiUbrHdfzQhqXQ7WfU+N6atbIeiACkh3kVRpLSwIAVktg+k9j
BMWjBRJBiCw8pEV3q9Wk+cI+UpbuBGiZz0QNOcBdWO7RxrSibgphaTGMoj4bBCWU
1WqCdoDvw06vAcuSe4K7wc8aERW8GkjgCTGQFsOWmQmG2StpptmfGmc52TxLOjtf
0bPGjb1NLQu8+nGJpFEjtm2Zah+Eb6aCNIvDihirWNcTw8mnOmdJgwWCc4E1WgJN
m8XcKjKzG3s5b0suQdubBViFGdbeondijiaRB6nCfJg8HYpk41Q6mm5gYmlkhYYQ
R918Io8pyJqvsBfu9LXuQmEIXpWejF/rxEkflI6bYilUbgUgqfEG3koQ1F6Y5a72
vO0aYMLfDRcorh5FkLCO6gElPZaa4KBpK14vPAuVgWtKIlXSLlM=
=znXh
-----END PGP SIGNATURE-----
Merge tag 'irq_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix a case of recursive locking in the MSI code
- Fix a randconfig build failure in armada-370-xp irqchip
* tag 'irq_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-msi-lib: Fix build with PCI disabled
PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmhxaygUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vx0ORAAn2zwwXjlSDU1V72rHX/P8aBDsmRs
R5jBD5aL+xEkoBnCguawkNl89YpKqtRo6G9yMfpSUziLvlue9F1vI82hleeHwtaZ
zcM3YdCDPz/xukQXTt2K6BniBqPNrbU0U1AXlFkjL2Gpmtp/pmr2nGn85X0OP2t8
jFGNrVFsogwPGkX7lC4LvPJozX6/PV2l4J9qF/NsuUQ8DKpYPuBjKli3C41TR13o
XA5YLIsUInvLcdfyccgmx9skCO5ZRph6WWkGF8Cvi/S67KdaYFlsGl8NTYd8KTgk
shGJjtc9Fu3w0mQMXDyMxk4EsVPT+7LZx7aJXADixa6sbOZvnRYBGJmtAqCHCgfT
6eGIzh2DIxoaEM+0F1W8ux4nmmozRTFlZPUXJBcDXuoxwicwUkZsDWuDU0WCzWia
br/uJq3queO6uemSYwMtf/UcfKLJMF4P6YG/PgRCjsD0/J1+RiclATmhaZU96H3P
qeOJ4yKhvyLQsbWW6uDxUFmv+WcHSXbOcQb1ywlLMmR8sWEf2U3F+Db0276QSFDz
Pypip9mfv0uWdf5xSuKC4FeOyOYAzopqwDgZJGK76RwB3jUnfJ6E8/yfVyhguz65
YX+NEWJ3udxK7EJXhN5Hs98xIn4xVPRjy4KtMtaXv1EjgnuxUA0/GweiMLmIbc7h
2+A3Ex1grahZuc4=
=qMbU
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Track apple Root Ports explicitly and look up the driver data from
the struct device instead of using dev->driver_data, which is used by
pci_host_common_init() for the generic host bridge pointer (Marc
Zyngier)
- Set dev->driver_data before pci_host_common_init() calls
gen_pci_init() because some drivers need it to set up ECAM mappings;
this fixes a regression on MicroChip MPFS Icicle (Geert Uytterhoeven)
- Revert the now-unnecessary use of ECAM pci_config_window.priv to
store a copy of dev->driver_data (Marc Zyngier)
* tag 'pci-v6.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: ecam: Allow cfg->priv to be pre-populated from the root port device"
PCI: host-generic: Set driver_data before calling gen_pci_init()
PCI: apple: Add tracking of probed root ports
Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().
While doing the conversion, I noticed that hv_compose_msi_msg() is doing
more than it is supposed to (composing message). This function also
allocates and populates struct tran_int_desc, which should be done in
hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
However, I have no hardware to test such change, therefore I leave a TODO
note.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci_msix_write_tph_tag() takes the per device MSI descriptor mutex and then
invokes msi_domain_get_virq(), which takes the same mutex again. That
obviously results in a system hang which is exposed by a softlockup or
lockdep warning.
Move the lock guard after the invocation of msi_domain_get_virq() to fix
this.
[ tglx: Massage changelog by adding a proper explanation and removing the
not really useful stacktrace ]
Fixes: d5124a9957 ("PCI/MSI: Provide a sane mechanism for TPH")
Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Link: https://lore.kernel.org/all/20250708222530.1041477-1-himanshu.madhani@oracle.com
The root partition only uses VMBus when running nested.
When running on baremetal the Hyper-V PCI driver is not needed,
so do not initialize it.
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/1751582677-30930-2-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1751582677-30930-2-git-send-email-nunodasneves@linux.microsoft.com>
Fix below warning in Hyper-V PCI driver that comes when kernel is
compiled with W=1 option. Include export.h in driver files to fix it.
* warning: EXPORT_SYMBOL() is used, but #include <linux/export.h>
is missing
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250611100459.92900-6-namjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250611100459.92900-6-namjain@linux.microsoft.com>
The current BAR configuration for the PCI vNTB endpoint function allocates
BARs in order, which lacks flexibility and does not account for
platform-specific quirks. This is problematic on Renesas platforms, where
BAR_4 is a fixed 256B region that ends up being used for MW1, despite being
better suited for doorbells.
Add new configfs attributes to allow users to specify arbitrary BAR
assignments. If no configuration is provided, the driver retains its
original behavior of sequential BAR allocation, preserving compatibility
with existing userspace setups.
This enables use cases such as assigning BAR_2 for MW1 and using the
limited BAR_4 for doorbells on Renesas platforms.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
[mani: adjusted the indent of EPF_NTB_BAR_W, fixed kdoc & squashed bar fix]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250603-pci-vntb-bar-mapping-v2-3-fc685a22ad28@baylibre.com
IRQchip drivers need a PCI/MSI function to map a RID to a MSI
controller deviceID namespace and at the same time retrieve the
struct device_node pointer of the MSI controller the RID is mapped
to.
Add pci_msi_map_rid_ctlr_node() to achieve this purpose.
Cc Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-25-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Rockchip controllers can support up to 5.0 GT/s link speed. But the driver
doesn't set the Target Link Speed currently. This may cause failure in
retraining the link to 5.0 GT/s if supported by the endpoint. So set the
Target Link Speed to 5.0 GT/s in the Link Control and Status Register 2.
Fixes: e77f847df5 ("PCI: rockchip: Add Rockchip PCIe controller support")
Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com>
[mani: fixed whitespace warning, commit message rewording, added fixes tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/0afa6bc47b7f50e2e81b0b47d51c66feb0fb565f.1751322015.git.geraldogabriel@gmail.com
Current code uses custom-defined register offsets and bitfields for the
standard PCIe registers. This creates duplication as the PCI header already
defines them. So, switch to using the standard PCIe definitions and drop
the custom ones.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Geraldo Nascimento <geraldogabriel@gmail.com>
[mani: commit message rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: include bitfield.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/e81700ef4b49f584bc8834bfb07b6d8995fc1f42.1751322015.git.geraldogabriel@gmail.com
The PCI bus type does not expect its runtime PM suspend callback
function, pci_pm_runtime_suspend(), to be invoked at all during system-
wide suspend and resume, and it does not expect its runtime resume
callback function, pci_pm_runtime_resume(), to be invoked at any point
when runtime PM is disabled for the given device during system-wide
suspend and resume, so make it express that expectation by setting
power.strict_midlayer for all PCI devices in pci_pm_prepare() and
clear it in pci_pm_complete().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/1925097.atdPhlSkOF@rjwysocki.net
Rename gen_pci_init() API as pci_host_common_ecam_create() and export it to
allow the PCIe controller drivers to create and configure the ECAM region.
Note that this API should only used by the drivers managing the drvdata on
their own. Rest should continue using pci_host_common_init() API.
Signed-off-by: Mayank Rana <mayank.rana@oss.qualcomm.com>
[mani: commit message rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250616224259.3549811-3-mayank.rana@oss.qualcomm.com
Export dw_pcie_msi_host_init(), dw_pcie_msi_init(), and dw_pcie_free_msi()
APIs to allow them to be reused by the upcoming DWC based ECAM driver
implementation. Also, move MSI IRQ related initialization code to
dw_pcie_msi_init(), as this code must be executed before dw_pcie_msi_init()
API can be used with ECAM driver.
Signed-off-by: Mayank Rana <mayank.rana@oss.qualcomm.com>
[mani: commit message rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250616224259.3549811-2-mayank.rana@oss.qualcomm.com
This reverts commit 4900454b4f.
Now that nobody relies of cfg->priv containing anything useful before the
.init() callback is used, restore the previous behaviour.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250625111806.4153773-4-maz@kernel.org
The apple driver relies on being able to directly find the matching root
port structure from the platform device that represents this port.
A previous hack stashed a pointer to the root port structure in the config
window private pointer, but that ended up relying on assumptions that break
other drivers.
Instead, bite the bullet and track the association as part of the driver
itself as a list of probed root ports.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250625111806.4153773-3-maz@kernel.org
When a PCIe device detects an error, it logs the error locally and issues
an error Message routed to the Root Complex (PCIe r6.0, sec 6.2.5). If the
Root Port or RCEC supports AER and Linux has enabled the AER interrupt,
aer_isr() traverses the relevant devices and adds those with AER errors
logged to the aer_err_info.dev[] array for error logging and recovery.
If aer_isr() finds more than AER_MAX_MULTI_ERR_DEVICES devices with AER
errors logged, it silently ignores them, and those extra devices are not
included in the recovery flow.
Emit an error message if we find more than AER_MAX_MULTI_ERR_DEVICES
devices with AER errors logged.
Testing details at link below.
Signed-off-by: Akshay Jindal <akshayaj.lkd@gmail.com>
[bhelgaas: commit log, join error message]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250619185041.73240-1-akshayaj.lkd@gmail.com
An endpoint driver configfs attributes group is added to the
epf_group list of struct pci_epf_driver by pci_epf_add_cfs() but an
added group is not removed from this list when the attribute group is
unregistered with pci_ep_cfs_remove_epf_group().
Add the missing list_del() call in pci_ep_cfs_remove_epf_group()
to correctly remove the attribute group from the driver list.
With this change, once the loop over all attribute groups in
pci_epf_remove_cfs() completes, the driver epf_group list should be
empty. Add a WARN_ON() to make sure of that.
Fixes: ef1433f717 ("PCI: endpoint: Create configfs entry for each pci_epf_device_id table entry")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250624114544.342159-3-dlemoal@kernel.org
Doing a list_del() on the epf_group field of struct pci_epf_driver in
pci_epf_remove_cfs() is not correct as this field is a list head, not
a list entry. This list_del() call triggers a KASAN warning when an
endpoint function driver which has a configfs attribute group is torn
down:
==================================================================
BUG: KASAN: slab-use-after-free in pci_epf_remove_cfs+0x17c/0x198
Write of size 8 at addr ffff00010f4a0d80 by task rmmod/319
CPU: 3 UID: 0 PID: 319 Comm: rmmod Not tainted 6.16.0-rc2 #1 NONE
Hardware name: Radxa ROCK 5B (DT)
Call trace:
show_stack+0x2c/0x84 (C)
dump_stack_lvl+0x70/0x98
print_report+0x17c/0x538
kasan_report+0xb8/0x190
__asan_report_store8_noabort+0x20/0x2c
pci_epf_remove_cfs+0x17c/0x198
pci_epf_unregister_driver+0x18/0x30
nvmet_pci_epf_cleanup_module+0x24/0x30 [nvmet_pci_epf]
__arm64_sys_delete_module+0x264/0x424
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xac/0x230
do_el0_svc+0x40/0x58
el0_svc+0x48/0xdc
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x198/0x19c
...
Remove this incorrect list_del() call from pci_epf_remove_cfs().
Fixes: ef1433f717 ("PCI: endpoint: Create configfs entry for each pci_epf_device_id table entry")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250624114544.342159-2-dlemoal@kernel.org
Move the LINK_WAIT_SLEEP_MS and LINK_WAIT_MAX_RETRIES macros to pci.h.
Prefix the macros with PCIE_ in order to avoid redefining these for
drivers that already have macros named like this.
No functional changes.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250625102347.1205584-15-cassel@kernel.org
As per PCIe r6.0, sec 6.6.1, a Downstream Port that supports Link speeds
greater than 5.0 GT/s, software must wait a minimum of 100 ms after Link
training completes before sending a Configuration Request.
Add this delay in dw_pcie_wait_for_link(), after the link is reported as
up. The delay will only be performed in the success case where the link
came up.
DWC glue drivers that have a link up IRQ (drivers that set
use_linkup_irq = true) do not call dw_pcie_wait_for_link(), instead they
perform this delay in their threaded link up IRQ handler.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Link: https://patch.msgid.link/20250625102347.1205584-14-cassel@kernel.org
Per PCIe r6.0, sec 6.6.1, software must generally wait a minimum of
100ms (PCIE_RESET_CONFIG_WAIT_MS) after Link training completes before
sending a Configuration Request.
Prior to 36971d6c5a ("PCI: qcom: Don't wait for link if we can detect
Link Up"), qcom used dw_pcie_wait_for_link(), which waited between 0
and 90ms after the link came up before we enumerate the bus, and this
was apparently enough for most devices.
After 36971d6c5a, qcom_pcie_global_irq_thread() started enumeration
immediately when handling the link-up IRQ, and devices (e.g., Laszlo
Fiat's PLEXTOR PX-256M8PeGN NVMe SSD) may not be ready to handle config
requests yet.
Delay PCIE_RESET_CONFIG_WAIT_MS after the link-up IRQ before starting
enumeration.
Fixes: 82a823833f ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Link: https://patch.msgid.link/20250625102347.1205584-13-cassel@kernel.org
Per PCIe r6.0, sec 6.6.1, software must generally wait a minimum of
100ms (PCIE_RESET_CONFIG_WAIT_MS) after Link training completes before
sending a Configuration Request.
Prior to ec9fd499b9 ("PCI: dw-rockchip: Don't wait for link since
we can detect Link Up"), dw-rockchip used dw_pcie_wait_for_link(),
which waited between 0 and 90ms after the link came up before we
enumerate the bus, and this was apparently enough for most devices.
After ec9fd499b9, rockchip_pcie_rc_sys_irq_thread() started
enumeration immediately when handling the link-up IRQ, and devices
(e.g., Laszlo Fiat's PLEXTOR PX-256M8PeGN NVMe SSD) may not be ready
to handle config requests yet.
Delay PCIE_RESET_CONFIG_WAIT_MS after the link-up IRQ before starting
enumeration.
Fixes: 0e898eb8df ("PCI: rockchip-dwc: Add Rockchip RK356X host controller driver")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Cc: Laszlo Fiat <laszlo.fiat@proton.me>
Link: https://patch.msgid.link/20250625102347.1205584-12-cassel@kernel.org
Macro PCIE_RESET_CONFIG_DEVICE_WAIT_MS was added to pci.h in commit
d5ceb9496c ("PCI: Add PCIE_RESET_CONFIG_DEVICE_WAIT_MS waiting time
value").
Later, in commit 70a7bfb1e5 ("PCI: rockchip-host: Wait 100ms after reset
before starting configuration"), PCIE_T_RRS_READY_MS was added to pci.h.
These macros are duplicates, and represent the exact same delay in the
PCIe specification.
Since the comment above PCIE_RESET_CONFIG_WAIT_MS is strictly more correct
than the comment above PCIE_T_RRS_READY_MS, change rockchip-host to use
PCIE_RESET_CONFIG_WAIT_MS, and remove PCIE_T_RRS_READY_MS, as
rockchip-host is the only user of this macro.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Link: https://patch.msgid.link/20250625102347.1205584-11-cassel@kernel.org
In a89c82249c ("PCI: Work around PCIe link training failures"), if the
speed limit is set to 2.5 GT/s and the retraining is successful, an attempt
will be made to lift the speed limit. One condition for lifting the speed
limit is to check whether the link speed field of the Link Control 2
register is PCI_EXP_LNKCTL2_TLS_2_5GT.
However, since de9a6c8d5d ("PCI/bwctrl: Add pcie_set_target_speed() to
set PCIe Link Speed"), the `lnkctl2` local variable does not undergo any
changes during the speed limit setting and retraining process. As a result,
the code intended to lift the speed limit is not executed.
To address this issue, adjust the position of the Link Control 2 register
read operation in the code and place it before its use.
Fixes: de9a6c8d5d ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250123055155.22648-3-sjiwei@163.com
When pcie_failed_link_retrain() fails to retrain, it tries to revert to the
previous link speed. However it calculates that speed from the Link
Control 2 register without masking out non-speed bits first.
PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to
PCI_SPEED_UNKNOWN (0xff), which in turn causes a WARN splat in
pcie_set_target_speed():
pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port
pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
pci 0000:00:01.1: retraining failed
WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed
RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000
pcie_failed_link_retrain
pci_device_add
pci_scan_single_device
Mask out the non-speed bits in PCIE_LNKCTL2_TLS2SPEED() and
PCIE_LNKCAP_SLS2SPEED() so they don't incorrectly return PCI_SPEED_UNKNOWN.
Fixes: de9a6c8d5d ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Reported-by: Andrew <andreasx0@protonmail.com>
Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
[bhelgaas: commit log, add details from https://lore.kernel.org/r/1c92ef6bcb314ee6977839b46b393282e4f52e74.1750684771.git.lukas@wunner.de]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org # v6.13+
Link: https://patch.msgid.link/20250123055155.22648-2-sjiwei@163.com
This reverts commit 631b2af2f3 ("PCI/ACPI: Fix allocated memory release
on error in pci_acpi_scan_root()").
The reverted patch causes the 'ri->cfg' and 'root_ops' resources to be
released multiple times.
When acpi_pci_root_create() fails, these resources have already been
released internally by the __acpi_pci_root_release_info() function.
Releasing them again in pci_acpi_scan_root() leads to incorrect behavior
and potential memory issues.
We plan to resolve the issue using a more appropriate fix.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aEmdnuw715btq7Q5@stanley.mountain/
Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn>
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250619072608.2075475-1-qiaozhe@iscas.ac.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The config file related to the memory windows start the numbering of
the MW from 1. The other NTB function does the same, yet the enumeration
defining the BARs of the vNTB function starts numbering the MW from 0.
Both numbering should be fine, but mixing the two is a bit confusing. The
configfs file being the interface with userspace, keep that stable and
consistently start the numbering of the MW from 1.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
[mani: commit message rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250603-pci-vntb-bar-mapping-v2-2-fc685a22ad28@baylibre.com
According the function documentation of epf_ntb_init_epc_bar(), the
function should return an error code on error. However, it returns -1 when
no BAR is available i.e., when pci_epc_get_next_free_bar() fails.
Return -ENOENT instead.
Fixes: e35f56bb03 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
[mani: changed err code to -ENOENT]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250603-pci-vntb-bar-mapping-v2-1-fc685a22ad28@baylibre.com
By default, the driver relies on the default hardware defined value for the
Max Link Width (MLW) capability. But if the "num-lanes" DT property is
present, assume that the chip's default capability information is incorrect
or undesired, and use the specified value instead.
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
[mani: reworded the description and comments]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250530224035.41886-3-james.quinlan@broadcom.com
We need the driver-core fixes that are in 6.16-rc3 into here as well
to build on top of.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RK3588 TRM, section "11.6.1.3.3 Hot Reset and Link-Down Reset" states that:
If you want to delay link re-establishment (after reset) so that you can
reprogram some registers through DBI, you must set app_ltssm_enable =0
immediately after core_rst_n as shown in above. This can be achieved by
enable the app_dly2_en, and end-up the delay by assert app_dly2_done.
I.e. setting app_dly2_en will automatically deassert app_ltssm_enable on
a hot reset, and setting app_dly2_done will re-assert app_ltssm_enable,
re-enabling link training.
When receiving a hot reset/link-down IRQ when running in EP mode, we will
call dw_pcie_ep_linkdown(), which may update registers through DBI. Unless
link training is inhibited, these register updates race with the link
training.
To avoid the race, set PCIE_LTSSM_APP_DLY2_EN so the controller never
automatically trains the link after a link-down or hot reset interrupt.
That way any DBI updates done in the dw_pcie_ep_linkdown() path will happen
while the link is still down. Then allow link training by setting
PCIE_LTSSM_APP_DLY2_DONE
Co-developed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250613101908.2182053-2-cassel@kernel.org