Commit Graph

11835 Commits

Author SHA1 Message Date
Hector Martin
de9637c9f7 PCI: apple: Drop poll for CORE_RC_PHYIF_STAT_REFCLK
This is checking a core refclk in per-port setup which doesn't make a
lot of sense, and the bootloader needs to have gone through this anyway.

It doesn't work on T602x, so just drop it across the board.

Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-11-maz@kernel.org
2025-04-23 12:52:49 +05:30
Hector Martin
80b31fbbca PCI: apple: Move port PHY registers to their own reg items
T602x PCIe cores move these registers around. Instead of hardcoding in
another offset, let's move them into their own reg entries. This matches
what Apple does on macOS device trees too.

Maintains backwards compatibility with old DTs by using the old offsets.

Note that we open code devm_platform_ioremap_resource_byname() to avoid
error messages on older platforms with missing resources in the pcie
node. ("pcie-apple 590000000.pcie: invalid resource (null)" on probe)

Co-developed-by: Janne Grunau <j@jannau.net>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-10-maz@kernel.org
2025-04-23 12:52:49 +05:30
Hector Martin
7fa9fbf391 PCI: apple: Fix missing OF node reference in apple_pcie_setup_port
In the success path, we hang onto a reference to the node, so make sure
to grab one. The caller iterator puts our borrowed reference when we
return.

Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-9-maz@kernel.org
2025-04-23 12:52:48 +05:30
Marc Zyngier
0411c90eee PCI: apple: Move away from INTMSK{SET,CLR} for INTx and private interrupts
T602x seems to have dropped the rather useful SET/CLR accessors
to the masking register.

Instead, let's use the mask register directly, and wrap it with
a brand new spinlock. No, this isn't moving in the right direction.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-8-maz@kernel.org
2025-04-23 12:52:48 +05:30
Marc Zyngier
ed982862ce PCI: apple: Dynamically allocate RID-to_SID bitmap
As we move towards supporting SoCs with varying RID-to-SID mapping
capabilities, turn the static SID tracking bitmap into a dynamically
allocated one. The current allocation size is still the same, but
that's about to change.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-7-maz@kernel.org
2025-04-23 12:52:48 +05:30
Marc Zyngier
4785591f96 PCI: apple: Move over to standalone probing
Now that we have the required infrastructure, split the Apple PCIe
setup into two categories:

- stuff that has to do with PCI setup stays in the .init() callback

- stuff that is just driver gunk (such as MSI setup) goes into a
  probe routine, which will eventually call into the host-common
  code

The result is a far more logical setup process.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-6-maz@kernel.org
2025-04-23 12:52:48 +05:30
Marc Zyngier
4900454b4f PCI: ecam: Allow cfg->priv to be pre-populated from the root port device
In order to decouple ecam config space creation from probing via
pci_host_common_probe(), allow the private pointer to be populated
via the device drvdata pointer.

Crucially, this is set before calling ops->init(), allowing that
particular callback to have access to probe data.

This should have no impact on existing code which ignores the
current value of cfg->priv.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-5-maz@kernel.org
2025-04-23 12:52:47 +05:30
Marc Zyngier
afc0a570bb PCI: host-generic: Extract an ECAM bridge creation helper from pci_host_common_probe()
pci_host_common_probe() is an extremely useful helper, as it
abstracts away most of the gunk that a "mostly-ECAM-compliant"
device driver needs.

However, it is structured as a probe function, meaning that a lot
of the driver-specific setup has to happen in a .init() callback,
after the bridge and config space have been instantiated.

This is a bit awkward, and results in a number of convolutions
that could be avoided if the host-common code was more like
a library.

Introduce a pci_host_common_init() helper that does exactly that,
taking the platform device and a struct pci_ecam_op as parameters.

This can then be called from the probe routine, and a lot of the
code that isn't relevant to PCI setup moved away from the .init()
callback. This also removes the dependency on the device match
data, which is an oddity.

Signed-off-by: Marc Zyngier <maz@kernel.org>
[mani: fixed spelling mistakes]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patch.msgid.link/20250401091713.2765724-4-maz@kernel.org
2025-04-23 12:52:11 +05:30
Nitheesh Sekar
3e5127469a PCI: qcom: Add support for IPQ5018
Add IPQ5018 platform with is based on Qcom IP rev. 2.9.0 and Synopsys IP
rev. 5.00a.

The platform itself has two PCIe Gen2 controllers: one single-lane and
one dual-lane. So add the IPQ5018 compatible and re-use 2_9_0 ops.

Signed-off-by: Nitheesh Sekar <quic_nsekar@quicinc.com>
Signed-off-by: Sricharan R <quic_srichara@quicinc.com>
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250326-ipq5018-pcie-v7-4-e1828fef06c9@outlook.com
2025-04-23 12:51:20 +05:30
Krishna Chaitanya Chundru
09483959e3 PCI: dwc: Add support for configuring lane equalization presets
PCIe equalization presets are predefined settings used to optimize signal
integrity by compensating for signal loss and distortion in high-speed data
transmission.

Based upon the number of lanes and the data rate supported, write the
preset data read from the device tree in to the lane equalization control
registers. These preset values will be used by the controller during the
LTSSM lane equalization procedure.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: reworded the commit message and comments in the driver]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250328-preset_v6-v9-5-22cfa0490518@oss.qualcomm.com
2025-04-23 12:50:54 +05:30
Hans Zhang
8805f32a96 PCI: cadence: Fix runtime atomic count underflow
If the call to pci_host_probe() in cdns_pcie_host_setup() fails, PM
runtime count is decremented in the error path using pm_runtime_put_sync().
But the runtime count is not incremented by this driver, but only by the
callers (cdns_plat_pcie_probe/j721e_pcie_probe). And the callers also
decrement the runtime PM count in their error path. So this leads to the
below warning from the PM core:

	"runtime PM usage count underflow!"

So fix it by getting rid of pm_runtime_put_sync() in the error path and
directly return the errno.

Fixes: 49e427e6bd ("Merge branch 'pci/host-probe-refactor'")
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250419133058.162048-1-18255117159@163.com
2025-04-23 12:41:32 +05:30
Niklas Cassel
a7d824b2df PCI: rockchip-ep: Mark RK3399 as intx_capable
RK3399 can raise INTx interrupts, as can be seen by
rockchip_pcie_ep_send_intx_irq().

This is also in line with the register description of
PCIE_CLIENT_LEGACY_INT_CTRL, section "17.6.3 PCIe Client Detail Register
Description" of the RK3399 TRM.

Thus, mark RK3399 as intx_capable.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250416142749.336298-2-cassel@kernel.org
2025-04-20 09:26:36 +05:30
Janne Grunau
751bec089c PCI: apple: Set only available ports up
Iterating over disabled ports results in of_irq_parse_raw() parsing
the wrong "interrupt-map" entries, as it takes the status of the node
into account.

This became apparent after disabling unused PCIe ports in the Apple
Silicon device trees instead of deleting them.

Switching from for_each_child_of_node_scoped() to
for_each_available_child_of_node_scoped() solves this issue.

Fixes: 1e33888fbe ("PCI: apple: Add initial hardware bring-up")
Fixes: a0189fdfb7 ("arm64: dts: apple: t8103: Disable unused PCIe ports")
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/asahi/20230214-apple_dts_pcie_disable_unused-v1-0-5ea0d3ddcde3@jannau.net/
Link: https://lore.kernel.org/asahi/1ea2107a-bb86-8c22-0bbc-82c453ab08ce@linaro.org/
Link: https://patch.msgid.link/20250401091713.2765724-2-maz@kernel.org
2025-04-19 20:23:51 +05:30
Rob Herring (Arm)
5da3d94a23 PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"
The mvebu "ranges" is a bit unusual with its own encoding of addresses,
but it's still just normal "ranges" as far as parsing is concerned.
Convert mvebu_get_tgt_attr() to use the for_each_of_range() iterator
instead of open coding the parsing.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20241107153255.2740610-1-robh@kernel.org
2025-04-19 19:43:41 +05:30
Krishna Chaitanya Chundru
f9eb654fb1 PCI: dwc: Update pci->num_lanes to maximum supported link width
If the num-lanes property is not present in the devicetree, update
pci->num_lanes with the hardware supported maximum link width using
the newly introduced dw_pcie_link_get_max_link_width() API.

The API is used to get the Maximum Link Width (MLW) of the controller.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: reworded commit message a bit]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250328-preset_v6-v9-3-22cfa0490518@oss.qualcomm.com
2025-04-19 19:42:38 +05:30
Krishna Chaitanya Chundru
57a4591df7 PCI: of: Add of_pci_get_equalization_presets() API
PCIe equalization presets are predefined settings used to optimize
signal integrity by compensating for signal loss and distortion in
high-speed data transmission.

As per PCIe spec 6.0.1 revision section 8.3.3.3 & 4.2.4 for data rates
of 8.0 GT/s, 16.0 GT/s, 32.0 GT/s, and 64.0 GT/s, there is a way to
configure lane equalization presets for each lane to enhance the PCIe
link reliability. Each preset value represents a different combination
of pre-shoot and de-emphasis values. For each data rate, different
registers are defined: for 8.0 GT/s, registers are defined in section
7.7.3.4; for 16.0 GT/s, in section 7.7.5.9, etc. The 8.0 GT/s rate has
an extra receiver preset hint, requiring 16 bits per lane, while the
remaining data rates use 8 bits per lane.

Based on the number of lanes and the supported data rate,
of_pci_get_equalization_presets() reads the device tree property and
stores in the presets structure.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250328-preset_v6-v9-2-22cfa0490518@oss.qualcomm.com
2025-04-19 19:42:33 +05:30
Jerome Brunet
b584ab12d5 PCI: rcar-gen4: set ep BAR4 fixed size
On rcar-gen4, the ep BAR4 has a fixed size of 256B. Document this
constraint in the epc features of the platform.

Fixes: e311b3834d ("PCI: rcar-gen4: Add endpoint mode support")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250328-rcar-gen4-bar4-v1-1-10bb6ce9ee7f@baylibre.com
2025-04-19 19:41:31 +05:30
Jensen Huang
c7540e5423 PCI: rockchip: Fix order of rockchip_pci_core_rsts
The order of rockchip_pci_core_rsts introduced in the offending commit
followed the previous comment that warned not to reorder them. But the
commit failed to take into account that reset_control_bulk_deassert()
deasserts the resets in reverse order. So this leads to the link getting
downgraded to 2.5 GT/s.

Hence, restore the deassert order and also add back the comments for
rockchip_pci_core_rsts.

Tested on NanoPC-T4 with Samsung 970 Pro.

Fixes: 18715931a5 ("PCI: rockchip: Simplify reset control handling by using reset_control_bulk*() function")
Signed-off-by: Jensen Huang <jensenhuang@friendlyarm.com>
[mani: reworded the commit message and the comment above rockchip_pci_core_rsts]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/20250328105822.3946767-1-jensenhuang@friendlyarm.com
2025-04-19 19:17:02 +05:30
Linus Torvalds
b0c3bc35a5 Miscellaneous fixes:
- Fix BCM2712 irqchip driver Kconfig dependencies
    required on the Raspberry PI5
 
  - Fix spurious interrupts on RZ/G3E SMARC EVK systems
 
  - Fix crash regression on Sun/NIU hardware
 
  - Apply MSI driver quirk for Sun Neptune chips
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgCtVYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gU2Q//eEhRXZbzzDq8M9e70ll4qBSoIdSNJLK1
 Neyf+mGuGJ8qsFPnoTjj21t9/zn/AFROPBdXUwLslOKgnXAiD9Zm0OO60Xrujrhb
 ezaqMp3VCotoDchotbKWhWPD26cXfwjs44QbqOxS17NYyv5TVGVXfbo3sY9x5m/S
 1eSFgWcYxjp4v++sbWJpeEOo9nm++ozhY+vlFX8dhzaW/uOiEVRG1sW8scs//uEe
 I/V8wTh0ZYoGBsf7WgBpFXdEr/obZd7Iy/GFMnghBY1zHZcWX5xfGfHGMaMq8/cU
 SNrTqInaWSgIxsQfiagbZsyGwM35SmNC1yQ2MH4nDayRrBa67bjbYawNK5lXrXbB
 me6o4RIWnz3Xcy0RZZPyPyXzSa5z8Ut5jbCbcdct9o0FmzJyvKKDBcljw22AJjls
 h3hWBCIxINB2+KwfAOcKLLolOQ/LdY8tHgpv/8OoELIOlbmvdkBtYC2cIQ58OHZm
 M8IGnJyT1OgZukBBNCClSBZbuzrtR63sS7t3oep+fNpK1cFpzYRjB1YFGruGe2m1
 BT6vjwmuDBCNv8LQTkj07xGAwmKgJ0baqhdqK5fNyTD7iIMTPtVd8CI/EelGygVO
 rUEGS5cP7EMRZEvpyefqCbv+96RHbtGooEPR+/l2qJgQcKoZfv8wuuUpK+7u/Z8H
 b+jUwtdLZ9M=
 =HJUJ
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc irq fixes from Ingo Molnar:

 - Fix BCM2712 irqchip driver Kconfig dependencies required on the
   Raspberry PI5

 - Fix spurious interrupts on RZ/G3E SMARC EVK systems

 - Fix crash regression on Sun/NIU hardware

 - Apply MSI driver quirk for Sun Neptune chips

* tag 'irq-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-bcm2712-mip: Enable driver when ARCH_BCM2835 is enabled
  irqchip/renesas-rzv2h: Prevent TINT spurious interrupt
  net/niu: Niu requires MSIX ENTRY_DATA fields touch before entry reads
  PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
2025-04-18 13:28:41 -07:00
Ilpo Järvinen
a34d74877c PCI: Restore assigned resources fully after release
PCI resource fitting code in __assign_resources_sorted() runs in multiple
steps. A resource that was successfully assigned may have to be released
before the next step attempts assignment again. The assign+release cycle is
destructive to a start-aligned struct resource (bridge window or IOV
resource) because the start field is overwritten with the real address when
the resource got assigned.

One symptom:

  pci 0002:00:00.0: bridge window [mem size 0x00100000]: can't assign; bogus alignment

Properly restore the resource after releasing it. The start, end, and flags
fields must be stored into the related struct pci_dev_resource in order to
be able to restore the resource to its original state.

Fixes: 96336ec702 ("PCI: Perform reset_resource() and build fail list in sync")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/r/01eb7d40-f5b5-4ec5-b390-a5c042c30aff@roeck-us.net/
Reported-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Closes: https://lore.kernel.org/r/3578030.5fSG56mABF@workhorse
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Ondrej Jirman <megi@xff.cz>
Link: https://patch.msgid.link/20250403093137.1481-1-ilpo.jarvinen@linux.intel.com
2025-04-18 08:23:22 -05:00
Linus Torvalds
fc96b232f8 pci-v6.15-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmgBgqMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxXkRAAgicW9BMwem/xsX2IcJKDJ1lFrXq0
 DCeUyDMh3U3EnbkFGtG+lhkJnWVOcyH8jlBbRd44dsFE95TbL4dvd0l1ZR8TUY9u
 HwSR80CsejO3ZrXMygKQkLgHOBdbvw3knVSHBH+nmQGM9PbYxG1OtCLJMyrrmtPV
 8e2lM6CxvqY2NsajOA4oHk9TW35FyKicmHB6iQV8iANj8/MO6ZfbAwqL9EdD/qRy
 wZvycNfQRjD2qbHDqi1S8Mh3JTPCp9F1Ld7KhrkRCzIW8AUBDpMlWur0aKKWHK7F
 jPSuppO+qNN5HbRUuQWgbh8AxSRVzvMaDo3VCi21G/tgOOWpPggJL5qqQ8yS4GPk
 ZcIP7d3/B7rSkUqe1dNHlzW4GSRf1XSIVZSUIzjbZEraHkbgsIsoeZM3iD8WsZ+F
 EilDFIeFEswK4DFMthVtWDsWy8EBEcO6aX6qDT8pJBbJNj2GPuL5ZeCuKYWwOHhh
 Pkqf5kAIj06QCx6AghmK9RQk7lt6uAhnuP9dZq+/L+DYRFej3caOjQPFH7n6jdEY
 e42B+Ceb4eANCnaKnY0ZdlsnC//VRsSFWJAiGZIZX1P3jnvlyin7YFugp739U/La
 fUJUT8MhnSibZsSF1bPSwqTDl0D9tstnPVXnio+umfe9YATf21m3W/lLrRYu4Emy
 6ZNJIszfQrY/cqw=
 =ebpM
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

 - Revert a reset patch that broke VFIO passthrough because devices
   ended up with no available reset mechanisms (Alex Williamson)

* tag 'pci-v6.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI: Avoid reset when disabled via sysfs"
2025-04-17 16:00:31 -07:00
Huacai Chen
1f3303aa92 PCI: Add ACS quirk for Loongson PCIe
Loongson PCIe Root Ports don't advertise an ACS capability, but they do not
allow peer-to-peer transactions between Root Ports. Add an ACS quirk so
each Root Port can be in a separate IOMMU group.

Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250403040756.720409-1-chenhuacai@loongson.cn
2025-04-17 15:26:46 -05:00
Wilfred Mallawa
d24eba726a PCI: Print the actual delay time in pci_bridge_wait_for_secondary_bus()
Print the delay amount that pcie_wait_for_link_delay() is invoked with
instead of the hardcoded 1000ms value in the debug info print.

Fixes: 7b3ba09feb ("PCI/PM: Shorten pci_bridge_wait_for_secondary_bus() wait time for slow links")
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250414001505.21243-2-wilfred.opensource@gmail.com
2025-04-17 15:24:20 -05:00
Lukas Wunner
d46b3918fa PCI: hotplug: Drop superfluous #include directives
In February 2003, historic commit

  https://git.kernel.org/tglx/history/c/280c1c9a0ea4
  ("[PATCH] PCI Hotplug: Replace pcihpfs with sysfs.")

removed all invocations of __get_free_page() and free_page() from the PCI
hotplug core without also removing the #include <linux/pagemap.h>
directive.

It removed all invocations of kern_mount(), mntget() and mntput()
without also removing the #include <linux/mount.h> directive.

It removed all invocations of lookup_hash()
without also removing the #include <linux/namei.h> directive.

It removed all invocations of copy_to_user() and copy_from_user()
without also removing the #include <linux/uaccess.h> directive.

These #include directives are still unnecessary today, so drop them.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/c19e25bf2cefecc14e0822c6a9bb3a7f546258bc.1744640331.git.lukas@wunner.de
2025-04-17 14:36:29 -05:00
Ilpo Järvinen
5fe8d08139 PCI: Use PCI_STD_NUM_BARS instead of 6
pci_read_bases() is given literal 6 that means PCI_STD_NUM_BARS.  Replace
the literal with the define to annotate the code better.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20250416100239.6958-1-ilpo.jarvinen@linux.intel.com
2025-04-16 13:21:28 -05:00
Alex Williamson
bc0b828ef6 Revert "PCI: Avoid reset when disabled via sysfs"
This reverts commit 479380efe1.

The reset_method attribute on a PCI device is only intended to manage the
availability of function scoped resets for a device.  It was never intended
to restrict resets targeting the bus or slot.

In introducing a restriction that each device must support function level
reset by testing pci_reset_supported(), we essentially create a catch-22,
that a device must have a function scope reset in order to support bus/slot
reset, when we use bus/slot reset to effect a reset of a device that does
not support a function scoped reset, especially multi-function devices.

This breaks the majority of uses cases where vfio-pci uses bus/slot resets
to manage multifunction devices that do not support function scoped resets.

Fixes: 479380efe1 ("PCI: Avoid reset when disabled via sysfs")
Reported-by: Cal Peake <cp@absolutedigital.net>
Closes: https://lore.kernel.org/all/808e1111-27b7-f35b-6d5c-5b275e73677b@absolutedigital.net
Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220010
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250414211828.3530741-1-alex.williamson@redhat.com
2025-04-15 17:27:40 -05:00
Lukas Wunner
2af781a9ed PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
Link Layer State Changed event as a side effect.  On hotplug ports using
in-band presence detect, it additionally causes a Presence Detect Changed
event.

These spurious events should not result in teardown and re-enumeration of
the device in the slot.  Hence commit 2e35afaefe ("PCI: pciehp: Add
reset_slot() method") masked the Presence Detect Changed Enable bit in the
Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af5
("PCI: pciehp: Disable link notification across slot reset") additionally
masked the Data Link Layer State Changed Enable bit.

However masking those bits only disables interrupt generation (PCIe r6.2
sec 6.7.3.1).  The events are still visible in the Slot Status register
and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
This can happen if the interrupt is shared or if an unmasked hotplug event
occurs, e.g. Attention Button Pressed or Power Fault Detected.

The likelihood of this happening used to be small, so it wasn't much of a
problem in practice.  That has changed with the recent introduction of
bandwidth control in v6.13-rc1 with commit 665745f274 ("PCI/bwctrl:
Re-add BW notification portdrv as PCIe BW controller"):

Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
runs, picks up the masked events and tears down the device in the slot.

As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
root-caused to the incorrect handling of masked hotplug events.

Clearly, a more reliable way is needed to ignore spurious hotplug events.

For Downstream Port Containment, a new ignore mechanism was introduced by
commit a97396c6eb ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
It has been working reliably for the past four years.

Adapt it for Secondary Bus Resets.

Introduce two helpers to annotate code sections which cause spurious link
changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
Use those helpers in lieu of masking interrupts in the Slot Control
register.

Introduce a helper to check whether such a code section is executing
concurrently and if so, await it:  pci_hp_spurious_link_change()
Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
IRQ thread's existing code which ignores DPC-induced link changes unless
the link is unexpectedly down after reset recovery or the device was
replaced during the bus reset.

That code block in pciehp_ist() was previously only executed if a Data
Link Layer State Changed event has occurred.  Additionally execute it for
Presence Detect Changed events.  That's necessary for compatibility with
PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
hotplug ports always support Data Link Layer State Changed events.
But the same cannot be assumed for Secondary Bus Reset, which already
existed in PCIe r1.0.

Secondary Bus Reset is only one of many causes of spurious link changes.
Others include runtime suspend to D3cold, firmware updates or FPGA
reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
used by all kinds of drivers to annotate such code sections, hence their
declarations are publicly visible in <linux/pci.h>.  A case in point is
the Mellanox Ethernet driver which disables a firmware reset feature if
the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612
("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
forward, PCIe hotplug will be able to cope gracefully with all such use
cases once the code sections are properly annotated.

The new helpers internally use two bits in struct pci_dev's priv_flags as
well as a wait_queue.  This mirrors what was done for DPC by commit
a97396c6eb ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
be insufficient if spurious link changes are caused by multiple sources
simultaneously.  An example might be a Secondary Bus Reset issued by AER
during FPGA reconfiguration.  If this turns out to happen in real life,
support for it can easily be added by replacing the PCI_LINK_CHANGING flag
with an atomic_t counter incremented by pci_hp_ignore_link_change() and
decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
then simply await a zero counter.

Fixes: 665745f274 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de
2025-04-15 16:03:44 -05:00
Lukas Wunner
c3be50f754 PCI: pciehp: Ignore Presence Detect Changed caused by DPC
Commit a97396c6eb ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.

However Keith reports off-list that if the slot uses in-band presence
detect (i.e. Presence Detect State is derived from Data Link Layer Link
Active), DPC also causes a spurious Presence Detect Changed event.

This needs to be ignored as well.

Unfortunately there's no register indicating that in-band presence detect
is used.  PCIe r5.0 sec 7.5.3.10 introduced the In-Band PD Disable bit in
the Slot Control Register.  The PCIe hotplug driver sets this bit on
ports supporting it.  But older ports may still use in-band presence
detect.

If in-band presence detect can be disabled, Presence Detect Changed events
occurring during DPC must not be ignored because they signal device
replacement.  On all other ports, device replacement cannot be detected
reliably because the Presence Detect Changed event could be a side effect
of DPC.  On those (older) ports, perform a best-effort device replacement
check by comparing the Vendor ID, Device ID and other data in Config Space
with the values cached in struct pci_dev.  Use the existing helper
pciehp_device_replaced() to accomplish this.  It is currently #ifdef'ed to
CONFIG_PM_SLEEP in pciehp_core.c, so move it to pciehp_hpc.c where most
other functions accessing config space reside.

Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/fa264ff71952915c4e35a53c89eb0cde8455a5c5.1744298239.git.lukas@wunner.de
2025-04-15 15:59:15 -05:00
Jonathan Currier
cf761e3dac PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
Commit 7d5ec3d361 ("PCI/MSI: Mask all unused MSI-X entries") introduced a
readl() from ENTRY_VECTOR_CTRL before the writel() to ENTRY_DATA.

This is correct, however some hardware, like the Sun Neptune chips, the NIU
module, will cause an error and/or fatal trap if any MSIX table entry is
read before the corresponding ENTRY_DATA field is written to.

Add an optional early writel() in msix_prepare_msi_desc().

Fixes: 7d5ec3d361 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Jonathan Currier <dullfire@yahoo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241117234843.19236-2-dullfire@yahoo.com
2025-04-15 08:32:18 +02:00
Zhangfei Gao
c8ba3f8aff PCI: Run quirk_huawei_pcie_sva() before arm_smmu_probe_device()
quirk_huawei_pcie_sva() sets properties needed by arm_smmu_probe_device(),
but bcb81ac6ae ("iommu: Get DT/ACPI parsing into the proper probe path")
changed the iommu_probe_device() flow so arm_smmu_probe_device() is now
invoked before the quirk, leading to failures like this:

  reg-dummy reg-dummy: late IOMMU probe at driver bind, something fishy here!
  WARNING: CPU: 0 PID: 1 at drivers/iommu/iommu.c:449 __iommu_probe_device+0x140/0x570
  RIP: 0010:__iommu_probe_device+0x140/0x570

The SR-IOV enumeration ordering changes like this:

  pci_iov_add_virtfn
    pci_device_add
      pci_fixup_device(pci_fixup_header)      <--
      device_add
        bus_notify
          iommu_bus_notifier
  +         iommu_probe_device
  +           arm_smmu_probe_device
    pci_bus_add_device
      pci_fixup_device(pci_fixup_final)       <--
      device_attach
        driver_probe_device
          really_probe
            pci_dma_configure
              acpi_dma_configure_id
  -             iommu_probe_device
  -               arm_smmu_probe_device

The non-SR-IOV case is similar in that pci_device_add() is called from
pci_scan_single_device() in the generic enumeration path and
pci_bus_add_device() is called later, after all host bridges have been
enumerated.

Declare quirk_huawei_pcie_sva() as a header fixup to ensure that it happens
before arm_smmu_probe_device().

Fixes: bcb81ac6ae ("iommu: Get DT/ACPI parsing into the proper probe path")
Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Closes: https://lore.kernel.org/all/SJ1PR11MB61295DE21A1184AEE0786E25B9D22@SJ1PR11MB6129.namprd11.prod.outlook.com/
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
[bhelgaas: commit log, add failure info and reporter]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250317011352.5806-1-zhangfei.gao@linaro.org
2025-04-11 12:53:21 -05:00
Heiner Kallweit
74a70e80da PCI: Remove pci_fixup_cardbus()
Since 1c7f4fe86f ("powerpc/pci: Remove pcibios_setup_bus_devices()")
there's no architecture left setting pci_fixup_cardbus. Therefore remove
support from PCI core.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/8de7da4c-2b16-4ee1-8c42-0d04f3c821c6@gmail.com
2025-04-10 09:33:52 -05:00
Philipp Stanner
855c634930 PCI: Remove pcim_iounmap_regions()
All users of the deprecated function pcim_iounmap_regions() have been
ported by now. Remove it.

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250327110707.20025-4-phasta@kernel.org
2025-04-09 14:22:44 -05:00
Thomas Gleixner
71296eae58 PCI/TPH: Replace the broken MSI-X control word update
The driver walks the MSI descriptors to test whether a descriptor exists
for a given index. That's just abuse of the MSI internals.

The same test can be done with a single function call by looking up whether
there is a Linux interrupt number assigned at the index.

What's worse is that the function is completely unserialized against
modifications of the MSI-X control by operations issued from the interrupt
core. It also brings the PCI/MSI-X internal cached control word out of
sync.

Remove the trainwreck and invoke the function provided by the PCI/MSI core
to update it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.744271447@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
d5124a9957 PCI/MSI: Provide a sane mechanism for TPH
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.

Provide a function, which has the required serialization and keeps the
control word cache in sync.

Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.683663807@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
6552e90e2a PCI: hv: Switch MSI descriptor locking to guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.624838146@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
891146645e PCI/MSI: Switch msix_capability_init() to guard(msi_desc_lock)
Split the lock protected functionality of msix_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.

Simplify the error path in the helper function by utilizing a custom
cleanup to get rid of the remaining gotos.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.564105011@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
f11cc2af8f PCI/MSI: Switch msi_capability_init() to guard(msi_desc_lock)
Split the lock protected functionality of msi_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.504992208@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
5c0ba4f9d2 PCI/MSI: Use __free() for affinity masks
Let cleanup handle the freeing of the affinity mask. That prepares for
switching the MSI descriptor locking to a guard().

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.444764312@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
b0c44a5ec3 PCI/MSI: Set pci_dev:: Msi_enabled late
The comment claiming that pci_dev::msi_enabled has to be set across setup
is a leftover from ancient code versions. Nothing in the setup code
requires the flag to be set anymore.

Set it in the success path and remove the extra goto label.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.383222333@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
497f68cff6 PCI/MSI: Use guard(msi_desc_lock) where applicable
Convert the trivial cases of msi_desc_lock/unlock() pairs.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.322536126@linutronix.de
2025-04-09 20:47:29 +02:00
Jiri Slaby (SUSE)
fdc348121f irqdomain: pci: Switch to of_fwnode_handle()
of_node_to_fwnode() is irqdomain's reimplementation of the "officially"
defined of_fwnode_handle(). The former is in the process of being removed,
so use the latter instead.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250319092951.37667-8-jirislaby@kernel.org
2025-04-07 12:15:14 -05:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Linus Torvalds
48552153cf iommufd 6.15 merge window pull
Two significant new items:
 
 - Allow reporting IOMMU HW events to userspace when the events are clearly
   linked to a device. This is linked to the VIOMMU object and is intended to
   be used by a VMM to forward HW events to the virtual machine as part of
   emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
   mechanism. Like the existing fault events the data is delivered through
   a simple FD returning event records on read().
 
 - PASID support in VFIO. "Process Address Space ID" is a PCI feature that
   allows the device to tag all PCI DMA operations with an ID. The IOMMU
   will then use the ID to select a unique translation for those DMAs. This
   is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to
   manage each PASID entry. The support is generic so any VFIO user could
   attach any translation to a PASID, and the support should work on ARM
   SMMUv3 as well. AMD requires additional driver work.
 
 Some minor updates, along with fixes:
 
 - Prevent using nested parents with fault's, no driver support today
 
 - Put a single "cookie_type" value in the iommu_domain to indicate what
   owns the various opaque owner fields
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ+q6NgAKCRCFwuHvBreF
 YZ3zAQDbl4/Z0O+CLN2AXq4Zeiyq1HTSoF94hzqmm7lQ17zTIwD8CCdyLXHvupaq
 tkBIv5IovpaxlrSk6M0kh2K8vPCk9Qk=
 =CIM3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Two significant new items:

   - Allow reporting IOMMU HW events to userspace when the events are
     clearly linked to a device.

     This is linked to the VIOMMU object and is intended to be used by a
     VMM to forward HW events to the virtual machine as part of
     emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
     mechanism. Like the existing fault events the data is delivered
     through a simple FD returning event records on read().

   - PASID support in VFIO.

     The "Process Address Space ID" is a PCI feature that allows the
     device to tag all PCI DMA operations with an ID. The IOMMU will
     then use the ID to select a unique translation for those DMAs. This
     is part of Intel's vIOMMU support as VT-D HW requires the
     hypervisor to manage each PASID entry.

     The support is generic so any VFIO user could attach any
     translation to a PASID, and the support should work on ARM SMMUv3
     as well. AMD requires additional driver work.

  Some minor updates, along with fixes:

   - Prevent using nested parents with fault's, no driver support today

   - Put a single "cookie_type" value in the iommu_domain to indicate
     what owns the various opaque owner fields"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
  iommufd: Test attach before detaching pasid
  iommufd: Fix iommu_vevent_header tables markup
  iommu: Convert unreachable() to BUG()
  iommufd: Balance veventq->num_events inc/dec
  iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  ida: Add ida_find_first_range()
  iommufd/selftest: Add coverage for iommufd pasid attach/detach
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd: Allow allocating PASID-compatible domain
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Enforce PASID-compatible domain for RID
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd/device: Add pasid_attach array to track per-PASID attach
  ...
2025-04-01 18:03:46 -07:00
Linus Torvalds
eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Linus Torvalds
7d06015d93 pci-v6.15-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmfmt1AUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxqNw//cC1BlVe4UUVR5r9nfpoFeGAZeDJz
 32naGvZKjwL0tR6dStO/BEZx4QrBp+smVfJfuxtQxfzLHLgMigM2jVhfa7XUmaun
 7yZlJZu4Jmydc57sPf56CFOYOFP6zyPzSaE8u1Eb4IIqpvuoYpvDayDt6PSsLmFS
 PDzqmicT3nuNbbcfE4rYLyL6JsXooKCR1h+NNcDjy7Run9DvQbE6N2PPvXCu6O97
 aC3+kYUydEpgn9DfjBDghe+GBQCkBPldwnWqXBxKDFmYj5bKFujNccS9/IDSEuuX
 oWntDRAXgLWg048sBC1AuJQajF3UaqffRGJkzUBaZWbU/jB9t5N/Z3GpYlXzizRx
 CAqnt/ciGUKVbaESKwoeeIqgK+wG1bnrmoEaJHXFGqjr6sjm2A2T5EzyBMJ1hFwE
 wUq6SDnkp5igG7rWtsBPo/lGa5h/pNlaXng11570ikD2ZfHVfRgwy2MpXYxChrkt
 X2q/lRYU7yyfNJQ8O5LQJ6bYztatjxT0TxXNFv+cxfVrdI7vnQMuaeBE352jn+Lo
 aZ8fTJDFbRXtrZolcoetZjBdHwPIS42wYQtvxo/ylUl64xKEzZEzN09XODjwn74v
 nuOzDtbM0TAjZWxi6bwRFPnemTEAQxPuv1i4VdPQ7yblvC0at2agNBsz6Fdq60GS
 s80YMJVUU0UcVoc=
 =gM1i
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Enable Configuration RRS SV, which makes device readiness visible,
     early instead of during child bus scanning (Bjorn Helgaas)

   - Log debug messages about reset methods being used (Bjorn Helgaas)

   - Avoid reset when it has been disabled via sysfs (Nishanth
     Aravamudan)

   - Add common pci-ep-bus.yaml schema for exporting several peripherals
     of a single PCI function via devicetree (Andrea della Porta)

   - Create DT nodes for PCI host bridges to enable loading device tree
     overlays to create platform devices for PCI devices that have
     several features that require multiple drivers (Herve Codina)

  Resource management:

   - Enlarge devres table[] to accommodate bridge windows, ROM, IOV
     BARs, etc., and validate BAR index in devres interfaces (Philipp
     Stanner)

   - Fix typo that repeatedly distributed resources to a bridge instead
     of iterating over subordinate bridges, which resulted in too little
     space to assign some BARs (Kai-Heng Feng)

   - Relax bridge window tail sizing for optional resources, e.g., IOV
     BARs, to avoid failures when removing and re-adding devices (Ilpo
     Järvinen)

   - Allow drivers to enable devices even if we haven't assigned
     optional IOV resources to them (Ilpo Järvinen)

   - Rework handling of optional resources (IOV BARs, ROMs) to reduce
     failures if we can't allocate them (Ilpo Järvinen)

   - Fix a NULL dereference in the SR-IOV VF creation error path (Shay
     Drory)

   - Fix s390 mmio_read/write syscalls, which didn't cause page faults
     in some cases, which broke vfio-pci lazy mapping on first access
     (Niklas Schnelle)

   - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
     was disabled only for s390 (Niklas Schnelle)

   - Support mmap of PCI resources on s390 except for ISM devices
     (Niklas Schnelle)

  ASPM:

   - Delay pcie_link_state deallocation to avoid dangling pointers that
     cause invalid references during hot-unplug (Daniel Stodden)

  Power management:

   - Allow PCI bridges to go to D3Hot when suspending on all non-x86
     systems (Manivannan Sadhasivam)

  Power control:

   - Create pwrctrl devices in pci_scan_device() to make it more
     symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
     for PCI bridges possible (Manivannan Sadhasivam)

   - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
     can still access devices after pci_stop_dev() (Manivannan
     Sadhasivam)

   - If there's a pwrctrl device for a PCI device, skip scanning it
     because the pwrctrl core will rescan the bus after the device is
     powered on (Manivannan Sadhasivam)

   - Add a pwrctrl driver for PCI slots based on voltage regulators
     described via devicetree (Manivannan Sadhasivam)

  Bandwidth control:

   - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
     set_pcie_cooling_state.sh test case (Yi Lai)

   - Avoid a NULL pointer dereference when we run out of bus numbers to
     assign for a bridge secondary bus (Lukas Wunner)

  Hotplug:

   - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
     NULL pointer checks (Lukas Wunner)

   - Drop shpchp module init/exit logging, replace shpchp dbg() with
     ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
     (Ilpo Järvinen)

   - Drop 'shpchp_debug' module parameter in favor of standard dynamic
     debugging (Ilpo Järvinen)

   - Drop unused cpcihp .get_power(), .set_power() function pointers
     (Guilherme Giacomo Simoes)

   - Disable hotplug interrupts in portdrv only when pciehp is not
     enabled to avoid issuing two hotplug commands too close together
     (Feng Tang)

   - Skip pciehp 'device replaced' check if the device has been removed
     to address a deadlock when resuming after a device was removed
     during system sleep (Lukas Wunner)

   - Don't enable pciehp hotplug interupt when resuming in poll mode
     (Ilpo Järvinen)

  Virtualization:

   - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
     Dave)

  DOE:

   - Expose supported DOE features via sysfs (Alistair Francis)

   - Allow DOE support to be enabled even if CXL isn't enabled (Alistair
     Francis)

  Endpoint framework:

   - Convert PCI device data so pci-epf-test works correctly on
     big-endian endpoint systems (Niklas Cassel)

   - Add BAR_RESIZABLE type to endpoint framework and add DWC core
     support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
     Cassel)

   - Fix pci-epf-test double free that causes an oops if the host
     reboots and PERST# deassertion restarts endpoint BAR allocation
     (Christian Bruel)

   - Fix endpoint BAR testing so tests can skip disabled BARs instead of
     reporting them as failures (Niklas Cassel)

   - Widen endpoint test BAR size variable to accommodate BARs larger
     than INT_MAX (Niklas Cassel)

   - Remove unused tools 'pci' build target left over after moving tests
     to tools/testing/selftests/pci_endpoint (Jianfeng Liu)

  Altera PCIe controller driver:

   - Add DT binding and driver support for Agilex family (P-Tile,
     F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)

  AMD MDB PCIe controller driver:

   - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
     (Thippeswamy Havalige)

  Broadcom STB PCIe controller driver:

   - Add BCM2712 MSI-X DT binding and interrupt controller drivers and
     add softdep on irq_bcm2712_mip driver to ensure that it is loaded
     first (Stanimir Varbanov)

   - Expand inbound window map to 64GB so it can accommodate BCM2712
     (Stanimir Varbanov)

   - Add BCM2712 support and DT updates (Stanimir Varbanov)

   - Apply link speed restriction before bringing link up, not after
     (Jim Quinlan)

   - Update Max Link Speed in Link Capabilities via the internal
     writable register, not the read-only config register (Jim Quinlan)

   - Handle regulator_bulk_get() error to avoid panic when we call
     regulator_bulk_free() later (Jim Quinlan)

   - Disable regulators only when removing the bus immediately below a
     Root Port because we don't support regulators deeper in the
     hierarchy (Jim Quinlan)

   - Make const read-only arrays static (Colin Ian King)

  Cadence PCIe endpoint driver:

   - Correct MSG TLP generation so endpoints can generate INTx messages
     (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Identify the second controller on i.MX8MQ based on devicetree
     'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

   - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
     ATU input address (using parent_bus_offset) from devicetree (Frank
     Li)

  Freescale Layerscape PCIe controller driver:

   - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
     unnecessary 'status' from example (Krzysztof Kozlowski)

   - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
     arg_count to fix probe failure on LS1043A (Ioana Ciornei)

  HiSilicon STB PCIe controller driver:

   - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
     JAILLET)

  Intel Gateway PCIe controller driver:

   - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
     input address (using parent_bus_offset) from devicetree (Frank Li)

  Intel VMD host bridge driver:

   - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
     pci_ops.read() will never sleep, even on PREEMPT_RT where
     spinlock_t becomes a sleepable lock, to avoid calling a sleeping
     function from invalid context (Ryo Takakura)

  MediaTek PCIe Gen3 controller driver:

   - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
     Bianconi)

   - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
     program host bridge memory aperture to this syscon node (Lorenzo
     Bianconi)

  Qualcomm PCIe controller driver:

   - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)

   - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
     Stein)

   - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
     Baryshkov)

   - Make DT iommu property required for SA8775P and prohibited for
     SDX55 (Dmitry Baryshkov)

   - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
     Baryshkov)

   - Add endpoint DT properties for SAR2130P and enable endpoint mode in
     driver (Dmitry Baryshkov)

   - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
     RESERVED (Manivannan Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
     Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Add debugfs-based Silicon Debug, Error Injection, Statistical
     Counter support for DWC (Shradha Todi)

   - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
     Zhang)

   - Add Rockchip support for DWC debugfs features (Niklas Cassel)

   - Add dw_pcie_parent_bus_offset() to look up the parent bus address
     of a specified 'reg' property and return the offset from the CPU
     physical address (Frank Li)

   - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
     via 'reg[config]' for host controllers and 'reg[addr_space]' for
     endpoint controllers (Frank Li)

   - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
     of .cpu_addr_fixup() when programming ATU (Frank Li)

  TI J721E PCIe driver:

   - Correct the 'link down' interrupt bit for J784S4 (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
     alignment requirement from 1MB to 64KB (Niklas Cassel)

  Xilinx Versal CPM PCIe controller driver:

   - Free IRQ domain in probe error path to avoid leaking it
     (Thippeswamy Havalige)

   - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
     Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

   - Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

  Miscellaneous:

   - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)

   - Use for_each_available_child_of_node_scoped() to simplify apple,
     kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"

* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: brcmstb: Make const read-only arrays static
  ...
2025-03-28 19:36:53 -07:00
Linus Torvalds
112e43e9fd Revert "Merge tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
This reverts commit 36f5f026df, reversing
changes made to 43a7eec035.

Thomas says:
 "I just noticed that for some incomprehensible reason, probably sheer
  incompetemce when trying to utilize b4, I managed to merge an outdated
  _and_ buggy version of that series.

  Can you please revert that merge completely?"

Done.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-28 11:22:54 -07:00
Yi Liu
803f97298e iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
PASID usage requires PASID support in both device and IOMMU. Since the
iommu drivers always enable the PASID capability for the device if it
is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
capability to userspace. Also, enhances the selftest accordingly.

Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com
Cc: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28 10:07:23 -03:00
Bjorn Helgaas
dea140198b Merge branch 'pci/misc'
- Remove unused tools 'pci' build target left over after moving tests to
  tools/testing/selftests/pci_endpoint (Jianfeng Liu)

- Fix typos and whitespace errors (Bjorn Helgaas)

* pci/misc:
  PCI: Fix typos
  tools/Makefile: Remove pci target

# Conflicts:
#	drivers/pci/endpoint/functions/pci-epf-test.c
2025-03-27 13:15:05 -05:00
Bjorn Helgaas
63c83f1fff Merge branch 'pci/controller/dwc-cpu-addr-fixup'
- Ioremap() msg_res region using res->start (the CPU address), not the ATU
  'cpu_addr', which will be replaced with the ATU input address (which may
  not be the CPU address) (Frank Li)

- Rename struct dw_pcie_ob_atu_cfg.cpu_addr to 'parent_bus_addr' (Frank Li)

- Call devm_pci_alloc_host_bridge() early in dw_pcie_host_init() to keep
  devicetree-related code together (Frank Li)

- Consolidate devicetree handling in dw_pcie_host_get_resources() (Bjorn
  Helgaas)

- Add dw_pcie_parent_bus_offset() to look up the parent bus address of a
  specified 'reg' property and return the offset from the CPU physical
  address (Frank Li)

- Add cross-checking with .cpu_addr_fixup() and debug logging to
  dw_pcie_parent_bus_offset() (Frank Li)

- Use devicetree 'reg[config]' via dw_pcie_parent_bus_offset() to derive
  CPU -> ATU addr offset for host controller (Frank Li)

- Call epc_create() early in dw_pcie_ep_init() to keep devicetree-related
  code together (Bjorn Helgaas)

- Consolidate devicetree handling in dw_pcie_ep_get_resources() (Bjorn
  Helgaas)

- Use devicetree 'reg[addr_space]' via dw_pcie_parent_bus_offset() to
  derive CPU -> ATU addr offset for endpoint controller (Frank Li)

- Update dw_pcie_find_index() to remove assumption that ATU input address
  is non-zero (Frank Li)

- Apply struct dw_pcie.parent_bus_offset in ATU users to remove use of
  .cpu_addr_fixup() when programming ATU (Frank Li)

- Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the ATU
  input address (using parent_bus_offset) from devicetree (Frank Li)

- Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU input
  address (using parent_bus_offset) from devicetree (Frank Li)

* pci/controller/dwc-cpu-addr-fixup:
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI: dwc: Consolidate devicetree handling in dw_pcie_host_get_resources()
  PCI: dwc: Call devm_pci_alloc_host_bridge() early in dw_pcie_host_init()
  PCI: dwc: Rename cpu_addr to parent_bus_addr for ATU configuration
  PCI: dwc: Use resource start as ioremap() input in dw_pcie_pme_turn_off()

# Conflicts:
#	drivers/pci/controller/dwc/pcie-designware.c
#	drivers/pci/controller/dwc/pcie-designware.h
2025-03-27 13:14:59 -05:00
Bjorn Helgaas
79e08f8d4e Merge branch 'pci/controller/xilinx-cpm'
- Free IRQ domain in probe error path to avoid leaking it (Thippeswamy
  Havalige)

- Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
  Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

- Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

* pci/controller/xilinx-cpm:
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: xilinx-cpm: Add support for Versal Net CPM5NC Root Port controller
  dt-bindings: PCI: xilinx-cpm: Add compatible string for CPM5NC Versal Net host
  PCI: xilinx-cpm: Fix IRQ domain leak in error path of probe
2025-03-27 13:14:51 -05:00
Bjorn Helgaas
a80b04dffe Merge branch 'pci/controller/vmd'
- Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
  pci_ops.read() will never sleep, even on PREEMPT_RT where spinlock_t
  becomes a sleepable lock (Ryo Takakura)

* pci/controller/vmd:
  PCI: vmd: Make vmd_dev::cfg_lock a raw_spinlock_t type
2025-03-27 13:14:51 -05:00
Bjorn Helgaas
6547faa1bc Merge branch 'pci/controller/qcom'
- Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
  RESERVED (Manivannan Sadhasivam)

- Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
  Baryshkov)

- Make DT iommu property required for SA8775P and prohibited for SDX55
  (Dmitry Baryshkov)

- Add DT iommu and DMA-related properties for Qualcomm SM8450 (Dmitry
  Baryshkov)

- Consolidate DMA vs non-DMA cases in DT (Dmitry Baryshkov)

- Add endpoint DT properties for SAR2130P and enable endpoint mode in
  driver (Dmitry Baryshkov)

* pci/controller/qcom:
  PCI: qcom-ep: Enable EP mode support for SAR2130P
  dt-bindings: PCI: qcom-ep: Add SAR2130P compatible
  dt-bindings: PCI: qcom-ep: Consolidate DMA vs non-DMA cases
  dt-bindings: PCI: qcom-ep: Enable DMA for SM8450
  dt-bindings: PCI: qcom-ep: Describe optional IOMMU
  dt-bindings: PCI: qcom-ep: Describe optional dma-coherent property
  PCI: qcom-ep: Mark BAR0/BAR2 as 64bit BARs and BAR1/BAR3 as RESERVED
2025-03-27 13:14:51 -05:00
Bjorn Helgaas
d7f6f07ece Merge branch 'pci/controller/mediatek'
- Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo Bianconi)

- Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and program
  host bridge memory aperture to this syscon node (Lorenzo Bianconi)

* pci/controller/mediatek:
  PCI: mediatek-gen3: Fix inconsistent indentation
  PCI: mediatek-gen3: Configure PBUS_CSR registers for EN7581 SoC
  dt-bindings: PCI: mediatek-gen3: Add mediatek,pbus-csr phandle array property
  PCI: mediatek-gen3: Remove leftover mac_reset assert for Airoha EN7581 SoC
2025-03-27 13:14:50 -05:00
Bjorn Helgaas
5edeea2d7b Merge branch 'pci/controller/layerscape'
- Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
  arg_count to fix probe failure on LS1043A (Ioana Ciornei)

* pci/controller/layerscape:
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
2025-03-27 13:14:50 -05:00
Bjorn Helgaas
f2d4def0e9 Merge branch 'pci/controller/j721e'
- Correct the 'link down' interrupt bit for J784S4 (Siddharth Vadapalli)

* pci/controller/j721e:
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
2025-03-27 13:14:50 -05:00
Bjorn Helgaas
ad49cd490e Merge branch 'pci/controller/imx6'
- Identify the second controller on i.MX8MQ based on devicetree
  'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

- Use devm_clk_bulk_get_all() to fetch clocks to simplify the code (Richard
  Zhu)

* pci/controller/imx6:
  PCI: imx6: Use devm_clk_bulk_get_all() to fetch clocks
  PCI: imx6: Identify controller via 'linux,pci-domain', not address
2025-03-27 13:14:49 -05:00
Bjorn Helgaas
8c6dadf8af Merge branch 'pci/controller/hyperv'
- Correct comment to say that invalidations from a PF driver are delivered
  to the VF endpoint driver, not by the VF driver (Easwar Hariharan)

* pci/controller/hyperv:
  PCI: hv: Correct a comment
2025-03-27 13:14:49 -05:00
Bjorn Helgaas
58746a573a Merge branch 'pci/controller/histb'
- Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
  JAILLET)

* pci/controller/histb:
  PCI: histb: Fix an error handling path in histb_pcie_probe()
2025-03-27 13:14:49 -05:00
Bjorn Helgaas
ba4751ae1a Merge branch 'pci/controller/dwc'
- Move struct dwc_pcie_vsec_id to include/linux/pcie-dwc.h, where it can be
  shared by debugfs, perf, sysfs, etc (Manivannan Sadhasivam)

- Add dw_pcie_find_vsec_capability() to locate Vendor Specific Extended
  Capabilities (Shradha Todi)

- Add debugfs-based Silicon Debug, Error Injection, Statistical Counter
  support for DWC (Shradha Todi)

- Add debugfs property to expose LTSSM status of DWC PCIe link (Hans Zhang)

- Add Rockchip Vendor ID and Vendor Specific ID of RAS DES Capability so
  the DWC debugfs features work for Rockchip as well (Niklas Cassel)

* pci/controller/dwc:
  PCI: dw-rockchip: Hide broken ATS capability for RK3588 running in EP mode
  PCI: dwc: ep: Add dw_pcie_ep_hide_ext_capability()
  PCI: dwc: ep: Return -ENOMEM for allocation failures
  PCI: dwc: Add Rockchip to the RAS DES allowed vendor list
  PCI: Add Rockchip Vendor ID
  PCI: dwc: Add debugfs property to provide LTSSM status of the PCIe link
  PCI: dwc: Add debugfs based Statistical Counter support for DWC
  PCI: dwc: Add debugfs based Error Injection support for DWC
  PCI: dwc: Add debugfs based Silicon Debug support for DWC
  PCI: dwc: Add helper to find the Vendor Specific Extended Capability (VSEC)
  perf/dwc_pcie: Move common DWC struct definitions to 'pcie-dwc.h'
2025-03-27 13:14:49 -05:00
Bjorn Helgaas
479e4a014b Merge branch 'pci/controller/cadence'
- Correct MSG TLP generation so endpoint can generate INTx messages (Hans
  Zhang)

* pci/controller/cadence:
  PCI: cadence-ep: Fix the driver to send MSG TLP for INTx without data payload
2025-03-27 13:14:48 -05:00
Bjorn Helgaas
b79789646e Merge branch 'pci/controller/brcmstb'
- Add missing of_node refcount release after of_parse_phandle() (Stanimir
  Varbanov)

- Add BCM2712 MSI-X DT binding and interrupt controller drivers (Stanimir
  Varbanov)

- Add brcmstb softdep on irq_bcm2712_mip MIP MSI-X interrupt controller
  driver to ensure that it is loaded first (Stanimir Varbanov)

- Add struct brcm_pcie pointer to pcie_cfg_data so we can reference the
  pcie_cfg_data directly instead of copying it to brcm_pcie (Stanimir
  Varbanov)

- Expand inbound window map to 64GB so it can accommodate BCM2712 (Stanimir
  Varbanov)

- Add BCM2712 support and DT updates (Stanimir Varbanov)

- Apply link speed restriction before bringing link up, not after (Jim
  Quinlan)

- Update Max Link Speed in Link Capabilities via the internal writable
  register, not the read-only config register (Jim Quinlan)

- Handle regulator_bulk_get() error to avoid panic when we call
  regulator_bulk_free() later (Jim Quinlan)

- Disable regulators only when removing the bus immediately below a Root
  Port because we don't support regulators deeper in the hierarchy (Jim
  Quinlan)

- Consistently use config access index/data register offsets from the
  SoC-specific pcie_offsets[] table (Jim Quinlan)

- Update MDIO register fields that reduced CMD from 12 bits to 1 and
  widened PORT from 4 bits to 5 and split it into two parts (Jim Quinlan)

- Make const read-only arrays static (Colin Ian King)

* pci/controller/brcmstb:
  PCI: brcmstb: Make const read-only arrays static
  PCI: brcmstb: Make irq_domain_set_info() parameter cast explicit
  PCI: brcmstb: Make two changes in MDIO register fields
  PCI: brcmstb: Use same constant table for config space access
  PCI: brcmstb: Fix potential premature regulator disabling
  PCI: brcmstb: Fix error path after a call to regulator_bulk_get()
  PCI: brcmstb: Do not assume that register field starts at LSB
  PCI: brcmstb: Use internal register to change link capability
  PCI: brcmstb: Set generation limit before PCIe link up
  PCI: brcmstb: Add BCM2712 support
  PCI: brcmstb: Expand inbound window size up to 64GB
  PCI: brcmstb: Reuse pcie_cfg_data structure
  PCI: brcmstb: Add a softdep to MIP MSI-X driver
  irqchip: Add Broadcom BCM2712 MSI-X interrupt controller
  dt-bindings: PCI: brcmstb: Update bindings for PCIe on BCM2712
  dt-bindings: interrupt-controller: Add BCM2712 MSI-X bindings
  PCI: brcmstb: Fix missing of_node_put() in brcm_pcie_probe()
2025-03-27 13:14:48 -05:00
Bjorn Helgaas
c51638f15e Merge branch 'pci/controller/amd-mdb'
- Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
  (Thippeswamy Havalige)

* pci/controller/amd-mdb:
  PCI: amd-mdb: Add AMD MDB Root Port driver
  dt-bindings: PCI: amd-mdb: Add AMD Versal2 MDB PCIe Root Port Bridge
  dt-bindings: PCI: dwc: Add AMD Versal2 MDB SLCR support
2025-03-27 13:14:48 -05:00
Bjorn Helgaas
17dbd3f621 Merge branch 'pci/controller/altera'
- Add DT binding for Agilex family (P-Tile, F-Tile, R-Tile) (Matthew
  Gerlach)

- Add driver support for Agilex family (P-Tile, F-Tile, R-Tile) (D M,
  Sharath Kumar)

* pci/controller/altera:
  PCI: altera: Add Agilex support
  dt-bindings: PCI: altera: Add binding for Agilex
2025-03-27 13:14:47 -05:00
Bjorn Helgaas
8085db1d07 Merge branch 'pci/scoped-cleanup'
- Use for_each_available_child_of_node_scoped() to simplify apple, kirin,
  mediatek, mt7621, tegra drivers (Zhang Zekun)

* pci/scoped-cleanup:
  PCI: tegra: Use helper function for_each_child_of_node_scoped()
  PCI: apple: Use helper function for_each_child_of_node_scoped()
  PCI: mt7621: Use helper function for_each_available_child_of_node_scoped()
  PCI: mediatek: Use helper function for_each_available_child_of_node_scoped()
  PCI: kirin: Tidy up _probe() related function with dev_err_probe()
  PCI: kirin: Use helper function for_each_available_child_of_node_scoped()
2025-03-27 13:14:47 -05:00
Bjorn Helgaas
f775c8a4bb Merge branch 'pci/epf-mhi'
- Update SA8775P device ID (Mrinmay Sarkar)

* pci/epf-mhi:
  PCI: epf-mhi: Update device ID for SA8775P
2025-03-27 13:14:47 -05:00
Bjorn Helgaas
cc28c0e5e7 Merge branch 'pci/endpoint-test'
- Fix endpoint BAR testing so the test can skip disabled BARs instead of
  reporting them as failures (Niklas Cassel)

- Verify that pci_endpoint interrupt tests set the correct IRQ type
  (Kunihiko Hayashi)

- Fix interpretation of pci_endpoint_test_bars_read_bar() error returns
  (Niklas Cassel)

- Fix potential string truncation in pci_endpoint_test_probe() (Niklas
  Cassel)

- Increase endpoint test BAR size variable to accommodate BARs larger than
  INT_MAX (Niklas Cassel)

- Release IRQs to avoid leak in pci_endpoint interrupt tests (Kunihiko
  Hayashi)

- Log the correct IRQ type when pci_endpoint IRQ request test fails
  (Kunihiko Hayashi)

- Remove pci_endpoint_test irq_type and no_msi globals; instead use
  test->irq_type (Kunihiko Hayashi)

- Remove unnecessary use of managed IRQ functions in pci_endpoint_test
  (Kunihiko Hayashi)

- Add and use IRQ_TYPE_* defines in pci_endpoint_test (Niklas Cassel)

- Add struct pci_epc_features.intx_capable and note that RK3568 and RK3588
  can't raise INTx interrupts (Niklas Cassel)

- Expose supported IRQ types in CAPS so pci_endpoint_test can set
  appropriate type (Niklas Cassel)

- Add PCITEST_IRQ_TYPE_AUTO to pci_endpoint_test for cases where the IRQ
  type doesn't matter (Niklas Cassel)

* pci/endpoint-test:
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header
  misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI header
  PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header
  misc: pci_endpoint_test: Do not use managed IRQ functions
  misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'
  misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type
  misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error
  misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error
  misc: pci_endpoint_test: Handle BAR sizes larger than INT_MAX
  misc: pci_endpoint_test: Give disabled BARs a distinct error code
  misc: pci_endpoint_test: Fix potential truncation in pci_endpoint_test_probe()
  misc: pci_endpoint_test: Fix pci_endpoint_test_bars_read_bar() error handling
  selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test
  selftests: pci_endpoint: Skip disabled BARs
2025-03-27 13:14:46 -05:00
Bjorn Helgaas
a113afb84a Merge branch 'pci/endpoint'
- Convert PCI device data so pci-epf-test works correctly on big-endian
  endpoint systems (Niklas Cassel)

- Add BAR_RESIZABLE type to endpoint framework (Niklas Cassel)

- Add pci_epc_bar_size_to_rebar_cap() to convert a size to the Resizable
  BAR Capability so endpoint drivers can configure what the Capability
  register advertises (Niklas Cassel)

- Add DWC core support for EPF drivers to set BAR_RESIZABLE type and size
  via dw_pcie_ep_set_bar() (Niklas Cassel)

- Describe TI AM65x (keystone) BARs 2 and 5 as Resizable, not Fixed
  (Niklas Cassel)

- Reduce TI AM65x (keystone) BAR alignment requirement from 1MB to 64KB
  (Niklas Cassel)

- Describe Rockchip rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
  Cassel)

- Drop unused devm_pci_epc_destroy() (Zijun Hu)

- Fix pci-epf-test double free that causes an oops if the host reboots and
  PERST# deassertion restarts endpoint BAR allocation (Christian Bruel)

- Drop dw_pcie_ep_find_ext_capability() and use
  dw_pcie_find_ext_capability() instead (Niklas Cassel)

* pci/endpoint:
  PCI: dwc: ep: Remove superfluous function dw_pcie_ep_find_ext_capability()
  PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops
  PCI: endpoint: Remove unused devm_pci_epc_destroy()
  PCI: dw-rockchip: Describe Resizable BARs as Resizable BARs
  PCI: keystone: Specify correct alignment requirement
  PCI: keystone: Describe Resizable BARs as Resizable BARs
  PCI: dwc: ep: Allow EPF drivers to configure the size of Resizable BARs
  PCI: dwc: ep: Move dw_pcie_ep_find_ext_capability()
  PCI: endpoint: Add pci_epc_bar_size_to_rebar_cap()
  PCI: endpoint: Allow EPF drivers to configure the size of Resizable BARs
  PCI: endpoint: pci-epf-test: Handle endianness properly
2025-03-27 13:14:46 -05:00
Bjorn Helgaas
a1aed6b34f Merge branch 'pci/devtree-create'
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they
  haven't been set already (Herve Codina)

- Allow of_pci_set_address() to set the DT address property for root bus
  nodes, where there is no PCI bridge to supply the PCI bus/device/function
  part of the property (Herve Codina)

- Create DT nodes for PCI host bridges to enable loading device tree
  overlays to create platform devices for PCI devices that have several
  features that require multiple drivers (Herve Codina)

* pci/devtree-create:
  PCI: of: Create device tree PCI host bridge node
  PCI: of_property: Constify parameter in of_pci_get_addr_flags()
  PCI: of_property: Add support for NULL pdev in of_pci_set_address()
  PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
  driver core: Introduce device_{add,remove}_of_node()
2025-03-27 13:14:45 -05:00
Bjorn Helgaas
38d42a6612 Merge branch 'pci/resource'
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
  Järvinen)

- Fix typo that repeatedly distributed resources to a bridge instead of
  iterating over subordinate bridges, which resulted in too little space to
  assign some BARs (Kai-Heng Feng)

- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
  to avoid failures when removing and re-adding devices (Ilpo Järvinen)

- Fix a double counting error for I/O resources, as we previously did for
  memory resources (Ilpo Järvinen)

- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)

- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)

- Add pci_resource_num() to look up the BAR number from the resource
  pointer (Ilpo Järvinen)

- Add restore_dev_resource() to simplify code that resources saved device
  resources (Ilpo Järvinen)

- Allow drivers to enable devices even if we haven't assigned optional IOV
  resources to them (Ilpo Järvinen)

- Improve debug output during resource reallocation (Ilpo Järvinen)

- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
  if we can't allocate them (Ilpo Järvinen)

- Move declarations of pci_rescan_bus_bridge_resize(),
  pci_reassign_bridge_resources(), and CardBus-related sizes from
  include/linux/pci.h to drivers/pci/pci.h since they're not used outside
  the PCI core (Ilpo Järvinen)

- Make pci_setup_bridge() static (Ilpo Järvinen)

- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)

- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
  cases, which broke vfio-pci lazy mapping on first access (Niklas
  Schnelle)

- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
  disabled only for s390 (Niklas Schnelle)

- Support mmap of PCI resources on s390 except for ISM devices (Niklas
  Schnelle)

* pci/resource:
  s390/pci: Support mmap() of PCI resources except for ISM devices
  s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
  s390/pci: Fix s390_mmio_read/write syscall page fault handling
  PCI: Fix NULL dereference in SR-IOV VF creation error path
  PCI: Move cardbus IO size declarations into pci/pci.h
  PCI: Make pci_setup_bridge() static
  PCI: Move resource reassignment func declarations into pci/pci.h
  PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
  PCI: Fix BAR resizing when VF BARs are assigned
  PCI: Do not claim to release resource falsely
  PCI: Increase Resizable BAR support from 512 GB to 128 TB
  PCI: Rework optional resource handling
  PCI: Perform reset_resource() and build fail list in sync
  PCI: Use res->parent to check if resource is assigned
  PCI: Add debug print when releasing resources before retry
  PCI: Indicate optional resource assignment failures
  PCI: Always have realloc_head in __assign_resources_sorted()
  PCI: Extend enable to check for any optional resource
  PCI: Add restore_dev_resource()
  PCI: Remove incorrect comment from pci_reassign_resource()
  PCI: Consolidate assignment loop next round preparation
  PCI: Rename retval to ret
  PCI: Use while loop and break instead of gotos
  PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
  PCI: Converge return paths in __assign_resources_sorted()
  PCI: Add dev & res local variables to resource assignment funcs
  PCI: Add pci_resource_num() helper
  PCI: Check resource_size() separately
  PCI: Add pci_resource_is_iov() to identify IOV resources
  PCI: Use resource_set_{range,size}() helpers
  PCI: Use SZ_* instead of literals in setup-bus.c
  PCI: Fix old_size lower bound in calculate_iosize() too
  PCI: Allow relaxed bridge window tail sizing for optional resources
  PCI: Simplify size1 assignment logic
  PCI: Use min_align, not unrelated add_align, for size0
  PCI: Remove add_align overwrite unrelated to size0
  PCI: Use downstream bridges for distributing resources
  PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-27 13:14:45 -05:00
Bjorn Helgaas
a7a8e7996c Merge branch 'pci/reset'
- Log debug messages about reset methods being used (Bjorn Helgaas)

- Avoid reset when it has been disabled via sysfs (Nishanth Aravamudan)

* pci/reset:
  PCI: Avoid reset when disabled via sysfs
  PCI: Log debug messages about reset method
2025-03-27 13:14:45 -05:00
Bjorn Helgaas
55d25a101d Merge branch 'pci/pwrctrl'
- Create pwrctrl devices in pci_scan_device() to make it more symmetric
  with pci_pwrctrl_unregister() and make pwrctrl devices for PCI bridges
  possible (Manivannan Sadhasivam)

- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc. can
  still access devices after pci_stop_dev() (Manivannan Sadhasivam)

- If there's a pwrctrl device for a PCI device, skip scanning it because
  the pwrctrl core will rescan the bus after the device is powered on
  (Manivannan Sadhasivam)

- Add a pwrctrl driver for PCI slots based on voltage regulators described
  via devicetree (Manivannan Sadhasivam)

* pci/pwrctrl:
  PCI/pwrctrl: Add pwrctrl driver for PCI slots
  dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
  PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
  PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
  PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
2025-03-27 13:14:44 -05:00
Bjorn Helgaas
e91c25c6fc Merge branch 'pci/pm'
- Allow PCI bridges to go to D3Hot on all non-x86 systems (Manivannan
  Sadhasivam)

* pci/pm:
  PCI: Allow PCI bridges to go to D3Hot on all non-x86
2025-03-27 13:14:44 -05:00
Bjorn Helgaas
655ea930fe Merge branch 'pci/hotplug'
- Drop shpchp module init/exit logging (Ilpo Järvinen)

- Replace shpchp dbg() with ctrl_dbg() and remove unused dbg(), err(),
  info(), warn() wrappers (Ilpo Järvinen)

- Drop 'shpchp_debug' module parameter in favor of standard dynamic
  debugging (Ilpo Järvinen)

- Drop unused .get_power(), .set_power() function pointers (Guilherme
  Giacomo Simoes)

- Drop superfluous pci_hotplug_slot_list (Lukas Wunner)

- Drop superfluous try_module_get() calls (Lukas Wunner)

- Drop superfluous NULL pointer checks (Lukas Wunner)

- Pass struct hotplug_slot pointers directly to avoid backpointer
  dereferencing in has_*_file() (Lukas Wunner)

- Inline pci_hp_{create,remove}_module_link() to reduce exported symbols
  (Lukas Wunner)

- Disable hotplug interrupts in portdrv only when pciehp is not enabled to
  prevent issuing two hotplug commands too close together (Feng Tang)

- Skip pciehp 'device replaced' check if the device has been removed to
  address a common deadlock when resuming after a device was removed during
  system sleep (Lukas Wunner)

- Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo
  Järvinen)

* pci/hotplug:
  PCI: pciehp: Don't enable HPIE when resuming in poll mode
  PCI: pciehp: Avoid unnecessary device replacement check
  PCI/portdrv: Only disable pciehp interrupts early when needed
  PCI: hotplug: Inline pci_hp_{create,remove}_module_link()
  PCI: hotplug: Avoid backpointer dereferencing in has_*_file()
  PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file()
  PCI: hotplug: Drop superfluous try_module_get() calls
  PCI: hotplug: Drop superfluous pci_hotplug_slot_list
  PCI: cpcihp: Remove unused .get_power() and .set_power()
  PCI: shpchp: Remove 'shpchp_debug' module parameter
  PCI: shpchp: Remove unused logging wrappers
  PCI: shpchp: Change dbg() -> ctrl_dbg()
  PCI: shpchp: Remove logging from module init/exit functions
2025-03-27 13:14:44 -05:00
Bjorn Helgaas
e9e224dadd Merge branch 'pci/enumeration'
- Enable Configuration RRS SV early instead of during child bus scanning
  (Bjorn Helgaas)

- Cache offset of Resizable BAR capability to avoid redundant searches for
  it (Bjorn Helgaas)

- Fix reference leaks in pci_register_host_bridge() and
  pci_alloc_child_bus() (Ma Ke)

- Drop put_device() in pci_register_host_bridge() left over from converting
  device_register() to device_add() (Dan Carpenter)

* pci/enumeration:
  PCI: Remove stray put_device() in pci_register_host_bridge()
  PCI: Fix reference leak in pci_alloc_child_bus()
  PCI: Fix reference leak in pci_register_host_bridge()
  PCI: Cache offset of Resizable BAR capability
  PCI: Enable Configuration RRS SV early
2025-03-27 13:14:43 -05:00
Bjorn Helgaas
651aa9052c Merge branch 'pci/doe'
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
  Francis)

- Expose supported DOE features via sysfs (Alistair Francis)

- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
  Francis)

* pci/doe:
  PCI/DOE: Allow enabling DOE without CXL
  PCI/DOE: Expose DOE features via sysfs
  PCI/DOE: Rename Discovery Response Data Object Contents to type
  PCI/DOE: Rename DOE protocol to feature
2025-03-27 13:14:43 -05:00
Bjorn Helgaas
67b9f18202 Merge branch 'pci/devres'
- Enlarge the devres table[] to accommodate bridge windows, ROM, IOV BARs,
  etc (Philipp Stanner)

- Validate BAR index in devres interfaces (Philipp Stanner)

* pci/devres:
  PCI: Check BAR index for validity
  PCI: Fix wrong length of devres array
2025-03-27 13:14:43 -05:00
Bjorn Helgaas
4d1a2a9244 Merge branch 'pci/bwctrl'
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
  set_pcie_cooling_state.sh test case (Yi Lai)

- Fix the pcie_bwctrl_select_speed() return value in cases where a
  non-compliant device doesn't advertise valid supported speeds (Ilpo
  Järvinen)

- Avoid a NULL pointer dereference when we run out of bus numbers to assign
  for a bridge secondary bus (Lukas Wunner)

* pci/bwctrl:
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
  selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
2025-03-27 13:14:42 -05:00
Bjorn Helgaas
2cde6eb252 Merge branch 'pci/aspm'
- Delay pcie_link_state deallocation to avoid dangling pointers that cause
  invalid references during hot-unplug (Daniel Stodden)

* pci/aspm:
  PCI/ASPM: Fix link state exit during switch upstream function removal
2025-03-27 13:14:42 -05:00
Bjorn Helgaas
5a04a18b1a Merge branch 'pci/aer'
- Implement local aer_printk() since AER is the only place that prints a
  message with level depending on the error severity (Ilpo Järvinen)

* pci/aer:
  PCI/ERR: Handle TLP Log in Flit mode
  PCI: Track Flit Mode Status & print it with link status
  PCI/AER: Descope pci_printk() to aer_printk()
2025-03-27 13:14:42 -05:00
Ioana Ciornei
4c8c0ffd41 PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
The arg_count parameter to syscon_regmap_lookup_by_phandle_args()
represents the number of argument cells following the phandle. In this
case, the number of arguments should be 1 instead of 2 since the dt
property looks like this:

  fsl,pcie-scfg = <&scfg 0>;

Without this fix, layerscape-pcie fails with the following message on
LS1043A:

  OF: /soc/pcie@3500000: phandle scfg@1570000 needs 2, found 1
  layerscape-pcie 3500000.pcie: No syscfg phandle specified
  layerscape-pcie 3500000.pcie: probe with driver layerscape-pcie failed with error -22

Link: https://lore.kernel.org/r/20250327151949.2765193-1-ioana.ciornei@nxp.com
Fixes: 149fc35734 ("PCI: layerscape: Use syscon_regmap_lookup_by_phandle_args")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Roy Zang <Roy.Zang@nxp.com>
Cc: stable@vger.kernel.org
2025-03-27 13:11:14 -05:00
Linus Torvalds
336b4dae6d IOMMU Updates for Linux v6.15
Including:
 
 	- Core: IOMMUFD dependencies from Jason:
 	  - Change the iommufd fault handle into an always present hwpt handle in
 	    the domain
 	  - Give iommufd its own SW_MSI implementation along with some IRQ layer
 	    rework
 	  - Improvements to the handle attach API
 
 	- Core: Fixes for probe-issues from Robin
 
 	- Intel VT-d changes:
 	  - Checking for SVA support in domain allocation and attach paths
 	  - Move PCI ATS and PRI configuration into probe paths
 	  - Fix a pentential hang on reboot -f
 	  - Miscellaneous cleanups
 
 	- AMD-Vi changes:
 	  - Support for up to 2k IRQs per PCI device function
 	  - Set of smaller fixes
 
 	- ARM-SMMU changes:
 	  - SMMUv2 devicetree binding updates for Qualcomm implementations
 	    (QCS8300 GPU and MSM8937)
 	  - Clean up SMMUv2 runtime PM implementation to help with wider rework of
 	    pm_runtime_put_autosuspend()
 
 	- Rockchip driver changes:
 	  - Driver adjustments for recent DT probing changes
 
 	- S390 IOMMU changes:
 	  - Support for IOMMU passthrough
 
 	- Apple Dart changes:
 	  - Driver adjustments to meet ISP device requirements
 	  - Null-ptr deref fix
 	  - Disable subpage protection for DART 1
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmfkFSkACgkQK/BELZcB
 GuOVUg/9En4QlF0eeg4E1stszGcOOXn9IkKiGP9iqKpL/WqVVitoagn7nwq0rBFF
 PdqqcGbSPNHctiUursB18DyENOAmRUE8mGG29po9rAfq5wxZ2yA6eLmpFAf9QDCW
 vY3nz2oen1lUrzbPwVI1IR67L6BrtqEUz/1syf295RMr2yXkdmBXb12lbL1rdjv0
 7YfWwlDtMOVsewlZmCuDAAfWHtd7cO/ulDcYwq+GOiSXojVmKw8624EFlFbM9E9f
 tkceE4mjGQnN/CpZzUc48f1DgIW82PVVTek95Cznbk9ACDJQ4VYYwDMx2WszfMzu
 D1CXLiQu0vrxQgFdHn02bw3T4yvXQ4HeawIaTF+1J1gEHbrj6MoVQ88zaO3GnKFk
 yxYI5sfOs5iL6zSMkig4OytXuRmRqDVJhgrF1i4XIM3GXrRKf9EiRZ/1eXb+1gab
 3Q5k7+u4+7vWbaKLsxOdIBNzz02hb7LVndPIFlWmSnr1YY0LFBfwmgbfTAsSINZo
 22Ni8aE2C42lYOZxJ67m3khLpqTZaTERtQIab/gUd5FcuEjxKuHBIzj6SZeC/0Q9
 Y6rzklHxJxxtQLrUJp16IzjpiMJbWH0H2jKstRberOpxk3L5aLrJNd8M1B+CokmP
 LZeKgqgxV5F+VlR2Bap5AXL9ZEuqjCVDIEy79j8XzAzxobDVnVM=
 =a8Pd
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core iommufd dependencies from Jason:
   - Change the iommufd fault handle into an always present hwpt handle
     in the domain
   - Give iommufd its own SW_MSI implementation along with some IRQ
     layer rework
   - Improvements to the handle attach API

  Core fixes for probe-issues from Robin

  Intel VT-d changes:
   - Checking for SVA support in domain allocation and attach paths
   - Move PCI ATS and PRI configuration into probe paths
   - Fix a pentential hang on reboot -f
   - Miscellaneous cleanups

  AMD-Vi changes:
   - Support for up to 2k IRQs per PCI device function
   - Set of smaller fixes

  ARM-SMMU changes:
   - SMMUv2 devicetree binding updates for Qualcomm implementations
     (QCS8300 GPU and MSM8937)
   - Clean up SMMUv2 runtime PM implementation to help with wider rework
     of pm_runtime_put_autosuspend()

  Rockchip driver changes:
   - Driver adjustments for recent DT probing changes

  S390 IOMMU changes:
   - Support for IOMMU passthrough

  Apple Dart changes:
   - Driver adjustments to meet ISP device requirements
   - Null-ptr deref fix
   - Disable subpage protection for DART 1"

* tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
  iommu/vt-d: Fix possible circular locking dependency
  iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes
  iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled
  iommu: apple-dart: fix potential null pointer deref
  iommu/rockchip: Retire global dma_dev workaround
  iommu/rockchip: Register in a sensible order
  iommu/rockchip: Allocate per-device data sensibly
  iommu/mediatek-v1: Support COMPILE_TEST
  iommu/amd: Enable support for up to 2K interrupts per function
  iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro
  iommu/amd: Replace slab cache allocator with page allocator
  iommu/amd: Introduce generic function to set multibit feature value
  iommu: Don't warn prematurely about dodgy probes
  iommu/arm-smmu: Set rpm auto_suspend once during probe
  dt-bindings: arm-smmu: Document QCS8300 GPU SMMU
  iommu: Get DT/ACPI parsing into the proper probe path
  iommu: Keep dev->iommu state consistent
  iommu: Resolve ops in iommu_init_device()
  iommu: Handle race with default domain setup
  iommu: Unexport iommu_fwspec_free()
  ...
2025-03-26 20:10:09 -07:00
Linus Torvalds
87dc996d56 A urgent fix for the XEN related PCI/MSI changes:
XEN used a global variable to disable the masking of MSI interrupts as
   XEN handles that on the hypervisor side. This turned out to be a problem
   with VMD as the PCI devices behind a VMD bridge are not always handled
   by the hypervisor and then require masking by guest.
 
   To solve this the global variable was replaced by a interrupt domain
   specific flag, which is set by the generic XEN PCI/MSI domain, but not by
   VMD or any other domain in the system.
 
   So far, so good. But the implementation (and the reviewer) missed the
   fact, that accessing the domain flag cannot be done directly because
   there are at least two situations, where this fails. Legacy architectures
   are not providing interrupt domains at all. The new MSI parent domains do
   not require to have a domain info pointer. Both cases result in a
   unconditional NULL pointer derefence.
 
   The PCI/MSI code already has a function to query the MSI domain specific
   flag in a safe way, which handles all possible cases of PCI/MSI backends.
 
   So the fix it simply to replace the open coded checks by invoking the
   safe helper to query the flag.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfkUA4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVymD/4xvcTnjjBQgesBWtkZ/wyvOx+zAoZD
 s6ikEnYVzG6jjiob9SXrttJS/fZDuFujmntvq1lp+Lw1MbK/k9qyeKpMUkjVw4V5
 gHgPUZUkKapk2Jf3vYy23cb8CuRp9nANi7r//kzefN/pyEmfuqme3AYNJjusyEyn
 QfhLGE046GI2PLLS+4FKUjk+/aPEk/Pywy3moaSpDVd8Kmsus5GTmxt/AWcwGfcw
 00TSRKstpew+RVLBb7NvSar0D43vtvP/8If6qkIpEttVhB6/4zTREB1z6iPjUSf0
 2+90iubCdSm8wO+y0JDmA13cUct2XqW19M75imSLBIsJAlVtgWgOi2IJF787vDPJ
 3bN2HeScCbl8+0e4Uv+NOi2c906k43MtC4XTrIq0nZjAuRyF9FEtdK+IrcywxuKH
 nuaHtlseGWg97kU49yDTi1k+tRvIpmsibKM4pKZoUZKNMfE2soCFjzj40O7hTcKJ
 LaN/fcDdPGnn+GMPywwIWc7e6kOIYnCqakom/x7e+KWpqfujnM3KLiyOy9iys/oL
 xXfKnMbbmL0CSdUrC7EJn20KFRA2ovAJ+ouzJRkFhCdERCVidY29O3VrstacbevS
 zQKb302PbFraGHa/MiuzyF0BuWuM0T0d12+2wEoVJpN/GTAbgO2vVbk2gvHEReXJ
 eMDn02f5lF659w==
 =ZPMY
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2025-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI irq fix from Thomas Gleixner:
 "An urgent fix for the XEN related PCI/MSI changes:

  XEN used a global variable to disable the masking of MSI interrupts as
  XEN handles that on the hypervisor side. This turned out to be a
  problem with VMD as the PCI devices behind a VMD bridge are not always
  handled by the hypervisor and then require masking by guest.

  To solve this the global variable was replaced by a interrupt domain
  specific flag, which is set by the generic XEN PCI/MSI domain, but not
  by VMD or any other domain in the system.

  So far, so good. But the implementation (and the reviewer) missed the
  fact, that accessing the domain flag cannot be done directly because
  there are at least two situations, where this fails.

  Legacy architectures are not providing interrupt domains at all. The
  new MSI parent domains do not require to have a domain info pointer.
  Both cases result in a unconditional NULL pointer derefence.

  The PCI/MSI code already has a function to query the MSI domain
  specific flag in a safe way, which handles all possible cases of
  PCI/MSI backends.

  So the fix it simply to replace the open coded checks by invoking the
  safe helper to query the flag"

* tag 'irq-urgent-2025-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  PCI/MSI: Handle the NOMASK flag correctly for all PCI/MSI backends
2025-03-26 13:20:22 -07:00
Thomas Gleixner
3ece3e8e59 PCI/MSI: Handle the NOMASK flag correctly for all PCI/MSI backends
The conversion of the XEN specific global variable pci_msi_ignore_mask to a
MSI domain flag, missed the facts that:

    1) Legacy architectures do not provide a interrupt domain
    2) Parent MSI domains do not necessarily have a domain info attached
   
Both cases result in an unconditional NULL pointer dereference. This was
unfortunatly missed in review and testing revealed it late.

Cure this by using the existing pci_msi_domain_supports() helper, which
handles all possible cases correctly.

Fixes: c3164d2e0d ("PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag")
Reported-by: Daniel Gomez <da.gomez@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Daniel Gomez <da.gomez@kernel.org>
Link: https://lore.kernel.org/all/87iknwyp2o.ffs@tglx
Closes: https://lore.kernel.org/all/qn7fzggcj6qe6r6gdbwcz23pzdz2jx64aldccmsuheabhmjgrt@tawf5nfwuvw7
2025-03-26 15:28:43 +01:00
Siddharth Vadapalli
d66b5b3362
PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
Commit e49ad66781 ("PCI: j721e: Add TI J784S4 PCIe configuration")
assigned the value of .linkdown_irq_regfield for the J784S4 SoC as the
"LINK_DOWN" macro corresponding to BIT(1), and as a result, the Link
Down interrupts on J784S4 SoC are missed.

According to the Technical Reference Manual and Register Documentation
for the J784S4 SoC[1], BIT(1) corresponds to "ENABLE_SYS_EN_PCIE_DPA_1",
which is not the correct field for the link-state interrupt. Instead, it
is BIT(10) of the "PCIE_INTD_ENABLE_REG_SYS_2" register that corresponds
to the link-state field named as "ENABLE_SYS_EN_PCIE_LINK_STATE".

Thus, set .linkdown_irq_regfield to the macro "J7200_LINK_DOWN", which
expands to BIT(10) and was first defined for the J7200 SoC. Other SoCs
already reuse this macro since it accurately represents the "link-state"
field in their respective "PCIE_INTD_ENABLE_REG_SYS_2" register.

1: https://www.ti.com/lit/zip/spruj52

Fixes: e49ad66781 ("PCI: j721e: Add TI J784S4 PCIe configuration")
Cc: stable@vger.kernel.org
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
[kwilczynski: commit log, add a missing .linkdown_irq_regfield member
set to the J7200_LINK_DOWN macro to struct j7200_pcie_ep_data]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250305132018.2260771-1-s-vadapalli@ti.com
2025-03-26 07:06:12 +00:00
Niklas Cassel
7c3b54cf64
PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
Expose the supported IRQ types in the CAPS register.

This way, the host side driver (drivers/misc/pci_endpoint_test.c) can
know which IRQ types that the endpoint supports.

The host side driver will make use of this information in a follow-up
commit.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-15-cassel@kernel.org
2025-03-26 06:11:52 +00:00
Niklas Cassel
e55c67837a
PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
Neither RK3568 or RK3588 supports INTx interrupts.

Since epc_features is zero initialized, this is strictly not needed.
However, setting intx_capable explicitly to false makes it more clear
that neither RK3568 or RK3588 supports INTx interrupts.

No functional change.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-14-cassel@kernel.org
2025-03-26 06:11:49 +00:00
Linus Torvalds
7d20aa5c32 Power management updates for 6.15-rc1
- Manage sysfs attributes and boost frequencies efficiently from
    cpufreq core to reduce boilerplate code in drivers (Viresh Kumar).
 
  - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
    Dhananjay Ugwekar, Imran Shaik, zuoqian).
 
  - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
    Bai).
 
  - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
 
  - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng).
 
  - Optimize the amd-pstate driver to avoid cases where call paths end
    up calling the same writes multiple times and needlessly caching
    variables through code reorganization, locking overhaul and tracing
    adjustments (Mario Limonciello, Dhananjay Ugwekar).
 
  - Make it possible to avoid enabling capacity-aware scheduling (CAS) in
    the intel_pstate driver and relocate a check for out-of-band (OOB)
    platform handling in it to make it detect OOB before checking HWP
    availability (Rafael Wysocki).
 
  - Fix dbs_update() to avoid inadvertent conversions of negative integer
    values to unsigned int which causes CPU frequency selection to be
    inaccurate in some cases when the "conservative" cpufreq governor is
    in use (Jie Zhan).
 
  - Update the handling of the most recent idle intervals in the menu
    cpuidle governor to prevent useful information from being discarded
    by it in some cases and improve the prediction accuracy (Rafael
    Wysocki).
 
  - Make it possible to tell the intel_idle driver to ignore its built-in
    table of idle states for the given processor, clean up the handling
    of auto-demotion disabling on Baytrail and Cherrytrail chips in it,
    and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy,
    Rafael Wysocki).
 
  - Make some cpuidle drivers use for_each_present_cpu() instead of
    for_each_possible_cpu() during initialization to avoid issues
    occurring when nosmp or maxcpus=0 are used (Jacky Bai).
 
  - Clean up the Energy Model handling code somewhat (Rafael Wysocki).
 
  - Use kfree_rcu() to simplify the handling of runtime Energy Model
    updates (Li RongQing).
 
  - Add an entry for the Energy Model framework to MAINTAINERS as
    properly maintained (Lukasz Luba).
 
  - Address RCU-related sparse warnings in the Energy Model code (Rafael
    Wysocki).
 
  - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
    when DEVFREQ is set without CPUFREQ so it can be used on a wider
    range of systems (Jeson Gao).
 
  - Unify error handling during runtime suspend and runtime resume in the
    core to help drivers to implement more consistent runtime PM error
    handling (Rafael Wysocki).
 
  - Drop a redundant check from pm_runtime_force_resume() and rearrange
    documentation related to __pm_runtime_disable() (Rafael Wysocki).
 
  - Rework the handling of the "smart suspend" driver flag in the PM core
    to avoid issues hat may occur when drivers using it depend on some
    other drivers and clean up the related PM core code (Rafael Wysocki,
    Colin Ian King).
 
  - Fix the handling of devices with the power.direct_complete flag set
    if device_suspend() returns an error for at least one device to avoid
    situations in which some of them may not be resumed (Rafael Wysocki).
 
  - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
    possible deadlock that may occur if the "compressor" hibernation
    module parameter is accessed during the registration of a new
    ieee80211 device (Lizhi Xu).
 
  - Suppress sleeping parent warning in device_pm_add() in the case when
    new children are added under a device with the power.direct_complete
    set after it has been processed by device_resume() (Xu Yang).
 
  - Remove needless return in three void functions related to system
    wakeup (Zijun Hu).
 
  - Replace deprecated kmap_atomic() with kmap_local_page() in the
    hibernation core code (David Reaver).
 
  - Remove unused helper functions related to system sleep (David Alan
    Gilbert).
 
  - Clean up s2idle_enter() so it does not lock and unlock CPU offline
    in vain and update comments in it (Ulf Hansson).
 
  - Clean up broken white space in dpm_wait_for_children() (Geert
    Uytterhoeven).
 
  - Update the cpupower utility to fix lib version-ing in it and memory
    leaks in error legs, remove hard-coded values, and implement CPU
    physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
    Khan, Yiwei Lin, Zhongqiu Han).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA
 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q
 TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s
 HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS
 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay
 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI=
 =LqVy
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are dominated by cpufreq updates which in turn are dominated by
  updates related to boost support in the core and drivers and
  amd-pstate driver optimizations.

  Apart from the above, there are some cpuidle updates including a
  rework of the most recent idle intervals handling in the venerable
  menu governor that leads to significant improvements in some
  performance benchmarks, as the governor is now more likely to predict
  a shorter idle duration in some cases, and there are updates of the
  core device power management code, mostly related to system suspend
  and resume, that should help to avoid potential issues arising when
  the drivers of devices depending on one another want to use different
  optimizations.

  There is also a usual collection of assorted fixes and cleanups,
  including removal of some unused code.

  Specifics:

   - Manage sysfs attributes and boost frequencies efficiently from
     cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)

   - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
     Dhananjay Ugwekar, Imran Shaik, zuoqian)

   - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
     Bai)

   - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)

   - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)

   - Optimize the amd-pstate driver to avoid cases where call paths end
     up calling the same writes multiple times and needlessly caching
     variables through code reorganization, locking overhaul and tracing
     adjustments (Mario Limonciello, Dhananjay Ugwekar)

   - Make it possible to avoid enabling capacity-aware scheduling (CAS)
     in the intel_pstate driver and relocate a check for out-of-band
     (OOB) platform handling in it to make it detect OOB before checking
     HWP availability (Rafael Wysocki)

   - Fix dbs_update() to avoid inadvertent conversions of negative
     integer values to unsigned int which causes CPU frequency selection
     to be inaccurate in some cases when the "conservative" cpufreq
     governor is in use (Jie Zhan)

   - Update the handling of the most recent idle intervals in the menu
     cpuidle governor to prevent useful information from being discarded
     by it in some cases and improve the prediction accuracy (Rafael
     Wysocki)

   - Make it possible to tell the intel_idle driver to ignore its
     built-in table of idle states for the given processor, clean up the
     handling of auto-demotion disabling on Baytrail and Cherrytrail
     chips in it, and update its MAINTAINERS entry (David Arcari, Artem
     Bityutskiy, Rafael Wysocki)

   - Make some cpuidle drivers use for_each_present_cpu() instead of
     for_each_possible_cpu() during initialization to avoid issues
     occurring when nosmp or maxcpus=0 are used (Jacky Bai)

   - Clean up the Energy Model handling code somewhat (Rafael Wysocki)

   - Use kfree_rcu() to simplify the handling of runtime Energy Model
     updates (Li RongQing)

   - Add an entry for the Energy Model framework to MAINTAINERS as
     properly maintained (Lukasz Luba)

   - Address RCU-related sparse warnings in the Energy Model code
     (Rafael Wysocki)

   - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
     when DEVFREQ is set without CPUFREQ so it can be used on a wider
     range of systems (Jeson Gao)

   - Unify error handling during runtime suspend and runtime resume in
     the core to help drivers to implement more consistent runtime PM
     error handling (Rafael Wysocki)

   - Drop a redundant check from pm_runtime_force_resume() and rearrange
     documentation related to __pm_runtime_disable() (Rafael Wysocki)

   - Rework the handling of the "smart suspend" driver flag in the PM
     core to avoid issues hat may occur when drivers using it depend on
     some other drivers and clean up the related PM core code (Rafael
     Wysocki, Colin Ian King)

   - Fix the handling of devices with the power.direct_complete flag set
     if device_suspend() returns an error for at least one device to
     avoid situations in which some of them may not be resumed (Rafael
     Wysocki)

   - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
     possible deadlock that may occur if the "compressor" hibernation
     module parameter is accessed during the registration of a new
     ieee80211 device (Lizhi Xu)

   - Suppress sleeping parent warning in device_pm_add() in the case
     when new children are added under a device with the
     power.direct_complete set after it has been processed by
     device_resume() (Xu Yang)

   - Remove needless return in three void functions related to system
     wakeup (Zijun Hu)

   - Replace deprecated kmap_atomic() with kmap_local_page() in the
     hibernation core code (David Reaver)

   - Remove unused helper functions related to system sleep (David Alan
     Gilbert)

   - Clean up s2idle_enter() so it does not lock and unlock CPU offline
     in vain and update comments in it (Ulf Hansson)

   - Clean up broken white space in dpm_wait_for_children() (Geert
     Uytterhoeven)

   - Update the cpupower utility to fix lib version-ing in it and memory
     leaks in error legs, remove hard-coded values, and implement CPU
     physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
     Khan, Yiwei Lin, Zhongqiu Han)"

* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
  PM: sleep: Fix bit masking operation
  dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
  dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
  cpufreq: Init cpufreq only for present CPUs
  PM: sleep: Fix handling devices with direct_complete set on errors
  cpuidle: Init cpuidle only for present CPUs
  PM: clk: Remove unused pm_clk_remove()
  PM: sleep: core: Fix indentation in dpm_wait_for_children()
  PM: s2idle: Extend comment in s2idle_enter()
  PM: s2idle: Drop redundant locks when entering s2idle
  PM: sleep: Remove unused pm_generic_ wrappers
  cpufreq: tegra186: Share policy per cluster
  cpupower: Make lib versioning scheme more obvious and fix version link
  PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
  PM: EM: Address RCU-related sparse warnings
  cpupower: Implement CPU physical core querying
  pm: cpupower: remove hard-coded topology depth values
  pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
  ...
2025-03-25 15:00:18 -07:00
Linus Torvalds
dce3ab4c57 xen: branch for v6.15-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZ9/gEwAKCRCAXGG7T9hj
 vlxhAQCRzSCNI8wwvENnuc2OnRyWKy8gq7C5WAOIOJdJ3U+scQEAwKGhPJLwE4IS
 /JDh5PRJgZ4rdMYatuDfldEcSAfRRgw=
 =dF6Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - cleanup: remove an used function

 - add support for a XenServer specific virtual PCI device

 - fix the handling of a sparse Xen hypervisor symbol table

 - avoid warnings when building the kernel with gcc 15

 - fix use of devices behind a VMD bridge when running as a Xen PV dom0

* tag 'for-linus-6.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
  PCI: vmd: Disable MSI remapping bypass under Xen
  xen/pci: Do not register devices with segments >= 0x10000
  xen/pciback: Remove unused pcistub_get_pci_dev
  xenfs/xensyms: respect hypervisor's "next" indication
  xen/mcelog: Add __nonstring annotations for unterminated strings
  xen: Add support for XenServer 6.1 platform device
2025-03-25 14:33:32 -07:00
Linus Torvalds
36f5f026df Updates for MSI interrupts
- Switch the MSI descriptor locking to guards
 
   - Replace the broken PCI/TPH implementation, which lacks any form of
     serialization against concurrent modifications with a properly
     serialized mechanism in the PCI/MSI core code.
 
   - Replace the MSI descriptor abuse in the SCSI/UFS Qualcom driver with
     dedicated driver internal storage.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5JcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocp6D/9IUXX/8HsBVZwCBY1/SYAC8fnTUNGG
 fMxPTX0BSRNOaChvGkyIllEEkNbh+WDMh1qkV94fusJCRvQVaDnwJZ9zoTMJekiM
 ZEBXdqTiDyqUG5LzFeoYQFu/XBth59HMkaDqCxJEy+YbjnTNamX8QS8A/FZlPNH0
 uRs4m7oQxABL6E/JkjOkC0UOqddZ5BDwbLgXoGq+uDSLTicO9yR6EsigVacoreap
 zmkaUPRmL9NM8onrP2/9LduWwgcxn9u2wvd0uOCoIHBUKrlIpHkuNjieQ4R8QCP0
 b7INNTEnwv815O11EoU+AVoL+sTYjUVOAUzfeNejlUspwvjBASm5lUkyB2TS5nq7
 TtCxFyIS7/eBXwrKeV1w9ZcGjcso0DRJPvBQvhhL+q00Mtlrbpy7NUIF5S/8I60q
 QWLN1h1iKISl9aT5p7K1v1lnGyPTb942PNmRc9Cd8Si3QMt6be3RkAJekPWU78Wd
 Q9fXiMS4vyZhtt1S0vKU1L80Ezg3JVL6tWWpsSdZhMBM8wYiw1PI9NYtYEO2lzP/
 hDHOu4EaItPg5A6Z+IDOj13pddtQsnssiWJmcDvarnJKBOfmAbwOcNGpwcwCbm62
 bephtPc4h7En6/d4lsi8yzK6dSpf7yHLxTRrp9lomMaydEjfsNWcAs/nN2Y+wbr4
 UMuZDSrlTGHP0Q==
 =Vl8Q
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI irq updates from Thomas Gleixner:

 - Switch the MSI descriptor locking to guards

 - Replace the broken PCI/TPH implementation, which lacks any form of
   serialization against concurrent modifications with a properly
   serialized mechanism in the PCI/MSI core code

 - Replace the MSI descriptor abuse in the SCSI/UFS Qualcom driver with
   dedicated driver internal storage

* tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi: Rename msi_[un]lock_descs()
  scsi: ufs: qcom: Remove the MSI descriptor abuse
  PCI/TPH: Replace the broken MSI-X control word update
  PCI/MSI: Provide a sane mechanism for TPH
  PCI: hv: Switch MSI descriptor locking to guard()
  PCI/MSI: Switch to MSI descriptor locking to guard()
  NTB/msi: Switch MSI descriptor locking to lock guard()
  soc: ti: ti_sci_inta_msi: Switch MSI descriptor locking to guard()
  genirq/msi: Use lock guards for MSI descriptor locking
  cleanup: Provide retain_ptr()
  genirq/msi: Make a few functions static
2025-03-25 09:15:17 -07:00
Linus Torvalds
e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Frank Li
07ae413e16 PCI: intel-gw: Remove intel_pcie_cpu_addr()
Remove intel_pcie_cpu_addr(), the .cpu_addr_fixup() method, because the dwc
core driver already handles address translation based on the devicetree
description.

[bhelgaas: this does require a minor dts change, but maintainer
Lei Chuan Hua <lchuanhua@maxlinear.com> confirms that the driver is only
used internally to Maxlinear and internal users will update dts:
https://lore.kernel.org/r/BY3PR19MB507667CE7531D863E1E5F8AEBDD82@BY3PR19MB5076.namprd19.prod.outlook.com]

Link: https://lore.kernel.org/r/20250305-intel-v1-1-40db3a685490@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 14:58:35 -05:00
Frank Li
b9812179f6 PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
Remove imx_pcie_cpu_addr_fixup, the .cpu_addr_fixup() method, because the
dwc core driver already handles address translation based on the devicetree
description.

Link: https://lore.kernel.org/r/20250315201548.858189-14-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
2025-03-24 14:58:34 -05:00
Frank Li
befc86a0b3 PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
We know the parent_bus_offset, either computed from a DT reg property (the
offset is the CPU physical addr - the 'config'/'addr_space' address on the
parent bus) or from a .cpu_addr_fixup() (which may have used a host bridge
window offset).

Apply that parent_bus_offset instead of calling .cpu_addr_fixup() when
programming the ATU.

This assumes all intermediate addresses are at the same offset from the CPU
physical addresses.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20250315201548.858189-13-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 14:58:34 -05:00
Frank Li
f3e1dccba0 PCI: dwc: ep: Ensure proper iteration over outbound map windows
Most systems' PCIe outbound map windows have non-zero physical addresses,
but the possibility of encountering zero increased after following commit
("PCI: dwc: Use parent_bus_offset").

'ep->outbound_addr[n]', representing 'parent_bus_address', might be 0 on
some hardware, which trims high address bits through bus fabric before
sending to the PCIe controller.

Replace the iteration logic with 'for_each_set_bit()' to ensure only
allocated map windows are iterated when determining the ATU index from a
given address.

Link: https://lore.kernel.org/r/20250315201548.858189-12-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 14:58:34 -05:00
Frank Li
f28b3c9c42 PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
Endpoint
  ┌───────────────────────────────────────────────┐
  │                             pcie-ep@5f010000  │
  │                             ┌────────────────┐│
  │                             │   Endpoint     ││
  │                             │   PCIe         ││
  │                             │   Controller   ││
  │           bus@5f000000      │             ┌────────►
  │           ┌──────────┐      │             │  ││dynamically
  │           │          │ Outbound Transfer  │  ││allocated
  │┌─────┐    │  Bus     ┼─────►│ ATU  ───────┘  ││PCI Addr
  ││     │    │  Fabric  │Bus   │                ││
  ││ CPU ├───►│          │Addr  │                ││
  ││     │CPU │          │0x8000_0000            ││
  │└─────┘Addr└──────────┘      │                ││
  │       0x7000_0000           └────────────────┘│
  └───────────────────────────────────────────────┘

  bus@5f000000 {
          compatible = "simple-bus";
          ranges = <0x80000000 0x0 0x70000000 0x10000000>;

          pcie-ep@5f010000 {
                  reg = <0x80000000 0x10000000>;
                  reg-names ="addr_space";
                  ...
          };
          ...
  };

In the diagram above, CPU writes data to outbound window address
0x7000_0000, and the bus fabric maps it to 0x8000_0000. The ATU uses
bus address 0x8000_0000 as input address and maps to some PCI address
dynamically allocated by a PCI device driver on the host side.

The pcie-ep@5f010000 'reg[addr_space]' is the parent bus address, which is
the input of PCIe controller, including the ATU.

Set parent_bus_offset, the offset from the CPU address to the PCIe
controller input address using dw_pcie_init_parent_bus_offset().  The
parent_bus_offset is not used yet, so no functional change intended.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20250315201548.858189-11-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 14:58:34 -05:00
Bjorn Helgaas
d7ae671eba PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
Consolidate devicetree resource handling in dw_pcie_ep_get_resources().
No functional change intended.

Link: https://lore.kernel.org/r/20250315201548.858189-10-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-03-24 14:58:34 -05:00
Bjorn Helgaas
92eb132ad1 PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
Move devm_pci_epc_create() to the beginning of dw_pcie_ep_init().

devm_pci_epc_create() is generic code that doesn't depend on any DWC
resource, so moving it earlier keeps all the subsequent devicetree-related
code together.

Link: https://lore.kernel.org/r/20250315201548.858189-9-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-03-24 14:58:34 -05:00
Frank Li
7db02f725d PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
The 'ranges' property of a PCI controller's parent can indicate address
translation information. Most system use 1:1 map between CPU physical and
PCI controller input addresses.

But some hardware, like i.MX8QXP, doesn't use 1:1 map.  See below diagram:

              ┌─────────┐                    ┌────────────┐
   ┌─────┐    │         │ IA: 0x8ff8_0000    │            │
   │ CPU ├───►│   ┌────►├─────────────────┐  │ PCI        │
   └─────┘    │   │     │ IA: 0x8ff0_0000 │  │            │
    CPU Addr  │   │  ┌─►├─────────────┐   │  │ Controller │
  0x7ff8_0000─┼───┘  │  │             │   │  │            │
              │      │  │             │   │  │            │   PCI Addr
  0x7ff0_0000─┼──────┘  │             │   └──► IOSpace   ─┼────────────►
              │         │             │      │            │    0
  0x7000_0000─┼────────►├─────────┐   │      │            │
              └─────────┘         │   └──────► CfgSpace  ─┼────────────►
               Bus Fabric         │          │            │    0
                                  │          │            │
                                  └──────────► MemSpace  ─┼────────────►
                          IA: 0x8000_0000    │            │  0x8000_0000
                                             └────────────┘

  bus@5f000000 {
          compatible = "simple-bus";
          #address-cells = <1>;
          #size-cells = <1>;
          ranges = <0x80000000 0x0 0x70000000 0x10000000>;

          pcie@5f010000 {
                  compatible = "fsl,imx8q-pcie";
                  reg = <0x5f010000 0x10000>, <0x8ff00000 0x80000>;
                  reg-names = "dbi", "config";
                  ...
          };
  };

Intermediate address (IA) here means the PCIe controller input address.
The pcie@5f010000 'reg[config]' address is the parent bus (PCIe controller
input) address of CfgSpace.

The ATU in MemSpace is not explicitly described via devicetree, so we
assume the offset from CPU address to intermediate MemSpace address is the
same as that for CfgSpace.

We could use bus@5f000000 'ranges' for the same purpose.

Set parent_bus_offset using dw_pcie_init_parent_bus_offset().  The
parent_bus_offset is not used yet, so no functional change intended.

Link: https://lore.kernel.org/r/20250315201548.858189-8-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 14:58:34 -05:00
Frank Li
3b69e1d381 PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
dw_pcie_parent_bus_offset() looks up the parent bus address of a PCI
controller 'reg' property in devicetree.  If implemented, .cpu_addr_fixup()
is a hard-coded way to get the parent bus address corresponding to a CPU
physical address.

Add debug code to compare the address from .cpu_addr_fixup() with the
address from devicetree.  If they match, warn that .cpu_addr_fixup() is
redundant and should be removed; if they differ, warn that something is
wrong with the devicetree.

If .cpu_addr_fixup() is not implemented, the parent bus address should be
identical to the CPU physical address because we previously ignored the
parent bus address from devicetree.  If the devicetree has a different
parent bus address, warn about it being broken.

[bhelgaas: split debug to separate patch for easier future revert, commit
log]

Link: https://lore.kernel.org/r/20250315201548.858189-7-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[bhelgaas: squash Ioana Ciornei <ioana.ciornei@nxp.com> fix for NULL
pointer deref when driver doesn't supply dw_pcie_ops, e.g., layerscape-pcie
https://lore.kernel.org/r/20250319134339.3114817-1-ioana.ciornei@nxp.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 14:58:01 -05:00
Frank Li
9de3f3cd47 PCI: dwc: Add dw_pcie_parent_bus_offset()
Return the offset from CPU physical address to the parent bus address of
the specified element of the devicetree 'reg' property.

[bhelgaas: cpu_phy_addr -> cpu_phys_addr, return offset, split
.cpu_addr_fixup() checking and debug to separate patch]

Link: https://lore.kernel.org/r/20250315201548.858189-6-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-24 13:13:02 -05:00
Lukas Wunner
667f053b05
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
When BIOS neglects to assign bus numbers to PCI bridges, the kernel
attempts to correct that during PCI device enumeration.  If it runs out
of bus numbers, no pci_bus is allocated and the "subordinate" pointer in
the bridge's pci_dev remains NULL.

The PCIe bandwidth controller erroneously does not check for a NULL
subordinate pointer and dereferences it on probe.

Bandwidth control of unusable devices below the bridge is of questionable
utility, so simply error out instead.  This mirrors what PCIe hotplug does
since commit 62e4492c30 ("PCI: Prevent NULL dereference during pciehp
probe").

The PCI core emits a message with KERN_INFO severity if it has run out of
bus numbers.  PCIe hotplug emits an additional message with KERN_ERR
severity to inform the user that hotplug functionality is disabled at the
bridge.  A similar message for bandwidth control does not seem merited,
given that its only purpose so far is to expose an up-to-date link speed
in sysfs and throttle the link speed on certain laptops with limited
Thermal Design Power.  So error out silently.

User-visible messages:

  pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
  [...]
  pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74
  pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
  [...]
  pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring
  [...]
  BUG: kernel NULL pointer dereference
  RIP: pcie_update_link_speed
  pcie_bwnotif_enable
  pcie_bwnotif_probe
  pcie_port_probe_service
  really_probe

Fixes: 665745f274 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Cc: stable@vger.kernel.org # v6.13+
Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de
2025-03-23 13:37:24 +00:00
Thippeswamy Havalige
9e141923cf
PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
Update the CPM5 check to include CPM5_HOST1 variant. Previously, only
CPM5 was considered when mapping the "cpm_csr" register.

With this change, CPM5_HOST1 is also supported, ensuring proper
resource mapping for this variant.

Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250317124136.1317723-1-thippeswamy.havalige@amd.com
2025-03-23 12:40:22 +00:00
Colin Ian King
2d72d81cac
PCI: brcmstb: Make const read-only arrays static
Don't populate the const read-only arrays "data" and "regs" on the
stack at run time, instead make them static.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
[kwilczynski: commit log, wrap overly long line to 80 columns]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250317143456.477901-1-colin.i.king@gmail.com
2025-03-23 12:34:38 +00:00
Thippeswamy Havalige
5f3de23d85
PCI: amd-mdb: Add AMD MDB Root Port driver
Add support for AMD MDB (Multimedia DMA Bridge) IP core as Root Port.

The Versal2 devices include MDB Module. The integrated block for MDB
along with the integrated bridge can function as PCIe Root Port
controller at Gen5 32-GT/s operation per lane.

Bridge supports error and INTx interrupts and are handled using platform
specific interrupt line in Versal2.

Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250228093351.923615-4-thippeswamy.havalige@amd.com
[bhelgaas: only present on ARM64-based SoCs; squash Kconfig dependency on
ARM64 from Geert Uytterhoeven <geert+renesas@glider.be>:
https://lore.kernel.org/r/eaef1dea7edcf146aa377d5e5c5c85a76ff56bae.1742306383.git.geert+renesas@glider.be]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log, code comments and error messages clean-up,
drop redundant "depends on PCI" from Kconfig, expose the error code
as part of error messages where appropriatie, change "depends on"
expression to match existing style from other drivers]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-23 05:50:59 +00:00
Alistair Francis
6fc6ded50f PCI/DOE: Allow enabling DOE without CXL
PCIe devices (not CXL) can support DOE as well, so allow DOE to be enabled
even if CXL isn't.

Link: https://lore.kernel.org/r/20250306075211.1855177-4-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21 16:36:34 -05:00
Alistair Francis
2311ab1820 PCI/DOE: Expose DOE features via sysfs
PCIe r6.0 added support for Data Object Exchange (DOE).  When DOE is
supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec
6.30.1.1. DOE allows a requester to obtain information about the other DOE
features supported by the device.

The kernel already queries the DOE features supported and caches the
values.  Expose the values in sysfs to allow user space to determine which
DOE features are supported by the PCIe device.

By exposing the information to userspace, tools like lspci can relay the
information to users. By listing all of the supported features we can allow
userspace to parse the list, which might include vendor specific features
as well as yet to be supported features.

As the DOE Discovery feature must always be supported we treat it as a
special named attribute case. This allows the usual PCI attribute_group
handling to correctly create the doe_features directory when registering
pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group()
will seg fault).

After this patch is supported you can see something like this when
attaching a DOE device:

  $ ls /sys/devices/pci0000:00/0000:00:02.0//doe*
  0001:01        0001:02        doe_discovery

Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
[bhelgaas: drop pci_doe_sysfs_init() stub return, make
DEVICE_ATTR_RO(doe_discovery) static]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21 16:36:01 -05:00
Niklas Schnelle
888bd8322d s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
The ability to map PCI resources to user-space is controlled by global
defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and
controls mapping of PCI resources using vfio-pci with a fallback option via
the pread()/pwrite() interface.

For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a
generic implementation for mapping PCI resources plus the newer sysfs
interface. Then there is HAVE_PCI_MMAP which can be used with custom
definitions of pci_mmap_resource_range() and the historical /proc/bus/pci
interface. Both mechanisms are all or nothing.

For s390 mapping PCI resources is possible and useful for testing and
certain applications such as QEMU's vfio-pci based user-space NVMe driver.
For certain devices, however access to PCI resources via mappings to
user-space is not possible and these must be excluded from the general PCI
resource mapping mechanisms.

Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can
not be accessed via mappings to user-space. In the future this enables
per-device restrictions of PCI resource mapping.

For now, set this flag for all PCI devices on s390 in line with the
existing, general disable of PCI resource mapping. As s390 is the only user
of the VFI_PCI_MMAP Kconfig options this can already be replaced with a
check of this new flag. Also add similar checks in the other code protected
by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for
enabling these for supported devices.

Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21 14:54:16 -05:00
Shay Drory
04d50d953a PCI: Fix NULL dereference in SR-IOV VF creation error path
Clean up when virtfn setup fails to prevent NULL pointer dereference
during device removal. The kernel oops below occurred due to incorrect
error handling flow when pci_setup_device() fails.

Add pci_iov_scan_device(), which handles virtfn allocation and setup and
cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need
to call pci_stop_and_remove_bus_device().  This prevents accessing
partially initialized virtfn devices during removal.

  BUG: kernel NULL pointer dereference, address: 00000000000000d0
  RIP: 0010:device_del+0x3d/0x3d0
  Call Trace:
   pci_remove_bus_device+0x7c/0x100
   pci_iov_add_virtfn+0xfa/0x200
   sriov_enable+0x208/0x420
   mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core]
   sriov_numvfs_store+0xae/0x1a0

Link: https://lore.kernel.org/r/20250310084524.599225-1-shayd@nvidia.com
Fixes: e3f30d563a ("PCI: Make pci_destroy_dev() concurrent safe")
Signed-off-by: Shay Drory <shayd@nvidia.com>
[bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <kbusch@kernel.org>
2025-03-21 14:54:16 -05:00
Ilpo Järvinen
026e4bffb0 PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return
it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply
assign the fallback speed (2.5GT/s) into supported_speeds variable to share
the normal return path that calls pcie_supported_speeds2target_speed() to
calculate __fls().

This code path is not very likely to execute because
pcie_get_supported_speeds() should provide valid ->supported_speeds but a
spec violating device could fail to synthesize any speed in
pcie_get_supported_speeds(). It could also happen in case the
supported_speeds intersection is empty (also a violation of the current
PCIe specs).

Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com
Fixes: de9a6c8d5d ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21 12:11:38 -05:00
Ilpo Järvinen
527664f738 PCI: pciehp: Don't enable HPIE when resuming in poll mode
PCIe hotplug can operate in poll mode without interrupt handlers using a
polling kthread only.  eb34da60ed ("PCI: pciehp: Disable hotplug
interrupt during suspend") failed to consider that and enables HPIE
(Hot-Plug Interrupt Enable) unconditionally when resuming the Port.

Only set HPIE if non-poll mode is in use. This makes
pcie_enable_interrupt() match how pcie_enable_notification() already
handles HPIE.

Link: https://lore.kernel.org/r/20250321162114.3939-1-ilpo.jarvinen@linux.intel.com
Fixes: eb34da60ed ("PCI: pciehp: Disable hotplug interrupt during suspend")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2025-03-21 11:33:49 -05:00
Roger Pau Monne
c3164d2e0d PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.

However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip).  One example is devices behind a VMD PCI
bridge.  In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table.  The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s).  Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.

Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis.  Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.

Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.

This fixes using devices behind a VMD bridge on Xen PV hardware domains.

Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them.  By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain.  That's the whole point of the series.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-ID: <20250219092059.90850-4-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-21 08:57:57 +01:00
Roger Pau Monne
6c4d5aadf5 PCI: vmd: Disable MSI remapping bypass under Xen
MSI remapping bypass (directly configuring MSI entries for devices on the
VMD bus) won't work under Xen, as Xen is not aware of devices in such bus,
and hence cannot configure the entries using the pIRQ interface in the PV
case, and in the PVH case traps won't be setup for MSI entries for such
devices.

Until Xen is aware of devices in the VMD bus prevent the
VMD_FEAT_CAN_BYPASS_MSI_REMAP capability from being used when running as
any kind of Xen guest.

The MSI remapping bypass is an optional feature of VMD bridges, and hence
when running under Xen it will be masked and devices will be forced to
redirect its interrupts from the VMD bridge.  That mode of operation must
always be supported by VMD bridges and works when Xen is not aware of
devices behind the VMD bridge.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-ID: <20250219092059.90850-3-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-21 08:57:52 +01:00
Ilpo Järvinen
cc7a371b0b PCI: Move cardbus IO size declarations into pci/pci.h
For some reason, cardbus related io/mem size declarations are in
linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h.
Move all them into one place in pci/pci.h.

Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20 16:44:26 -05:00
Ilpo Järvinen
2f255e299c PCI: Make pci_setup_bridge() static
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a
static function.

Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20 16:44:26 -05:00
Ilpo Järvinen
7d4bcc0f26 PCI: Move resource reassignment func declarations into pci/pci.h
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used
outside of the PCI subsystem. They seem to be naturally static functions
but since resource fitting/assignment is split between setup-bus.c and
setup-res.c, they fall into different sides of the divide and need to be
declared.

Move the declarations of pci_reassign_bridge_resources() and
pci_reassign_resource() into pci/pci.h to keep them internal to PCI
subsystem.

Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20 16:44:25 -05:00
Ilpo Järvinen
95c4e6d42c PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem.
The comment also falsely advertises it to be for hotplug drivers, yet the
only caller is from sysfs store function. Move the function declaration
into pci/pci.h.

Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20 16:44:25 -05:00
Ilpo Järvinen
9ec19bfa78 PCI: Fix BAR resizing when VF BARs are assigned
__resource_resize_store() attempts to release all resources of the device
before attempting the resize. The loop, however, only covers standard BARs
(< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned,
pci_reassign_bridge_resources() finds the bridge window still has some
assigned child resources and returns -NOENT which makes
pci_resize_resource() to detect an error and abort the resize.

Change the release loop to cover all resources up to VF BARs which allows
the resize operation to release the bridge windows and attempt to assigned
them again with the different size.

If SR-IOV is enabled, disallow resize as it requires releasing also IOV
resources.

Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com
Fixes: 91fa127794 ("PCI: Expose PCIe Resizable BAR support via sysfs")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2025-03-20 16:44:11 -05:00
Manivannan Sadhasivam
a5fb3ff632 PCI: Allow PCI bridges to go to D3Hot on all non-x86
Currently, pci_bridge_d3_possible() encodes a variety of decision factors
when deciding whether a given bridge can be put into D3. A particular one
of note is for "recent enough PCIe ports." Per Rafael [0]:

  "There were hardware issues related to PM on x86 platforms predating
   the introduction of Connected Standby in Windows.  For instance,
   programming a port into D3hot by writing to its PMCSR might cause the
   PCIe link behind it to go down and the only way to revive it was to
   power cycle the Root Complex.  And similar."

Thus, this function contains a DMI-based check for post-2015 BIOS.

The above factors (Windows, x86) don't really apply to non-x86 systems, and
also, many such systems don't have BIOS or DMI. However, we'd like to be
able to suspend bridges on non-x86 systems too.

Restrict the "recent enough" check to x86. If we find further
incompatibilities, it probably makes sense to expand on the deny-list
approach (i.e., bridge_d3_blacklist or similar).

Link: https://lore.kernel.org/r/20250320110604.v6.1.Id0a0e78ab0421b6bce51c4b0b87e6aebdfc69ec7@changeid
Link: https://lore.kernel.org/linux-pci/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ [0]
Link: https://lore.kernel.org/linux-pci/20240227225442.GA249898@bhelgaas/ [1]
Link: https://lore.kernel.org/linux-pci/20240828210705.GA37859@bhelgaas/ [2]
[Brian: rewrite to !X86 based on Rafael's suggestions]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20 16:33:33 -05:00
Ryo Takakura
18056a4866 PCI: vmd: Make vmd_dev::cfg_lock a raw_spinlock_t type
The access to the PCI config space via pci_ops::read and pci_ops::write is
a low-level hardware access. The functions can be accessed with disabled
interrupts even on PREEMPT_RT. The pci_lock is a raw_spinlock_t for this
purpose.

A spinlock_t becomes a sleeping lock on PREEMPT_RT, so it cannot be
acquired with disabled interrupts. The vmd_dev::cfg_lock is accessed in
the same context as the pci_lock.

Make vmd_dev::cfg_lock a raw_spinlock_t type so it can be used with
interrupts disabled.

This was reported as:

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
  Call Trace:
   rt_spin_lock+0x4e/0x130
   vmd_pci_read+0x8d/0x100 [vmd]
   pci_user_read_config_byte+0x6f/0xe0
   pci_read_config+0xfe/0x290
   sysfs_kf_bin_read+0x68/0x90

Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
[bigeasy: reword commit message]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/r/20250218080830.ufw3IgyX@linutronix.de
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: add back report info from
https://lore.kernel.org/lkml/20241218115951.83062-1-ryotkkr98@gmail.com/]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-18 20:05:06 -05:00
Alistair Popple
82ba975e4c mm: allow compound zone device pages
Zone device pages are used to represent various type of device memory
managed by device drivers.  Currently compound zone device pages are not
supported.  This is because MEMORY_DEVICE_FS_DAX pages are the only user
of higher order zone device pages and have their own page reference
counting.

A future change will unify FS DAX reference counting with normal page
reference counting rules and remove the special FS DAX reference counting.
Supporting that requires compound zone device pages.

Supporting compound zone device pages requires compound_head() to
distinguish between head and tail pages whilst still preserving the
special struct page fields that are specific to zone device pages.

A tail page is distinguished by having bit zero being set in
page->compound_head, with the remaining bits pointing to the head page. 
For zone device pages page->compound_head is shared with page->pgmap.

The page->pgmap field must be common to all pages within a folio, even if
the folio spans memory sections.  Therefore pgmap is the same for both
head and tail pages and can be moved into the folio and we can use the
standard scheme to find compound_head from a tail page.

Link: https://lkml.kernel.org/r/67055d772e6102accf85161d0b57b0b3944292bf.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:39 -07:00
Alistair Popple
b7e2823787 mm/mm_init: move p2pdma page refcount initialisation to p2pdma
Currently ZONE_DEVICE page reference counts are initialised by core memory
management code in __init_zone_device_page() as part of the memremap()
call which driver modules make to obtain ZONE_DEVICE pages.  This
initialises page refcounts to 1 before returning them to the driver.

This was presumably done because it drivers had a reference of sorts on
the page.  It also ensured the page could always be mapped with
vm_insert_page() for example and would never get freed (ie.  have a zero
refcount), freeing drivers of manipulating page reference counts.

However it complicates figuring out whether or not a page is free from the
mm perspective because it is no longer possible to just look at the
refcount.  Instead the page type must be known and if GUP is used a
secondary pgmap reference is also sometimes needed.

To simplify this it is desirable to remove the page reference count for
the driver, so core mm can just use the refcount without having to account
for page type or do other types of tracking.  This is possible because
drivers can always assume the page is valid as core kernel will never
offline or remove the struct page.

This means it is now up to drivers to initialise the page refcount as
required.  P2PDMA uses vm_insert_page() to map the page, and that requires
a non-zero reference count when initialising the page so set that when the
page is first mapped.

Link: https://lkml.kernel.org/r/6aedb0ac2886dcc4503cb705273db5b3863a0b66.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:38 -07:00
Bjorn Helgaas
2ce107e064 PCI: dwc: Consolidate devicetree handling in dw_pcie_host_get_resources()
Consolidate devicetree resource handling in dw_pcie_host_get_resources().
No functional change intended.

Link: https://lore.kernel.org/r/20250315201548.858189-5-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-03-17 14:18:53 -05:00
Frank Li
84f37c43d5 PCI: dwc: Call devm_pci_alloc_host_bridge() early in dw_pcie_host_init()
Move devm_pci_alloc_host_bridge() to the beginning of dw_pcie_host_init().

devm_pci_alloc_host_bridge() is generic code that doesn't depend on any DWC
resource, so moving it earlier keeps all the subsequent devicetree-related
code together.

[bhelgaas: reorder earlier in series]

Link: https://lore.kernel.org/r/20250315201548.858189-4-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-17 14:18:53 -05:00
Frank Li
513ef9c496 PCI: dwc: Rename cpu_addr to parent_bus_addr for ATU configuration
Rename 'cpu_addr' to 'parent_bus_addr' in the DesignWare ATU configuration.

The ATU translates parent bus addresses to PCI addresses, which are often
the same as CPU addresses but can differ in systems where the bus fabric
translates addresses before passing them to the PCIe controller. This
renaming clarifies the purpose and avoids confusion.

Link: https://lore.kernel.org/r/20250315201548.858189-3-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-17 14:18:53 -05:00
Frank Li
8f4a489b37 PCI: dwc: Use resource start as ioremap() input in dw_pcie_pme_turn_off()
The msg_res region translates writes into PCIe Message TLPs. Previously we
mapped this region using atu.cpu_addr, the input address programmed into
the ATU.

"cpu_addr" is a misnomer because when a bus fabric translates addresses
between the CPU and the ATU, the ATU input address is different from the
CPU address.  A future patch will rename "cpu_addr" and correct the value
to be the ATU input address instead of the CPU physical address.

Map the msg_res region before writing to it using the msg_res resource
start, a CPU physical address.

Link: https://lore.kernel.org/r/20250315201548.858189-2-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-17 14:18:14 -05:00
Christophe JAILLET
b36fb50701
PCI: histb: Fix an error handling path in histb_pcie_probe()
If an error occurs after a successful phy_init() call, then phy_exit()
should be called.

Add the missing call, as already done in the remove function.

Fixes: bbd11bddb3 ("PCI: hisi: Add HiSilicon STB SoC PCIe controller driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[kwilczynski: remove unnecessary hipcie->phy NULL check from
histb_pcie_probe() and squash a patch that removes similar NULL
check for hipcie-phy from histb_pcie_remove() from
https://lore.kernel.org/linux-pci/c369b5d25e17a44984ae5a889ccc28a59a0737f7.1742058005.git.christophe.jaillet@wanadoo.fr]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/8301fc15cdea5d2dac21f57613e8e6922fb1ad95.1740854531.git.christophe.jaillet@wanadoo.fr
2025-03-16 11:49:19 +00:00
Richard Zhu
f6a1fdfc78 PCI: imx6: Use devm_clk_bulk_get_all() to fetch clocks
Use devm_clk_bulk_get_all() helper to simplify clock handle code.

No functional changes intended.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[kwilczynski: commit log, refactor to use dev_err_probe()]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250226025628.1681206-1-hongxing.zhu@nxp.com
2025-03-15 13:13:09 -05:00
Richard Zhu
81d1d214e1 PCI: imx6: Identify controller via 'linux,pci-domain', not address
Instead of testing the controller register address to distinguish
controller 1 from controller 0 on i.MX8MQ platforms, use the PCI domain
number, which comes from the devicetree 'linux,pci-domain' property.

All relevant devicetrees should already supply 'linux,pci-domain', which
was added by c0b70f05c8 ("arm64: dts: imx8mq: use_dt_domains for pci
node").

Instead of being set directly in imx_pcie_probe(), pci->dbi_base will be
set by the DWC core in dw_pcie_get_resources().

No functional changes intended.

Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250226024256.1678103-3-hongxing.zhu@nxp.com
2025-03-15 13:12:00 -05:00
Niklas Cassel
1f5a69f1b3
PCI: dw-rockchip: Hide broken ATS capability for RK3588 running in EP mode
When running the RK3588 in Endpoint mode, with an Intel host with IOMMU
enabled, the host side prints:

  DMAR: VT-d detected Invalidation Time-out Error: SID 0

When running the RK3588 in Endpoint mode, with an AMD host with IOMMU
enabled, the host side prints:

  iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01a0]

Rockchip has confirmed that the ATS support for RK3588 only works when
running the PCIe controller in Root Complex (RC) mode, see:

  https://lore.kernel.org/linux-pci/93cdce39-1ae6-4939-a3fc-db10be7564e5@rock-chips.com

Usually, to handle these issues, we add a quirk for the PCI vendor and
device ID in drivers/pci/quirks.c with quirk_no_ats(). That is because
we cannot usually modify the capabilities on the EP side. In this case,
we can modify the capabilities on the EP side.

Thus, hide the broken ATS capability on RK3588 when running in EP mode.

That way, we don't need any quirk on the host side, and we see no errors
on the host side, and we can run pci_endpoint_test successfully, with
the IOMMU enabled on the host side.

Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[kwilczynski: commit log, tidy up code comments and error message]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250310094826.842681-6-cassel@kernel.org
2025-03-14 16:13:19 +00:00
Niklas Cassel
e3d6957f17
PCI: dwc: ep: Add dw_pcie_ep_hide_ext_capability()
Add dw_pcie_ep_hide_ext_capability() which can be used by an endpoint
controller driver to hide a capability.

This can be useful to hide a capability that is buggy, such that the
host side does not try to enable the buggy capability.

Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310094826.842681-5-cassel@kernel.org
2025-03-14 16:13:19 +00:00
Dan Carpenter
8189aa56db
PCI: dwc: ep: Return -ENOMEM for allocation failures
If the bitmap or memory allocations fail, then dw_pcie_ep_init_registers()
will incorrectly return a success.

Return -ENOMEM instead.

Fixes: 869bc52534 ("PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/36dcb6fc-f292-4dd5-bd45-a8c6f9dc3df7@stanley.mountain
2025-03-14 16:13:03 +00:00
Philipp Stanner
b1a7f99967 PCI: Check BAR index for validity
Many functions in PCI use accessor macros such as pci_resource_len(),
which take a BAR index. That index, however, is never checked for
validity, potentially resulting in undefined behavior by overflowing the
array pci_dev.resource in the macro pci_resource_n().

Since many users of those macros directly assign the accessed value to
an unsigned integer, the macros cannot be changed easily anymore to
return -EINVAL for invalid indexes. Consequently, the problem has to be
mitigated in higher layers.

Add pci_bar_index_valid(). Use it where appropriate.

Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org
Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/
Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: correct if-statement condition the pci_bar_index_is_valid()
helper function uses, tidy up code comments]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix typo]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-14 10:35:12 -05:00
Lukas Wunner
e3260237aa PCI: pciehp: Avoid unnecessary device replacement check
Hot-removal of nested PCI hotplug ports suffers from a long-standing race
condition which can lead to a deadlock:  A parent hotplug port acquires
pci_lock_rescan_remove(), then waits for pciehp to unbind from a child
hotplug port.  Meanwhile that child hotplug port tries to acquire
pci_lock_rescan_remove() as well in order to remove its own children.

The deadlock only occurs if the parent acquires pci_lock_rescan_remove()
first, not if the child happens to acquire it first.

Several workarounds to avoid the issue have been proposed and discarded
over the years, e.g.:

https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/

A proper fix is being worked on, but needs more time as it is nontrivial
and necessarily intrusive.

Recent commit 9d573d1954 ("PCI: pciehp: Detect device replacement during
system sleep") provokes more frequent occurrence of the deadlock when
removing more than one Thunderbolt device during system sleep.  The commit
sought to detect device replacement, but also triggered on device removal.
Differentiating reliably between replacement and removal is impossible
because pci_get_dsn() returns 0 both if the device was removed, as well as
if it was replaced with one lacking a Device Serial Number.

Avoid the more frequent occurrence of the deadlock by checking whether the
hotplug port itself was hot-removed.  If so, there's no sense in checking
whether its child device was replaced.

This works because the ->resume_noirq() callback is invoked in top-down
order for the entire hierarchy:  A parent hotplug port detecting device
replacement (or removal) marks all children as removed using
pci_dev_set_disconnected() and a child hotplug port can then reliably
detect being removed.

Link: https://lore.kernel.org/r/02f166e24c87d6cde4085865cce9adfdfd969688.1741674172.git.lukas@wunner.de
Fixes: 9d573d1954 ("PCI: pciehp: Detect device replacement during system sleep")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@panix.com/
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@canonical.com/
Tested-by: Kenneth Crudup <kenny@panix.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v6.11+
2025-03-14 10:17:17 -05:00
Philipp Stanner
f09d3937d4
PCI: Fix wrong length of devres array
The array for the iomapping cookie addresses has a length of
PCI_STD_NUM_BARS. This constant, however, only describes standard BARs;
while PCI can allow for additional, special BARs.

The total number of PCI resources is described by constant
PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars().

Thus, the devres array has so far been too small.

Change the length of the devres array to PCI_NUM_RESOURCES.

Link: https://lore.kernel.org/r/20250312080634.13731-3-phasta@kernel.org
Fixes: bbaff68bf4 ("PCI: Add managed partial-BAR request and map infrastructure")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Cc: stable@vger.kernel.org	# v6.11+
2025-03-14 09:22:24 +00:00
Thomas Gleixner
79273d0a40 PCI/TPH: Replace the broken MSI-X control word update
The driver walks the MSI descriptors to test whether a descriptor exists
for a given index. That's just abuse of the MSI internals.

The same test can be done with a single function call by looking up whether
there is a Linux interrupt number assigned at the index.

What's worse is that the function is completely unserialized against
modifications of the MSI-X control by operations issued from the interrupt
core. It also brings the PCI/MSI-X internal cached control word out of
sync.

Remove the trainwreck and invoke the function provided by the PCI/MSI core
to update it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.898592817@linutronix.de
2025-03-13 18:58:00 +01:00
Thomas Gleixner
b9db8df433 PCI/MSI: Provide a sane mechanism for TPH
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.

Provide a function, which has the required serialization and keeps the
control word cache in sync.

Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.822790423@linutronix.de
2025-03-13 18:58:00 +01:00
Thomas Gleixner
50410bad27 PCI: hv: Switch MSI descriptor locking to guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.758905320@linutronix.de
2025-03-13 18:58:00 +01:00
Thomas Gleixner
1bc7e262a2 PCI/MSI: Switch to MSI descriptor locking to guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/all/20250313130321.695027112@linutronix.de
2025-03-13 18:58:00 +01:00
Thippeswamy Havalige
ad3b7174d4
PCI: xilinx-cpm: Add support for Versal Net CPM5NC Root Port controller
The Versal Net ACAP (Adaptive Compute Acceleration Platform) devices
incorporate the Coherency and PCIe Gen5 Module, specifically the
Next-Generation Compact Module (CPM5NC).

The integrated CPM5NC block, along with the built-in bridge, can function
as a PCIe Root Port and supports the PCIe Gen5 protocol with data transfer
rates of up to 32 GT/s, and is capable of supporting up to a x16 lane-width
configuration.

Bridge errors are managed using a specific interrupt line designed for
CPM5N. The INTx interrupt support is not available.

Currently in this commit platform specific bridge errors support is not
added.

Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com>
[kwilczynski: commit log, squashed patch to fix an if-statement condition
to ensure that xilinx_cpm_pcie_init_port() does not run on the CPM5NC_HOST
variant from https://lore.kernel.org/linux-pci/20250311072402.1049990-1-thippeswamy.havalige@amd.com]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250224155025.782179-4-thippeswamy.havalige@amd.com
2025-03-11 14:34:48 +00:00
Thippeswamy Havalige
57b0302240
PCI: xilinx-cpm: Fix IRQ domain leak in error path of probe
The IRQ domain allocated for the PCIe controller is not freed if
resource_list_first_type() returns NULL, leading to a resource leak.

This fix ensures properly cleaning up the allocated IRQ domain in
the error path.

Fixes: 49e427e6bd ("Merge branch 'pci/host-probe-refactor'")
Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com>
[kwilczynski: added missing Fixes: tag, refactored to use one of the goto labels]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250224155025.782179-2-thippeswamy.havalige@amd.com
2025-03-11 14:34:36 +00:00
Robin Murphy
bcb81ac6ae iommu: Get DT/ACPI parsing into the proper probe path
In hindsight, there were some crucial subtleties overlooked when moving
{of,acpi}_dma_configure() to driver probe time to allow waiting for
IOMMU drivers with -EPROBE_DEFER, and these have become an
ever-increasing source of problems. The IOMMU API has some fundamental
assumptions that iommu_probe_device() is called for every device added
to the system, in the order in which they are added. Calling it in a
random order or not at all dependent on driver binding leads to
malformed groups, a potential lack of isolation for devices with no
driver, and all manner of unexpected concurrency and race conditions.
We've attempted to mitigate the latter with point-fix bodges like
iommu_probe_device_lock, but it's a losing battle and the time has come
to bite the bullet and address the true source of the problem instead.

The crux of the matter is that the firmware parsing actually serves two
distinct purposes; one is identifying the IOMMU instance associated with
a device so we can check its availability, the second is actually
telling that instance about the relevant firmware-provided data for the
device. However the latter also depends on the former, and at the time
there was no good place to defer and retry that separately from the
availability check we also wanted for client driver probe.

Nowadays, though, we have a proper notion of multiple IOMMU instances in
the core API itself, and each one gets a chance to probe its own devices
upon registration, so we can finally make that work as intended for
DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
be able to run the iommu_fwspec machinery currently buried deep in the
wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
surprisingly straightforward to bootstrap this transformation by pretty
much just calling the same path twice. At client driver probe time,
dev->driver is obviously set; conversely at device_add(), or a
subsequent bus_iommu_probe(), any device waiting for an IOMMU really
should *not* have a driver already, so we can use that as a condition to
disambiguate the two cases, and avoid recursing back into the IOMMU core
at the wrong times.

Obviously this isn't the nicest thing, but for now it gives us a
functional baseline to then unpick the layers in between without many
more awkward cross-subsystem patches. There are some minor side-effects
like dma_range_map potentially being created earlier, and some debug
prints being repeated, but these aren't significantly detrimental. Let's
make things work first, then deal with making them nice.

With the basic flow finally in the right order again, the next step is
probably turning the bus->dma_configure paths inside-out, since all we
really need from bus code is its notion of which device and input ID(s)
to parse the common firmware properties with...

Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-11 14:05:43 +01:00
Dan Carpenter
6e8d06e509 PCI: Remove stray put_device() in pci_register_host_bridge()
This put_device() was accidentally left over from when we changed the code
from using device_register() to calling device_add().  Delete it.

Link: https://lore.kernel.org/r/55b24870-89fb-4c91-b85d-744e35db53c2@stanley.mountain
Fixes: 9885440b16 ("PCI: Fix pci_host_bridge struct device release/free handling")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-10 13:41:48 -05:00
Ma Ke
1f2768b6a3 PCI: Fix reference leak in pci_alloc_child_bus()
If device_register(&child->dev) fails, call put_device() to explicitly
release child->dev, per the comment at device_register().

Found by code review.

Link: https://lore.kernel.org/r/20250202062357.872971-1-make24@iscas.ac.cn
Fixes: 4f535093cf ("PCI: Put pci_dev in device tree as early as possible")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
2025-03-10 13:41:48 -05:00
Ma Ke
804443c1f2 PCI: Fix reference leak in pci_register_host_bridge()
If device_register() fails, call put_device() to give up the reference to
avoid a memory leak, per the comment at device_register().

Found by code review.

Link: https://lore.kernel.org/r/20250225021440.3130264-1-make24@iscas.ac.cn
Fixes: 37d6a0a6f4 ("PCI: Add pci_register_host_bridge() interface")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
[bhelgaas: squash Dan Carpenter's double free fix from
https://lore.kernel.org/r/db806a6c-a91b-4e5a-a84b-6b7e01bdac85@stanley.mountain]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
2025-03-10 13:41:48 -05:00
Bjorn Helgaas
a7eb9124d9 PCI: Cache offset of Resizable BAR capability
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(),
pci_rebar_set_size(), etc) as well as pci_restore_state() searched config
space for a Resizable BAR capability.  Most devices don't have such a
capability, so this is wasted effort, especially for pci_restore_state().

Search for a Resizable BAR capability once at enumeration-time and cache
the offset so we don't have to search every time we need it.  No functional
change intended.

Link: https://lore.kernel.org/r/20250215000301.175097-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-10 13:41:47 -05:00
Bjorn Helgaas
3f8c4959fc PCI: Enable Configuration RRS SV early
Following a reset, a Function may respond to Config Requests with Request
Retry Status (RRS) Completion Status to indicate that it is temporarily
unable to process the Request, but will be able to process the Request in
the future (PCIe r6.0, sec 2.3.1).

If the Configuration RRS Software Visibility feature is enabled and a Root
Complex receives RRS for a config read of the Vendor ID, the Root Complex
completes the Request to the host by returning PCI_VENDOR_ID_PCI_SIG,
0x0001 (sec 2.3.2).

The Config RRS SV feature applies only to Root Ports and is not directly
related to pci_scan_bridge_extend().  Move the RRS SV enable to
set_pcie_port_type() where we handle other PCIe-specific configuration.

Link: https://lore.kernel.org/r/20250303210217.199504-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-10 13:41:39 -05:00
Bjorn Helgaas
f4e026f454 PCI: Fix typos
Fix typos and whitespace errors.

Link: https://lore.kernel.org/r/20250307231715.438518-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-08 15:08:45 -06:00
Niklas Cassel
a60a708420
PCI: dwc: ep: Remove superfluous function dw_pcie_ep_find_ext_capability()
Remove the superfluous function dw_pcie_ep_find_ext_capability(),
as it is virtually identical to dw_pcie_find_ext_capability().

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250221202646.395252-3-cassel@kernel.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:47:35 +00:00
Christian Bruel
934e9d137d
PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops
Fix a kernel oops found while testing the stm32_pcie Endpoint driver
with handling of PERST# deassertion:

During EP initialization, pci_epf_test_alloc_space() allocates all BARs,
which are further freed if epc_set_bar() fails (for instance, due to no
free inbound window).

However, when pci_epc_set_bar() fails, the error path:

  pci_epc_set_bar() ->
    pci_epf_free_space()

does not clear the previous assignment to epf_test->reg[bar].

Then, if the host reboots, the PERST# deassertion restarts the BAR
allocation sequence with the same allocation failure (no free inbound
window), creating a double free situation since epf_test->reg[bar] was
deallocated and is still non-NULL.

Thus, make sure that pci_epf_alloc_space() and pci_epf_free_space()
invocations are symmetric, and as such, set epf_test->reg[bar] to NULL
when memory is freed.

Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
Link: https://lore.kernel.org/r/20250124123043.96112-1-christian.bruel@foss.st.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:47:35 +00:00
Zijun Hu
22a01177c3
PCI: endpoint: Remove unused devm_pci_epc_destroy()
The static function devm_pci_epc_match() is only invoked within the
devm_pci_epc_destroy(). However, since it was initially introduced,
this new API has had no callers.

Thus, remove both the unused API and the static function.

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250217-remove_api-v2-1-b169c9117045@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:47:31 +00:00
Niklas Cassel
aba2b17810
PCI: dw-rockchip: Describe Resizable BARs as Resizable BARs
Looking at section "11.4.4.29 USP_PCIE_RESBAR Registers Summary" in the
Technical Reference Manual (TRM) for RK3588, we can see that none of the
BARs are Fixed BARs, but actually Resizable BARs.

I couldn't find any reference in the TRM for RK3568, but looking at the
downstream PCIe endpoint driver, both RK3568 and RK3588 are treated as
the same, so the BARs on RK3568 must also be Resizable BARs.

Now when we actually have support for Resizable BARs, let's configure
these BARs as such.

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250131182949.465530-16-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:47:31 +00:00
Niklas Cassel
a2fa5f9614
PCI: keystone: Specify correct alignment requirement
The support for a specific iATU alignment was added in
commit 2a9a801620 ("PCI: endpoint: Add support to specify alignment for
buffers allocated to BARs").

This commit specifically mentions both that the alignment by each DWC
based EP driver should match CX_ATU_MIN_REGION_SIZE, and that AM65x
specifically has a 64 KB alignment.

This also matches the CX_ATU_MIN_REGION_SIZE value specified in the
section "12.2.2.4.7 PCIe Subsystem Address Translation" of the Technical
Reference Manual (TRM) for AM65x:

  https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf

This higher value, 1 MB, was obviously an ugly hack used to be able to
handle Resizable BARs which have a minimum size of 1 MB.

Now when we actually have support for Resizable BARs, let's configure the
iATU alignment requirement to the actual requirement.
(BARs described as Resizable will still get aligned to 1 MB.)

Cc: stable+noautosel@kernel.org # Depends on PCI endpoint Resizable BARs series
Fixes: 23284ad677 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-15-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:47:25 +00:00
Niklas Cassel
6a6b66f7e6
PCI: keystone: Describe Resizable BARs as Resizable BARs
Looking at section "12.2.2.4.15 PCIe Subsystem BAR Configuration" in the
following Technical Reference Manual (TRM) for AM65x:

  https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf

We can see that BAR2 and BAR5 are not Fixed BARs, but actually Resizable
BARs.

Now when we actually have support for Resizable BARs, let's configure
these BARs as such.

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250131182949.465530-14-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:43:11 +00:00
Niklas Cassel
3a3d4cabe6
PCI: dwc: ep: Allow EPF drivers to configure the size of Resizable BARs
The DWC databook specifies three different BARn_SIZING_SCHEME_N as:

  - Fixed Mask (0)
  - Programmable Mask (1)
  - Resizable BAR (2)

Each of these sizing schemes have different instructions for how to
initialize the BAR.

The DWC driver currently does not support resizable BARs.

Instead, in order to somewhat support resizable BARs, the DWC EP driver
currently has an ugly hack that force sets a resizable BAR to 1 MB, if
such a BAR is detected.

Additionally, this hack only works if the DWC glue driver also has lied
in their EPC features, and claimed that the resizable BAR is a 1 MB fixed
size BAR.

This is unintuitive (as you somehow need to know that you need to lie in
your EPC features), but other than that it is overly restrictive, since a
resizable BAR is capable of supporting sizes different than 1 MB.

Add proper support for resizable BARs in the DWC EP driver.

Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells
the EPC driver how it wants to configure the BAR.

struct pci_epf_bar only has a single size struct member.

This means that an EPC driver will only be able to set a single supported
size. This is perfectly fine, as we do not need the complexity of allowing
a host to change the size of the BAR. If someone ever wants to support
resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the
future.

With these changes, we allow an EPF driver to configure the size of
Resizable BARs, rather than forcing them to a 1 MB size.

This means that an EPC driver does not need to lie in EPC features, and an
EPF driver will be able to set an arbitrary size (not be forced to a 1 MB
size), just like BAR_PROGRAMMABLE.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-13-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:43:08 +00:00
Niklas Cassel
30a172db9f
PCI: dwc: ep: Move dw_pcie_ep_find_ext_capability()
Move dw_pcie_ep_find_ext_capability() so that it is located next to
dw_pcie_ep_find_capability().

Additionally, a follow-up commit requires this to be defined earlier
in order to avoid a forward declaration.

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250131182949.465530-12-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:43:07 +00:00
Niklas Cassel
4eb208424c
PCI: endpoint: Add pci_epc_bar_size_to_rebar_cap()
Add a helper function to convert a size to the representation used by the
Resizable BAR Capability Register.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-11-cassel@kernel.org
[mani: squashed the change that added PCIe spec reference to comments
from https://lore.kernel.org/linux-pci/20250219171454.2903059-2-cassel@kernel.org]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:43:05 +00:00
Niklas Cassel
52132f3a63
PCI: endpoint: Allow EPF drivers to configure the size of Resizable BARs
A resizable BAR is different from a normal BAR in a few ways:

  - The minimum size of a resizable BAR is 1 MB.
  - Each BAR that is resizable has a Capability and Control register in
    the Resizable BAR Capability structure.

These registers contain the supported sizes and the currently selected
size of a resizable BAR.

The supported sizes is a bitmap of the supported sizes. The selected size
is a single value that is equal to one of the supported sizes.

A resizable BAR thus has to be configured differently than a
BAR_PROGRAMMABLE BAR, which usually sets the BAR size/mask in a vendor
specific way.

The PCI endpoint framework currently does not support resizable BARs.

Add a BAR type BAR_RESIZABLE, so that an EPC driver can support resizable
BARs properly.

Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells
the EPC driver how it wants to configure the BAR.

struct pci_epf_bar only has a single size struct member.

This means that an EPC driver will only be able to set a single supported
size. This is perfectly fine, as we do not need the complexity of allowing
a host to change the size of the BAR. If someone ever wants to support
resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the
future.

With these changes, we allow an EPF driver to configure the size of
Resizable BARs, rather than forcing them to a 1 MB size.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-10-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:43:02 +00:00
Niklas Cassel
3c936e0ec0
PCI: endpoint: pci-epf-test: Handle endianness properly
The struct pci_epf_test_reg is the actual data in pci-epf-test's test_reg
BAR (usually BAR0), which the host uses to send commands (etc.), and which
pci-epf-test uses to send back status codes.

pci-epf-test currently reads and writes this data without any endianness
conversion functions, which means that pci-epf-test is completely broken
on big-endian endpoint systems.

PCI devices are inherently little-endian, and the data stored in the PCI
BARs should be in little-endian.

Use endianness conversion functions when reading and writing data to
struct pci_epf_test_reg so that pci-epf-test will behave correctly on
big-endian endpoint systems.

Fixes: 349e7a85b2 ("PCI: endpoint: functions: Add an EP function to test PCI")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250127161242.104651-2-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:42:38 +00:00
Ilpo Järvinen
e4cb29386f PCI: Do not claim to release resource falsely
pci_release_resource() will print "... releasing" regardless of the
resource being assigned or not. Move the print after the res->parent check
to avoid claiming the kernel would be releasing an unassigned resource.

Likely, none of the current callers pass a resource that is unassigned so
this change is mostly to correct the non-sensical order than to remove
errorneous printouts.

Link: https://lore.kernel.org/r/20250307140922.5776-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-07 11:26:19 -06:00
Zhiyuan Dai
5af473941b PCI: Increase Resizable BAR support from 512 GB to 128 TB
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to
128 TB in the Resizable BAR Capability register.  Larger sizes can be
advertised via the Capability register, but that requires an API change.

Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the
sizes we currently support from 512 GB to 128 TB.

Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-07 11:23:34 -06:00
Alistair Francis
f810d17762 PCI/DOE: Rename Discovery Response Data Object Contents to type
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and
"Next Index" as the fields in the DOE Discovery Response Data Object. The
DOE driver currently uses both the terms 'type' and 'prot' for the second
element.

Rename all uses of the DOE Discovery Response Data Object to use 'type' as
the second element of the object header, instead of type/prot as it
currently is.

Link: https://lore.kernel.org/r/20250306075211.1855177-2-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-06 12:48:16 -06:00
Alistair Francis
b4db6be0ce PCI/DOE: Rename DOE protocol to feature
DOE r1.1 replaced all occurrences of "protocol" with the term "feature" or
"Data Object Type".  PCIe r6.1 incorporated that change.

Rename the existing terms protocol with feature.

Link: https://lore.kernel.org/r/20250306075211.1855177-1-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2025-03-06 12:47:57 -06:00
D M, Sharath Kumar
60f2ee5f14
PCI: altera: Add Agilex support
Add PCIe Root Port controller support for the Agilex family of chips.

The Agilex PCIe Hard IP has three variants that are mostly software
compatible, except for a couple register offsets. The P-Tile variant
supports Gen3/Gen4 1x16. The F-Tile variant supports Gen3/Gen4 4x4,
4x8, and 4x16. The R-Tile variant improves on the F-Tile variant by
adding Gen5 support.

To simplify the implementation of pci_ops read/write functions,
ep_{read/write}_cfg() callbacks were added to struct altera_pci_ops
to easily distinguish between hardware variants.

Signed-off-by: D M, Sharath Kumar <sharath.kumar.d.m@intel.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250221170452.875419-3-matthew.gerlach@linux.intel.com
[kwilczynski: tidy code comments]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 09:52:48 +00:00
Zhang Zekun
bffc72387a
PCI: tegra: Use helper function for_each_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.

Thus, use this helper to simplify the code.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240831040413.126417-7-zhangzekun11@huawei.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 09:31:45 +00:00
Zhang Zekun
f60b4e06a9
PCI: apple: Use helper function for_each_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.

Thus, use this helper to simplify the code.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240831040413.126417-6-zhangzekun11@huawei.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 09:31:45 +00:00
Zhang Zekun
8905f8b6f5
PCI: mt7621: Use helper function for_each_available_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.

Thus, use this helper to simplify the code.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240831040413.126417-5-zhangzekun11@huawei.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 09:31:45 +00:00
Zhang Zekun
a51adf82f8
PCI: mediatek: Use helper function for_each_available_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.

Thus, use this helper to simplify the code.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240831040413.126417-4-zhangzekun11@huawei.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 09:31:45 +00:00
Zhang Zekun
d523347854
PCI: kirin: Tidy up _probe() related function with dev_err_probe()
The combination of dev_err() and the returned error code could be
replaced by dev_err_probe() in driver's probe function.

Thus, convert the code to use dev_err_probe() to make code simpler.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240831040413.126417-3-zhangzekun11@huawei.com
[kwilczynski: commit log, return -ETIMEDOUT from hi3660_pcie_phy_start()
rather than -EINVAL for when the PIPE clock fails to become stable,
drop redundant dev->of_node NULL check]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 09:30:15 +00:00
Shawn Lin
20bbb083bb
PCI: Add Rockchip Vendor ID
Move PCI_VENDOR_ID_ROCKCHIP from pci_endpoint_test.c to pci_ids.h and
reuse it in pcie-rockchip-host.c.

Link: https://lore.kernel.org/r/20250218092120.2322784-2-cassel@kernel.org
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 08:55:54 +00:00
Hans Zhang
f0f3044d22
PCI: dwc: Add debugfs property to provide LTSSM status of the PCIe link
Add the debugfs property to provide a view of the current link's LTSSM
status from the Root Port device.

Signed-off-by: Hans Zhang <18255117159@163.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Niklas Cassel <cassel@kernel.org>
Tested-by: Hrishikesh Deleep <hrishikesh.d@samsung.com>
Link: https://lore.kernel.org/r/20250223141848.231232-1-18255117159@163.com
[kwilczynski: commit log, refactor dw_ltssm_sts_string() to avoid
compilation errors on platforms that do not set CONFIG_PCIE_DW_HOST]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 08:55:54 +00:00
Shradha Todi
27491ac2cc
PCI: dwc: Add debugfs based Statistical Counter support for DWC
Add support to provide Statistical Counter interface to userspace.

This set of debug registers are part of the RAS DES feature present in
DesignWare PCIe controllers.

Co-developed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Shradha Todi <shradha.t@samsung.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Tested-by: Hrishikesh Deleep <hrishikesh.d@samsung.com>
Link: https://lore.kernel.org/r/20250221131548.59616-6-shradha.t@samsung.com
[kwilczynski: commit log, tidy up code comments, update documentation,
squashed patch that checks if the event counter is supported from
https://lore.kernel.org/linux-pci/20250225171239.19574-3-manivannan.sadhasivam@linaro.org]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 08:55:53 +00:00
Shradha Todi
d20ee8e2db
PCI: dwc: Add debugfs based Error Injection support for DWC
Add support to provide Error Injection interface to userspace.

This set of debug registers are part of the RAS DES feature present in
DesignWare PCIe controllers.

Signed-off-by: Shradha Todi <shradha.t@samsung.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Hrishikesh Deleep <hrishikesh.d@samsung.com>
Link: https://lore.kernel.org/r/20250221131548.59616-5-shradha.t@samsung.com
[kwilczynski: commit log, tidy up code comments, update documentation,
change debugfs property name from "duplicate_dllp" to "duplicate_tlp"]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 08:55:53 +00:00
Shradha Todi
4fbfa17f9a
PCI: dwc: Add debugfs based Silicon Debug support for DWC
Add support to provide Silicon Debug interface to userspace.

This set of debug registers are part of the RAS DES feature present in
DesignWare PCIe controllers.

Co-developed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Shradha Todi <shradha.t@samsung.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Hrishikesh Deleep <hrishikesh.d@samsung.com>
Link: https://lore.kernel.org/r/20250221131548.59616-4-shradha.t@samsung.com
[kwilczynski: commit log, tidy up Kconfig and drop "default y", tidy up
code comments, squashed patch that fixes a NULL pointer dereference when
debugfs is already unavailable during clean-up from
https://lore.kernel.org/linux-pci/20250225171239.19574-2-manivannan.sadhasivam@linaro.org,
refactor dwc_pcie_debugfs_init() to not return errors, squashed patch that
changes how lack of the RAS DES capability is handled from
https://lore.kernel.org/linux-pci/20250304151814.6xu7cbpwpqrvcad5@thinkpad]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06 08:55:47 +00:00
Charles Han
98e87cc501
PCI: mediatek-gen3: Fix inconsistent indentation
Fix the following inconsistent indentation warning:

  drivers/pci/controller/pcie-mediatek-gen3.c:922 mtk_pcie_parse_port() warn: inconsistent indenting

Found using Smatch.  No functional changes intended.

Signed-off-by: Charles Han <hanchunchao@inspur.com>
Link: https://lore.kernel.org/r/20250305070022.4668-1-hanchunchao@inspur.com
[kwilczynski: commit log, refactor if-statement around num_lanes to
make it more readable, wrap overly long lines to fit 80 colums]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-05 20:30:51 +00:00
Zhang Zekun
9a0f3c50bd
PCI: kirin: Use helper function for_each_available_child_of_node_scoped()
The for_each_available_child_of_node_scoped() helper provides
a scope-based clean-up functionality to put the device_node
automatically, and as such, there is no need to call of_node_put()
directly.

Thus, use this helper to simplify the code.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240831040413.126417-2-zhangzekun11@huawei.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-05 10:44:44 +00:00
Nishanth Aravamudan
479380efe1 PCI: Avoid reset when disabled via sysfs
After d88f521da3 ("PCI: Allow userspace to query and set device reset
mechanism"), userspace can disable reset of specific PCI devices by writing
an empty string to the sysfs reset_method file.

However, pci_slot_resettable() does not check pci_reset_supported(), which
means that pci_reset_function() will still reset the device even if
userspace has disabled all the reset methods.

I was able to reproduce this issue with a vfio device passed to a qemu
guest, where I had disabled PCI reset via sysfs.

Add an explicit check of pci_reset_supported() in both
pci_slot_resettable() and pci_bus_resettable() to ensure both the reset
status and reset execution are bypassed if an administrator disables it for
a device.

Link: https://lore.kernel.org/r/20250207205600.1846178-1-naravamudan@nvidia.com
Fixes: d88f521da3 ("PCI: Allow userspace to query and set device reset mechanism")
Signed-off-by: Nishanth Aravamudan <naravamudan@nvidia.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Amey Narkhede <ameynarkhede03@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Yishai Hadas <yishaih@nvidia.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
2025-03-04 17:23:21 -06:00
Feng Tang
9d7db4db19 PCI/portdrv: Only disable pciehp interrupts early when needed
Firmware developers reported that Linux issues two PCIe hotplug commands in
very short intervals on an ARM server, which doesn't comply with the PCIe
spec.  According to PCIe r6.1, sec 6.7.3.2, if the Command Completed event
is supported, software must wait for a command to complete or wait at
least 1 second before sending a new command.

In the failure case, the first PCIe hotplug command is from
get_port_device_capability(), which sends a command to disable PCIe hotplug
interrupts without waiting for its completion, and the second command comes
from pcie_enable_notification() of pciehp driver, which enables hotplug
interrupts again.

Fix this by only disabling the hotplug interrupts when the pciehp driver is
not enabled.

Link: https://lore.kernel.org/r/20250303023630.78397-1-feng.tang@linux.alibaba.com
Fixes: 2bd50dd800 ("PCI: PCIe: Disable PCIe port services during port initialization")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-03-04 17:17:01 -06:00
Lukas Wunner
cc973ef13f PCI: hotplug: Inline pci_hp_{create,remove}_module_link()
For no apparent reason, the pci_hp_{create,remove}_module_link() helpers
live in slot.c, even though they're only called from two functions in
pci_hotplug_core.c.

Inline the helpers to reduce code size and number of exported symbols.

Link: https://lore.kernel.org/r/c207f03cfe32ae9002d9b453001a1dd63d9ab3fb.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04 17:00:12 -06:00
Lukas Wunner
34bd6141a6 PCI: hotplug: Avoid backpointer dereferencing in has_*_file()
The PCI hotplug core contains five has_*_file() functions to determine
whether a certain sysfs file shall be added (or removed) for a given
hotplug slot.

The functions receive a struct pci_slot pointer which they have to
dereference back to a struct hotplug_slot.

Avoid by passing them a struct hotplug_slot pointer directly.

Link: https://lore.kernel.org/r/5b2f5b4ac45285953d00fd7637732a93fd40d26e.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04 17:00:12 -06:00
Lukas Wunner
62460bcb5a PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file()
The PCI hotplug core contains five has_*_file() functions to determine
whether a certain sysfs file shall be added (or removed) for a given
hotplug slot.

The functions perform NULL pointer checks for the hotplug_slot and its
hotplug_slot_ops.  However the callers already perform these checks:

  pci_hp_register()
    __pci_hp_register()
      __pci_hp_initialize()

  pci_hp_deregister()
    pci_hp_del()

The only way to actually trigger these checks is to call pci_hp_add()
without having called pci_hp_initialize().

Amend pci_hp_add() to catch that and drop the now superfluous NULL
pointer checks in has_*_file().

Drop the same superfluous checks from pci_hp_create_module_link(),
which is (only) called from pci_hp_add().

Link: https://lore.kernel.org/r/37d1928edf8c3201a8b10794f1db3142e16e02b9.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04 17:00:12 -06:00
Lukas Wunner
666550a806 PCI: hotplug: Drop superfluous try_module_get() calls
In December 2002, historic commit

  https://git.kernel.org/tglx/history/c/bec7aa00ffe5
  ("[PATCH] more module warning fixes")

amended the PCI hotplug core to acquire a reference on the hotplug
driver module when a sysfs attribute is accessed.  That was necessary
because back in the day, sysfs code did not take any precautions to
prevent module unloading when an attribute was accessed.

Soon after in July 2003, historic commit

  https://git.kernel.org/tglx/history/c/1cf6d20f6078
  ("[PATCH] SYSFS: add module referencing to sysfs attribute files.")

addressed that deficiency.  But the commit neglected to remove the now
unnecessary reference acquisition from the PCI hotplug core.

The commit acquired a module reference for the entire duration between
open() and close() of a sysfs attribute.  This made it impossible to
unload a module while attributes were kept open by user space.
That's possible today:

When a hotplug driver module is unloaded, it removes sysfs attributes of
all its hotplug slots by calling pci_hp_del().  This will wait for any
concurrent user space operation to finish:

  pci_hp_del()
    fs_remove_slot()
      sysfs_remove_file()
        sysfs_remove_file_ns()
          kernfs_remove_by_name_ns()
            __kernfs_remove()
              kernfs_drain()

A user space operation such as read() briefly acquires a reference on
the attribute with kernfs_get_active().  kernfs_drain() waits until all
such references are released before allowing attribute removal.  Once
the attribute is removed, any subsequent user space operation on a still
open attribute file will return -ENODEV.

Thus, reference acquisition by the PCI hotplug core is still unnecessary
today.  So drop it at long last.

Link: https://lore.kernel.org/r/ed950fa2722967be4491146c7b867c1e7be11d37.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04 17:00:12 -06:00
Lukas Wunner
5c8265fa63 PCI: hotplug: Drop superfluous pci_hotplug_slot_list
The PCI hotplug core keeps a list of all registered slots.  Its sole
purpose is to WARN() on slot removal if another slot is using the same
name.

But this can never happen because already on slot creation, an error is
returned and multiple messages are emitted if a slot's name is
duplicated:

  pci_hp_register()
    __pci_hp_register()
      __pci_hp_initialize()
        pci_create_slot()
          kobject_init_and_add()
            kobject_add_varg()
              kobject_add_internal()
                create_dir()
                  sysfs_create_dir_ns()
                    kernfs_create_dir_ns()
                      sysfs_warn_dup()
                        pr_warn("cannot create duplicate filename ...")
                pr_err("%s failed for %s with -EEXIST, ...");

Drop the superfluous list.

Link: https://lore.kernel.org/r/603735bc50eb370bc7f1c358441ac671360bab25.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04 17:00:12 -06:00
Bjorn Helgaas
800ce277f4 PCI: Log debug messages about reset method
Log pci_dbg() messages about the reset methods we attempt and any errors
(-ENOTTY means "try the next method").

Set CONFIG_DYNAMIC_DEBUG=y and enable by booting with
dyndbg="file drivers/pci/* +p" or enable at runtime:

  # echo "file drivers/pci/* +p" > /sys/kernel/debug/dynamic_debug/control

Link: https://lore.kernel.org/r/20250303204220.197172-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
2025-03-04 16:44:43 -06:00
Jim Quinlan
174cfcf13d
PCI: brcmstb: Make irq_domain_set_info() parameter cast explicit
Make the cast to the irq_hw_number_t type for the parameter passed to
irq_domain_set_info() function explicit.

Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214173944.47506-9-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 16:03:10 +00:00
Jim Quinlan
a9ec9fb738
PCI: brcmstb: Make two changes in MDIO register fields
The hardware has been updated with two changes to the MDIO packet
format.

The CMD field used to be 12 bits and now is only 1 bit. This change
is backwards compatible because the field's starting bit position is
unchanged, and the only commands we've used have values 0 and 1.

The PORT field's width has been changed from 4 bits to 5 bits. When
written, the new bit is not contiguous with the other four. However,
this change is backwards compatible because the driver never used
anything other than 0 for the port field's value.

Thus, update the existing code to handle new changes to the hardware
in a backwards-compatible manner.

Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214173944.47506-8-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 16:03:04 +00:00
Jim Quinlan
42fd45be82
PCI: brcmstb: Use same constant table for config space access
The constants EXT_CFG_DATA and EXT_CFG_INDEX vary by SoC, where one of
the map_bus methods used these constants, and the other used a different
set of constants.

Thankfully, there was no problem because the SoCs that used the latter
map_bus method all had the same register constants.

Thus, remove redundant constants and adjust the code to use the correct
constants accordingly.

While at it, update the value of EXT_CFG_DATA to use the 4k-page based
configuration space access system, which is what the second map_bus
method was already using.

Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20250214173944.47506-7-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 16:01:43 +00:00
Jim Quinlan
b7de1b60ec
PCI: brcmstb: Fix potential premature regulator disabling
The platform supports enabling and disabling regulators only on
ports below the Root Complex.

Thus, we need to verify this both when adding and removing the bus,
otherwise regulators may be disabled prematurely when a bus further
down the topology is removed.

Fixes: 9e6be018b2 ("PCI: brcmstb: Enable child bus device regulators from DT")
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214173944.47506-6-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 16:01:30 +00:00
Jim Quinlan
3651ad5249
PCI: brcmstb: Fix error path after a call to regulator_bulk_get()
If the regulator_bulk_get() returns an error and no regulators
are created, we need to set their number to zero.

If we don't do this and the PCIe link up fails, a call to the
regulator_bulk_free() will result in a kernel panic.

While at it, print the error value, as we cannot return an error
upwards as the kernel will WARN() on an error from add_bus().

Fixes: 9e6be018b2 ("PCI: brcmstb: Enable child bus device regulators from DT")
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20250214173944.47506-5-james.quinlan@broadcom.com
[kwilczynski: commit log, use comma in the message to match style with
other similar messages]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 16:00:20 +00:00
Jim Quinlan
b5e441793e
PCI: brcmstb: Do not assume that register field starts at LSB
When setting the LNKCAP and LNKCTL2 register fields, it was assumed
that the field started at the LSB of the register.

Although the masks do indeed start at the LSB, and this will probably
not change, it is prudent to use a method that makes no assumption
about the mask's placement in the register.

Thus, use the u{16,32}p_replace_bits() helpers since they are already
wildly used in this driver.

Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214173944.47506-4-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 15:58:17 +00:00
Jim Quinlan
0c97321e11
PCI: brcmstb: Use internal register to change link capability
The driver has been mistakenly writing to a read-only (RO)
configuration space register (PCI_EXP_LNKCAP) to change the
PCIe link capability.

Although harmless in this case, the proper write destination
is an internal register that is reflected by PCI_EXP_LNKCAP.

Thus, fix the brcm_pcie_set_gen() function to correctly update
the link capability.

Fixes: c045213703 ("PCI: brcmstb: Add Broadcom STB PCIe host controller driver")
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214173944.47506-3-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 15:54:46 +00:00
Jim Quinlan
72d36589c6
PCI: brcmstb: Set generation limit before PCIe link up
When the user elects to limit the PCIe generation via the appropriate
devicetree property, apply the settings before the PCIe link up, not
after.

Fixes: c045213703 ("PCI: brcmstb: Add Broadcom STB PCIe host controller driver")
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214173944.47506-2-james.quinlan@broadcom.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 15:54:15 +00:00
Stanimir Varbanov
377bced88c
PCI: brcmstb: Add BCM2712 support
Add a bare minimum amount of changes in order to support PCIe Root
Complex hardware IP found on RPi5. The PCIe controller on BCM2712
is based on BCM7712 and as such it inherits register offsets, PERST#
assertion, bridge_reset ops, and inbound windows count.

Although, the implementation for BCM2712 needs a workaround related to
the control of the bridge_reset where turning off of the Root Port must
not shutdown the bridge_reset and this must be avoided. To implement
this workaround a quirks field is introduced in pcie_cfg_data struct.

The controller also needs adjustment of PHY PLL setup to use a 54MHz
input refclk. The default input reference clock for the PHY PLL is
100Mhz, except for some devices where it is 54Mhz like BCM2712C1 and
BCM2712D0.

To implement those adjustments introduce a new .post_setup op in
pcie_cfg_data and call it at the end of brcm_pcie_setup function.

The BCM2712 .post_setup callback implements the required MDIO writes
that switch the PLL refclk and also change PHY PM clock period.

Without this RPi5 PCIex1 is unable to enumerate endpoint devices on
the expansion connector.

Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Jim Quinlan <james.quinlan@broadcom.com>
Tested-by: Ivan T. Ivanov <iivanov@suse.de>
Link: https://lore.kernel.org/r/20250224083559.47645-8-svarbanov@suse.de
[commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 15:54:08 +00:00
Hans Zhang
3ac47fbf4f
PCI: cadence-ep: Fix the driver to send MSG TLP for INTx without data payload
Per the Cadence's "PCIe Controller IP for AX14" user guide, Version
1.04, Section 9.1.7.1, "AXI Subordinate to PCIe Address Translation
Registers", Table 9.4, the bit 16 of the AXI Subordinate Address
(axi_s_awaddr) when set corresponds to MSG with data, and when not set,
to MSG without data.

However, the driver is currently doing the opposite and due to this,
the INTx is never received on the host.

So, fix the driver to reflect the documentation and also make INTx work.

Fixes: 37dddf14f1 ("PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller")
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250214165724.184599-1-18255117159@163.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-04 11:22:25 +00:00
Shradha Todi
efaf16de43
PCI: dwc: Add helper to find the Vendor Specific Extended Capability (VSEC)
The new dw_pcie_find_vsec_capability() helper will be used within
different DWC APIs to find the VSEC capabilities like PTM, RAS, etc.

Co-developed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Shradha Todi <shradha.t@samsung.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Hrishikesh Deleep <hrishikesh.d@samsung.com>
Link: https://lore.kernel.org/r/20250221131548.59616-3-shradha.t@samsung.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-03 20:02:58 +00:00
Lorenzo Bianconi
249b782980
PCI: mediatek-gen3: Configure PBUS_CSR registers for EN7581 SoC
Configure PBus base address and address mask to allow the hw
to detect if a given address is accessible on PCIe controller.

Fixes: f6ab898356 ("PCI: mediatek-gen3: Add Airoha EN7581 support")
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/20250225-en7581-pcie-pbus-csr-v4-2-24324382424a@kernel.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-03 19:11:44 +00:00
Herve Codina
1f34072441 PCI: of: Create device tree PCI host bridge node
PCI devices device tree nodes can be already created. This was introduced
by commit 407d1a5192 ("PCI: Create device tree node for bridge").

In order to have device tree nodes related to PCI devices attached on their
PCI root bus (the PCI bus handled by the PCI host bridge), a PCI root bus
device tree node is needed. This root bus node will be used as the parent
node of the first level devices scanned on the bus. On device tree based
systems, this PCI root bus device tree node is set to the node of the
related PCI host bridge. The PCI host bridge node is available in the
device tree used to describe the hardware passed at boot.

On non device tree based system (such as ACPI), a device tree node for the
PCI host bridge or for the root bus does not exist. Indeed, the PCI host
bridge is not described in a device tree used at boot simply because no
device tree is passed at boot.

The device tree PCI host bridge node creation needs to be done at runtime.
This is done in the same way as for the creation of the PCI device nodes.
I.e. node and properties are created based on computed information done by
the PCI core. Also, as is done on device tree based systems, this PCI host
bridge node is used for the PCI root bus.

With this done, hardware available in a PCI device that doesn't follow the
PCI model consisting in one PCI function handled by one driver can be
described by a device tree overlay loaded by the PCI device driver on non
device tree based systems. Those PCI devices provide a single PCI function
that includes several functionalities that require different drivers. The
device tree overlay describes the internal devices and their relationships.
It allows to load drivers needed by those different devices in order to
have functionalities handled.

Link: https://lore.kernel.org/r/20250224141356.36325-6-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-28 15:13:51 -06:00
Herve Codina
3dc8adeeef PCI: of_property: Constify parameter in of_pci_get_addr_flags()
The res parameter has no reason to be a pointer to an un-const struct
resource. Indeed, struct resource is not supposed to be modified by the
function.

Constify the res parameter.

Link: https://lore.kernel.org/r/20250224141356.36325-5-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-28 15:13:07 -06:00
Herve Codina
c5785a165f PCI: of_property: Add support for NULL pdev in of_pci_set_address()
The pdev (pointer to a struct pci_dev) parameter of of_pci_set_address()
cannot be NULL.

In order to use of_pci_set_address() when creating the PCI root bus node,
it needs to support a NULL pdev parameter. Indeed, in the case of the PCI
root bus node creation, no pdev is available and of_pci_set_address() will
be used with the bridge windows.

Allow to call of_pci_set_address() with a NULL pdev.

Link: https://lore.kernel.org/r/20250224141356.36325-4-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-28 15:13:07 -06:00
Herve Codina
e2267841fe PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
The commit 407d1a5192 ("PCI: Create device tree node for bridge")
creates of_node for PCI devices. The newly created of_node is attached
to an existing device. This is done setting directly pdev->dev.of_node
in the code.

Even if pdev->dev.of_node cannot be previously set, this doesn't handle
the fwnode field of the struct device. Indeed, this field needs to be
set if it hasn't already been set.

device_{add,remove}_of_node() have been introduced to handle this case.

Use them instead of the direct setting.

Link: https://lore.kernel.org/r/20250224141356.36325-3-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-28 15:13:07 -06:00
Stanimir Varbanov
25a98c7270
PCI: brcmstb: Expand inbound window size up to 64GB
The BCM2712 memory map can support up to 64GB of system memory, thus
expand the inbound window size in calculation helper function.

The change is safe for the currently supported SoCs that have smaller
inbound window sizes.

Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Jim Quinlan <james.quinlan@broadcom.com>
Tested-by: Ivan T. Ivanov <iivanov@suse.de>
Link: https://lore.kernel.org/r/20250224083559.47645-7-svarbanov@suse.de
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-28 18:40:08 +00:00
Stanimir Varbanov
10dbedad3c
PCI: brcmstb: Reuse pcie_cfg_data structure
Instead of copying fields from the pcie_cfg_data structure to
brcm_pcie, reference it directly.

Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
Reviewed-by: Florian Fainelil <florian.fainelli@broadcom.com>
Reviewed-by: Jim Quinlan <james.quinlan@broadcom.com>
Tested-by: Ivan T. Ivanov <iivanov@suse.de>
Link: https://lore.kernel.org/r/20250224083559.47645-6-svarbanov@suse.de
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-28 18:40:01 +00:00
Stanimir Varbanov
2294059118
PCI: brcmstb: Add a softdep to MIP MSI-X driver
Then the brcmstb PCIe driver and MIP MSI-X interrupt controller
drivers are built as modules there could be a race in probing.

To avoid this, add a softdep to MIP driver to guarantee that
MIP driver will be load first.

Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Ivan T. Ivanov <iivanov@suse.de>
Link: https://lore.kernel.org/r/20250224083559.47645-5-svarbanov@suse.de
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-28 18:38:01 +00:00
Yury Norov
7a610694fa PCI: hv: Switch hv_compose_multi_msi_req_get_cpu() to using cpumask_next_wrap()
Calling cpumask_next_wrap_old() with starting CPU == nr_cpu_ids
is effectively the same as request to find first CPU, starting
from a given one and wrapping around if needed.

cpumask_next_wrap() is a proper replacement for that.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-24 16:37:23 -05:00
Yury Norov
dc5bb9b769 cpumask: deprecate cpumask_next_wrap()
The next patch aligns implementation of cpumask_next_wrap() with the
find_next_bit_wrap(), and it changes function signature.

To make the transition smooth, this patch deprecates current
implementation by adding an _old suffix. The following patches switch
current users to the new implementation one by one.

No functional changes were intended.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-24 16:37:22 -05:00
Dmitry Baryshkov
42c812d070
PCI: qcom-ep: Enable EP mode support for SAR2130P
Enable PCIe Endpoint mode support for the Qualcomm SAR2130P platform.

This is needed, as it is not possible to use a compatible fallback on
any other platform since SAR2130P uses slightly different set of clocks.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20250221-sar2130p-pci-v3-6-61a0fdfb75b4@linaro.org
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-24 19:05:00 +00:00
Stanimir Varbanov
2df181e1ae
PCI: brcmstb: Fix missing of_node_put() in brcm_pcie_probe()
A call to of_parse_phandle() is incrementing the refcount, and as such,
the of_node_put() must be called when the reference is no longer needed.

Thus, refactor the existing code and add a missing of_node_put() call
following the check to ensure that "msi_np" matches "pcie->np" and after
MSI initialization, but only if the MSI support is enabled system-wide.

Cc: stable@vger.kernel.org # v5.10+
Fixes: 40ca1bf580 ("PCI: brcmstb: Add MSI support")
Signed-off-by: Stanimir Varbanov <svarbanov@suse.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250122222955.1752778-1-svarbanov@suse.de
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-24 18:50:54 +00:00
Manivannan Sadhasivam
9d52691f89
PCI: qcom-ep: Mark BAR0/BAR2 as 64bit BARs and BAR1/BAR3 as RESERVED
On all Qcom endpoint SoCs, BAR0/BAR2 are 64bit BARs by default and
software cannot change the type.

So, mark the those BARs as 64bit BARs and also mark the successive
BAR1/BAR3 as RESERVED BARs so that the EPF drivers cannot use them.

Cc: stable+noautosel@kernel.org # depends on patch introducing only_64bit flag
Fixes: f55fee56a6 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20241231130224.38206-3-manivannan.sadhasivam@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-24 18:29:05 +00:00
Balbir Singh
7ffb791423 x86/kaslr: Reduce KASLR entropy on most x86 systems
When CONFIG_PCI_P2PDMA=y (which is basically enabled on all
large x86 distros), it maps the PFN's via a ZONE_DEVICE
mapping using devm_memremap_pages(). The mapped virtual
address range corresponds to the pci_resource_start()
of the BAR address and size corresponding to the BAR length.

When KASLR is enabled, the direct map range of the kernel is
reduced to the size of physical memory plus additional padding.
If the BAR address is beyond this limit, PCI peer to peer DMA
mappings fail.

Fix this by not shrinking the size of the direct map when
CONFIG_PCI_P2PDMA=y.

This reduces the total available entropy, but it's better than
the current work around of having to disable KASLR completely.

[ mingo: Clarified the changelog to point out the broad impact ... ]

Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/Kconfig
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/lkml/20250206023201.1481957-1-balbirs@nvidia.com/
Link: https://lore.kernel.org/r/20250206234234.1912585-1-balbirs@nvidia.com
--
 arch/x86/mm/kaslr.c | 10 ++++++++--
 drivers/pci/Kconfig |  6 ++++++
 2 files changed, 14 insertions(+), 2 deletions(-)
2025-02-22 12:25:57 +01:00
Guilherme Giacomo Simoes
8ff4574cf7
PCI: cpcihp: Remove unused .get_power() and .set_power()
The .get_power() and .set_power() function pointers in struct
cpci_hp_controller_ops were declared but never implemented by any
driver.

Thus, to improve code readability and reduce resource usage,
remove these pointers and the code that has never been used.

Link: https://lore.kernel.org/r/20250217185638.398925-1-trintaeoitogc@gmail.com
Signed-off-by: Guilherme Giacomo Simoes <trintaeoitogc@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-22 08:03:02 +00:00
Ilpo Järvinen
7e077e6707 PCI/ERR: Handle TLP Log in Flit mode
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is
presented through AER and DPC Capability registers. The TLP Prefix Log
Register is not present with Flit mode, and the register becomes an
extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13).

Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the
extended TLP Header Log when the Link is in Flit mode. As the Prefix Log
and Extended TLP Header are not present at the same time, a C union can be
used.

Determining whether the error occurred while the Link was in Flit mode is a
bit complicated. In case of AER, the Advanced Error Capabilities and
Control Register directly tells whether the error was logged in Flit mode
or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14),
unfortunately, does not contain the same information.

Unlike AER, the DPC Capability does not provide a way to discern whether
the error was logged in Flit mode (this is confirmed by PCI WG to be an
oversight in the spec). DPC will bring the Link down immediately following
an error, which makes it impossible to acquire the Flit Mode Status
directly from the Link Status 2 register because Flit Mode Status is only
set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use
the flit_mode value stored into the struct pci_bus.

Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:31:45 -06:00
Ilpo Järvinen
79c731e20d PCI: Track Flit Mode Status & print it with link status
PCIe r6.0 added Flit mode, which mainly alters HW behavior, but there are
some OS visible changes. The OS visible changes include differences in the
layout of some capabilities and interpretation of the TLP headers (in
diagnostics situations).

To be able to determine which mode the PCIe Link is using, store the Flit
Mode Status (PCIe r6.1 sec 7.5.3.20) information in addition to the Link
speed into struct pci_bus in pcie_update_link_speed().

Link: https://lore.kernel.org/r/20250207161836.2755-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: use unsigned int:1 instead of bool, update flit_mode setting]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:31:35 -06:00
Ilpo Järvinen
fab874e125 PCI/AER: Descope pci_printk() to aer_printk()
include/linux/pci.h provides low-level pci_printk() interface that is
only used by AER because it needs to print the same message with
different levels depending on the error severity. No other PCI code
uses that functionality and calls pci_<level>() logging functions
directly with the appropriate level.

Descope pci_printk() into AER as aer_printk().

Link: https://lore.kernel.org/r/20241216161012.1774-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: retain pci_printk() for now since shpchp still uses it and I
moved those patches to a different branch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:17:08 -06:00
Tushar Dave
9cf8a952d5 PCI/ACS: Fix 'pci=config_acs=' parameter
Commit 47c8846a49 ("PCI: Extend ACS configurability") introduced bugs
that fail to configure ACS ctrl to the value specified by the kernel
parameter. Essentially there are two bugs:

1) When ACS is configured for multiple PCI devices using 'config_acs'
   kernel parameter, it results into error "PCI: Can't parse ACS command
   line parameter". This is due to a bug that doesn't preserve the ACS
   mask, but instead overwrites the mask with value 0.

   For example, using 'config_acs' to configure ACS ctrl for multiple BDFs
   fails:

      Kernel command line: pci=config_acs=1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
      PCI: Can't parse ACS command line parameter
      pci 0020:02:00.0: ACS mask  = 0x007f
      pci 0020:02:00.0: ACS flags = 0x007b
      pci 0020:02:00.0: Configured ACS to 0x007b

   After this fix:

      Kernel command line: pci=config_acs=1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
      pci 0020:02:00.0: ACS mask  = 0x007f
      pci 0020:02:00.0: ACS flags = 0x007b
      pci 0020:02:00.0: ACS control = 0x005f
      pci 0020:02:00.0: ACS fw_ctrl = 0x0053
      pci 0020:02:00.0: Configured ACS to 0x007b
      pci 0039:00:00.0: ACS mask  = 0x0070
      pci 0039:00:00.0: ACS flags = 0x0050
      pci 0039:00:00.0: ACS control = 0x001d
      pci 0039:00:00.0: ACS fw_ctrl = 0x0000
      pci 0039:00:00.0: Configured ACS to 0x0050

2) In the bit manipulation logic, we copy the bit from the firmware
   settings when mask bit 0.

   For example, 'disable_acs_redir' fails to clear all three ACS P2P redir
   bits due to the wrong bit fiddling:

      Kernel command line: pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
      pci 0020:02:00.0: ACS mask  = 0x002c
      pci 0020:02:00.0: ACS flags = 0xffd3
      pci 0020:02:00.0: Configured ACS to 0xfffb
      pci 0030:02:00.0: ACS mask  = 0x002c
      pci 0030:02:00.0: ACS flags = 0xffd3
      pci 0030:02:00.0: Configured ACS to 0xffdf
      pci 0039:00:00.0: ACS mask  = 0x002c
      pci 0039:00:00.0: ACS flags = 0xffd3
      pci 0039:00:00.0: Configured ACS to 0xffd3

   After this fix:

      Kernel command line: pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
      pci 0020:02:00.0: ACS mask  = 0x002c
      pci 0020:02:00.0: ACS flags = 0xffd3
      pci 0020:02:00.0: ACS control = 0x007f
      pci 0020:02:00.0: ACS fw_ctrl = 0x007b
      pci 0020:02:00.0: Configured ACS to 0x0053
      pci 0030:02:00.0: ACS mask  = 0x002c
      pci 0030:02:00.0: ACS flags = 0xffd3
      pci 0030:02:00.0: ACS control = 0x005f
      pci 0030:02:00.0: ACS fw_ctrl = 0x005f
      pci 0030:02:00.0: Configured ACS to 0x0053
      pci 0039:00:00.0: ACS mask  = 0x002c
      pci 0039:00:00.0: ACS flags = 0xffd3
      pci 0039:00:00.0: ACS control = 0x001d
      pci 0039:00:00.0: ACS fw_ctrl = 0x0000
      pci 0039:00:00.0: Configured ACS to 0x0000

Link: https://lore.kernel.org/r/20250207030338.456887-1-tdave@nvidia.com
Fixes: 47c8846a49 ("PCI: Extend ACS configurability")
Signed-off-by: Tushar Dave <tdave@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-02-21 16:16:55 -06:00
Mrinmay Sarkar
4f13dd9e2b
PCI: epf-mhi: Update device ID for SA8775P
Update device ID for the Qcom SA8775P SoC.

Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com>
Link: https://lore.kernel.org/r/20241205065422.2515086-3-quic_msarkar@quicinc.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-21 05:55:09 +00:00
Lorenzo Bianconi
b6d7bb0d3b
PCI: mediatek-gen3: Remove leftover mac_reset assert for Airoha EN7581 SoC
Remove a leftover assert for mac_reset line in mtk_pcie_en7581_power_up().

This is not harmful since EN7581 does not requires mac_reset and
mac_reset is not defined in EN7581 device tree.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20250201-pcie-en7581-remove-mac_reset-v2-1-a06786cdc683@kernel.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-21 05:46:00 +00:00
Manivannan Sadhasivam
75996c92f4
PCI/pwrctrl: Add pwrctrl driver for PCI slots
This driver is used to control the power state of the devices attached to
the PCI slots. Currently, it controls the voltage rails of the PCI slots
defined in the devicetree node of the root port.

The voltage rails for PCI slots are documented in the DT-schema:

  https://github.com/devicetree-org/dt-schema/blob/v2024.11/dtschema/schemas/pci/pci-bus-common.yaml#L153

Since this driver has to work with different kind of slots (PCIe
x1/x4/x8/x16, Mini PCIe, PCI, etc.), the driver is thus using the
of_regulator_bulk_get_all() API to obtain the voltage regulators defined
in the DT node, instead of hardcoding them.

As such, the DT node of the root port should define the relevant supply
properties corresponding to the voltage rails of the PCI slot.

Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-5-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-21 01:03:39 +00:00
Easwar Hariharan
25a3c220a2
PCI: hv: Correct a comment
The VF driver controls an endpoint attached to the PCI HyperV
controller. An invalidation sent by the PF driver in the host
would be delivered *to* the endpoint driver by the controller
driver.

Thus, correct the wording within the comment.

Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20250207190716.89995-1-eahariha@linux.microsoft.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 14:57:28 +00:00
Manivannan Sadhasivam
2489eeb777
PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
The pwrctrl core will rescan the bus once the device is powered on. So
there is no need to continue scanning for the device further.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-3-827473c8fbf4@linaro.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:59:10 +00:00
Manivannan Sadhasivam
2d923930f2
PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
The PCI core will try to access the devices even after pci_stop_dev()
for things like Data Object Exchange (DOE), ASPM, etc.

So, move pci_pwrctrl_unregister() to the near end of pci_destroy_dev()
to make sure that the devices are powered down only after the PCI core
is done with them.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-2-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:59:10 +00:00
Manivannan Sadhasivam
957f40d039
PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
Current way of creating pwrctrl devices requires iterating through the
child devicetree nodes of the PCI bridge in pci_pwrctrl_create_devices().
Even though it works, it creates confusion as there is no symmetry between
this and pci_pwrctrl_unregister() function that removes the pwrctrl
devices.

So to make these two functions symmetric, move the creation of pwrctrl
devices to pci_scan_device(). During the scan of each device in a slot,
the devicetree node (if exists) for the PCI device will be checked. If it
has the supplies populated, then the pwrctrl device will be created.

Since the PCI device scan happens so early, there would be no "struct
pci_dev" available for the device. So the host bridge is used as the
parent of all pwrctrl devices.

One nice side effect of this move is that, it is now possible to have
pwrctrl devices for PCI bridges as well (to control the supplies of PCI
slots).

Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-1-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:59:02 +00:00
Daniel Stodden
cbf937dcad
PCI/ASPM: Fix link state exit during switch upstream function removal
Before 456d8aa37d ("PCI/ASPM: Disable ASPM on MFD function removal to
avoid use-after-free"), we would free the ASPM link only after the last
function on the bus pertaining to the given link was removed.

That was too late. If function 0 is removed before sibling function,
link->downstream would point to free'd memory after.

After above change, we freed the ASPM parent link state upon any function
removal on the bus pertaining to a given link.

That is too early. If the link is to a PCIe switch with MFD on the upstream
port, then removing functions other than 0 first would free a link which
still remains parent_link to the remaining downstream ports.

The resulting GPFs are especially frequent during hot-unplug, because
pciehp removes devices on the link bus in reverse order.

On that switch, function 0 is the virtual P2P bridge to the internal bus.
Free exactly when function 0 is removed -- before the parent link is
obsolete, but after all subordinate links are gone.

Link: https://lore.kernel.org/r/e12898835f25234561c9d7de4435590d957b85d9.1734924854.git.dns@arista.com
Fixes: 456d8aa37d ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Signed-off-by: Daniel Stodden <dns@arista.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:05:56 +00:00
Ilpo Järvinen
588021b286 PCI: shpchp: Remove 'shpchp_debug' module parameter
The "shpchp_debug" module parameter is used to enable debug logging.  The
generic ability to turn on/off debug prints dynamically covers this use
case already so there is no need for module specific debug handling.  The
ctrl_dbg() wrapper also uses a low-level pci_printk() despite always using
KERN_DEBUG level.

Remove "shpchp_debug" parameter and convert ctrl_dbg() to use pci_dbg().

From now on, shpchp can be debugged using the normal dynamic debugger by
setting CONFIG_DYNAMIC_DEBUG=y and then either adding to kernel cmdline:

  dyndbg="file drivers/pci/hotplug/shpchp* +p"

or using this command on a running kernel:

  echo 'file drivers/pci/hotplug/shpchp* +p' > /sys/kernel/debug/dynamic_debug/control

Link: https://lore.kernel.org/r/20250217095550.2789-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-19 16:54:42 -06:00
Ilpo Järvinen
b52ce0b6d5 PCI: shpchp: Remove unused logging wrappers
The shpchp hotplug driver defines logging wrapper with generic names which
are just duplicates of existing generic printk() wrappers. They are also
unused so remove them.

Link: https://lore.kernel.org/r/20250217095550.2789-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-19 16:54:42 -06:00
Ilpo Järvinen
4999822008 PCI: shpchp: Change dbg() -> ctrl_dbg()
Convert the last user of dbg() to use ctrl_dbg().

Link: https://lore.kernel.org/r/20241216161012.1774-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-19 16:54:37 -06:00
Ilpo Järvinen
7d5f1e615e PCI: shpchp: Remove logging from module init/exit functions
The logging in shpchp module init/exit functions is not very useful.
Remove it.

Link: https://lore.kernel.org/r/20241216161012.1774-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-19 16:51:44 -06:00
Rafael J. Wysocki
bca84a7b93 PM: sleep: Use DPM_FLAG_SMART_SUSPEND conditionally
A recent discussion has revealed that using DPM_FLAG_SMART_SUSPEND
unconditionally is generally problematic because it may lead to
situations in which the device's runtime PM information is internally
inconsistent or does not reflect its real state [1].

For this reason, change the handling of DPM_FLAG_SMART_SUSPEND so that
it is only taken into account if it is consistently set by the drivers
of all devices having any PM callbacks throughout dependency graphs in
accordance with the following rules:

 - The "smart suspend" feature is only enabled for devices whose drivers
   ask for it (that is, set DPM_FLAG_SMART_SUSPEND) and for devices
   without PM callbacks unless they have never had runtime PM enabled.

 - The "smart suspend" feature is not enabled for a device if it has not
   been enabled for the device's parent unless the parent does not take
   children into account or it has never had runtime PM enabled.

 - The "smart suspend" feature is not enabled for a device if it has not
   been enabled for one of the device's suppliers taking runtime PM into
   account unless that supplier has never had runtime PM enabled.

Namely, introduce a new device PM flag called smart_suspend that is only
set if the above conditions are met and update all DPM_FLAG_SMART_SUSPEND
users to check power.smart_suspend instead of directly checking the
latter.

At the same time, drop the power.set_active flage introduced recently
in commit 3775fc538f ("PM: sleep: core: Synchronize runtime PM status
of parents and children") because it is now sufficient to check
power.smart_suspend along with the dev_pm_skip_resume() return value
to decide whether or not pm_runtime_set_active() needs to be called
for the device.

Link: https://lore.kernel.org/linux-pm/CAPDyKFroyU3YDSfw_Y6k3giVfajg3NQGwNWeteJWqpW29BojhQ@mail.gmail.com/ [1]
Fixes: 7585946243 ("PM: sleep: core: Restrict power.set_active propagation")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci
Link: https://patch.msgid.link/1914558.tdWV9SEqCh@rjwysocki.net
2025-02-19 13:22:12 +01:00
Ilpo Järvinen
2499f53484 PCI: Rework optional resource handling
Remove and rescan cycle can result in failure to assign a bridge window if
it becomes larger than before the remove. The bridge window size will
include space for disabled Expansion ROM, which can causes the bridge
window to not fit anymore into the same address space slot on rescan if the
Expansion ROM resource was not assigned before the remove. In addition, the
optional resource handling is not internally consistent.

The resource fitting logic supports three main types of optional resources:

  - IOV BARs
  - Expansion ROMs
  - Bridge window size variation due to optional resources

In addition to the above, resizable BARs beyond their current size will
require handling optional variation in resource sizes within the resource
fitting algorithm (not yet done by the resource fitting code).

There are multiple inconsistencies related to optional resource handling:

a) The allocation failure of disabled expansion ROM requires special case
   inside assign_requested_resources_sorted().

b) The optionality of disabled expansion ROM is not considered during
   bridge window sizing in pbus_size_mem().

c) Setting resource size to zero for optional resource in pbus_size_mem()
   is problematic because it makes also the alignment invalid, which is
   checked by pdev_sort_resources().

   Optional IOV resources have their size set to zero by pbus_size_mem()
   but the information about size is stored externally in struct pci_sriov
   and complex call-chain trickery in pci_resource_alignment() ensures IOV
   resources return a valid alignment despite having zero resource size. A
   solution that is specific to IOV resources makes it hard to use the same
   solution for other types of resources such as expansion ROM.

Simply changing pbus_size_mem() is not sufficient to fully address the main
issue because it would introduce disparity between bridge window sizing and
resource allocation. Due to size-based ordering of the resource list during
assignment loop, an Expansion ROM resource could steal space from some
other resource and make the other resource not fit if the Expansion ROM is
larger than the other resource. Thus, the resource assignment functions
need to be changed as well.

Make optional resource handling more straightforward. Use
pci_resource_is_optional() to determine if a resource is optional in both
bridge window sizing and assignment failure classification to ensure they
always align. Indicate with a parameter to
assign_requested_resources_sorted() whether it should attempt to allocate
optional resources or not.

Always try first to assign all resources (also when realloc_head is not
provided). This is required for calls from
pci_assign_unassigned_root_bus_resources() that provide realloc_head only
with some of its iterations.

Non-bridge-window optional resources in realloc_head now have add_size 0.
This condition has to be detected in reassign_resources_sorted() before
reassigning them (which would fail as there is no size change).  Removing
add_size=0 optional resources entirely from realloc_head might eventually
be doable but further rework in __assign_resources_sorted() is needed first
to support such a change.

Link: https://lore.kernel.org/r/20241216175632.4175-26-ilpo.jarvinen@linux.intel.com
Reported-by: Jia Yao <jia.yao@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219547
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jia Yao <jia.yao@intel.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
96336ec702 PCI: Perform reset_resource() and build fail list in sync
Resetting a resource is problematic as it prevents attempting to allocate
the resource later, unless something in between restores the resource.
Similarly, if fail_head does not contain all resources that were reset,
those resources cannot be restored later.

The entire reset/restore cycle adds complexity and leaving resources in the
reset state causes issues to other code such as for checks done in
pci_enable_resources(). Take a small step towards not resetting resources
by delaying reset until the end of resource assignment and build failure
list (fail_head) in sync with the reset to avoid leaving behind resources
that cannot be restored (for the case where the caller provides fail_head
in the first place to allow restore somewhere in the callchain, as is not
all callers pass non-NULL fail_head).

Leave the Expansion ROM check temporarily in place while building the
failure list until an upcoming change that reworks optional resource
handling.

Ideally, whole resource reset could be removed but doing that in one step
would be non-tractable due to complexity of all related code.

Link: https://lore.kernel.org/r/20241216175632.4175-25-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
e89df6d2be PCI: Use res->parent to check if resource is assigned
reassign_resources_sorted() uses resource_size() to select between
pci_assign_resource() and pci_reassign_resource(). Due to twisted way
bridge window sizing in pbus_size_mem() sets resource sizes to 0, it works
to match into IOV resources but that is going to be changed by an upcoming
change.

Replace resource_size() check with res->parent check that is the true
dividing line in between whether assign or reassign function should be used
for the resource.

Link: https://lore.kernel.org/r/20241216175632.4175-24-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
8884b5637b PCI: Add debug print when releasing resources before retry
PCI resource fitting is somewhat hard to track because it performs many
actions without logging them. In the case inside
__assign_resources_sorted(), the resources are released before resource
assignment is going to be retried in a different order. That is just one
level of retries the resource fitting performs overall so tracking it
through repeated assignments or failures of a resource gets messy rather
quickly.

Simply announce the release explicitly using pci_dbg() so it is clear what
is going on with each resource.

Link: https://lore.kernel.org/r/20241216175632.4175-23-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
07854e08cd PCI: Indicate optional resource assignment failures
Add pci_dbg() to note that an assignment failure was for an optional
resource and reword existing message about resource resize to say the
change was optional.

Link: https://lore.kernel.org/r/20241216175632.4175-22-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
b3281eb5de PCI: Always have realloc_head in __assign_resources_sorted()
Add a dummy list to always have a non-NULL realloc head in
__assign_resources_sorted() as it allows only checking list_empty().

In future, it would be good to ensure all callers provide a valid
realloc_head but that is relatively complex to do in practice and not
necessary for the subsequent optional resource handling fix.

Link: https://lore.kernel.org/r/20241216175632.4175-21-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
9caf4ea2fd PCI: Extend enable to check for any optional resource
pci_enable_resources() checks if device's io and mem resources are all
assigned and disallows enable if any resource failed to assign (*) but
makes an exception for the case of disabled extension ROM. There are other
optional resources, however.

Add pci_resource_is_optional() and use it instead of
pci_resource_is_disabled_rom() to cover also IOV resources that are also
optional as per pbus_size_mem().

As there will be more users of pci_resource_is_optional() inside
setup-bus.c in changes coming up after this one, the function is placed
there.

(*) In practice, resource fitting code calls reset_resource() for any
resource it fails to assign which clears resource's ->flags causing
pci_enable_resources() to never detect failed resource assignments.
This seems undesirable internal logic inconsistency, effectively
reset_resource() prevents pci_enable_resources() from functioning as
intended. This is one step of many that will be needed towards removing
reset_resource().

Link: https://lore.kernel.org/r/20241216175632.4175-20-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
4e362abe48 PCI: Add restore_dev_resource()
Resource fitting needs to restore the saved dev resources in a few places.
Add a restore_dev_resource() helper for that.

Link: https://lore.kernel.org/r/20241216175632.4175-19-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
c8098ad8fb PCI: Remove incorrect comment from pci_reassign_resource()
Commit a4ac9fea01 ("PCI : Calculate right add_size") removed including
min_align into new_size in pci_reassign_resource() which is the correct
thing to do. However, it also added a snakeoil comment that the resource
would already be aligned with min_align which is incorrect.

A resource that is assigned earlier is aligned with the old alignment, NOT
with the new requested alignment (min_align) until later deep within the
reassignment callchain. Thus, remove the incorrect comment.

Link: https://lore.kernel.org/r/20241216175632.4175-18-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
ca9097f9ce PCI: Consolidate assignment loop next round preparation
pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() have a loop that may perform
several rounds to assign resources. The code to prepare for the next round
is identical.

Consolidate the code that prepares for the next assignment round into
pci_prepare_next_assign_round().

Link: https://lore.kernel.org/r/20241216175632.4175-17-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:54 -06:00
Ilpo Järvinen
54181c1364 PCI: Rename retval to ret
Rename 'retval' to 'ret' in pci_assign_unassigned_bridge_resources().

Link: https://lore.kernel.org/r/20241216175632.4175-16-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Ilpo Järvinen
acba174d2e PCI: Use while loop and break instead of gotos
pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() contain ad hoc loops using
backwards goto and gotos out of the loop. Replace them with while loops
and break statements.

While reindenting the loop bodies, add braces & remove parenthesis.

Link: https://lore.kernel.org/r/20241216175632.4175-15-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Ilpo Järvinen
0aa089cdde PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
Reduce level of call nesting by calling pdev_sort_resources() directly
and by moving the tests done inside __dev_sort_resources() into
pdev_resources_assignable() helper.

Link: https://lore.kernel.org/r/20241216175632.4175-14-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Ilpo Järvinen
22fb2eda54 PCI: Converge return paths in __assign_resources_sorted()
All return paths want to free head list in __assign_resources_sorted(), so
add a label and use goto.

Link: https://lore.kernel.org/r/20241216175632.4175-13-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Ilpo Järvinen
9b54578bc0 PCI: Add dev & res local variables to resource assignment funcs
Many PCI resource allocation related functions process struct
pci_dev_resource items which hold the struct pci_dev and resource pointers.
Reduce the number of lines that need indirection by adding 'dev' and 'res'
local variable to hold the pointers.

Link: https://lore.kernel.org/r/20241216175632.4175-12-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Ilpo Järvinen
e4728eed24 PCI: Add pci_resource_num() helper
A few places in PCI code, mainly in setup-bus.c, need to reverse lookup the
index of a resource in pci_dev's resource array. Create pci_resource_num()
helper to avoid repeating the pointer arithmetic trick used to calculate
the index.

Link: https://lore.kernel.org/r/20241216175632.4175-11-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Ilpo Järvinen
2bd0c72117 PCI: Check resource_size() separately
Instead of chaining logic inside if () condition so that multiple lines are
required, make !resource_size() a separate check and use continue.

Link: https://lore.kernel.org/r/20241216175632.4175-10-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:53 -06:00
Michał Winiarski
cbd384389e PCI: Add pci_resource_is_iov() to identify IOV resources
There are multiple places where special handling is required for IOV
resources.

Extract the identification of IOV resources to pci_resource_is_iov() and
drop a few ifdefs.

Link: https://lore.kernel.org/r/20241216175632.4175-9-ilpo.jarvinen@linux.intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
ee4621b7e4 PCI: Use resource_set_{range,size}() helpers
A few sites that could use resource_set_range/size() in setup-bus.c were
not picked up earlier due to them no matching the usual pattern.  Convert
them now.

These are more cases similar to 783602c920 ("PCI: Use
resource_set_{range,size}() helpers").

Link: https://lore.kernel.org/r/20241216175632.4175-8-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: add 783602c920 history]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
8986e7e668 PCI: Use SZ_* instead of literals in setup-bus.c
Convert literals in setup-bus.c to SZ_* defines that make the size more
human readable.

As the code is now self-explanatory, eliminate comments about the size.

Link: https://lore.kernel.org/r/20241216175632.4175-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
ff61f380de PCI: Fix old_size lower bound in calculate_iosize() too
Commit 903534fa7d ("PCI: Fix resource double counting on remove &
rescan") fixed double counting of mem resources because of old_size being
applied too early.

Fix a similar counting bug on the io resource side.

Link: https://lore.kernel.org/r/20241216175632.4175-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
67f9085596 PCI: Allow relaxed bridge window tail sizing for optional resources
Commit 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
relaxed the bridge window requirements for non-optional size (size0)
but pbus_size_mem() also handles optional sizes (IOV resources) using
size1. This can manifest, e.g., as a failure to resize a BAR back to
its original size after it was first shrunk when device has a VF BAR
resource because the bridge window (size1) is enlarged beyond what is
strictly required to fit the downstream resources.

Allow using relaxed bridge window tail sizing rules also with the optional
resources (size1) so that the remove/realloc cycle during BAR resize
(smaller and back to the original size) does not fail unexpectedly due to
increase in bridge window size demand.

Also move add_align calculation to more logical place next to size1
assignment as they are strongly related to each other.

Link: https://lore.kernel.org/r/20241216175632.4175-5-ilpo.jarvinen@linux.intel.com
Fixes: 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
a55bf64b30 PCI: Simplify size1 assignment logic
In pbus_size_io() and pbus_size_mem(), a complex ?: operation is performed
to set size1.  Decompose this so it's easier to read.

In the case of pbus_size_mem(), simply initializing size1 to zero ensures
the size1 checks work as expected.

Link: https://lore.kernel.org/r/20241216175632.4175-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
1f82b7e84a PCI: Use min_align, not unrelated add_align, for size0
Commit 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The required alignment given for
pbus_upstream_space_available(), however, was add_align which relates
only to size1 alignment.

As pbus_upstream_space_available() only selects between normal and relaxed
tail alignment of the bridge window, the different alignment only makes
relaxed tail alignment to be used more often than what was intended, which
should be harmless because relaxed tail alignment itself should work in all
cases.

For consistency, change pbus_upstream_space_available() call to use
min_align which is the alignment that is going to be used for the bridge
window in the case where size0 sized allocation is attempted.

Link: https://lore.kernel.org/r/20241216175632.4175-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Ilpo Järvinen
d06cc1e380 PCI: Remove add_align overwrite unrelated to size0
Commit 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The change, however, also overwrote
add_align, which is only related to case where optional size1 related
entry is added into realloc head.

Correct this by removing the add_align overwrite.

Link: https://lore.kernel.org/r/20241216175632.4175-2-ilpo.jarvinen@linux.intel.com
Fixes: 566f1dd528 ("PCI: Relax bridge window tail sizing rules")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18 15:40:52 -06:00
Kai-Heng Feng
1a596ad00f PCI: Use downstream bridges for distributing resources
7180c1d086 ("PCI: Distribute available resources for root buses, too")
breaks BAR assignment on some devices:

  pci 0006:03:00.0: BAR 0 [mem 0x6300c0000000-0x6300c1ffffff 64bit pref]: assigned
  pci 0006:03:00.1: BAR 0 [mem 0x6300c2000000-0x6300c3ffffff 64bit pref]: assigned
  pci 0006:03:00.2: BAR 0 [mem size 0x00800000 64bit pref]: can't assign; no space
  pci 0006:03:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
  pci 0006:03:00.1: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space

The apertures of domain 0006 before 7180c1d086:

  6300c0000000-63ffffffffff : PCI Bus 0006:00
    6300c0000000-6300c9ffffff : PCI Bus 0006:01
      6300c0000000-6300c9ffffff : PCI Bus 0006:02        # 160MB
        6300c0000000-6300c8ffffff : PCI Bus 0006:03      #   144MB
          6300c0000000-6300c1ffffff : 0006:03:00.0       #     32MB
          6300c2000000-6300c3ffffff : 0006:03:00.1       #     32MB
          6300c4000000-6300c47fffff : 0006:03:00.2       #      8MB
          6300c4800000-6300c67fffff : 0006:03:00.0       #     32MB
          6300c6800000-6300c87fffff : 0006:03:00.1       #     32MB
        6300c9000000-6300c9bfffff : PCI Bus 0006:04      #    12MB
          6300c9000000-6300c9bfffff : PCI Bus 0006:05    #    12MB
            6300c9000000-6300c91fffff : PCI Bus 0006:06  #      2MB
            6300c9200000-6300c93fffff : PCI Bus 0006:07  #      2MB
            6300c9400000-6300c95fffff : PCI Bus 0006:08  #      2MB
            6300c9600000-6300c97fffff : PCI Bus 0006:09  #      2MB

After 7180c1d086:

  6300c0000000-63ffffffffff : PCI Bus 0006:00
    6300c0000000-6300c9ffffff : PCI Bus 0006:01
      6300c0000000-6300c9ffffff : PCI Bus 0006:02        # 160MB
        6300c0000000-6300c43fffff : PCI Bus 0006:03      #    68MB
          6300c0000000-6300c1ffffff : 0006:03:00.0       #      32MB
          6300c2000000-6300c3ffffff : 0006:03:00.1       #      32MB
              --- no space ---      : 0006:03:00.2       #       8MB
              --- no space ---      : 0006:03:00.0       #      32MB
              --- no space ---      : 0006:03:00.1       #      32MB
        6300c4400000-6300c4dfffff : PCI Bus 0006:04      #    10MB
          6300c4400000-6300c4dfffff : PCI Bus 0006:05    #      10MB
            6300c4400000-6300c45fffff : PCI Bus 0006:06  #        2MB
            6300c4600000-6300c47fffff : PCI Bus 0006:07  #        2MB
            6300c4800000-6300c49fffff : PCI Bus 0006:08  #        2MB
            6300c4a00000-6300c4bfffff : PCI Bus 0006:09  #        2MB

We can see that the window to 0006:03 gets shrunken too much and 0006:04
eats away the window for 0006:03:00.2.

The offending commit distributes the upstream bridge's resources multiple
times to every downstream bridge, hence makes the aperture smaller than
desired because calculation of io_per_b, mmio_per_b and mmio_pref_per_b
becomes incorrect.

Instead, distribute downstream bridges' own resources to resolve the issue.

Link: https://lore.kernel.org/r/20241204022457.51322-1-kaihengf@nvidia.com
Fixes: 7180c1d086 ("PCI: Distribute available resources for root buses, too")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219540
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Carol Soto <csoto@nvidia.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Chris Chiu <chris.chiu@canonical.com>
2025-02-18 14:46:55 -06:00
Linus Torvalds
78a632a208 pci-v6.14-fixes-3
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmev0oAUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxeNBAAunByLPVxbteqIXJ6IZHz1956SDfo
 batHJodeNCxeVYpjLca+1asqZ75hply3pOK+DUKyNUyEqkXfQDvSZCBfo70uA9ur
 eSeaZxBoNS7VnOvmw5w0kEWd2Sx0gEkIPuxegympOfTiWaN8bGoryPHAuGnO0Exz
 YTIv+TtLAag96OFswQGvSZyh8AfCg5QXl6vZ0W7Ex+O24o/LJ9sO9gALkhFXoc2+
 tdKBJSOQ3dfEVa0S6btNxsY5nsy7xp3CxLYqQTvT2kES0xUwK7QrafOdl0kEaVDB
 6JRkFhKaK5R6WDz5d4LnU0mzSrBc17cQD+C6StUd4dzQdzyuiP4X70R+kMiu71ff
 XMSBPSbLcP1iQ50O3IdkbV5OVnn7ap3LPP2v/PYcmoYNc3i81hvZap5RDaEnJ/ej
 xI+bKullZG787jl+qnweNsu9jar9HWCJOGwMCZIKiXoU8fJdsI+b+53FNxFqqWAR
 4DAsu9nGwlkoEFHaDFKeoG/qBrW65UAeIetUoh+dpQ19OaD7vUuv6Uk4aiwbDFx3
 jGNBOS1Cwx3aIhLNrg3wxSmajFHaPlI+yElS+PcP/J9cV1YDXV7U/ZKaYTVwU0TZ
 l3TY72q3/ckP7+9NaxExO6ow59cN1XNncBkAcOEoSCfFIFnSmvxC8v6Vpwc2ZNQN
 o3N9euTROFouZeA=
 =HYty
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Update a BUILD_BUG_ON() usage that works on current compilers, but
   breaks compilation on gcc 5.3.1 (Alex Williamson)

 - Avoid use of FLR for Mediatek MT7922 WiFi; the device previously
   worked after a long timeout and fallback to SBR, but after a recent
   RRS change it doesn't work at all after FLR (Bjorn Helgaas)

* tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Avoid FLR for Mediatek MT7922 WiFi
  PCI: Fix BUILD_BUG_ON usage for old gcc
2025-02-14 16:49:07 -08:00
Ilpo Järvinen
addb30c5bd PCI: Cleanup dev->resource + resno to use pci_resource_n()
Replace pointer arithmetic in finding the correct resource entry with the
pci_resource_n() helper.

Link: https://lore.kernel.org/r/20250207162301.2842-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-14 14:39:31 -06:00
Bjorn Helgaas
81f64e925c PCI: Avoid FLR for Mediatek MT7922 WiFi
The Mediatek MT7922 WiFi device advertises FLR support, but it apparently
does not work, and all subsequent config reads return ~0:

  pci 0000:01:00.0: [14c3:0616] type 00 class 0x028000 PCIe Endpoint
  pciback 0000:01:00.0: not ready 65535ms after FLR; giving up

After an FLR, pci_dev_wait() waits for the device to become ready.  Prior
to d591f6804e ("PCI: Wait for device readiness with Configuration RRS"),
it polls PCI_COMMAND until it is something other that PCI_POSSIBLE_ERROR
(~0).  If it times out, pci_dev_wait() returns -ENOTTY and
__pci_reset_function_locked() tries the next available reset method.
Typically this is Secondary Bus Reset, which does work, so the MT7922 is
eventually usable.

After d591f6804e, if Configuration Request Retry Status Software
Visibility (RRS SV) is enabled, pci_dev_wait() polls PCI_VENDOR_ID until it
is something other than the special 0x0001 Vendor ID that indicates a
completion with RRS status.

When RRS SV is enabled, reads of PCI_VENDOR_ID should return either 0x0001,
i.e., the config read was completed with RRS, or a valid Vendor ID.  On the
MT7922, it seems that all config reads after FLR return ~0 indefinitely.
When pci_dev_wait() reads PCI_VENDOR_ID and gets 0xffff, it assumes that's
a valid Vendor ID and the device is now ready, so it returns with success.

After pci_dev_wait() returns success, we restore config space and continue.
Since the MT7922 is not actually ready after the FLR, the restore fails and
the device is unusable.

We considered changing pci_dev_wait() to continue polling if a
PCI_VENDOR_ID read returns either 0x0001 or 0xffff.  This "works" as it did
before d591f6804e, although we have to wait for the timeout and then fall
back to SBR.  But it doesn't work for SR-IOV VFs, which *always* return
0xffff as the Vendor ID.

Mark Mediatek MT7922 WiFi devices to avoid the use of FLR completely.  This
will cause fallback to another reset method, such as SBR.

Link: https://lore.kernel.org/r/20250212193516.88741-1-helgaas@kernel.org
Fixes: d591f6804e ("PCI: Wait for device readiness with Configuration RRS")
Link: https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
Link: https://lore.kernel.org/r/Z4pHll_6GX7OUBzQ@mail-itl
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: stable@vger.kernel.org
2025-02-13 08:36:54 -06:00
Alex Williamson
472ff48e2c PCI: Fix BUILD_BUG_ON usage for old gcc
As reported in the below link, it seems older versions of gcc cannot
determine that the howmany variable is known for all callers.  Include
a test so that newer compilers can enforce this sanity check and older
compilers can still work.  Add __always_inline attribute to give the
compiler an even better chance to know the inputs.

Link: https://lore.kernel.org/r/20250212185337.293023-1-alex.williamson@redhat.com
Fixes: 4453f36086 ("PCI: Batch BAR sizing operations")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/all/20250209154512.GA18688@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com>
2025-02-12 16:16:21 -06:00
Robin Murphy
6f64b83d9f PCI/TPH: Restore TPH Requester Enable correctly
When we reenable TPH after changing a Steering Tag value, we need the
actual TPH Requester Enable value, not the ST Mode (which only happens to
work out by chance for non-extended TPH in interrupt vector mode).

Link: https://lore.kernel.org/r/13118098116d7bce07aa20b8c52e28c7d1847246.1738759933.git.robin.murphy@arm.com
Fixes: d2e8a34876 ("PCI/TPH: Add Steering Tag support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Wei Huang <wei.huang2@amd.com>
2025-02-06 10:30:11 -06:00
Ilpo Järvinen
7507eb3e7b PCI/ASPM: Fix L1SS saving
Commit 1db806ec06 ("PCI/ASPM: Save parent L1SS config in
pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both the
Upstream Port and its upstream bridge when handling an Upstream Port, which
matches what the L1SS restore side does. However, parent->state_saved can
be set true at an earlier time when the upstream bridge saved other parts
of its state. Then later when attempting to save the L1SS config while
handling the Upstream Port, parent->state_saved is true in
pci_save_aspm_l1ss_state() resulting in early return and skipping saving
bridge's L1SS config because it is assumed to be already saved. Later on
restore, junk is written into L1SS config which causes issues with some
devices.

Remove parent->state_saved check and unconditionally save L1SS config also
for the upstream bridge from an Upstream Port which ought to be harmless
from correctness point of view. With the Upstream Port check now present,
saving the L1SS config more than once for the bridge is no longer a problem
(unlike when the parent->state_saved check got introduced into the fix
during its development).

Link: https://lore.kernel.org/r/20250131152913.2507-1-ilpo.jarvinen@linux.intel.com
Fixes: 1db806ec06 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Reported by: Rafael J. Wysocki <rafael@kernel.org>
Closes: https://lore.kernel.org/r/CAJZ5v0iKmynOQ5vKSQbg1J_FmavwZE-nRONovOZ0mpMVauheWg@mail.gmail.com
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/d7246feb-4f3f-4d0c-bb64-89566b170671@molgen.mpg.de
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360
2025-02-06 09:54:38 -06:00
Takashi Iwai
d555ed45a5 PCI: Restore original INTX_DISABLE bit by pcim_intx()
pcim_intx() tries to restore the INTx bit at removal via devres, but there
is a chance that it restores a wrong value.

Because the value to be restored is blindly assumed to be the negative of
the enable argument, when a driver calls pcim_intx() unnecessarily for the
already enabled state, it'll restore to the disabled state in turn.  That
is, the function assumes the case like:

  // INTx == 1
  pcim_intx(pdev, 0); // old INTx value assumed to be 1 -> correct

but it might be like the following, too:

  // INTx == 0
  pcim_intx(pdev, 0); // old INTx value assumed to be 1 -> wrong

Also, when a driver calls pcim_intx() multiple times with different enable
argument values, the last one will win no matter what value it is.  This
can lead to inconsistency, e.g.

  // INTx == 1
  pcim_intx(pdev, 0); // OK
  ...
  pcim_intx(pdev, 1); // now old INTx wrongly assumed to be 0

This patch addresses those inconsistencies by saving the original INTx
state at the first pcim_intx() call.  For that, get_or_create_intx_devres()
is folded into pcim_intx() caller side; it allows us to simply check the
already allocated devres and record the original INTx along with the
devres_alloc() call.

Link: https://lore.kernel.org/r/20241031134300.10296-1-tiwai@suse.de
Fixes: 25216afc9d ("PCI: Add managed pcim_intx()")
Link: https://lore.kernel.org/87v7xk2ps5.wl-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Philipp Stanner <pstanner@redhat.com>
Cc: stable@vger.kernel.org	# v6.11+
2025-01-27 12:55:12 -06:00
Linus Torvalds
647d69605c pci-v6.14-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmeTr8wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxrMw//TJXH+U6x5LhYvBPD/KZ20ecGHqaA
 eGXrbHAasYbU1CfW7HM0onR8NffOIGoYvQrtefjQAln0w6rTvyFO0xJKLP15vMfN
 hnj+y1WWtKwAkSpu10Cl9nTj8uYRNNSQeoy5kS+1diwuXdby/DlgQONO2APSe9zd
 KMPXJcqSfDJlM5zHrcqqtlxauO9KHInLCc/iutd85AKjvcjOoNHNeZE0pTC0C3gE
 sXYHDqJiS3zdEG6X6mWFo3OzI/Q/7NGlHJ2j0CQaObsgQ9yA7eWkez25ifwZcugc
 TPtjm8DhaDo9/zx0NV9c2dPauHRC6NYUjAflMPK7Aye/41BE1Ag5Ka+tMDgC2i/N
 TbfBxSeArhjnjY+eZwRhrJNNC58TtHTUs69TO7Dbmuwr7cp99MIEDAYI5V6LFAdk
 plKqn1h8FztW5QKRPCgmzy6KTE+WPytiGAGAQFxzIGYkV/QqyvFaVs8FIyOJUIFM
 aDSa6Xy5WLGxmPZ9hPapzEm4ws/HTRpFjNgi/d4rRG5RWMwAxZZa44s9eldhN1D/
 ZwEmF2rJ+U8S7Q+mXPHDlwcsHe5APCbiaTEp4X+e3LNe0i9oxhhaUWG6LrDDmTlQ
 tU5j5daHiBa0nTDL1lfaayJlYX/oJ+IYQrIYzGbnivZv4ZVdPnuWSsOsMOEiLhEt
 4QqCoanqf0mCn2A=
 =O62O
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Batch sizing of multiple BARs while memory decoding is disabled
     instead of disabling/enabling decoding for each BAR individually;
     this optimizes virtualized environments where toggling decoding
     enable is expensive (Alex Williamson)

   - Add host bridge .enable_device() and .disable_device() hooks for
     bridges that need to configure things like Requester ID to StreamID
     mapping when enabling devices (Frank Li)

   - Extend struct pci_ecam_ops with .enable_device() and
     .disable_device() hooks so drivers that use pci_host_common_probe()
     instead of their own .probe() have a way to set the
     .enable_device() callbacks (Marc Zyngier)

   - Drop 'No bus range found' message so we don't complain when DTs
     don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)

   - Rename the drivers/pci/of_property.c struct of_pci_range to
     of_pci_range_entry to avoid confusion with the global of_pci_range
     in include/linux/of_address.h (Bjorn Helgaas)

  Driver binding:

   - Update resource request API documentation to encourage callers to
     supply a driver name when requesting resources (Philipp Stanner)

   - Export pci_intx_unmanaged() and pcim_intx() (always managed) so
     callers of pci_intx() (which is sometimes managed) can explicitly
     choose the one they need (Philipp Stanner)

   - Convert drivers from pci_intx() to always-managed pcim_intx() or
     never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
     pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
     ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)

   - Remove pci_intx_unmanaged() since pci_intx() is now always
     unmanaged and pcim_intx() is always managed (Philipp Stanner)

  Error handling:

   - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
     logging rather than building their own (Ilpo Järvinen)

   - Move TLP Log handling to its own file (Ilpo Järvinen)

   - Store number of supported End-End TLP Prefixes always so we can
     read the correct number of DWORDs from the TLP Prefix Log (Ilpo
     Järvinen)

   - Read TLP Prefixes in addition to the Header Log in
     pcie_read_tlp_log() (Ilpo Järvinen)

   - Add pcie_print_tlp_log() to consolidate printing of TLP Header and
     Prefix Log (Ilpo Järvinen)

   - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
     BIOSes that don't configure it correctly (Takashi Iwai)

  ASPM:

   - Save parent L1 PM Substates config so when we restore it along with
     an endpoint's config, the parent info isn't junk (Jian-Hong Pan)

  Power management:

   - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
     the system can't wake up from suspend (Werner Sembach)

  Endpoint framework:

   - Destroy the EPC device in devm_pci_epc_destroy(), which previously
     didn't call devres_release() (Zijun Hu)

   - Finish virtual EP removal in pci_epf_remove_vepf(), which
     previously caused a subsequent pci_epf_add_vepf() to fail with
     -EBUSY (Zijun Hu)

   - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
     don't depend on the BAR_MASK reset value being larger than the
     requested BAR size (Niklas Cassel)

   - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
     reads from bypassing the iATU if we reduced the BAR size (Niklas
     Cassel)

   - Verify address alignment when programming iATU so we don't attempt
     to write bits that are read-only because of the BAR size, which
     could lead to directing accesses to the wrong address (Niklas
     Cassel)

   - Implement artpec6 pci_epc_features so we can rely on all drivers
     supporting it so we can use it in EPC core code (Niklas Cassel)

   - Check for BARs of fixed size to prevent endpoint drivers from
     trying to change their size (Niklas Cassel)

   - Verify that requested BAR size is a power of two when endpoint
     driver sets the BAR (Niklas Cassel)

  Endpoint framework tests:

   - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
     dma_chan_rx (Mohamed Khalfella)

   - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
     supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)

   - Add pci-epf-test and pci_endpoint_test support for capabilities
     (Niklas Cassel)

   - Add Endpoint test for consecutive BARs (Niklas Cassel)

   - Remove redundant comparison from Endpoint BAR test because a > 1MB
     BAR can always be exactly covered by iterating with a 1MB buffer
     (Hans Zhang)

   - Move and convert PCI Endpoint tests from tools/pci to Kselftests
     (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Convert StreamID mapping configuration from a bus notifier to the
     .enable_device() and .disable_device() callbacks (Marc Zyngier)

  Freescale i.MX6 PCIe controller driver:

   - Add Requester ID to StreamID mapping configuration when enabling
     devices (Frank Li)

   - Use DWC core suspend/resume functions for imx6 (Frank Li)

   - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
     Zhu)

   - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
     i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
     Li)

   - Add DT binding for optional i.MX95 Refclk and driver support to
     enable it if the platform hasn't enabled it (Richard Zhu)

   - Configure PHY based on controller being in Root Complex or Endpoint
     mode (Frank Li)

   - Rely on dbi2 and iATU base addresses from DT via
     dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)

   - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
     asserted in imx_pcie_assert_core_reset() (Richard Zhu)

   - Add missing reference clock enable or disable logic for IMX6SX,
     IMX7D, IMX8MM (Richard Zhu)

   - Remove redundant imx7d_pcie_init_phy() since
     imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)

  Freescale Layerscape PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead
     of syscon_regmap_lookup_by_phandle() followed by
     of_property_read_u32_array() (Krzysztof Kozlowski)

  Marvell MVEBU PCIe controller driver:

   - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)

  MediaTek PCIe Gen3 controller driver:

   - Use clk_bulk_prepare_enable() instead of separate
     clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)

   - Rearrange reset assert/deassert so they're both done in the
     *_power_up() callbacks (Lorenzo Bianconi)

   - Document that Airoha EN7581 requires PHY init and power-on before
     PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
     Bianconi)

   - Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
     method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)

   - Sleep instead of delay during Airoha EN7581 power-up, since this is
     a non-atomic context (Lorenzo Bianconi)

   - Skip PERST# assertion on Airoha EN7581 during probe and
     suspend/resume to avoid a hardware defect (Lorenzo Bianconi)

   - Enable async probe to reduce system startup time (Douglas Anderson)

  Microchip PolarFlare PCIe controller driver:

   - Set up the inbound address translation based on whether the
     platform allows coherent or non-coherent DMA (Daire McNamara)

   - Update DT binding such that platforms are DMA-coherent by default
     and must specify 'dma-noncoherent' if needed (Conor Dooley)

  Mobiveil PCIe controller driver:

   - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
     and 'reg-names' (Frank Li)

  Qualcomm PCIe controller driver:

   - Add DT SM8550 and SM8650 optional 'global' interrupt for link
     events (Neil Armstrong)

   - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
     Mylavarapu)

   - If 'global' IRQ is supported for detection of Link Up events, tell
     DWC core not to wait for link up (Krishna chaitanya chundru)

  Renesas R-Car PCIe controller driver:

   - Avoid passing stack buffer as resource name (King Dix)

  Rockchip PCIe controller driver:

   - Simplify clock and reset handling by using bulk interfaces (Anand
     Moon)

   - Pass typed rockchip_pcie (not void) pointer to
     rockchip_pcie_disable_clocks() (Anand Moon)

   - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
     (Dan Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Use dll_link_up IRQ to detect Link Up and enumerate devices so
     users don't have to manually rescan (Niklas Cassel)

   - Tell DWC core not to wait for link up since the 'sys' interrupt is
     required and detects Link Up events (Niklas Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Don't wait for link up in DWC core if driver can detect Link Up
     event (Krishna chaitanya chundru)

   - Update ICC and OPP votes after Link Up events (Krishna chaitanya
     chundru)

   - Always stop link in dw_pcie_suspend_noirq(), which is required at
     least for i.MX8QM to re-establish link on resume (Richard Zhu)

   - Drop racy and unnecessary LTSSM state check before sending
     PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)

   - Add struct of_pci_range.parent_bus_addr for devices that need their
     immediate parent bus address, not the CPU address, e.g., to program
     an internal Address Translation Unit (iATU) (Frank Li)

  TI DRA7xx PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
     syscon_regmap_lookup_by_phandle() followed by
     of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
     (Krzysztof Kozlowski)

  Xilinx Versal CPM PCIe controller driver:

   - Add DT binding and driver support for Xilinx Versal CPM5
     (Thippeswamy Havalige)

  MicroSemi Switchtec management driver:

   - Add Microchip PCI100X device IDs (Rakesh Babu Saladi)

  Miscellaneous:

   - Move reset related sysfs code from pci.c to pci-sysfs.c where other
     similar code lives (Ilpo Järvinen)

   - Simplify reset_method_store() memory management by using __free()
     instead of explicit kfree() cleanup (Ilpo Järvinen)

   - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
     ACPI hotplug driver (Thomas Weißschuh)

   - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
     Zhang)

   - Correct documentation of the 'config_acs=' kernel parameter
     (Akihiko Odaki)"

* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
  PCI: Batch BAR sizing operations
  dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
  PCI: microchip: Set inbound address translation for coherent or non-coherent mode
  Documentation: Fix pci=config_acs= example
  PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
  PCI: Don't include 'pm_wakeup.h' directly
  selftests: pci_endpoint: Migrate to Kselftest framework
  selftests: Move PCI Endpoint tests from tools/pci to Kselftests
  misc: pci_endpoint_test: Fix IOCTL return value
  dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
  dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
  dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
  PCI: switchtec: Add Microchip PCI100X device IDs
  misc: pci_endpoint_test: Remove redundant 'remainder' test
  misc: pci_endpoint_test: Add consecutive BAR test
  misc: pci_endpoint_test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
  PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
  PCI: dwc: Simplify config resource lookup
  ...
2025-01-25 16:03:40 -08:00
Bjorn Helgaas
10ff5bbfd4 Merge branch 'pci/misc'
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI
  hotplug driver (Thomas Weißschuh)

- Update PCI_EXP_LNKCAP_SLS comment (Lukas Wunner)

- Drop superfluous pm_wakeup.h include (Wolfram Sang)

- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang)

- Correct documentation of the 'config_acs=' kernel parameter (Akihiko
  Odaki)

* pci/misc:
  Documentation: Fix pci=config_acs= example
  PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
  PCI: Don't include 'pm_wakeup.h' directly
  PCI: Update code comment on PCI_EXP_LNKCAP_SLS for PCIe r3.0
  PCI/ACPI: Constify 'struct bin_attribute'
  PCI/P2PDMA: Constify 'struct bin_attribute'
  PCI/VPD: Constify 'struct bin_attribute'
  PCI/sysfs: Constify 'struct bin_attribute'
2025-01-23 13:05:06 -06:00
Bjorn Helgaas
4d3bf4e845 Merge branch 'pci/controller/xilinx-cpm'
- Add DT binding and driver support for Xilinx Versal CPM5 (Thippeswamy
  Havalige)

* pci/controller/xilinx-cpm:
  PCI: xilinx-cpm: Add support for Versal CPM5 Root Port Controller 1
  dt-bindings: PCI: xilinx-cpm: Add compatible string for CPM5 host1
2025-01-23 13:05:06 -06:00
Bjorn Helgaas
d3f0bec2a4 Merge branch 'pci/controller/rockchip'
- Add struct rockchip_pcie_ep kernel-doc to fix warnings (Damien Le Moal)

- Simplify clock and reset handling by using bulk interfaces (Anand Moon)

- Pass typed rockchip_pcie (not void) pointer to
  rockchip_pcie_disable_clocks() (Anand Moon)

- Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails (Dan
  Carpenter)

* pci/controller/rockchip:
  PCI: rockchip-ep: Fix error code in rockchip_pcie_ep_init_ob_mem()
  PCI: rockchip: Refactor rockchip_pcie_disable_clocks() signature
  PCI: rockchip: Simplify reset control handling by using reset_control_bulk*() function
  PCI: rockchip: Simplify clock handling by using clk_bulk*() functions
  PCI: rockchip: Add missing fields descriptions for struct rockchip_pcie_ep
2025-01-23 13:05:05 -06:00
Bjorn Helgaas
a306f01eb5 Merge branch 'pci/controller/rcar-ep'
- Avoid passing stack buffer as resource name (King Dix)

* pci/controller/rcar-ep:
  PCI: rcar-ep: Fix incorrect variable used when calling devm_request_mem_region()
2025-01-23 13:05:05 -06:00
Bjorn Helgaas
1854b2e019 Merge branch 'pci/controller/mvebu'
- Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)

* pci/controller/mvebu:
  PCI: mvebu: Enable module autoloading
2025-01-23 13:05:05 -06:00
Bjorn Helgaas
8d35c2b0eb Merge branch 'pci/controller/microchip'
- Set up the inbound address translation based on whether the platform
  allows coherent or non-coherent DMA (Daire McNamara)

- Update DT binding such that platforms are DMA-coherent by default and
  must specify 'dma-noncoherent' if needed (Conor Dooley)

* pci/controller/microchip:
  dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
  PCI: microchip: Set inbound address translation for coherent or non-coherent mode
2025-01-23 13:05:04 -06:00
Bjorn Helgaas
1276ad01a2 Merge branch 'pci/controller/mediatek'
- Use clk_bulk_prepare_enable() instead of separate clk_bulk_prepare() and
  clk_bulk_enable() (Lorenzo Bianconi)

- Rearrange reset assert/deassert so they're both done in the *_power_up()
  callbacks (Lorenzo Bianconi)

- Document that Airoha EN7581 requires PHY init and power-on before PHY
  reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo Bianconi)

- Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
  method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)

- Sleep instead of delay during Airoha EN7581 power-up, since this is a
  non-atomic context (Lorenzo Bianconi)

- Skip PERST# assertion on Airoha EN7581 during probe and suspend/resume to
  avoid a hardware defect (Lorenzo Bianconi)

- Enable async probe to reduce system startup time (Douglas Anderson)

* pci/controller/mediatek:
  PCI: mediatek-gen3: Enable async probe by default
  PCI: mediatek-gen3: Avoid PCIe resetting via PERST# for Airoha EN7581 SoC
  PCI: mediatek-gen3: Rely on msleep() in mtk_pcie_en7581_power_up()
  PCI: mediatek-gen3: Move reset delay in mtk_pcie_en7581_power_up()
  PCI: mediatek-gen3: Add comment about initialization order in mtk_pcie_en7581_power_up()
  PCI: mediatek-gen3: Move reset/assert callbacks in .power_up()
  PCI: mediatek-gen3: Rely on clk_bulk_prepare_enable() in mtk_pcie_en7581_power_up()
2025-01-23 13:05:04 -06:00
Bjorn Helgaas
09b7c1627b Merge branch 'pci/controller/layerscape'
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
  syscon_regmap_lookup_by_phandle() followed by
  of_property_read_u32_array() (Krzysztof Kozlowski)

* pci/controller/layerscape:
  PCI: layerscape: Use syscon_regmap_lookup_by_phandle_args
2025-01-23 13:05:03 -06:00
Bjorn Helgaas
5b9c74b635 Merge branch 'pci/controller/imx6'
- Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
  i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank Li)

- Add DT binding for optional i.MX95 Refclk and driver support to enable it
  if the platform hasn't enabled it (Richard Zhu)

- Configure PHY based on controller being in Root Complex or Endpoint mode
  (Frank Li)

- Rely on dbi2 and iATU base addresses from DT via dw_pcie_get_resources()
  instead of hardcoding them in imx6 (Richard Zhu)

- Skip controller_id computation for i.MX7D since it only has one
  controller (Richard Zhu)

- Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
  asserted in imx_pcie_assert_core_reset() (Richard Zhu)

- Add missing reference clock enable or disable logic for IMX6SX, IMX7D,
  IMX8MM (Richard Zhu)

- Remove redundant imx7d_pcie_init_phy() since imx7d_pcie_enable_ref_clk()
  does the same thing (Richard Zhu)

* pci/controller/imx6:
  PCI: imx6: Clean up comments and whitespace
  PCI: imx6: Remove surplus imx7d_pcie_init_phy() function
  PCI: imx6: Add missing reference clock disable logic
  PCI: imx6: Deassert apps_reset in imx_pcie_deassert_core_reset()
  PCI: imx6: Skip controller_id generation logic for i.MX7D
  PCI: imx6: Fetch dbi2 and iATU base addesses from DT
  PCI: imx6: Configure PHY based on Root Complex or Endpoint mode
  PCI: imx6: Add Refclk for i.MX95 PCIe
  dt-bindings: PCI: fsl,imx6q-pcie: Add Refclk for i.MX95 RC
  PCI: imx6: Add i.MX8Q PCIe Endpoint (EP) support
  dt-bindings: PCI: fsl,imx6q-pcie-ep: Add compatible string fsl,imx8q-pcie-ep

# Conflicts:
#	drivers/pci/controller/dwc/pci-imx6.c
2025-01-23 13:05:03 -06:00
Bjorn Helgaas
349b434b7a Merge branch 'pci/controller/dwc'
- Fix potential string truncation in dw_pcie_edma_irq_verify() (Niklas
  Cassel)

- Don't wait for link up in DWC core if driver can detect Link Up event
  (Krishna chaitanya chundru)

- If qcom 'global' IRQ is supported for detection of Link Up events, tell
  DWC core not to wait for link up (Krishna chaitanya chundru)

- Update ICC and OPP votes after Link Up events (Krishna chaitanya chundru)

- Use dw-rockchip dll_link_up IRQ to detect Link Up and enumerate devices
  so users don't have to manually rescan (Niklas Cassel)

- In dw-rockchip, the 'sys' interrupt is required and detects Link Up
  events, so tell DWC core not to wait for link up (Niklas Cassel)

- Always stop link in dw_pcie_suspend_noirq(), which is required at least
  for i.MX8QM to re-establish link on resume (Richard Zhu)

- Drop racy and unnecessary LTSSM state check before sending PME_TURN_OFF
  message in dw_pcie_suspend_noirq() (Richard Zhu)

- Add stubs for dw_pcie_suspend_noirq() dw_pcie_resume_noirq() when
  CONFIG_PCIE_DW_HOST is not defined so drivers don't need #ifdefs (Bjorn
  Helgaas)

- Use DWC core suspend/resume functions for imx6 (Frank Li)

- Add imx6 suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
  Zhu)

- Add struct of_pci_range.parent_bus_addr for devices that need their
  immediate parent bus address, not the CPU address, e.g., to program an
  internal Address Translation Unit (iATU) (Frank Li)

* pci/controller/dwc:
  PCI: dwc: Simplify config resource lookup
  of: address: Add parent_bus_addr to struct of_pci_range
  PCI: imx6: Add i.MX8MQ, i.MX8Q and i.MX95 PM support
  PCI: imx6: Use DWC common suspend resume method
  PCI: dwc: Add dw_pcie_suspend_noirq(), dw_pcie_resume_noirq() stubs for !CONFIG_PCIE_DW_HOST
  PCI: dwc: Remove LTSSM state test in dw_pcie_suspend_noirq()
  PCI: dwc: Always stop link in the dw_pcie_suspend_noirq
  PCI: dw-rockchip: Don't wait for link since we can detect Link Up
  PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ
  PCI: qcom: Update ICC and OPP values after Link Up event
  PCI: qcom: Don't wait for link if we can detect Link Up
  PCI: dwc: Don't wait for link up if driver can detect Link Up event
  PCI: dwc: Fix potential truncation in dw_pcie_edma_irq_verify()

# Conflicts:
#	drivers/pci/controller/dwc/pci-imx6.c
2025-01-23 13:04:59 -06:00
Bjorn Helgaas
8ee6c61633 Merge branch 'pci/controller/dra7xx'
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
  syscon_regmap_lookup_by_phandle() followed by
  of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
  (Krzysztof Kozlowski)

* pci/controller/dra7xx:
  PCI: dra7xx: Use syscon_regmap_lookup_by_phandle_args
2025-01-23 13:04:54 -06:00
Bjorn Helgaas
ec3619abbf Merge branch 'pci/controller/iommu-map'
- Add host bridge .enable_device() and .disable_device() hooks for bridges
  that need to configure things like Requester ID to StreamID mapping when
  enabling devices (Frank Li)

- Add imx6 Requester ID to StreamID mapping configuration when enabling
  devices (Frank Li)

- Extend struct pci_ecam_ops with .enable_device() and .disable_device()
  hooks so drivers that use pci_host_common_probe() instead of their own
  .probe() have a way to set the .enable_device() callbacks (Marc Zyngier)

- Convert pcie-apple StreamID mapping configuration from a bus notifier to
  the .enable_device() and .disable_device() callbacks (Marc Zyngier)

* pci/controller/iommu-map:
  PCI: apple: Convert to {en,dis}able_device() callbacks
  PCI: host-generic: Allow {en,dis}able_device() to be provided via pci_ecam_ops
  PCI: imx6: Add IOMMU and ITS MSI support for i.MX95
  PCI: Add enable_device() and disable_device() callbacks for bridges
2025-01-23 13:04:53 -06:00
Bjorn Helgaas
e321b10b83 Merge branch 'pci/endpoint-test'
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
  dma_chan_rx (Mohamed Khalfella)

- Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports
  both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)

- Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas
  Cassel)

- Add Endpoint test for consecutive BARs (Niklas Cassel)

- Remove redundant comparison from Endpoint BAR test because a > 1MB BAR
  can always be exactly covered by iterating with a 1MB buffer (Hans Zhang)

- Correct the PCI Endpoint test IOCTL return value (Manivannan Sadhasivam)

- Move PCI Endpoint tests from tools/pci to Kselftests (Manivannan
  Sadhasivam)

- Convert PCI Endpoint tests to the Kselftest framework (Manivannan
  Sadhasivam)

* pci/endpoint-test:
  selftests: pci_endpoint: Migrate to Kselftest framework
  selftests: Move PCI Endpoint tests from tools/pci to Kselftests
  misc: pci_endpoint_test: Fix IOCTL return value
  misc: pci_endpoint_test: Remove redundant 'remainder' test
  misc: pci_endpoint_test: Add consecutive BAR test
  misc: pci_endpoint_test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
  PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
2025-01-23 13:04:53 -06:00
Bjorn Helgaas
74855f6697 Merge branch 'pci/endpoint'
- Destroy the EPC device in devm_pci_epc_destroy(), which previously didn't
  call devres_release() (Zijun Hu)

- Simplify pci_epc_get() with class_find_device_by_name() (Zijun Hu)

- Finish virtual EP removal in pci_epf_remove_vepf(), which previously
  caused a subsequent pci_epf_add_vepf() to fail with -EBUSY (Zijun Hu)

- Write BAR_MASK before iATU registers in pci_epc_set_bar() so we don't
  depend on the BAR_MASK reset value being larger than the requested BAR
  size (Niklas Cassel)

- Prevent changing BAR size/flags in pci_epc_set_bar() to prevent reads
  from bypassing the iATU if we reduced the BAR size (Niklas Cassel)

- Verify address alignment when programming iATU so we don't attempt to
  write bits that are read-only because of the BAR size, which could lead
  to directing accesses to the wrong address (Niklas Cassel)

- Implement artpec6 pci_epc_features so we can rely on all drivers
  supporting it so we can use it in EPC core code (Niklas Cassel)

- Check for BARs of fixed size to prevent endpoint drivers from trying to
  change their size (Niklas Cassel)

- Verify that requested BAR size is a power of two when endpoint driver
  sets the BAR (Niklas Cassel)

* pci/endpoint:
  PCI: endpoint: Verify that requested BAR size is a power of two
  PCI: endpoint: Add size check for fixed size BARs in pci_epc_set_bar()
  PCI: artpec6: Implement dw_pcie_ep operation get_features
  PCI: dwc: ep: Add 'address' alignment to 'size' check in dw_pcie_prog_ep_inbound_atu()
  PCI: dwc: ep: Prevent changing BAR size/flags in pci_epc_set_bar()
  PCI: dwc: ep: Write BAR_MASK before iATU registers in pci_epc_set_bar()
  PCI: endpoint: Finish virtual EP removal in pci_epf_remove_vepf()
  PCI: endpoint: Simplify pci_epc_get()
  PCI: endpoint: Destroy the EPC device in devm_pci_epc_destroy()
  PCI: endpoint: Replace magic number '6' by PCI_STD_NUM_BARS
2025-01-23 13:04:52 -06:00
Bjorn Helgaas
770b18a541 Merge branch 'pci/switchtec'
- Add Microchip PCI100X device IDs (Rakesh Babu Saladi)

* pci/switchtec:
  PCI: switchtec: Add Microchip PCI100X device IDs
2025-01-23 13:04:52 -06:00
Bjorn Helgaas
60b5cd4f82 Merge branch 'pci/pci-sysfs'
- Move reset related sysfs code from pci.c to pci-sysfs.c where other
  similar code lives (Ilpo Järvinen)

- Simplify reset_method_store() memory management by using __free() instead
  of explicit kfree() cleanup (Ilpo Järvinen)

- Drop unnecessary zero initializer (Ilpo Järvinen)

* pci/pci-sysfs:
  PCI/sysfs: Remove unnecessary zero in initializer
  PCI/sysfs: Use __free() in reset_method_store()
  PCI/sysfs: Move reset related sysfs code to correct file
2025-01-23 13:04:51 -06:00
Bjorn Helgaas
ccbd884f9e Merge branch 'pci/of'
- Unexport of_pci_parse_bus_range() since it's only used in of.c (Bjorn
  Helgaas)

- Drop 'No bus range found' message so we don't complain when DTs don't
  specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)

- Simplify devm_of_pci_get_host_bridge_resources() interface by dropping
  parameters that are always the same default values (Bjorn Helgaas)

- Update comment reference to of_pci_get_host_bridge_resources(), which no
  longer exists (Bjorn Helgaas)

- Rename the drivers/pci/of_property.c struct of_pci_range to
  of_pci_range_entry to avoid confusion with the global of_pci_range in
  include/linux/of_address.h (Bjorn Helgaas)

* pci/of:
  PCI: of_property: Rename struct of_pci_range to of_pci_range_entry
  sparc/PCI: Update reference to devm_of_pci_get_host_bridge_resources()
  PCI: of: Simplify devm_of_pci_get_host_bridge_resources() interface
  PCI: of: Drop 'No bus range found' message
  PCI: Unexport of_pci_parse_bus_range()
2025-01-23 13:04:51 -06:00
Bjorn Helgaas
f4a09274c5 Merge branch 'pci/err'
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging
  rather than building their own (Ilpo Järvinen)

- Move TLP Log handling to its own file (Ilpo Järvinen)

- Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen)

- Store number of supported End-End TLP Prefixes always so we can read the
  correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen)

- Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log()
  (Ilpo Järvinen)

- Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix
  Log (Ilpo Järvinen)

* pci/err:
  PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log
  PCI: Add TLP Prefix reading to pcie_read_tlp_log()
  PCI: Store number of supported End-End TLP Prefixes
  PCI: Use unsigned int i in pcie_read_tlp_log()
  PCI: Use same names in pcie_read_tlp_log() prototype and definition
  PCI: Add defines for TLP Header/Prefix log sizes
  PCI: Move TLP Log handling to its own file
  PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
2025-01-23 13:04:50 -06:00
Bjorn Helgaas
5a1d568ad1 Merge branch 'pci/enumeration'
- Batch sizing of multiple BARs while memory decoding is disabled instead
  of disabling/enabling decoding for each BAR individually; this optimizes
  virtualized environments where toggling decoding enable is expensive
  (Alex Williamson)

* pci/enumeration:
  PCI: Batch BAR sizing operations
2025-01-23 13:04:50 -06:00
Bjorn Helgaas
06fd49ef47 Merge branch 'pci/dpc'
- Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor BIOSes
  that don't configure it correctly (Takashi Iwai)

* pci/dpc:
  PCI/DPC: Quirk PIO log size for Intel Raptor Lake-P
2025-01-23 13:04:50 -06:00
Bjorn Helgaas
a3eda0e333 Merge branch 'pci/devres'
- Update resource request API documentation to encourage callers to supply
  a driver name when requesting resources (Philipp Stanner)

- Export pci_intx_unmanaged() and pcim_intx() (always managed) so callers
  of pci_intx() (which is sometimes managed) can explicitly choose the one
  they need (Philipp Stanner)

- Convert drivers from pci_intx() to always-managed pcim_intx() or
  never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
  pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna, ntb,
  qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)

- Remove pci_intx_unmanaged() since pci_intx() is now always unmanaged and
  pcim_intx() is always managed (Philipp Stanner)

* pci/devres:
  PCI: Remove devres from pci_intx()
  net/ethernet: Use never-managed version of pci_intx()
  HID: amd_sfh: Use always-managed version of pcim_intx()
  wifi: qtnfmac: use always-managed version of pcim_intx()
  ata: Use always-managed version of pci_intx()
  PCI/MSI: Use never-managed version of pci_intx()
  vfio/pci: Use never-managed version of pci_intx()
  misc: Use never-managed version of pci_intx()
  ntb: Use never-managed version of pci_intx()
  drivers/xen: Use never-managed version of pci_intx()
  PCI: Export pci_intx_unmanaged() and pcim_intx()
  PCI: Encourage resource request API users to supply driver name
2025-01-23 13:04:49 -06:00
Alex Williamson
4453f36086 PCI: Batch BAR sizing operations
Toggling memory enable is free on bare metal, but potentially expensive
in virtualized environments as the device MMIO spaces are added and
removed from the VM address space, including DMA mapping of those spaces
through the IOMMU where peer-to-peer is supported.  Currently memory
decode is disabled around sizing each individual BAR, even for SR-IOV
BARs while VF Enable is cleared.

This can be better optimized for virtual environments by sizing a set
of BARs at once, stashing the resulting mask into an array, while only
toggling memory enable once.  This also naturally improves the SR-IOV
path as the caller becomes responsible for any necessary decode disables
while sizing BARs, therefore SR-IOV BARs are sized relying only on the
VF Enable rather than toggling the PF memory enable in the command
register.

Link: https://lore.kernel.org/r/20250120182202.1878581-1-alex.williamson@redhat.com
Reported-by: Mitchell Augustin <mitchell.augustin@canonical.com>
Link: https://lore.kernel.org/r/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com>
Reviewed-by: Mitchell Augustin <mitchell.augustin@canonical.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-23 11:05:20 -06:00
Linus Torvalds
641b0c64b8 A pretty quiet cycle this time around. We have a bunch of new Qualcomm clk
drivers, per usual, and then a handful of drivers for other SoCs. Then the
 usual pile of cleanups is fairly small data fixes or converting DT bindings to
 YAML so they can be validated. No changes to the core framework besides an OF
 node refcount bump that never got decremented.
 
 New Drivers:
  - 5L35023 variant of Versa 3 clock generator
  - Various Qualcomm clk controllers: IPQ CMN PLL, SM6115 LPASS, SM750 global,
    tcsr, rpmh, and display. X Plus GPU and global. QCS615 rpmh and MSM8937 and
    MSM8940 RPM.
  - Qualcomm Pongo and Taycan Alpha PLLs
  - Qualcomm IPQ5424 NoC-related interconnect clks
  - Renesas RZ/G3E (R9A09G047) SoC clk driver
  - SAMA7D65 SoC clk driver
  - Samsung Exynos990 SoC clk driver
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmeQNfYRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSUuwRAAkKea3uRcSkTgHK3Ts0gmf8L2QS+dL47N
 OFmqhhdF0gYU60kzsaU0A6UGvaagq/rkB8nvZJ6G8/wV6T0jXHmxuCmZ7uRaErpt
 4KDjpS9qQ8sl5LXpuxh9LgfxcOOfAueWRpmF/5alHEtAQLXKHKV5CdcyYa71pj40
 +LfjoaW6xaqx+G3lqJhakY77zKiRzxWH86XQS5CHD3DITkv3B5/dV/nQlAb3P083
 7SzHXKbBpWpXH0y0pLTXZDTVCsHl90t1DO7JKt9Y1fOxtpLB/ROfLPOJ4cZyCQGH
 Y28ZWDA9jEEX/cz/R2qPY3mRUPrFp2ArsXsx1rKlPTabp4NZLs3d9tZiMI/irK/W
 GTkRKMUZlDD5w6jSYgmSTbTj2CsTsPXc8EzsNIFudl6WyzyxWHvnpUb+hdrR2B+0
 untNOkwcb8GzgucYrbK5s/Aw03CiyGTYZHGJxsnIr7uSYRxe8mlV/cIbDcn5+WWj
 rrOcPatLEnCeE1Eldm6cOzFsLMbBVP9HeNkms91y2AJDx4mWn8qyY0psX+HaNyBm
 1YZBVmo2PiZ84ZEhiK7WhPPMaDyR2ZSQS0/U5FaB56G9+rtuVYs8Z7KFS3nK27Rh
 oKWcdKDn1wUmtUhVggC+m4PueOH3dlM0ELaRNKzePx9rEimjWhzfy5GlOvPoaBAl
 MKOVgeLYa4c=
 =wK9g
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "A pretty quiet cycle this time around. We have a bunch of new Qualcomm
  clk drivers, per usual, and then a handful of drivers for other SoCs.
  Then the usual pile of cleanups is fairly small data fixes or
  converting DT bindings to YAML so they can be validated.

  No changes to the core framework besides an OF node refcount bump that
  never got decremented.

  New Drivers:

   - 5L35023 variant of Versa 3 clock generator

   - Various Qualcomm clk controllers: IPQ CMN PLL, SM6115 LPASS, SM750
     global, tcsr, rpmh, and display. X Plus GPU and global. QCS615 rpmh
     and MSM8937 and MSM8940 RPM.

   - Qualcomm Pongo and Taycan Alpha PLLs

   - Qualcomm IPQ5424 NoC-related interconnect clks

   - Renesas RZ/G3E (R9A09G047) SoC clk driver

   - SAMA7D65 SoC clk driver

   - Samsung Exynos990 SoC clk driver"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (159 commits)
  clk: analogbits: Fix incorrect calculation of vco rate delta
  clk: bcm: rpi: Add disp clock
  clk: bcm: rpi: Create helper to retrieve private data
  clk: bcm: rpi: Enable minimize for all firmware clocks
  clk: bcm: rpi: Allow cpufreq driver to also adjust gpu clocks
  clk: bcm: rpi: Add ISP to exported clocks
  clk: stm32f4: support spread spectrum clock generation
  clk: stm32f4: use FIELD helpers to access the PLLCFGR fields
  dt-bindings: clock: st,stm32-rcc: support spread spectrum clocking
  dt-bindings: clock: convert stm32 rcc bindings to json-schema
  clk: Use str_enable_disable-like helpers
  clk: clk-loongson2: Fix the number count of clk provider
  clk: clk-loongson2: Switch to use devm_clk_hw_register_fixed_rate_parent_data()
  clk: starfive: Make _clk_get become a common helper function
  clk: en7523: Add clock for eMMC for EN7581
  dt-bindings: clock: add ID for eMMC for EN7581
  dt-bindings: clock: drop NUM_CLOCKS define for EN7581
  clk: en7523: Rework clock handling for different clock numbers
  clk: thead: Fix cpu2vp_clk for TH1520 AP_SUBSYS clocks
  clk: thead: Add CLK_IGNORE_UNUSED to fix TH1520 boot
  ...
2025-01-22 10:54:18 -08:00
Daire McNamara
1390a33b3d PCI: microchip: Set inbound address translation for coherent or non-coherent mode
On Microchip PolarFire SoC the PCIe Root Port can be behind one of three
general purpose Fabric Interface Controller (FIC) buses that encapsulates
an AXI-S bus. Depending on which FIC(s) the Root Port is connected through
to CPU space, and what address translation is done by that FIC, the Root
Port driver's inbound address translation may vary.

For all current supported designs and all future expected designs, inbound
address translation done by a FIC on PolarFire SoC varies depending on
whether PolarFire SoC is operating in coherent DMA mode or noncoherent DMA
mode.

The setup of the outbound address translation tables in the Root Port
driver only needs to handle these two cases.

Setup the inbound address translation tables to one of two address
translations, depending on whether the Root Port is being used with
coherent DMA or noncoherent DMA.

Link: https://lore.kernel.org/r/20241011140043.1250030-3-daire.mcnamara@microchip.com
Fixes: 6f15a9c9f9 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver")
Signed-off-by: Daire McNamara <daire.mcnamara@microchip.com>
[bhelgaas: adapt for ac7f53b7e7 ("PCI: microchip: Add support for using
either Root Port 1 or 2")]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
2025-01-21 17:34:56 -06:00
Wolfram Sang
816875a468 PCI: Don't include 'pm_wakeup.h' directly
The header clearly states that it does not want to be included directly,
only via 'device.h'. The 'platform_device.h' works equally well.

Thus, remove the direct inclusion.

Link: https://lore.kernel.org/r/20241118072917.3853-12-wsa+renesas@sang-engineering.com
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-21 17:29:07 -06:00
Linus Torvalds
4c551165e7 Updates for the interrupt subsystem:
- Consolidation of the machine_kexec_mask_interrupts() by providing a
     generic implementation and replacing the copy & pasta orgy in the
     relevant architectures.
 
   - Prevent unconditional operations on interrupt chips during kexec
     shutdown, which can trigger warnings in certain cases when the
     underlying interrupt has been shut down before.
 
   - Make the enforcement of interrupt handling in interrupt context
     unconditionally available, so that it actually works for non x86
     related interrupt chips. The earlier enablement for ARM GIC chips set
     the required chip flag, but did not notice that the check was hidden
     behind a config switch which is not selected by ARM[64].
 
   - Decrapify the handling of deferred interrupt affinity setting. Some
     interrupt chips require that affinity changes are made from the context
     of handling an interrupt to avoid certain race conditions. For x86 this
     was the default, but with interrupt remapping this requirement was
     lifted and a flag was introduced which tells the core code that
     affinity changes can be done in any context. Unrestricted affinity
     changes are the default for the majority of interrupt chips. RISCV has
     the requirement to add the deferred mode to one of it's interrupt
     controllers, but with the original implementation this would require to
     add the any context flag to all other RISC-V interrupt chips. That's
     backwards, so reverse the logic and require that chips, which need the
     deferred mode have to be marked accordingly. That avoids chasing the
     'sane' chips and marking them.
 
   - Add multi-node support to the Loongarch AVEC interrupt controller
     driver.
 
   - The usual tiny cleanups, fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmePkVITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRbQD/9bHVph/V9Ekl7JAX3aY4gG4JbRhOc7
 dp1VAcHRhktRfoTztYRbjsbMu2nvZ58GKA8bkOS2jHSF/m3PbkIJfOhwk0YdIAoa
 +kdy5yDgqCGfkqW43DN4Cr+CnzGjWMitw67tFp3fhwehMDpDjdt2L28IjtanSS0f
 hO6FV7o65MWeJwxk4Isb2/nvkO+X23Lrp6RrWS8SXBnF9FFXxiPIg/fiOPTizhCh
 1W/bSGxLLb9WwsVzmlGAKVFlXDij0QGaIUug2fdVZ63OsELXD7tJrLSPG133yk92
 ppIa0s6BT4IBsfM00us4hG15PkLuJmP3yWWcoquG0rP8Wq58VOXiN6+rcJIyvB+5
 mWceTH6IKfZGoRQKwXC7BxeBAIb147reiJtb06meq1/8ADIvzafiNy0c8x9i/UaV
 QiyhPVENjaGCGDomZmJQqN7Yb02Wge1k8InQnodDrHxZNl/bX/B1Z8Bxd0n6hPHg
 NSJXYif2AxgaddpohsdygqRDbT6SNyQdj7YjJFY5qAGJ3yFyJ4JB6WTqkWW4o1vH
 3FVqdAnJmejAmmYSkah0Hkem2T5QASQmTWb93PLxiV6q+d0NM8stWAujjyVdIV/B
 W4Uj9mQ20cz54TjLtxqX+A1k6KcqOWRgh1l2QbUlFsgsOP3V8yz47yqYdR9qMWlO
 9kNEjI3sw+G/IQ==
 =q4rj
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:

 - Consolidate the machine_kexec_mask_interrupts() by providing a
   generic implementation and replacing the copy & pasta orgy in the
   relevant architectures.

 - Prevent unconditional operations on interrupt chips during kexec
   shutdown, which can trigger warnings in certain cases when the
   underlying interrupt has been shut down before.

 - Make the enforcement of interrupt handling in interrupt context
   unconditionally available, so that it actually works for non x86
   related interrupt chips. The earlier enablement for ARM GIC chips set
   the required chip flag, but did not notice that the check was hidden
   behind a config switch which is not selected by ARM[64].

 - Decrapify the handling of deferred interrupt affinity setting.

   Some interrupt chips require that affinity changes are made from the
   context of handling an interrupt to avoid certain race conditions.
   For x86 this was the default, but with interrupt remapping this
   requirement was lifted and a flag was introduced which tells the core
   code that affinity changes can be done in any context. Unrestricted
   affinity changes are the default for the majority of interrupt chips.

   RISCV has the requirement to add the deferred mode to one of it's
   interrupt controllers, but with the original implementation this
   would require to add the any context flag to all other RISC-V
   interrupt chips. That's backwards, so reverse the logic and require
   that chips, which need the deferred mode have to be marked
   accordingly. That avoids chasing the 'sane' chips and marking them.

 - Add multi-node support to the Loongarch AVEC interrupt controller
   driver.

 - The usual tiny cleanups, fixes and improvements all over the place.

* tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set()
  genirq/timings: Add kernel-doc for a function parameter
  genirq: Remove IRQ_MOVE_PCNTXT and related code
  x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
  genirq: Provide IRQCHIP_MOVE_DEFERRED
  hexagon: Remove GENERIC_PENDING_IRQ leftover
  ARC: Remove GENERIC_PENDING_IRQ
  genirq: Remove handle_enforce_irqctx() wrapper
  genirq: Make handle_enforce_irqctx() unconditionally available
  irqchip/loongarch-avec: Add multi-nodes topology support
  irqchip/ts4800: Replace seq_printf() by seq_puts()
  irqchip/ti-sci-inta : Add module build support
  irqchip/ti-sci-intr: Add module build support
  irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function
  irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args
  genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown
  kexec: Consolidate machine_kexec_mask_interrupts() implementation
  genirq: Reuse irq_thread_fn() for forced thread case
  genirq: Move irq_thread_fn() further up in the code
2025-01-21 13:51:07 -08:00
Rakesh Babu Saladi
a3282f84b2 PCI: switchtec: Add Microchip PCI100X device IDs
Add Microchip parts to the Device ID table so the driver supports PCI100x
devices.

Add a new macro to quirk the Microchip Switchtec PCI100x parts to allow DMA
access via NTB to work when the IOMMU is turned on.

PCI100x family has 6 variants; each variant is designed for different
application usages, different port counts and lane counts:

  PCI1001 has 1 x4 upstream port and 3 x4 downstream ports
  PCI1002 has 1 x4 upstream port and 4 x2 downstream ports
  PCI1003 has 2 x4 upstream ports, 2 x2 upstream ports, and 2 x2
    downstream ports
  PCI1004 has 4 x4 upstream ports
  PCI1005 has 1 x4 upstream port and 6 x2 downstream ports
  PCI1006 has 6 x2 upstream ports and 2 x2 downstream ports

[Historical note: these parts use PCI_VENDOR_ID_EFAR (0x1055), from EFAR
Microsystems, which was acquired in 1996 by Standard Microsystems Corp,
which was acquired by Microchip Technology in 2012.  The PCI-SIG confirms
that Vendor ID 0x1055 is assigned to Microchip even though it's not
visible via https://pcisig.com/membership/member-companies]

Link: https://lore.kernel.org/r/20250120095524.243103-1-Saladi.Rakeshbabu@microchip.com
Signed-off-by: Rakesh Babu Saladi <Saladi.Rakeshbabu@microchip.com>
[bhelgaas: Vendor ID history]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-By: Logan Gunthorpe <logang@deltatee.com>
2025-01-21 10:47:28 -06:00
Niklas Cassel
8a02612f85 PCI: endpoint: pci-epf-test: Add support for capabilities
The test BAR is on the EP side is allocated using pci_epf_alloc_space(),
which allocates the backing memory using dma_alloc_coherent(), which will
return zeroed memory regardless of __GFP_ZERO was set or not.

This means that running a new version of pci-endpoint-test.c (host side)
with an old version of pci-epf-test.c (EP side) will not see any
capabilities being set (as intended), so this is backwards compatible.

Additionally, the EP side always allocates at least 128 bytes for the test
BAR (excluding the MSI-X table), this means that adding another register at
offset 0x30 is still within the 128 available bytes.

For now, we only add the CAP_UNALIGNED_ACCESS capability.

Set CAP_UNALIGNED_ACCESS if the EPC driver can handle any address (because
it implements the .align_addr callback).

Link: https://lore.kernel.org/r/20241203063851.695733-5-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-21 09:44:14 -06:00
Manivannan Sadhasivam
235c2b197a PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
Currently, if DMA MEMCPY test is requested by the host, and if the endpoint
DMA controller supports DMA_PRIVATE, the test will fail. This is not
correct since there is no check for DMA_MEMCPY capability and the DMA
controller can support both DMA_PRIVATE and DMA_MEMCPY.

Fix the check and also reword the error message.

Link: https://lore.kernel.org/r/20250116171650.33585-2-manivannan.sadhasivam@linaro.org
Fixes: 8353813c88 ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities")
Reported-by: Niklas Cassel <cassel@kernel.org>
Closes: https://lore.kernel.org/linux-pci/Z3QtEihbiKIGogWA@ryzen
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
2025-01-21 09:44:13 -06:00
Mohamed Khalfella
b1b1f4b129 PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
If dma_chan_tx allocation fails, set dma_chan_rx to NULL after it is
freed.

Link: https://lore.kernel.org/r/20241227160841.92382-1-khalfella@gmail.com
Fixes: 8353813c88 ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities")
Signed-off-by: Mohamed Khalfella <khalfella@gmail.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-21 09:44:13 -06:00
Linus Torvalds
1cbfb828e0 for-6.14/block-20250118
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmeL6hoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppw2EADQV8nDgLRggZR+il4U03yKHXcQEdAX1GrB
 Erowx+dasIJuh6kp3n6qRe9QD/pRqt1DKyLvXoWF8Qfuwq85j7oDnDDYxutNYT27
 hDgrLJriJ3VeKYtTu+andHWt8P29b5h57UayInDOUJurEPA6rXyFZ5YVIti8n21K
 uDOrQXiACG3qRWS2+p2f3UNhX0MkFNFdN/lxi13WMIJtRWF5bXAP+JOgIWCID4Ze
 QuSY6rQD4dp4Q6M2erpX6tn0YZb7Hvw3rPjsd91n6jvYfTUVLH375zg8jCBpi6Wi
 Syufbb8xcTtriVPTDRNu0ekjebkc8wD8ax/h86g0z9v3Ua4DlNmsx9eXrtv6r5nu
 YXqDODOad6stI0+owFquW2vas0gHmfNSfyfGdlk2g24PMtP5Yx0V6FIEvwIeqnje
 ghgxQvBuKUsdhqakByfNnc+XvXi3+RUJek8kvMeUSUQWT1IyMQqPOOk0yp9WdyWD
 bY1f2ECP5BR1b37zYOyawewsI5xTupHUswn5a4r4qtGn3O15rGDkX98Nab5aLCnR
 rW/DvX7+wT6gW9EwrRHiwjwfNDZbsJ9Ggu3lMhtUl5GUWdk58yTiVgKaHJLnlX9/
 CKFKfyyIR1Vl8+gYIpemyFhhcoN+dCSf06ISkrg0jeS0/tYwydaAaCBPL5J4kxZA
 h3Rtbh+Pgg==
 =EXYs
 -----END PGP SIGNATURE-----

Merge tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe pull requests via Keith:
      - Target support for PCI-Endpoint transport (Damien)
      - TCP IO queue spreading fixes (Sagi, Chaitanya)
      - Target handling for "limited retry" flags (Guixen)
      - Poll type fix (Yongsoo)
      - Xarray storage error handling (Keisuke)
      - Host memory buffer free size fix on error (Francis)

 - MD pull requests via Song:
      - Reintroduce md-linear (Yu Kuai)
      - md-bitmap refactor and fix (Yu Kuai)
      - Replace kmap_atomic with kmap_local_page (David Reaver)

 - Quite a few queue freeze and debugfs deadlock fixes

   Ming introduced lockdep support for this in the 6.13 kernel, and it
   has (unsurprisingly) uncovered quite a few issues

 - Use const attributes for IO schedulers

 - Remove bio ioprio wrappers

 - Fixes for stacked device atomic write support

 - Refactor queue affinity helpers, in preparation for better supporting
   isolated CPUs

 - Cleanups of loop O_DIRECT handling

 - Cleanup of BLK_MQ_F_* flags

 - Add rotational support for null_blk

 - Various fixes and cleanups

* tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux: (106 commits)
  block: Don't trim an atomic write
  block: Add common atomic writes enable flag
  md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add()
  block: limit disk max sectors to (LLONG_MAX >> 9)
  block: Change blk_stack_atomic_writes_limits() unit_min check
  block: Ensure start sector is aligned for stacking atomic writes
  blk-mq: Move more error handling into blk_mq_submit_bio()
  block: Reorder the request allocation code in blk_mq_submit_bio()
  nvme: fix bogus kzalloc() return check in nvme_init_effects_log()
  md/md-bitmap: move bitmap_{start, end}write to md upper layer
  md/raid5: implement pers->bitmap_sector()
  md: add a new callback pers->bitmap_sector()
  md/md-bitmap: remove the last parameter for bimtap_ops->endwrite()
  md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()
  md: Replace deprecated kmap_atomic() with kmap_local_page()
  md: reintroduce md-linear
  partitions: ldm: remove the initial kernel-doc notation
  blk-cgroup: rwstat: fix kernel-doc warnings in header file
  blk-cgroup: fix kernel-doc warnings in header file
  nbd: fix partial sending
  ...
2025-01-20 19:38:46 -08:00
Bjorn Helgaas
1108d677da PCI: dwc: Simplify config resource lookup
If platform_get_resource_byname("config") fails, return error immediately
and unindent the normal path.  No functional change intended.

Link: https://lore.kernel.org/r/20250117235119.712043-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-20 12:39:38 -06:00
Bjorn Helgaas
b881532991 PCI: imx6: Clean up comments and whitespace
For readability, fix typos and comments that needlessly exceed 80 columns.

Link: https://lore.kernel.org/r/20250118210727.795559-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-20 12:38:14 -06:00
Bjorn Helgaas
42d9972732 PCI: of_property: Rename struct of_pci_range to of_pci_range_entry
Previously there were two definitions of struct of_pci_range: one in
include/linux/of_address.h and another local to drivers/pci/of_property.c.

Rename the local struct of_pci_range to of_pci_range_entry to avoid
confusion.

Link: https://lore.kernel.org/r/20250117161037.643953-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
2025-01-18 15:18:38 -06:00
Richard Zhu
9d6b1bd6b3 PCI: imx6: Add i.MX8MQ, i.MX8Q and i.MX95 PM support
Add i.MX8MQ, i.MX8Q and i.MX95 PCIe suspend/resume support.

Link: https://lore.kernel.org/r/20241126075702.4099164-10-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-18 15:04:23 -06:00
Frank Li
a528d1a725 PCI: imx6: Use DWC common suspend resume method
Call common DWC suspend/resume function. Use DWC common iATU method to
send out PME_TURN_OFF message.

In old DWC implementations, PCIE_ATU_INHIBIT_PAYLOAD in iATU Ctrl2 register
is reserved, so the generic DWC implementation of sending the PME_Turn_Off
message using a dummy MMIO write cannot be used. Use the previous method to
kick off PME_TURN_OFF message for these platforms.

The System Reset Control (SRC) interface is used to toggle 'turnoff_reset'
to send PME_TURN_OFF and since the DWC implementation is used, it is not
needed now.

Replace the imx_pcie_stop_link() and imx_pcie_host_exit() by
dw_pcie_suspend_noirq() in imx_pcie_suspend_noirq().

Since dw_pcie_suspend_noirq() already does these, see below call stack:

  dw_pcie_suspend_noirq()
    dw_pcie_stop_link()
      imx_pcie_stop_link()
    pci->pp.ops->deinit()
      imx_pcie_host_exit()

Replace the imx_pcie_host_init(), dw_pcie_setup_rc() and
imx_pcie_start_link() by dw_pcie_resume_noirq() in imx_pcie_resume_noirq().

Since dw_pcie_resume_noirq() already does these, see below call stack:

  dw_pcie_resume_noirq()
    pci->pp.ops->init()
      imx_pcie_host_init()
    dw_pcie_setup_rc()
    dw_pcie_start_link()
      imx_pcie_start_link(;

Link: https://lore.kernel.org/r/20241126075702.4099164-9-hongxing.zhu@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-18 15:04:23 -06:00
Bjorn Helgaas
ec57335b81 PCI: dwc: Add dw_pcie_suspend_noirq(), dw_pcie_resume_noirq() stubs for !CONFIG_PCIE_DW_HOST
Previously pcie-designware.h declared dw_pcie_suspend_noirq() and
dw_pcie_resume_noirq() unconditionally, even though they were only
implemented when CONFIG_PCIE_DW_HOST was defined.

Add no-op stubs for them when CONFIG_PCIE_DW_HOST is not defined so
drivers that support both Root Complex and Endpoint modes don't need

Link: https://lore.kernel.org/r/20250117213810.GA656803@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-18 15:03:40 -06:00
Philipp Stanner
dfa2f4d5f9 PCI: Remove devres from pci_intx()
pci_intx() is a hybrid function which can sometimes be managed through
devres. This hybrid nature is undesirable.

Since all users of pci_intx() have by now been ported either to
always-managed pcim_intx() or never-managed pci_intx_unmanaged(), the
devres functionality can be removed from pci_intx().

Consequently, pci_intx_unmanaged() is now redundant, because pci_intx()
itself is now unmanaged.

Remove the devres functionality from pci_intx(). Have all users of
pci_intx_unmanaged() call pci_intx(). Remove pci_intx_unmanaged().

Link: https://lore.kernel.org/r/20241209130632.132074-13-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
2025-01-18 14:38:49 -06:00
Philipp Stanner
b182cbaaa9 PCI/MSI: Use never-managed version of pci_intx()
pci_intx() is a hybrid function which can sometimes be managed through
devres. To remove this hybrid nature from pci_intx(), it is necessary to
port users to either an always-managed or a never-managed version.

MSI sets up its own separate devres callback implicitly in
pcim_setup_msi_release(). This callback ultimately uses pci_intx(), which
is problematic since the callback runs on driver detach.

That problem has last been described here:
https://lore.kernel.org/all/ee44ea7ac760e73edad3f20b30b4d2fff66c1a85.camel@redhat.com/

Replace the call to pci_intx() with one to the never-managed version
pci_intx_unmanaged().

Link: https://lore.kernel.org/r/20241209130632.132074-9-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2025-01-18 14:38:48 -06:00
Philipp Stanner
f546e8033d PCI: Export pci_intx_unmanaged() and pcim_intx()
pci_intx() is a hybrid function which sometimes performs devres operations,
depending on whether pcim_enable_device() has been used to enable the
pci_dev. This sometimes-managed nature of the function is problematic.
Notably, it causes the function to allocate under some circumstances which
makes it unusable from interrupt context.

Export pcim_intx() (which is always managed) and rename __pcim_intx()
(which is never managed) to pci_intx_unmanaged() and export it as well.

Then all callers of pci_intx() can be ported to the version they need,
depending whether they use pci_enable_device() or pcim_enable_device().

Link: https://lore.kernel.org/r/20241209130632.132074-3-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2025-01-18 14:38:48 -06:00
Richard Zhu
112aba9a79 PCI: dwc: Remove LTSSM state test in dw_pcie_suspend_noirq()
It's safe to send PME_TURN_OFF message regardless of whether the link is up
or down, so don't test the LTSSM state before sending the PME_TURN_OFF
message.

Only print an error message when the LTSSM is not in DETECT or POLL. There
shouldn't be an error when no Endpoint is connected at all.

Link: https://lore.kernel.org/r/20241210081557.163555-3-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-18 13:38:34 -06:00
Richard Zhu
86a016e278 PCI: dwc: Always stop link in the dw_pcie_suspend_noirq
On the i.MX8QM, PCIe link can't be re-established again in
dw_pcie_resume_noirq(), if the LTSSM_EN bit is not cleared
properly in dw_pcie_suspend_noirq().

So, add dw_pcie_stop_link() to dw_pcie_suspend_noirq() to fix
this issue and to align the suspend/resume functions since there
is dw_pcie_start_link() in dw_pcie_resume_noirq() already.

Fixes: 4774faf854 ("PCI: dwc: Implement generic suspend/resume functionality")
Link: https://lore.kernel.org/r/20241210081557.163555-2-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-18 11:35:26 -06:00
Niklas Cassel
ec9fd499b9 PCI: dw-rockchip: Don't wait for link since we can detect Link Up
The Root Complex specific device tree binding for pcie-dw-rockchip has the
'sys' interrupt marked as required.

The driver requests the 'sys' IRQ unconditionally, and errors out if not
provided.

Thus, we can unconditionally set 'use_linkup_irq', so dw_pcie_host_init()
doesn't wait for the link to come up.

This will skip the wait for link up (since the bus will be enumerated once
the link up IRQ is triggered), which reduces the bootup time.

Link: https://lore.kernel.org/r/20250113-rockchip-no-wait-v1-1-25417f37b92f@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-18 11:35:25 -06:00
Niklas Cassel
0e0b45ab5d PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ
Most boards using the pcie-dw-rockchip PCIe controller lack standard
hotplug support.

Thus, when an endpoint is attached to the SoC, users have to rescan the bus
manually to enumerate the device. This can be avoided by using the
'dll_link_up' interrupt in the combined system interrupt 'sys'.

Once the 'dll_link_up' IRQ is received, the bus underneath the host bridge
is scanned to enumerate PCIe endpoint devices.

This implements the same functionality that was implemented in the DWC
based pcie-qcom driver in 4581403f67 ("PCI: qcom: Enumerate endpoints
based on Link up event in 'global_irq' interrupt").

The Root Complex specific device tree binding for pcie-dw-rockchip already
has the 'sys' interrupt marked as required, so there is no need to update
the device tree binding. This also means that we can request the 'sys' IRQ
unconditionally.

Link: https://lore.kernel.org/r/20241127145041.3531400-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log, squash Pei Xiao's redundant dev_err() fix from
https://lore.kernel.org/r/327718207d3cd72847c079ff9d56eb246744c182.1736126067.git.xiaopei01@kylinos.cn,
squash Niklas's #define change from https://lore.kernel.org/r/20250103095812.2408364-2-cassel@kernel.org]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-18 11:35:25 -06:00
Krishna chaitanya chundru
f0639013d3 PCI: qcom: Update ICC and OPP values after Link Up event
4581403f67 ("PCI: qcom: Enumerate endpoints based on Link up event in
'global_irq' interrupt") added the Link Up-based enumeration support, but
did not update the ICC/OPP vote once link is up. Before that, the update
happened during probe and the endpoints may or may not be enumerated at
that time, so the ICC/OPP vote was not guaranteed to be accurate.

With Link Up-based enumeration support, the driver can request the accurate
vote based on the PCIe link.

Call qcom_pcie_icc_opp_update() in qcom_pcie_global_irq_thread() after
enumerating the endpoints.

Fixes: 4581403f67 ("PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt")
Link: https://lore.kernel.org/r/20241123-remove_wait2-v5-3-b5f9e6b794c2@quicinc.com
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
2025-01-18 11:35:18 -06:00
Krishna chaitanya chundru
36971d6c5a PCI: qcom: Don't wait for link if we can detect Link Up
If we have a 'global' IRQ for Link Up events, we need not wait for the
link to be up during PCI initialization, which reduces startup time.

Check for 'global' IRQ, and if present, set 'use_linkup_irq',
so dw_pcie_host_init() doesn't wait for the link to come up.

Link: https://lore.kernel.org/r/20241123-remove_wait2-v5-2-b5f9e6b794c2@quicinc.com
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
2025-01-18 11:35:11 -06:00
Krishna chaitanya chundru
8d3bf19f1b PCI: dwc: Don't wait for link up if driver can detect Link Up event
If the driver can detect the Link Up event and enumerate downstream devices
at that time, we need not wait here.

Skip waiting for link to come up if the driver supports 'use_linkup_irq'.

Link: https://lore.kernel.org/r/20241123-remove_wait2-v5-1-b5f9e6b794c2@quicinc.com
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: wrap comment, update commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
2025-01-18 11:35:02 -06:00
Niklas Cassel
574913f9e1 PCI: dwc: Fix potential truncation in dw_pcie_edma_irq_verify()
Increase the size of the string buffer to avoid potential truncation in
dw_pcie_edma_irq_verify().

This fixes the following build warning when compiling with W=1:

  drivers/pci/controller/dwc/pcie-designware.c: In function ‘dw_pcie_edma_detect’:
  drivers/pci/controller/dwc/pcie-designware.c:989:50: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Wformat-truncation=]
    989 |                 snprintf(name, sizeof(name), "dma%d", pci->edma.nr_irqs);
        |                                                  ^~

Link: https://lore.kernel.org/r/20250104002119.2681246-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-18 11:34:48 -06:00
Krzysztof Kozlowski
ad9afd7503 PCI: dra7xx: Use syscon_regmap_lookup_by_phandle_args
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over
syscon_regmap_lookup_by_phandle() combined with getting the syscon
argument.  Except simpler code this annotates within one line that given
phandle has arguments, so grepping for code would be easier.

There is also no real benefit in printing errors on missing syscon
argument, because this is done just too late: runtime check on
static/build-time data.  Dtschema and Devicetree bindings offer the
static/build-time check for this already.

Link: https://lore.kernel.org/r/20250112-syscon-phandle-args-pci-v1-1-fcb6ebcc0afc@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-16 14:38:53 -06:00
Krzysztof Kozlowski
149fc35734 PCI: layerscape: Use syscon_regmap_lookup_by_phandle_args
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over
syscon_regmap_lookup_by_phandle() combined with getting the syscon
argument.  Except simpler code this annotates within one line that given
phandle has arguments, so grepping for code would be easier.

Link: https://lore.kernel.org/r/20250112-syscon-phandle-args-pci-v1-2-fcb6ebcc0afc@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Roy Zang <Roy.Zang@nxp.com>
2025-01-16 14:38:36 -06:00
Richard Zhu
6dd24b0a85 PCI: imx6: Remove surplus imx7d_pcie_init_phy() function
This function essentially duplicates imx7d_pcie_enable_ref_clk(). So remove
it.

Link: https://lore.kernel.org/r/20241126075702.4099164-8-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-16 14:29:38 -06:00
Richard Zhu
93d883f890 PCI: imx6: Add missing reference clock disable logic
Ensure the *_enable_ref_clk() function is symmetric by addressing missing
disable parts on some platforms.

Fixes: d0a75c791f ("PCI: imx6: Factor out ref clock disable to match enable")
Link: https://lore.kernel.org/r/20241126075702.4099164-7-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-16 14:29:38 -06:00
Richard Zhu
ef61c7d8d0 PCI: imx6: Deassert apps_reset in imx_pcie_deassert_core_reset()
Since the apps_reset is asserted in imx_pcie_assert_core_reset(), it should
be deasserted in imx_pcie_deassert_core_reset().

Fixes: 9b3fe6796d ("PCI: imx6: Add code to support i.MX7D")
Link: https://lore.kernel.org/r/20241126075702.4099164-6-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-16 14:29:38 -06:00
Richard Zhu
f068ffdd03 PCI: imx6: Skip controller_id generation logic for i.MX7D
The i.MX7D only has one PCIe controller, so controller_id should always be
0. The previous code is incorrect although yielding the correct result.

Fix by removing "IMX7D" from the switch case branch.

Fixes: 2d8ed461db ("PCI: imx6: Add support for i.MX8MQ")
Link: https://lore.kernel.org/r/20241126075702.4099164-5-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-16 14:29:29 -06:00
Richard Zhu
7ab93e6a08 PCI: imx6: Fetch dbi2 and iATU base addesses from DT
Since dw_pcie_get_resources() gets the dbi2 and iATU base addresses from
DT, remove the code from the imx6 driver that does the same.

Upstream DTSes have not enabled Endpoint function. So nothing is broken for
old upstream DTBs.

Link: https://lore.kernel.org/r/20241126075702.4099164-4-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-16 14:22:07 -06:00
Frank Li
de22e20589 PCI: imx6: Configure PHY based on Root Complex or Endpoint mode
Pass PHY_MODE_PCIE_EP if the PCI controller operates in Endpoint (EP) mode,
and fix the Root Complex (RC) mode being hardcoded using a drvdata mode
check.

Fixes: 8026f2d8e8 ("PCI: imx6: Call common PHY API to set mode, speed, and submode")
Link: https://lore.kernel.org/r/20241119-pci_fixup_addr-v8-6-c4bfa5193288@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Richard Zhu <hongxing.zhu@nxp.com>
2025-01-16 14:21:53 -06:00
Richard Zhu
137250911f PCI: imx6: Add Refclk for i.MX95 PCIe
Add "ref" clock to enable Refclk. To avoid breaking DT backwards
compatibility, the i.MX95 "ref" clock is optional. Use
devm_clk_get_optional() to fetch i.MX95 PCIe optional clocks in driver.

If using external clock, "ref" clock should point to external reference.

If using internal clock, CREF_EN in LAST_TO_REG controls reference output,
implemented in drivers/clk/imx/clk-imx95-blk-ctl.c.

Link: https://lore.kernel.org/r/20241126075702.4099164-3-hongxing.zhu@nxp.com
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-16 14:20:37 -06:00
Frank Li
687aedb73a PCI: imx6: Add i.MX8Q PCIe Endpoint (EP) support
Add support for the i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) PCIe
Endpoint (EP). On the i.MX8Q platforms, the PCI bus addresses differ
from the CPU addresses. However, the DesignWare (DWC) driver already
handles this in the common code.

Link: https://lore.kernel.org/r/20241119-pci_fixup_addr-v8-7-c4bfa5193288@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Richard Zhu <hongxing.zhu@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-16 14:18:14 -06:00
Ilpo Järvinen
f68ea779d9 PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log
Add pcie_print_tlp_log() to print TLP Header and Prefix Log.  Print End-End
Prefixes only if they are non-zero.

Consolidate the few places which currently print TLP using custom
formatting.

Link: https://lore.kernel.org/r/20250114170840.1633-9-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-16 12:05:43 -06:00
Ilpo Järvinen
ad41ddeeac PCI: Add TLP Prefix reading to pcie_read_tlp_log()
pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log
(PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.

Generalize pcie_read_tlp_log() and struct pcie_tlp_log to also handle TLP
Prefix Log. The relevant registers are formatted identically in AER and DPC
Capability, but has these variations:

  a) The offsets of TLP Prefix Log registers vary.

  b) DPC RP PIO TLP Prefix Log register can be < 4 DWORDs.

  c) AER TLP Prefix Log Present (PCIe r6.1 sec 7.8.4.7) can indicate Prefix
     Log is not present.

Therefore callers must pass the offset of the TLP Prefix Log register and
the entire length to pcie_read_tlp_log() to be able to read the correct
number of TLP Prefix DWORDs from the correct offset.

Link: https://lore.kernel.org/r/20250114170840.1633-8-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash ternary fix from
https://lore.kernel.org/r/20250116172019.88116-1-colin.i.king@gmail.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-16 12:04:38 -06:00
Bjorn Helgaas
bbaa022917 PCI: of: Simplify devm_of_pci_get_host_bridge_resources() interface
Previously pci_parse_request_of_pci_ranges() supplied the default bus range
to devm_of_pci_get_host_bridge_resources(), but that function is static and
has no other callers, so there's no reason to complicate its interface by
passing the default bus range.

Drop the busno and bus_max parameters and use 0x0 and 0xff directly in
devm_of_pci_get_host_bridge_resources().

Link: https://lore.kernel.org/r/20250113231557.441289-4-helgaas@kernel.org
[bhelgaas: dev_warn() for invalid end of bus-range]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-15 15:26:51 -06:00
Bjorn Helgaas
adda86bf46 PCI: of: Drop 'No bus range found' message
The typical bus range for a host bridge is [bus 00-ff], and
devm_of_pci_get_host_bridge_resources() defaults to that unless DT contains
a "bus-range" property.

devm_of_pci_get_host_bridge_resources() previously emitted a message when
there was no "bus-range" property, but that seems unnecessary for this
common situation.  Remove the message.

Link: https://lore.kernel.org/r/20250113231557.441289-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-01-15 15:24:44 -06:00
Bjorn Helgaas
ddedc08e80 PCI: Unexport of_pci_parse_bus_range()
of_pci_parse_bus_range() is only used in drivers/pci/of.c, so make it
static and unexport it.

Link: https://lore.kernel.org/r/20250113231557.441289-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-01-15 15:24:39 -06:00
Marc Zyngier
d9f6642ab7 PCI: apple: Convert to {en,dis}able_device() callbacks
Now that the core host-bridge infrastructure is able to give us a callback
on each device being added or removed, convert the bus-notifier hack to it.

Link: https://lore.kernel.org/r/20241204150145.800408-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-15 14:52:50 -06:00
Marc Zyngier
477ac7b0a7 PCI: host-generic: Allow {en,dis}able_device() to be provided via pci_ecam_ops
In order to let host controller drivers using the host-generic
infrastructure use the {en,dis}able_device() callbacks that can be used to
configure sideband RID mapping hardware, provide these two callbacks as
part of the pci_ecam_ops structure.

Link: https://lore.kernel.org/r/20241204150145.800408-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-15 14:52:12 -06:00
Frank Li
ce4c430172 PCI: imx6: Add IOMMU and ITS MSI support for i.MX95
For the i.MX95, the configuration of a LUT is necessary to convert PCIe
Requester IDs (RIDs) to StreamIDs, which are used by both IOMMU and ITS.
This involves checking msi-map and iommu-map device tree properties to
ensure consistent mapping of Requester IDs to the same StreamIDs.

Subsequently, LUT-related registers are configured. If a msi-map isn't
detected, the platform relies on DWC built-in controller for MSIs that
do not need StreamIDs.

Implement PCI bus callback function to handle enable_device() and
disable_device() operations, setting up the LUT whenever a new PCI
device is enabled.

Known limitations:

  - If iommu-map exists in the device tree but the IOMMU controller is
    disabled, StreamIDs are programmed into the LUT. However, if a RID
    is out of range of the iommu-map, enabling the PCI device would
    result in a failure, although the PCI device can work without the
    IOMMU.

  - If msi-map exists in the device tree but the MSI controller is
    disabled, MSIs will not work. The DWC driver skips initializing the
    built-in MSI controller, falling back to legacy PCI INTx only.

Link: https://lore.kernel.org/r/20250114-imx95_lut-v9-2-39f58dbed03a@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix uninitialized "sid" in imx_pcie_enable_device()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
2025-01-15 14:47:24 -06:00
Thomas Gleixner
7d04319a05 x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
Instead of marking individual interrupts as safe to be migrated in
arbitrary contexts, mark the interrupt chips, which require the interrupt
to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED
flag. This makes more sense because this is a per interrupt chip property
and not restricted to individual interrupts.

That flips the logic from the historical opt-out to a opt-in model. This is
simpler to handle for other architectures, which default to unrestricted
affinity setting. It also allows to cleanup the redundant core logic
significantly.

All interrupt chips, which belong to a top-level domain sitting directly on
top of the x86 vector domain are marked accordingly, unless the related
setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2025-01-15 21:38:53 +01:00
Dan Carpenter
7ca2887600
PCI: rockchip-ep: Fix error code in rockchip_pcie_ep_init_ob_mem()
Return -ENOMEM if pci_epc_mem_alloc_addr() fails.  Don't return success.

Link: https://lore.kernel.org/r/Z014ylYz_xrrgI4W@stanley.mountain
Fixes: 9456480194 ("PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
2025-01-15 18:24:12 +00:00
Anand Moon
500969661b
PCI: rockchip: Refactor rockchip_pcie_disable_clocks() signature
Refactor rockchip_pcie_disable_clocks() to accept a struct rockchip_pcie
pointer instead of a void pointer thus improving type safety and code
readability by explicitly specifying the expected data type.

Link: https://lore.kernel.org/r/20241202151150.7393-4-linux.amoon@gmail.com
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-15 18:24:12 +00:00
Anand Moon
18715931a5
PCI: rockchip: Simplify reset control handling by using reset_control_bulk*() function
Currently, the driver acquires and asserts/deasserts the resets
individually thereby making the driver complex to read.

This can be simplified by using the reset_control_bulk() APIs.

Use devm_reset_control_bulk_get_exclusive() API to acquire all the resets
and use reset_control_bulk_{assert/deassert}() APIs to assert/deassert them
in bulk.

Following the recommendations in 'Rockchip RK3399 TRM v1.3 Part2':

  1. Split the reset controls into two groups as per section '17.5.8.1.1
     PCIe as Root Complex'.

  2. Deassert the 'Pipe, MGMT Sticky, MGMT, Core' resets in groups as per
     section '17.5.8.1.1 PCIe as Root Complex'. This is accomplished using
     the reset_control_bulk APIs.

Link: https://lore.kernel.org/r/20241202151150.7393-3-linux.amoon@gmail.com
Co-developed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: squash error handling fix from https://lore.kernel.org/r/7da6ac56-af55-4436-9597-6af24df8122c@stanley.mountain]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-15 18:23:28 +00:00
Frank Li
a3751212a8
PCI: Add enable_device() and disable_device() callbacks for bridges
Some PCI host bridges require special handling when enabling or
disabling PCI devices. For example, the i.MX95 platform has a lookup
table to map Requester IDs to StreamIDs, which the SMMU and MSI
controller use to identify the source of DMA accesses.

Without this mapping, DMA accesses may target unintended memory, which
would corrupt memory or read the wrong data.

Add a host bridge enable_device() hook the imx6 driver can use to
configure the Requester ID to StreamID mapping. The hardware table isn't
big enough to map all possible Requester IDs, so this hook may fail if
no table space is available. In that case, return failure from
pci_enable_device().

It might make more sense to make pci_set_master() decline to enable bus
mastering and return failure, but it currently doesn't have a way to return
failure.

Link: https://lore.kernel.org/r/20250114-imx95_lut-v9-1-39f58dbed03a@nxp.com
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-15 18:06:22 +00:00
Takashi Iwai
b198499c7d
PCI/DPC: Quirk PIO log size for Intel Raptor Lake-P
Apparently the Raptor Lake-P reference firmware configures the PIO log size
correctly, but some vendor BIOSes, including at least ASUSTeK COMPUTER INC.
Zenbook UX3402VA_UX3402VA, do not.

Apply the quirk for Raptor Lake-P.  This prevents kernel complaints like:

  DPC: RP PIO log size 0 is invalid

and also enables the DPC driver to dump the RP PIO Log registers when DPC
is triggered.

Note that the bug report also mentions 8086:a76e, which has been already
added by 627c6db207 ("PCI/DPC: Quirk PIO log size for Intel Raptor Lake
Root Ports").

Link: https://lore.kernel.org/r/20250102164315.7562-1-tiwai@suse.de
Link: https://bugzilla.suse.com/show_bug.cgi?id=1234623
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 17:59:55 +00:00
Ilpo Järvinen
2d54d23c60
PCI/sysfs: Remove unnecessary zero in initializer
Providing empty initializer for an array is enough to set its elements
to zero. Thus, remove the redundant 0 from the initializer.

Link: https://lore.kernel.org/r/20241028174046.1736-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 03:54:44 +00:00
Ilpo Järvinen
b6e5c46d83
PCI/sysfs: Use __free() in reset_method_store()
Use __free() from  cleanup.h to handle freeing options in
reset_method_store() as it simplifies the code flow.

Link: https://lore.kernel.org/r/20241028174046.1736-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 03:54:38 +00:00
Ilpo Järvinen
10269d5781
PCI/sysfs: Move reset related sysfs code to correct file
Most PCI sysfs code and structs are in a dedicated file but a few reset
related things remain in pci.c. Move also them to pci-sysfs.c and drop
pci_dev_reset_method_attr_is_visible() as it is 100% duplicate of
pci_dev_reset_attr_is_visible().

Link: https://lore.kernel.org/r/20241028174046.1736-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-15 03:54:22 +00:00
Ilpo Järvinen
e5321ae10e PCI: Store number of supported End-End TLP Prefixes
eetlp_prefix_path in the struct pci_dev tells if End-End TLP Prefixes
are supported by the path or not, and the value is only calculated if
CONFIG_PCI_PASID is set.

The Max End-End TLP Prefixes field in the Device Capabilities Register 2
also tells how many (1-4) End-End TLP Prefixes are supported (PCIe r6.2 sec
7.5.3.15). The number of supported End-End Prefixes is useful for reading
correct number of DWORDs from TLP Prefix Log register in AER capability
(PCIe r6.2 sec 7.8.4.12).

Replace eetlp_prefix_path with eetlp_prefix_max and determine the number of
supported End-End Prefixes regardless of CONFIG_PCI_PASID so that an
upcoming commit generalizing TLP Prefix Log register reading does not have
to read extra DWORDs for End-End Prefixes that never will be there.

The value stored into eetlp_prefix_max is directly derived from device's
Max End-End TLP Prefixes and does not consider limitations imposed by
bridges or the Root Port beyond supported/not supported flags. This is
intentional for two reasons:

  1) PCIe r6.2 spec sections 2.2.10.4 & 6.2.4.4 indicate that a TLP is
     malformed only if the number of prefixes exceed the number of Max
     End-End TLP Prefixes, which seems to be the case even if the device
     could never receive that many prefixes due to smaller maximum imposed
     by a bridge or the Root Port. If TLP parsing is later added, this
     distinction is significant in interpreting what is logged by the TLP
     Prefix Log registers and the value matching to the Malformed TLP
     threshold is going to be more useful.

  2) TLP Prefix handling happens autonomously on a low layer and the value
     in eetlp_prefix_max is not programmed anywhere by the kernel (i.e.,
     there is no limiter OS can control to prevent sending more than N TLP
     Prefixes).

Link: https://lore.kernel.org/r/20250114170840.1633-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14 17:47:39 -06:00
Ilpo Järvinen
27d41818b6 PCI: Use unsigned int i in pcie_read_tlp_log()
Loop variable i counting from 0 upwards does not need to be signed so make
it unsigned int.

Link: https://lore.kernel.org/r/20250114170840.1633-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-14 17:47:34 -06:00
Ilpo Järvinen
dbfd51ba4c PCI: Use same names in pcie_read_tlp_log() prototype and definition
pcie_read_tlp_log()'s prototype and function signature diverged due to
changes made while applying.

Make the parameters of pcie_read_tlp_log() named identically.

Link: https://lore.kernel.org/r/20250114170840.1633-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14 17:46:36 -06:00
Ilpo Järvinen
ede5d5dbef PCI: Add defines for TLP Header/Prefix log sizes
Add defines for AER and DPC capabilities TLP Header Logging register sizes
(PCIe r6.2, sec 7.8.4 / 7.9.14) and replace literals with them.

Link: https://lore.kernel.org/r/20250114170840.1633-4-ilpo.jarvinen@linux.intel.com
Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-14 17:39:04 -06:00
Ilpo Järvinen
a71c59269e PCI: Move TLP Log handling to its own file
TLP Log is a PCIe feature and is processed only by AER and DPC.
Configwise, DPC depends AER being enabled. In lack of better place, the TLP
Log handling code was initially placed into pci.c but it can be easily
placed in a separate file.

Move TLP Log handling code to its own file under pcie/ subdirectory and
include it only when AER is enabled.

Link: https://lore.kernel.org/r/20250114170840.1633-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14 17:38:18 -06:00
Ilpo Järvinen
013525583f PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
pcie_read_tlp_log() was exposed by the commit 0a5a46a6a6 ("PCI/AER:
Generalize TLP Header Log reading") with the intent that drivers could use
it, but the PCI maintainer later decided that drivers should be encouraged
to use PCI core diagnostic logging of generic AER registers rather than
building their own.

Drivers that currently implement their own diagnostic logging include ixgbe
(ixgbe_io_error_detected()) and iwlwifi (iwl_trans_pcie_dump_regs()).

Remove the unwanted EXPORT of pcie_read_tlp_log() and remove it from
include/linux/aer.h.

Link: https://lore.kernel.org/r/20250114170840.1633-2-ilpo.jarvinen@linux.intel.com
Link: https://lore.kernel.org/all/20240322193011.GA701027@bhelgaas/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14 17:24:27 -06:00
Linus Torvalds
7f5b6a8ec1 pci-v6.13-fixes-3
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmeGsZYUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwu6RAAreLLwBLB0bb2nHoGTL0OjfG+u9PU
 2lDkRD1A6g53YHMRLo67qC6TKD6+9e5N5xCxO+VjLBzE/KbPuD8cIacV+oYCv7r+
 BlIkQUnmmEOIrcO4wc+DuH3cYpRCDGerp8L8TI/+bkuVzGoc0nvSOuvfe++hqbDP
 fkt5jcFUSN5NYmXQmfe5bdh41YC/uHAUw0+PpP+bt1ygdG4jfsMXbUOE7NVUa79Y
 Ovg+SSrUKdnTtKJqjIkpvapZkeXl4Pn4vcxiIo1AX+KfMYpNX0CYm2ByN5UIICa7
 S0KE4Ac4Di7qg9jhFTTp8180sbgIxo6XCbIszRDyDvWJpVWFdZB5L1BNeZHgIDut
 4JsDXCH0u75IdmLLwK2yTWbNaQSYUQcZfuMLybjso7sMUGnubjQT7e4k5qvEsWZj
 pY+uCByowDMh8gkrkdSsGwf8pPX/4ka/bKgNyAstOkkXERHMV5NPWv7Ex0zfwjK6
 9QsJ8Wjfn0d3HqkHd+rD2cQWNYm2a3N5MhGZLK7vUQYulLa6MSwxEEjG4cv+E+aI
 T+MdhtX8mes45xfxzgMMY7JJBMEp3M9TQM6DjgtrWdig7sGDB93oNK2aCeg1zqvB
 Fvsn+izJCs2Ocks6+BWCIEaDYPAeTKnmat/7ZaUfLs/uw8nPS9RGE3/0OeiUJeGK
 EvT1mE6DYhVa1cY=
 =e6ty
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

 - Prevent bwctrl NULL pointer dereference that caused hangs on shutdown
   on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner)

* tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/bwctrl: Fix NULL pointer deref on unbind and bind
2025-01-14 11:32:14 -08:00
Liao Chen
26cdda5444
PCI: mvebu: Enable module autoloading
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based
on the alias from of_device_id table.

Link: https://lore.kernel.org/r/20240903132311.961646-1-liaochen4@huawei.com
Signed-off-by: Liao Chen <liaochen4@huawei.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
2025-01-14 01:36:39 +00:00
Anand Moon
abdd4c8ea7 PCI: rockchip: Simplify clock handling by using clk_bulk*() functions
Currently, the driver acquires clocks and prepare/enable/disable/unprepare
the clocks individually thereby making the driver complex to read.

The driver can be simplified by using the clk_bulk*() APIs.

Use:

  - devm_clk_bulk_get_all() API to acquire all the clocks
  - clk_bulk_prepare_enable() to prepare/enable clocks
  - clk_bulk_disable_unprepare() APIs to disable/unprepare them in bulk

Link: https://lore.kernel.org/r/20241202151150.7393-2-linux.amoon@gmail.com
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
[bhelgaas: squash error handling fix from https://lore.kernel.org/r/20250106153041.55267-1-linux.amoon@gmail.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-13 15:10:38 -06:00
Damien Le Moal
fd46bc0e0b PCI: rockchip: Add missing fields descriptions for struct rockchip_pcie_ep
When compiling the Rockchip endpoint driver with -W=1, the following
warnings can be seen in the output:

  drivers/pci/controller/pcie-rockchip-ep.c:59: warning: Function parameter or struct member 'perst_irq' not described in 'rockchip_pcie_ep'
  drivers/pci/controller/pcie-rockchip-ep.c:59: warning: Function parameter or struct member 'perst_asserted' not described in 'rockchip_pcie_ep'
  drivers/pci/controller/pcie-rockchip-ep.c:59: warning: Function parameter or struct member 'link_up' not described in 'rockchip_pcie_ep'
  drivers/pci/controller/pcie-rockchip-ep.c:59: warning: Function parameter or struct member 'link_training' not described in 'rockchip_pcie_ep'

Fix these warnings by adding the missing field descriptions in
struct rockchip_pcie_ep kernel-doc comment.

Fixes: a7137cbf6b ("PCI: rockchip-ep: Handle PERST# signal in EP mode")
Fixes: bd6e61df4b ("PCI: rockchip-ep: Improve link training")
Link: https://lore.kernel.org/r/20241216133404.540736-1-dlemoal@kernel.org
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-13 14:45:41 -06:00
King Dix
2d2da5a4c1
PCI: rcar-ep: Fix incorrect variable used when calling devm_request_mem_region()
The rcar_pcie_parse_outbound_ranges() uses the devm_request_mem_region()
macro to request a needed resource. A string variable that lives on the
stack is then used to store a dynamically computed resource name, which
is then passed on as one of the macro arguments. This can lead to
undefined behavior.

Depending on the current contents of the memory, the manifestations of
errors may vary. One possible output may be as follows:

  $ cat /proc/iomem
  30000000-37ffffff :
  38000000-3fffffff :

Sometimes, garbage may appear after the colon.

In very rare cases, if no NULL-terminator is found in memory, the system
might crash because the string iterator will overrun which can lead to
access of unmapped memory above the stack.

Thus, fix this by replacing outbound_name with the name of the previously
requested resource. With the changes applied, the output will be as
follows:

  $ cat /proc/iomem
  30000000-37ffffff : memory2
  38000000-3fffffff : memory3

Fixes: 2a6d0d63d9 ("PCI: rcar: Add endpoint mode support")
Link: https://lore.kernel.org/r/tencent_DBDCC19D60F361119E76919ADAB25EC13C06@qq.com
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: King Dix <kingdix10@qq.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-13 07:46:07 +00:00
Douglas Anderson
17bd5e4dc9
PCI: mediatek-gen3: Enable async probe by default
The mediatek-gen3 driver can run its probe routine fairly slow on some
hardware, which adds to the total time it takes for the system start up.

Thus, turn on async mode for the probe to avoid blocking the rest of the
system.

Link: https://lore.kernel.org/r/20241220145205.1.Ibf2563896c3b1fc133bb46d3fc96ad0041763922@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-13 07:12:44 +00:00
Lorenzo Bianconi
491cb9c508
PCI: mediatek-gen3: Avoid PCIe resetting via PERST# for Airoha EN7581 SoC
Airoha EN7581 has a hw bug asserting/releasing PERST# signal causing
occasional PCIe link down issues. In order to overcome the problem,
PERST# signal is not asserted/released during device probe or
suspend/resume phase and the PCIe block is reset using
en7523_reset_assert() and en7581_pci_enable().

Introduce flags field in the mtk_gen3_pcie_pdata struct in order to
specify per-SoC capabilities.

Link: https://lore.kernel.org/r/20250109-pcie-en7581-rst-fix-v4-1-4a45c89fb143@kernel.org
Tested-by: Hui Ma <hui.ma@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-13 07:09:42 +00:00
Lorenzo Bianconi
c98bee18d0
PCI: mediatek-gen3: Rely on msleep() in mtk_pcie_en7581_power_up()
Since mtk_pcie_en7581_power_up() runs in non-atomic context, rely on
msleep() routine instead of mdelay().

Link: https://lore.kernel.org/r/20250108-pcie-en7581-fixes-v6-5-21ac939a3b9b@kernel.org
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-13 07:07:40 +00:00
Lorenzo Bianconi
90d4e466c9
PCI: mediatek-gen3: Move reset delay in mtk_pcie_en7581_power_up()
Airoha EN7581 has a hw bug asserting/releasing PCIE_PE_RSTB signal
causing occasional PCIe link down issues. In order to overcome the
problem, PCIe block is reset using REG_PCI_CONTROL (0x88) and
REG_RESET_CONTROL (0x834) registers available in the clock module
running clk_bulk_prepare_enable() in mtk_pcie_en7581_power_up().

In order to make the code more readable, move the wait for the time
needed to complete the PCIe reset from en7581_pci_enable() to
mtk_pcie_en7581_power_up().

Reduce reset timeout from 250ms to the standard PCIE_T_PVPERL_MS value
(100ms) since it has no impact on the driver behavior.

Link: https://lore.kernel.org/r/20250108-pcie-en7581-fixes-v6-4-21ac939a3b9b@kernel.org
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
2025-01-13 07:07:18 +00:00