linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-08-26 06:39:13 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	3c836451ca	net: move HDS config from ethtool state Separate the HDS config from the ethtool state struct. The HDS config contains just simple parameters, not state. Having it as a separate struct will make it easier to clone / copy and also long term potentially make it per-queue. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-20 11:44:57 -08:00
Jakub Kicinski	17656eb5cf	eth: bnxt: fix string truncation warning in FW version W=1 builds with gcc 14.2.1 report: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:4193:32: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size 27 [-Werror=format-truncation=] 4193 \| "/pkg %s", buf); It's upset that we let buf be full length but then we use 5 characters for "/pkg ". The builds is also clear with clang version 19.1.5 now. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250117183726.1481524-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-18 17:32:45 -08:00
Jakub Kicinski	2ee738e90e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.13-rc8). Conflicts: drivers/net/ethernet/realtek/r8169_main.c `1f691a1fc4` ("r8169: remove redundant hwmon support") `152d00a913` ("r8169: simplify setting hwmon attribute visibility") https://lore.kernel.org/20250115122152.760b4e8d@canb.auug.org.au Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c `152f4da05a` ("bnxt_en: add support for rx-copybreak ethtool command") `f0aa6a37a3` ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref") drivers/net/ethernet/intel/ice/ice_type.h `50327223a8` ("ice: add lock to protect low latency interface") `dc26548d72` ("ice: Fix quad registers read on E825") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-16 10:34:59 -08:00
Taehee Yoo	6b43673a25	bnxt_en: add support for hds-thresh ethtool command The bnxt_en driver has configured the hds_threshold value automatically when TPA is enabled based on the rx-copybreak default value. Now the hds-thresh ethtool command is added, so it adds an implementation of hds-thresh option. Configuration of the hds-thresh is applied only when the tcp-data-split is enabled. The default value of hds-thresh is 256, which is the default value of rx-copybreak, which used to be the hds_thresh value. The maximum hds-thresh is 1023. # Example: # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-9-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 14:42:12 -08:00
Taehee Yoo	87c8f8496a	bnxt_en: add support for tcp-data-split ethtool command NICs that uses bnxt_en driver supports tcp-data-split feature by the name of HDS(header-data-split). But there is no implementation for the HDS to enable by ethtool. Only getting the current HDS status is implemented and The HDS is just automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. The hds_threshold follows rx-copybreak value. and it was unchangeable. This implements `ethtool -G <interface name> tcp-data-split <value>` command option. The value can be <on> and <auto>. The value is <auto> and one of LRO/GRO/JUMBO is enabled, HDS is automatically enabled and all LRO/GRO/JUMBO are disabled, HDS is automatically disabled. HDS feature relies on the aggregation ring. So, if HDS is enabled, the bnxt_en driver initializes the aggregation ring. This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-8-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 14:42:12 -08:00
Taehee Yoo	152f4da05a	bnxt_en: add support for rx-copybreak ethtool command The bnxt_en driver supports rx-copybreak, but it couldn't be set by userspace. Only the default value(256) has worked. This patch makes the bnxt_en driver support following command. `ethtool --set-tunable <devname> rx-copybreak <value> ` and `ethtool --get-tunable <devname> rx-copybreak`. By this patch, hds_threshol is set to the rx-copybreak value. But it will be set by `ethtool -G eth0 hds-thresh N` in the next patch. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-7-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 14:42:11 -08:00
Russell King (Oracle)	21f56ad1b2	net: bcm: asp2: convert to phylib managed EEE Convert the Broadcom ASP2 driver to use phylib managed EEE support. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk81-000r4x-TS@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 13:17:56 -08:00
Russell King (Oracle)	df8017e8a1	net: bcm: asp2: remove tx_lpi_enabled Phylib maintains a copy of tx_lpi_enabled, which will be used to populate the member when phy_ethtool_get_eee(). Therefore, writing to this member before phy_ethtool_get_eee() will have no effect. Remove it. Also remove setting our copy of info->eee.tx_lpi_enabled which becomes write-only. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk7w-000r4r-Pq@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 13:17:56 -08:00
Russell King (Oracle)	54033f5512	net: bcm: asp2: fix LPI timer handling Fix the LPI timer handling in Broadcom ASP2 driver after the phylib managed EEE patches were merged. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk7r-000r4l-Li@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 13:17:56 -08:00
Kalesh AP	57e6464c22	RDMA/bnxt_re: Pass the context for ulp_irq_stop ulp_irq_stop() can be invoked from a context where FW is healthy or when FW is in a reset state. In the latter case, ULP must stop all interactions with HW/FW and also with application and stack. Added a new parameter to the ulp_irq_stop() function to achieve that. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-01-14 06:22:10 -05:00
Kalesh AP	7fea327840	RDMA/bnxt_re: Add Async event handling support Using the option provided by Ethernet driver, register for FW Async event. During probe, while registeriung with Ethernet driver, provide the ulp hook 'ulp_async_notifier' for receiving the firmware events. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-01-14 04:05:20 -05:00
Michael Chan	184fe6f238	bnxt_en: Add ULP call to notify async events When the driver receives an async event notification from the Firmware, we make the new ulp_async_notifier() call to inform the RDMA driver that a firmware async event has been received. RDMA driver can then take necessary actions based on the event type. In the next patch, we will implement the ulp_async_notifier() callbacks in the RDMA driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-2-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-01-14 03:39:46 -05:00
Jakub Kicinski	f0aa6a37a3	eth: bnxt: always recalculate features after XDP clearing, fix null-deref Recalculate features when XDP is detached. Before: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 \| grep gro rx-gro-hw: off [requested on] After: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 \| grep gro rx-gro-hw: on The fact that HW-GRO doesn't get re-enabled automatically is just a minor annoyance. The real issue is that the features will randomly come back during another reconfiguration which just happens to invoke netdev_update_features(). The driver doesn't handle reconfiguring two things at a time very robustly. Starting with commit `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") we only reconfigure the RSS hash table if the "effective" number of Rx rings has changed. If HW-GRO is enabled "effective" number of rings is 2x what user sees. So if we are in the bad state, with HW-GRO re-enablement "pending" after XDP off, and we lower the rings by / 2 - the HW-GRO rings doing 2x and the ethtool -L doing / 2 may cancel each other out, and the: if (old_rx_rings != bp->hw_resc.resv_rx_rings && condition in __bnxt_reserve_rings() will be false. The RSS map won't get updated, and we'll crash with: BUG: kernel NULL pointer dereference, address: 0000000000000168 RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0 bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180 __bnxt_setup_vnic_p5+0x58/0x110 bnxt_init_nic+0xb72/0xf50 __bnxt_open_nic+0x40d/0xab0 bnxt_open_nic+0x2b/0x60 ethtool_set_channels+0x18c/0x1d0 As we try to access a freed ring. The issue is present since XDP support was added, really, but prior to commit `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") it wasn't causing major issues. Fixes: `1054aee823` ("bnxt_en: Use NETIF_F_GRO_HW.") Fixes: `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-10 18:01:29 -08:00
Jakub Kicinski	14ea4cd1b1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.13-rc7). Conflicts: `a42d71e322` ("net_sched: sch_cake: Add drop reasons") `737d4d91d3` ("sched: sch_cake: add bounds checks to host bulk flow fairness counts") Adjacent changes: drivers/net/ethernet/meta/fbnic/fbnic.h `3a856ab347` ("eth: fbnic: add IRQ reuse support") `95978931d5` ("eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-09 16:11:47 -08:00
Michael Chan	40452969a5	bnxt_en: Fix DIM shutdown DIM work will call the firmware to adjust the coalescing parameters on the RX rings. We should cancel DIM work before we call the firmware to free the RX rings. Otherwise, FW will reject the call from DIM work if the RX ring has been freed. This will generate an error message like this: bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2 and cause unnecessary concern for the user. It is also possible to modify the coalescing parameters of the wrong ring if the ring has been re-allocated. To prevent this, cancel DIM work right before freeing the RX rings. We also have to add a check in NAPI poll to not schedule DIM if the RX rings are shutting down. Check that the VNIC is active before we schedule DIM. The VNIC is always disabled before we free the RX rings. Fixes: `0bc0b97fca` ("bnxt_en: cleanup DIM work on device shutdown") Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-06 16:40:26 -08:00
Kalesh AP	c8dafb0e43	bnxt_en: Fix possible memory leak when hwrm_req_replace fails When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop() which could cause a memory leak. Fixes: `bbf33d1d98` ("bnxt_en: update all firmware calls to use the new APIs") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-06 16:40:26 -08:00
Jakub Kicinski	385f186aba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.13-rc6). No conflicts. Adjacent changes: include/linux/if_vlan.h `f91a5b8089` ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK") `3f330db306` ("net: reformat kdoc return statements") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-03 16:29:29 -08:00
Vitalii Mordan	b255ef45fc	eth: bcmsysport: fix call balance of priv->clk handling routines Check the return value of clk_prepare_enable to ensure that priv->clk has been successfully enabled. If priv->clk was not enabled during bcm_sysport_probe, bcm_sysport_resume, or bcm_sysport_open, it must not be disabled in any subsequent execution paths. Fixes: `31bc72d976` ("net: systemport: fetch and use clock resources") Signed-off-by: Vitalii Mordan <mordan@ispras.ru> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241227123007.2333397-1-mordan@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-30 17:33:46 -08:00
Michael Chan	bf2afe0f14	bnxt_en: Skip reading PXP registers during ethtool -d if unsupported Newer firmware does not allow reading the PXP registers during ethtool -d, so skip the firmware call in that case. Userspace (bnxt.c) always expects the register block to be populated so zeroes will be returned instead. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Michael Chan	b45a850585	bnxt_en: Skip MAC loopback selftest if it is unsupported by FW Call the new HWRM_PORT_MAC_QCAPS to check if mac loopback is supported. Skip the MAC loopback ethtool self test if it is not supported. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20241217182620.2454075-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Michael Chan	36d1e70a90	bnxt_en: Skip PHY loopback ethtool selftest if unsupported by FW Skip PHY loopback selftest if firmware advertises that it is unsupported in the HWRM_PORT_PHY_QCAPS call. Only show PHY loopback test result to be 0 if the test has run and passes. Do the same for external loopback to be consistent. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Michael Chan	fac5472fc8	bnxt_en: Do not allow ethtool -m on an untrusted VF Block all ethtool module operations on an untrusted VF. The firmware won't allow it and will return error. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Hongguang Gao	b1b66ae094	bnxt_en: Use FW defined resource limits for RoCE If FW supports setting resource limits for RoCE, then just use the FW limits instead of using some fixed values in the driver. These limits will be used to allocate context memory for QP, SRQ, AH, and MR resources for RoCE. Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Joe Hattori	0cb2c504d7	net: ethernet: bgmac-platform: fix an OF node reference leak The OF node obtained by of_parse_phandle() is not freed. Call of_node_put() to balance the refcount. This bug was found by an experimental static analysis tool that I am developing. Fixes: `1676aba5ef` ("net: ethernet: bgmac: device tree phy enablement") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241214014912.2810315-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-12-17 13:22:05 +01:00
Michael Chan	24c6843b73	bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of the previous generation (5750X or P5). However, the aggregation ID fields in the completion structures on P7 have been redefined from 16 bits to 12 bits. The freed up 4 bits are redefined for part of the metadata such as the VLAN ID. The aggregation ID mask was not modified when adding support for P7 chips. Including the extra 4 bits for the aggregation ID can potentially cause the driver to store or fetch the packet header of GRO/LRO packets in the wrong TPA buffer. It may hit the BUG() condition in __skb_pull() because the SKB contains no valid packet header: kernel BUG at include/linux/skbuff.h:2766! Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022 RIP: 0010:eth_type_trans+0xda/0x140 Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48 RSP: 0018:ff615003803fcc28 EFLAGS: 00010283 RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040 RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000 RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001 R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0 R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000 FS: 0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? eth_type_trans+0xda/0x140 ? do_error_trap+0x65/0x80 ? eth_type_trans+0xda/0x140 ? exc_invalid_op+0x4e/0x70 ? eth_type_trans+0xda/0x140 ? asm_exc_invalid_op+0x16/0x20 ? eth_type_trans+0xda/0x140 bnxt_tpa_end+0x10b/0x6b0 [bnxt_en] ? bnxt_tpa_start+0x195/0x320 [bnxt_en] bnxt_rx_pkt+0x902/0xd90 [bnxt_en] ? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en] ? kmem_cache_free+0x343/0x440 ? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en] __bnxt_poll_work+0x193/0x370 [bnxt_en] bnxt_poll_p5+0x9a/0x300 [bnxt_en] ? try_to_wake_up+0x209/0x670 __napi_poll+0x29/0x1b0 Fix it by redefining the aggregation ID mask for P5_PLUS chips to be 12 bits. This will work because the maximum aggregation ID is less than 4096 on all P5_PLUS chips. Fixes: `13d2d3d381` ("bnxt_en: Add new P7 hardware interface definitions") Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-10 18:25:38 -08:00
Hongguang Gao	fab4b4d2c9	bnxt_en: Fix potential crash when dumping FW log coredump If the FW log context memory is retained after FW reset, the existing code is not handling the condition correctly and zeroes out the data structures. This potentially will cause a division by zero crash when the user runs ethtool -w. The last_type is also not set correctly when the context memory is retained. This will cause errors because the last_type signals to the FW that all context memory types have been configured. Oops: divide error: 0000 1 PREEMPT SMP NOPTI CPU: 53 UID: 0 PID: 7019 Comm: ethtool Kdump: loaded Tainted: G OE 6.12.0-rc7+ #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Supermicro SYS-621C-TN12R/X13DDW-A, BIOS 1.4 08/10/2023 RIP: 0010:__bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] Code: 0a 31 d2 4c 89 6c 24 10 45 8b a5 fc df ff ff 4c 8b 74 24 20 31 db 66 89 44 24 06 48 63 c5 c1 e5 09 4c 0f af e0 48 8b 44 24 30 <49> f7 f4 4c 89 64 24 08 48 63 c5 4d 89 ec 31 ed 48 89 44 24 18 49 RSP: 0018:ff480591603d78b8 EFLAGS: 00010206 RAX: 0000000000100000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ff23959e46740000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000100000 R09: ff23959e46740000 R10: ff480591603d7a18 R11: 0000000000000010 R12: 0000000000000000 R13: ff23959e46742008 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f04227c1740(0000) GS:ff2395adbf680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f04225b33a5 CR3: 000000108b9a4001 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? do_error_trap+0x65/0x80 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? exc_divide_error+0x36/0x50 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? asm_exc_divide_error+0x16/0x20 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0xda/0x160 [bnxt_en] bnxt_get_ctx_coredump.constprop.0+0x1ed/0x390 [bnxt_en] ? __memcg_slab_post_alloc_hook+0x21c/0x3c0 ? __bnxt_get_coredump+0x473/0x4b0 [bnxt_en] __bnxt_get_coredump+0x473/0x4b0 [bnxt_en] ? security_file_alloc+0x74/0xe0 ? cred_has_capability.isra.0+0x78/0x120 bnxt_get_coredump_length+0x4b/0xf0 [bnxt_en] bnxt_get_dump_flag+0x40/0x60 [bnxt_en] __dev_ethtool+0x17e4/0x1fc0 ? syscall_exit_to_user_mode+0xc/0x1d0 ? do_syscall_64+0x85/0x150 ? unmap_page_range+0x299/0x4b0 ? vma_interval_tree_remove+0x215/0x2c0 ? __kmalloc_cache_noprof+0x10a/0x300 dev_ethtool+0xa8/0x170 dev_ioctl+0x1b5/0x580 ? sk_ioctl+0x4a/0x110 sock_do_ioctl+0xab/0xf0 sock_ioctl+0x1ca/0x2e0 __x64_sys_ioctl+0x87/0xc0 do_syscall_64+0x79/0x150 Fixes: `24d694aec1` ("bnxt_en: Allocate backing store memory for FW trace logs") Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204215918.1692597-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-06 17:39:13 -08:00
Michael Chan	de37faf41a	bnxt_en: Fix GSO type for HW GRO packets on 5750X chips The existing code is using RSS profile to determine IPV4/IPV6 GSO type on all chips older than 5760X. This won't work on 5750X chips that may be using modified RSS profiles. This commit from 2018 has updated the driver to not use RSS profile for HW GRO packets on newer chips: `50f011b63d` ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.") However, a recent commit to add support for the newest 5760X chip broke the logic. If the GRO packet needs to be re-segmented by the stack, the wrong GSO type will cause the packet to be dropped. Fix it to only use RSS profile to determine GSO type on the oldest 5730X/5740X chips which cannot use the new method and is safe to use the RSS profiles. Also fix the L3/L4 hash type for RX packets by not using the RSS profile for the same reason. Use the ITYPE field in the RX completion to determine L3/L4 hash types correctly. Fixes: `a7445d6980` ("bnxt_en: Add support for new RX and TPA_START completion types for P7") Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-06 17:39:13 -08:00
David Wei	bd649c5cc9	bnxt_en: handle tpa_info in queue API implementation Commit `7ed816be35` ("eth: bnxt: use page pool for head frags") added a page pool for header frags, which may be distinct from the existing pool for the aggregation ring. Prior to this change, frags used in the TPA ring rx_tpa were allocated from system memory e.g. napi_alloc_frag() meaning their lifetimes were not associated with a page pool. They can be returned at any time and so the queue API did not alloc or free rx_tpa. But now frags come from a separate head_pool which may be different to page_pool. Without allocating and freeing rx_tpa, frags allocated from the old head_pool may be returned to a different new head_pool which causes a mismatch between the pp hold/release count. Fix this problem by properly freeing and allocating rx_tpa in the queue API implementation. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204041022.56512-4-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-04 19:23:35 -08:00
David Wei	bf1782d70d	bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for allocating rx_agg_bmap. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204041022.56512-3-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-04 19:23:35 -08:00
David Wei	5883a3e0ba	bnxt_en: refactor tpa_info alloc/free into helpers Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work on a single tpa_info in prep for queue API using them. There are 2 pairs of operations: * bnxt_alloc_one_tpa_info() * bnxt_free_one_tpa_info() These alloc/free the tpa_info array itself. * bnxt_alloc_one_tpa_info_data() * bnxt_free_one_tpa_info_data() These alloc/free the frags stored in tpa_info array. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204041022.56512-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-04 19:23:35 -08:00
Daniel Xu	be75cda92a	bnxt_en: ethtool: Supply ntuple rss context action Commit `2f4f9fe5bf` ("bnxt_en: Support adding ntuple rules on RSS contexts") added support for redirecting to an RSS context as an ntuple rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE codepath. This caused `ethtool -n` to always report the action as "Action: Direct to queue 0" which is wrong. Fix by teaching bnxt driver to report the RSS context when applicable. Fixes: `2f4f9fe5bf` ("bnxt_en: Support adding ntuple rules on RSS contexts") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://patch.msgid.link/2e884ae39e08dc5123be7c170a6089cefe6a78f7.1732748253.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-30 14:16:12 -08:00
Linus Torvalds	65ae975e97	Including fixes from bluetooth. Current release - regressions: - rtnetlink: fix rtnl_dump_ifinfo() error path - bluetooth: remove the redundant sco_conn_put Previous releases - regressions: - netlink: fix false positive warning in extack during dumps - sched: sch_fq: don't follow the fast path if Tx is behind now - ipv6: delete temporary address if mngtmpaddr is removed or unmanaged - tcp: fix use-after-free of nreq in reqsk_timer_handler(). - bluetooth: fix slab-use-after-free Read in set_powered_sync - l2tp: fix warning in l2tp_exit_net found - eth: bnxt_en: fix receive ring space parameters when XDP is active - eth: lan78xx: fix double free issue with interrupt buffer allocation - eth: tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets Previous releases - always broken: - ipmr: fix tables suspicious RCU usage - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() - eth: octeontx2-af: fix low network performance - eth: stmmac: dwmac-socfpga: set RX watchdog interrupt as broken - eth: rtase: correct the speed for RTL907XD-V1 Misc: - some documentation fixup Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmdIolwSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk/fEP/01Nuobq5teEiJgfV25xMqKT8EtvtrTk QatoPMD4UrpxbTBlA6wc23wBewBCVHG6IKVTVH00mUsWbZv561PNnXexD5yTLlor p4XSyaUwXeUzD+9LsxlTJGyp2gKGrir6NY6R/pYaJJ7pjxuRQKOl+qXf7s7IjIye Fnh8LAxIhr/LdBCJBV4tajS5VfCB6svT+uFCflbOw0Ng/quGfKchTHGTBxyHr3Ef mw0XsFew+6hDt72l9u0BNUewsSNfcfxSR343Z/DCaS03ZRQxhsB9I2v0WfgteO+U 3xdRG1WvphfYsN/C/zJ19OThAmbKE+u4gz8Z07yebpgFN5jbe5Rcf7IVcXiexd0Y 2fivK7DFU06TLukqBkUqqwPzAgh1w/KA+ia119WteYKxxTchu9td7+L4pr9qU4Tg Nipq0MYaj0cEebf+DdlG+2UFjMzaTiN/Ph1Cdh15bqMaVhn/eOk+L959y/XUlBm0 vpNL2SaFg8ki1N3SyTCFvmS3w8P+jM/KaA3fQv8hfG9Ceab5NKEoUff1VdjDBh9X sS7I15rg8s0CV1DWDJn6Mvex30e2+/yesjJbD/D9HDcb1y2vmbwz9t5L3yFpoNbc +qxRawoxj+Vi/4DZNnZKHvTkc0+hOm4f+BtUGiGBfBnIIrqvYh3DnQTc5res6l0e ZdG0B4yEZedj =7dW1 -----END PGP SIGNATURE----- Merge tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Current release - regressions: - rtnetlink: fix rtnl_dump_ifinfo() error path - bluetooth: remove the redundant sco_conn_put Previous releases - regressions: - netlink: fix false positive warning in extack during dumps - sched: sch_fq: don't follow the fast path if Tx is behind now - ipv6: delete temporary address if mngtmpaddr is removed or unmanaged - tcp: fix use-after-free of nreq in reqsk_timer_handler(). - bluetooth: fix slab-use-after-free Read in set_powered_sync - l2tp: fix warning in l2tp_exit_net found - eth: - bnxt_en: fix receive ring space parameters when XDP is active - lan78xx: fix double free issue with interrupt buffer allocation - tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets Previous releases - always broken: - ipmr: fix tables suspicious RCU usage - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() - eth: - octeontx2-af: fix low network performance - stmmac: dwmac-socfpga: set RX watchdog interrupt as broken - rtase: correct the speed for RTL907XD-V1 Misc: - some documentation fixup" * tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) ipmr: fix build with clang and DEBUG_NET disabled. Documentation: tls_offload: fix typos and grammar Fix spelling mistake ipmr: fix tables suspicious RCU usage ip6mr: fix tables suspicious RCU usage ipmr: add debug check for mr table cleanup selftests: rds: move test.py to TEST_FILES net_sched: sch_fq: don't follow the fast path if Tx is behind now tcp: Fix use-after-free of nreq in reqsk_timer_handler(). net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI net: Comment copy_from_sockptr() explaining its behaviour rxrpc: Improve setsockopt() handling of malformed user input llc: Improve setsockopt() handling of malformed user input Bluetooth: SCO: remove the redundant sco_conn_put Bluetooth: MGMT: Fix possible deadlocks Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync bnxt_en: Unregister PTP during PCI shutdown and suspend bnxt_en: Refactor bnxt_ptp_init() bnxt_en: Fix receive ring space parameters when XDP is active bnxt_en: Fix queue start to update vnic RSS table ...	2024-11-28 10:15:20 -08:00
Michael Chan	3661c05c54	bnxt_en: Unregister PTP during PCI shutdown and suspend If we go through the PCI shutdown or suspend path, we shutdown the NIC but PTP remains registered. If the kernel continues to run for a little bit, the periodic PTP .do_aux_work() function may be called and it will read the PHC from the BAR register. Since the device has already been disabled, it will cause a PCIe completion timeout. Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend handlers. bnxt_ptp_clear() will unregister from PTP and .do_aux_work() will be canceled. In bnxt_resume(), we need to re-initialize PTP. Fixes: `a521c8a01d` ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()") Cc: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Michael Chan	1e9614cd95	bnxt_en: Refactor bnxt_ptp_init() Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init(). Store it in bp->ptp_cfg so that the caller doesn't need to know what the value should be. In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume() and this will make it easier. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Shravya KN	3051a77a09	bnxt_en: Fix receive ring space parameters when XDP is active The MTU setting at the time an XDP multi-buffer is attached determines whether the aggregation ring will be used and the rx_skb_func handler. This is done in bnxt_set_rx_skb_mode(). If the MTU is later changed, the aggregation ring setting may need to be changed and it may become out-of-sync with the settings initially done in bnxt_set_rx_skb_mode(). This may result in random memory corruption and crashes as the HW may DMA data larger than the allocated buffer size, such as: BUG: kernel NULL pointer dereference, address: 00000000000003c0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1 Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021 RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en] Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202 RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380 RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980 R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990 FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> __bnxt_poll_work+0x1c2/0x3e0 [bnxt_en] To address the issue, we now call bnxt_set_rx_skb_mode() within bnxt_change_mtu() to properly set the AGG rings configuration and update rx_skb_func based on the new MTU value. Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on the current MTU. Fixes: `08450ea98a` ("bnxt_en: Fix max_mtu setting for multi-buf XDP") Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Somnath Kotur	5ac066b7b0	bnxt_en: Fix queue start to update vnic RSS table HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to have the same FW ring ID as before. So we must reinitialize the RSS table with the correct ring IDs. Otherwise, traffic may not resume properly if the restarted ring ID is stale. Since this feature is only supported on P5_PLUS chips, we call bnxt_vnic_set_rss_p5() to update the HW RSS table. Fixes: `2d694c27d3` ("bnxt_en: implement netdev_queue_mgmt_ops") Cc: David Wei <dw@davidwei.uk> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Shravya KN	5007991670	bnxt_en: Set backplane link modes correctly for ethtool Use the return value from bnxt_get_media() to determine the port and link modes. bnxt_get_media() returns the proper BNXT_MEDIA_KR when the PHY is backplane. This will correct the ethtool settings for backplane devices. Fixes: `5d4e1bf606` ("bnxt_en: extend media types to supported and autoneg modes") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Saravanan Vajravel	5311598f7f	bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down After successful PCIe AER recovery, FW will reset all resource reservations. If it is IF_UP, the driver will call bnxt_open() and all resources will be reserved again. It it is IF_DOWN, we should call bnxt_reserve_rings() so that we can reserve resources including RoCE resources to allow RoCE to resume after AER. Without this patch, RoCE fails to resume in this IF_DOWN scenario. Later, if it becomes IF_UP, bnxt_open() will see that resources have been reserved and will not reserve again. Fixes: `fb1e6e562b` ("bnxt_en: Fix AER recovery.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Pavan Chebbi	614f4d166e	tg3: Set coherent DMA mask bits to 31 for BCM57766 chipsets The hardware on Broadcom 1G chipsets have a known limitation where they cannot handle DMA addresses that cross over 4GB. When such an address is encountered, the hardware sets the address overflow error bit in the DMA status register and triggers a reset. However, BCM57766 hardware is setting the overflow bit and triggering a reset in some cases when there is no actual underlying address overflow. The hardware team analyzed the issue and concluded that it is happening when the status block update has an address with higher (b16 to b31) bits as 0xffff following a previous update that had lowest bits as 0xffff. To work around this bug in the BCM57766 hardware, set the coherent dma mask from the current 64b to 31b. This will ensure that upper bits of the status block DMA address are always at most 0x7fff, thus avoiding the improper overflow check described above. This work around is intended for only status block and ring memories and has no effect on TX and RX buffers as they do not require coherent memory. Fixes: `72f2afb8a6` ("[TG3]: Add DMA address workaround") Reported-by: Salam Noureddine <noureddine@arista.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20241119055741.147144-1-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-24 16:45:48 -08:00
Linus Torvalds	2a163a4cea	RDMA v6.13 merge window pull request Seveal fixes scattered across the drivers and a few new features: - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns - Force disassociate the userspace FD when hns does an async reset - bnxt new features for optimized modify QP to skip certain stayes, CQ coalescing, better debug dumping - mlx5 new data placement ordering feature - Faster destruction of mlx5 devx HW objects - Improvements to RDMA CM mad handling -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZz4ENwAKCRCFwuHvBreF YQYQAP9R54r5J1Iylg+zqhCc+e/9oveuuZbfLvy/EJiEpmdprQEAgPs1RrB0z7U6 1xrVStUKNPhGd5XeVVZGkIV0zYv6Tw4= =V5xI -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Seveal fixes scattered across the drivers and a few new features: - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns - Force disassociate the userspace FD when hns does an async reset - bnxt new features for optimized modify QP to skip certain stayes, CQ coalescing, better debug dumping - mlx5 new data placement ordering feature - Faster destruction of mlx5 devx HW objects - Improvements to RDMA CM mad handling" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits) RDMA/bnxt_re: Correct the sequence of device suspend RDMA/bnxt_re: Use the default mode of congestion control RDMA/bnxt_re: Support different traffic class IB/cm: Rework sending DREQ when destroying a cm_id IB/cm: Do not hold reference on cm_id unless needed IB/cm: Explicitly mark if a response MAD is a retransmission RDMA/mlx5: Move events notifier registration to be after device registration RDMA/bnxt_re: Cache MSIx info to a local structure RDMA/bnxt_re: Refurbish CQ to NQ hash calculation RDMA/bnxt_re: Refactor NQ allocation RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved RDMA/hns: Fix different dgids mapping to the same dip_idx RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design bnxt_en: Add support for RoCE sriov configuration RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg() RDMA/hns: Fix out-of-order issue of requester when setting FENCE RDMA/nldev: Add IB device and net device rename events RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation RDMA/core: Move ib_uverbs_file struct to uverbs_types.h ...	2024-11-22 20:03:57 -08:00
Shruti Parab	3c2179e663	bnxt_en: Add FW trace coredump segments to the coredump The FW trace coredump segments are very similar to the context memory segments in the previous patch. The main difference is to call HWRM_DBG_LOG_BUFFER_FLUSH to flush the FW data to host memory and to include an additional record in the coredump that contains the head and tail information of the trace data. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Michael Chan	bda2e63a50	bnxt_en: Add a new ethtool -W dump flag Add a new ethtool -W dump flag (2) to include driver coredump segments. This patch adds the host backing store context memory pages used by the chip and FW to store various states to the coredump. The pages for each context memory type is dumped into a separate coredump segment. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Shruti Parab <shruti.parab@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Shruti Parab	a854a17097	bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr() Pass the component ID and segment ID to this function to create the coredump segment header. This will be needed in the next patches to create more segments for the coredump. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Sreekanth Reddy	23a18b91b6	bnxt_en: Add functions to copy host context memory Host context memory is used by the newer chips to store context information for various L2 and RoCE states and FW logs. This information will be useful for debugging. This patch adds the functions to copy all pages of a context memory type to a contiguous buffer. The next patches will include the context memory dump during ethtool -w coredump. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Co-developed-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Hongguang Gao	de999362ad	bnxt_en: Do not free FW log context memory If FW supports appending new FW logs to an offset in the context memory after FW reset, then do not free this type of context memory during reset. The driver will provide the initial offset to the FW when configuring this type of context memory. This way, we don't lose the older FW logs after reset. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Shruti Parab	84fcd9449f	bnxt_en: Manage the FW trace context memory The FW trace memory pages will be added to the ethtool -w coredump in later patches. In addition to the raw data, the driver has to add a header to provide the head and tail information on each FW trace log segment when creating the coredump. The FW sends an async message to the driver after DMAing a chunk of logs to the context memory to indicate the last offset containing the tail of the logs. The driver needs to keep track of that. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Shruti Parab	24d694aec1	bnxt_en: Allocate backing store memory for FW trace logs Allocate the new FW trace log backing store context memory types if they are supported by the FW. FW debug logs are DMA'ed to the host backing store memory when the on-chip buffers are full. If host memory cannot be allocated for these memory types, the driver will not abort. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Hongguang Gao	46010d43ab	bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem() If 'force' is false, it will keep the memory pages and all data structures for the context memory type if the memory is valid. This patch always passes true for the 'force' parameter so there is no change in behavior. Later patches will adjust the 'force' parameter for the FW log context memory types so that the logs will not be reset after FW reset. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Hongguang Gao	968d2cc07c	bnxt_en: Refactor bnxt_free_ctx_mem() Add a new function bnxt_free_one_ctx_mem() to free one context memory type. bnxt_free_ctx_mem() now calls the new function in the loop to free each context memory type. There is no change in behavior. Later patches will further make use of the new function. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Shruti Parab	0b350b4927	bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type Add a new bit to struct bnxt_ctx_mem_type to indicate that host memory has been successfully allocated for this context memory type. In the next patches, we'll be adding some additional context memory types for FW debugging/logging. If memory cannot be allocated for any of these new types, we will not abort and the cleared mem_valid bit will indicate to skip configuring the memory type. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-of-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Michael Chan	ff00bcc9ec	bnxt_en: Update firmware interface spec to 1.10.3.85 The major change is the new firmware command to flush the FW debug logs to the host backing store context memory buffers. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Kees Cook	1cfb5e5788	Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings" This reverts commit `3bd9b9abdf`. We cannot use the new tagged struct group because it throws C++ errors even under "extern C". Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 18:52:11 -08:00
Vadim Fedorenko	c7a21af711	bnxt_en: optimize gettimex64 Current implementation of gettimex64() makes at least 3 PCIe reads to get current PHC time. It takes at least 2.2us to get this value back to userspace. At the same time there is cached value of upper bits of PHC available for packet timestamps already. This patch reuses cached value to speed up reading of PHC time. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-15 14:26:05 -08:00
Jakub Kicinski	7ed816be35	eth: bnxt: use page pool for head frags Testing small size RPCs (300B-400B) on a large AMD system suggests that page pool recycling is very useful even for just the head frags. With this patch (and copy break disabled) I see a 30% performance improvement (82Gbps -> 106Gbps). Convert bnxt from normal page frags to page pool frags for head buffers. On systems with small page size we can use the same pool as for TPA pages. On systems with large pages the frag allocation logic of the page pool is already used to split a large page into TPA chunks. TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB) and we always allocate the same sized chunks. Mixing allocation of TPA and head pages would lead to sub-optimal memory use. Plus Taehee's work on zero-copy / devmem will need to differentiate between TPA and non-TPA page pool, anyway. Conditionally allocate a new page pool for heads. Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-12 18:26:38 -08:00
Bhargava Chenna Marreddy	304cc83807	RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design Refine RoCE SRIOV resource configuration design, using the INITIALIZE_FW's flag as an indication for the new design to the firmware. RoCE driver does not have to provision resources to VF when firmware advertises support for RoCE resource management by NIC driver. Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-11-12 03:04:04 -05:00
Vikas Gupta	53371c5c21	bnxt_en: Add support for RoCE sriov configuration During driver load, PF RDMA driver provisions resources to the RDMA VFs. This logic takes into consideration of the total number of VFs supported on the PF while allocating resources. Firmware now advertises a capability where NIC driver can allocate resources for RDMA VFs when the user actually creates a VF. So this resource distribution can be based on the number of active VFs. This patch adds the support to check for the firmware capability and follow the new RDMA VF resource allocation strategy. The current logic in the RDMA driver will be removed for the newer Firmware versions in a subsequent patch in this series. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-11-12 03:04:04 -05:00
Vadim Fedorenko	f0fe51a043	bnxt_en: add unlocked version of bnxt_refclk_read Serialization of PHC read with FW reset mechanism uses ptp_lock which also protects timecounter updates. This means we cannot grab it when called from bnxt_cc_read(). Let's move locking into different function. Fixes: `6c0828d00f` ("bnxt_en: replace PTP spinlock with seqlock") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20241107214917.2980976-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-11 17:32:29 -08:00
Mohammad Heib	fcf42409c6	bnxt_en: use irq_update_affinity_hint() irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. bnxt_en device reopening will resets the affinity in bnxt_open(). 3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-11 15:31:03 -08:00
Rosen Penev	fda960354e	net: broadcom: use ethtool string helpers The latter is the preferred way to copy ethtool strings. Avoids manually incrementing the pointer. Cleans up the code quite well. Signed-off-by: Rosen Penev <rosenp@gmail.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241104205317.306140-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:51:02 -08:00
Rosen Penev	4069dcb7da	net: bnx2x: use ethtool string helpers The latter is the preferred way to copy ethtool strings. Avoids manually incrementing the pointer. Cleans up the code quite well. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241104202326.78418-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:50:37 -08:00
Daniel Xu	5f143efd38	bnxt_en: ethtool: Support unset l4proto on ip4/ip6 ntuple rules Previously, trying to insert an ip4/ip6 ntuple rule with an unset l4proto would get rejected with -EOPNOTSUPP. For example, the following would fail: ethtool -N eth0 flow-type ip6 dst-ip $IP6 context 1 The reason was that all the l4proto validation was being run despite the l4proto mask being set to 0x0. Fix by respecting the mask on l4proto and treating a mask of 0x0 as wildcard l4proto. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/1ac93a2836b25f79e7045f8874d9a17875229ffc.1730778566.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:39:59 -08:00
Daniel Xu	050eb2cebb	bnxt_en: ethtool: Remove ip4/ip6 ntuple support for IPPROTO_RAW Commit `9ba0e56199` ("bnxt_en: Enhance ethtool ntuple support for ip flows besides TCP/UDP") added support for ip4/ip6 ntuple rules. However, if you wanted to wildcard over l4proto, you had to provide 0xFF. The choice of 0xFF is non-standard and non-intuitive. Delete support for it in this commit. Next commit we will introduce a cleaner way to wildcard l4proto. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/a5ba0d3bd926d27977c317efa7fdfbc8a704d2b8.1730778566.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:39:59 -08:00
Vadim Fedorenko	6c0828d00f	bnxt_en: replace PTP spinlock with seqlock We can see high contention on ptp_lock while doing RX timestamping on high packet rates over several queues. Spinlock is not effecient to protect timecounter for RX timestamps when reads are the most usual operations and writes are only occasional. It's better to use seqlock in such cases. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241103215108.557531-2-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-05 17:33:26 -08:00
Vadim Fedorenko	bb2ef9b92b	bnxt_en: cache only 24 bits of hw counter This hardware can provide only 48 bits of cycle counter. We can leave only 24 bits in the cache to extend RX timestamps from 32 bits to 48 bits. Lower 8 bits of the cached value will be used to check for roll-over while extending to full 48 bits. This change makes cache writes atomic even on 32 bit platforms and we can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The configuration structure will be also reduced by 4 bytes. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241103215108.557531-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-05 17:33:26 -08:00
Caleb Sander Mateos	61bf0009a7	dim: pass dim_sample to net_dim() by reference net_dim() is currently passed a struct dim_sample argument by value. struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64 passes it on the stack. All callers have already initialized dim_sample on the stack, so passing it by value requires pushing a duplicated copy to the stack. Either witing to the stack and immediately reading it, or perhaps dereferencing addresses relative to the stack pointer in a chain of push instructions, seems to perform quite poorly. In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, 94% of which is attributed to the first push instruction to copy dim_sample on the stack for the call to net_dim(): // Call ktime_get() 0.26 \|4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47> // Pass the address of struct dim in %rdi \|4ead7: lea 0x3d0(%rbx),%rdi // Set dim_sample.pkt_ctr \|4eade: mov %r13d,0x8(%rsp) // Set dim_sample.byte_ctr \|4eae3: mov %r12d,0xc(%rsp) // Set dim_sample.event_ctr 0.15 \|4eae8: mov %bp,0x10(%rsp) // Duplicate dim_sample on the stack 94.16 \|4eaed: push 0x10(%rsp) 2.79 \|4eaf1: push 0x10(%rsp) 0.07 \|4eaf5: push %rax // Call net_dim() 0.21 \|4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b> To allow the caller to reuse the struct dim_sample already on the stack, pass the struct dim_sample by reference to net_dim(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 12:36:54 -08:00
Rosen Penev	9b4b2e02c1	net: bnxt: use ethtool string helpers Avoids having to use manual pointer manipulation. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241029233229.9385-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 11:09:25 -08:00
Gustavo A. R. Silva	3bd9b9abdf	net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Change the type of the middle struct member currently causing trouble from `struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`. Additionally, update the type of some variables in various functions that don't access the flexible-array member, changing them to the newly created `struct ethtool_link_settings_hdr`. These changes are needed because the type of the conflicting middle members changed. So, those instances that expect the type to be `struct ethtool_link_settings` should be adjusted to the newly created type `struct ethtool_link_settings_hdr`. Also, adjust variable declarations to follow the reverse xmas tree convention. Fix 3338 of the following -Wflex-array-member-not-at-end warnings: include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 11:06:58 -08:00
Florian Fainelli	e69fbd287d	net: systemport: Move IO macros to header file Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h where it is more appropriate and where static inline helpers are acceptable. While at it, make sure that the macro 'offset' argument does not trigger a checkpatch warning due to possible argument re-use. Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-28 15:54:37 -07:00
Florian Fainelli	890bde75a2	net: systemport: Remove unused txchk accessors Vladimir reported the following warning with clang-16 and W=1: warning: unused function 'txchk_readl' [-Wunused-function] BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET); note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'txchk_writel' [-Wunused-function] note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'tbuf_readl' [-Wunused-function] BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET); note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'tbuf_writel' [-Wunused-function] note: expanded from macro 'BCM_SYSPORT_IO_MACRO' The TXCHK and RBUF blocks are not being accessed, remove the IO macros used to access those blocks. No functional impact. Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-28 15:54:37 -07:00
Paolo Abeni	03fc07a247	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-25 09:08:22 +02:00
Paolo Abeni	91afa49a3e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.12-rc4). Conflicts: `107a034d5c` ("net/mlx5: qos: Store rate groups in a qos domain") `1da9cfd6c4` ("net/mlx5: Unregister notifier on eswitch init failure") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-21 09:14:18 +02:00
WangYuli	9e2ffec543	eth: Fix typo 'accelaration'. 'exprienced' and 'rewritting' There are some spelling mistakes of 'accelaration', 'exprienced' and 'rewritting' in comments which should be 'acceleration', 'experienced' and 'rewriting'. Suggested-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/all/20241017162846.GA51712@kernel.org/ Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <90D42CB167CA0842+20241018021910.31359-1-wangyuli@uniontech.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>	2024-10-20 11:06:48 -05:00
Vadim Fedorenko	4ab3e4983b	bnxt_en: replace ptp_lock with irqsave variant In netpoll configuration the completion processing can happen in hard irq context which will break with spin_lock_bh() for fullfilling RX timestamp in case of all packets timestamping. Replace it with spin_lock_irqsave() variant. Fixes: `7f5515d19c` ("bnxt_en: Get the RX packet timestamp") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Message-ID: <20241016195234.2622004-1-vadfed@meta.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>	2024-10-19 16:16:25 -05:00
Andy Shevchenko	abb7c98b99	tg3: Increase buffer size for IRQ label GCC is not happy with the current code, e.g.: .../tg3.c:11313:37: error: ‘-txrx-’ directive output may be truncated writing 6 bytes into a region of size between 1 and 16 [-Werror=format-truncation=] 11313 \| "%s-txrx-%d", tp->dev->name, irq_num); \| ^~~~~~ .../tg3.c:11313:34: note: using the range [-2147483648, 2147483647] for directive argument 11313 \| "%s-txrx-%d", tp->dev->name, irq_num); When `make W=1` is supplied, this prevents kernel building. Fix it by increasing the buffer size for IRQ label and use sizeoF() instead of hard coded constants. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Message-ID: <20241016090647.691022-1-andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>	2024-10-17 21:24:24 -05:00
Wang Hai	fed07d3eb8	net: bcmasp: fix potential memory leak in bcmasp_xmit() The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb in case of mapping fails, add dev_kfree_skb() to fix it. Fixes: `490cb41200` ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-15 17:10:27 -07:00
Wang Hai	c401ed1c70	net: systemport: fix potential memory leak in bcm_sysport_xmit() The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb in case of dma_map_single() fails, add dev_kfree_skb() to fix it. Fixes: `80105befdb` ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-15 12:53:52 -07:00
Joe Damato	4193652274	bnxt: Add support for persistent NAPI config Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-14 17:54:29 -07:00
Simon Horman	76d37e4fd6	tg3: Address byte-order miss-matches Address byte-order miss-matches flagged by Sparse. In tg3_load_firmware_cpu() and tg3_get_device_address() this is done using appropriate types to store big endian values. In the cases of tg3_test_nvram(), where buf is an array which contains values of several different types, cast to __le32 before converting values to host byte order. Reported by Sparse as: .../tg3.c:3745:34: warning: cast to restricted __be32 .../tg3.c:13096:21: warning: cast to restricted __le32 .../tg3.c:13096:21: warning: cast from restricted __be32 .../tg3.c:13101:21: warning: cast to restricted __le32 .../tg3.c:13101:21: warning: cast from restricted __be32 .../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types) .../tg3.c:17070:63: expected restricted __be32 [usertype] val .../tg3.c:17070:63: got unsigned int dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types) .../tg3.c:17071:63: expected restricted __be32 [usertype] val .../tg3.c:17071:63: got unsigned int Also, address white-space issues on lines modified for the above. And, for consistency, lines adjacent to them. Compile tested only. No functional change intended. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-14 17:27:10 -07:00
Justin Chen	c531f2269a	net: bcmasp: enable SW timestamping Add skb_tx_timestamp() call and enable support for SW timestamping. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241010221506.802730-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-11 16:03:31 -07:00
Justin Chen	ea22f8eabb	net: broadcom: remove select MII from brcmstb Ethernet drivers The MII driver isn't used by brcmstb Ethernet drivers. Remove it from the BCMASP, GENET, and SYSTEMPORT drivers. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241010191332.1074642-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-11 16:00:17 -07:00
Joe Damato	aec5514d73	tg3: Link queues to NAPIs Link queues to NAPIs using the netdev-genl API so this information is queryable. First, test with the default setting on my tg3 NIC at boot with 1 TX queue: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}] Now, adjust the number of TX queues to be 4 via ethtool: $ sudo ethtool -L eth0 tx 4 $ sudo ethtool -l eth0 \| tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows the tg3 has renamed the IRQs to be combined: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Now query this via netlink to ensure the queues are linked properly to their NAPIs: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}] As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID 8960, and so on for each queue index up to 3. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:29 -07:00
Joe Damato	25118cce66	tg3: Link IRQs to NAPI instances Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX queues. Compare the output of /proc/interrupts for my tg3 device with the output of netdev-genl after applying this patch: $ cat /proc/interrupts \| grep eth0 343: [...] eth0-tx-0 344: [...] eth0-rx-1 345: [...] eth0-rx-2 346: [...] eth0-rx-3 347: [...] eth0-rx-4 As you can see above, tg3 has named the IRQs such that there is a dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8197, 'ifindex': 2, 'irq': 347}, {'id': 8196, 'ifindex': 2, 'irq': 346}, {'id': 8195, 'ifindex': 2, 'irq': 345}, {'id': 8194, 'ifindex': 2, 'irq': 344}, {'id': 8193, 'ifindex': 2, 'irq': 343}] Netlink displays the same IRQs as above, noting that each is mapped to a unique NAPI instance. Now, reconfigure the NIC to have 4 TX queues and 4 RX queues: $ sudo ethtool -L eth0 rx 4 tx 4 $ sudo ethtool -l eth0 \| tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Examine /proc/interrupts once again, noting that tg3 will now rename the IRQs to suggest that they are combined tx and rx without allocating additional IRQs, so the total IRQ count in /proc/interrupts is unchanged: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Check the output from netlink again: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8973, 'ifindex': 2, 'irq': 347}, {'id': 8972, 'ifindex': 2, 'irq': 346}, {'id': 8971, 'ifindex': 2, 'irq': 345}, {'id': 8970, 'ifindex': 2, 'irq': 344}, {'id': 8969, 'ifindex': 2, 'irq': 343}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:29 -07:00
Uwe Kleine-König	e96321fad3	net: ethernet: Switch back to struct platform_driver::remove() After commit `0edb555a65` ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/ethernet to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:39:56 -07:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Edwin Peer	f77cdee5db	bnxt_en: resize bnxt_irq name field to fit format string The name field of struct bnxt_irq is written using snprintf in bnxt_setup_msix(). Make the field large enough to fit the maximal formatted string to prevent truncation. Truncated IRQ names are less meaningful to the user. For example, "enp4s0f0np0-TxRx-0" gets truncated to "enp4s0f0np0-TxRx-" with the existing code. Make sure we have space for the extra characters added to the IRQ names: - the characters introduced by the static format string: hyphens - the maximal static substituted ring type string: "TxRx" - the maximum length of an integer formatted as a string, even though reasonable ring numbers would never be as long as this. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:42:45 -07:00
Michael Chan	2d51eb0bd8	bnxt_en: Add MSIX check in bnxt_check_rings() bnxt_check_rings() is called to ensure that we have the hardware ring resources before committing to reinitialize with the new number of rings. MSIX vectors are never checked at this point, because up until recently we must first disable MSIX before we can allocate the new set of MSIX vectors. Now that we support dynamic MSIX allocation, check to make sure we can dynamically allocate the new MSIX vectors as the last step in bnxt_check_rings() if dynamic MSIX is supported. For example, the IOMMU group may limit the number of MSIX vectors for the device. With this patch, the ring change will fail more gracefully when there is not enough MSIX vectors. It is also better to move bnxt_check_rings() to be called as the last step when changing ethtool rings. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:42:45 -07:00
Michael Chan	f775cb1bbf	bnxt_en: Increase the number of MSIX vectors for RoCE device If RocE is supported on the device, set the number of RoCE MSIX vectors to the number of online CPUs + 1 and capped at these maximums: VF: 2 NPAR: 5 PF: 64 For the PF, the maximum is now increased from the previous value of 9 to get better performance for kernel applications. Remove the unnecessary check for BNXT_FLAG_ROCE_CAP. bnxt_set_dflt_ulp_msix() will only be called if the flag is set. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:42:45 -07:00
Gal Pressman	0644646d91	tg3: Remove setting of RX software timestamp The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-09 17:44:40 -07:00
Gal Pressman	3fc85527b0	bnxt_en: Remove setting of RX software timestamp The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-09 17:44:40 -07:00
Gal Pressman	26f74155df	bnx2x: Remove setting of RX software timestamp The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-09-06 09:34:18 +01:00
Jinjie Ruan	e8ac897445	net: bcmasp: Simplify with scoped for each OF child loop Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-03 12:54:43 +02:00
Michael Chan	e68256c8a7	bnxt_en: Support dynamic MSIX A range of MSIX vectors are allocated at initialization for the number needed for RocE and L2. During run-time, if the user increases or decreases the number of L2 rings, all the MSIX vectors have to be freed and a new range has to be allocated. This is not optimal and causes disruptions to RoCE traffic every time there is a change in L2 MSIX. If the system supports dynamic MSIX allocations, use dynamic allocation to add new L2 MSIX vectors or free unneeded L2 MSIX vectors. RoCE traffic is not affected using this scheme. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20240828183235.128948-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	f049d699ae	bnxt_en: Allocate the max bp->irq_tbl size for dynamic msix allocation If dynamic MSIX allocation is supported, additional MSIX can be allocated at run-time without reinitializing the existing MSIX entries. The first step to support this dynamic scheme is to allocate a large enough bp->irq_tbl if dynamic allocation is supported. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	4343838ca5	bnxt_en: Replace deprecated PCI MSIX APIs Use the new pci_alloc_irq_vectors() and pci_free_irq_vectors() to replace the deprecated pci_enable_msix_range() and pci_disable_msix(). Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	af756aad3d	bnxt_en: Remove register mapping to support INTX In legacy INTX mode, a register is mapped so that the INTX handler can read it to determine if the NIC is the source of the interrupt. This and all the related macros are no longer needed now that INTX is no longer supported. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	e94d8d97c7	bnxt_en: Remove BNXT_FLAG_USING_MSIX flag Now that we only support MSIX, the BNXT_FLAG_USING_MSIX is always true. Remove it and any if conditions checking for it. Remove the INTX handler and associated logic. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Michael Chan	2a659a4603	bnxt_en: Deprecate support for legacy INTX mode Firmware has deprecated support for legacy INTX in 2022 (since v2.27) and INTX hasn't been tested for many years before that. INTX was only used as a fallback mechansim in case MSIX wasn't available. MSIX is always supported by all firmware. If MSIX capability in PCI config space is not found during probe, abort. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Sreekanth Reddy	26e3846e23	bnxt_en: Support QOS and TPID settings for the SRIOV VLAN With recent changes in the .ndo_set_vf_*() guidelines, resubmitting this patch that was reverted eariler in 2023: `c27153682e` ("Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN") Add these missing settings in the .ndo_set_vf_vlan() method. Older firmware does not support the TPID setting so check for proper support. Remove the unused BNXT_VF_QOS flag. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Vikas Gupta	9e7b880b92	bnxt_en: add support for retrieving crash dump using ethtool Add support for retrieving crash dump using ethtool -w on the supported interface. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Vikas Gupta	c33626d83e	bnxt_en: add support for storing crash dump into host memory Newer firmware supports automatic DMA of crash dump to host memory when it crashes. If the feature is supported, allocate the required memory using the existing context memory infrastructure. Communicate the page table containing the DMA addresses to the firmware. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Jakub Kicinski	761d527d5d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.h `c948c0973d` ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops") `f2878cdeb7` ("bnxt_en: Add support to call FW to update a VNIC") Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-22 17:06:18 -07:00
Somnath Kotur	8baeef7616	bnxt_en: Fix double DMA unmapping for XDP_REDIRECT Remove the dma_unmap_page_attrs() call in the driver's XDP_REDIRECT code path. This should have been removed when we let the page pool handle the DMA mapping. This bug causes the warning: WARNING: CPU: 7 PID: 59 at drivers/iommu/dma-iommu.c:1198 iommu_dma_unmap_page+0xd5/0x100 CPU: 7 PID: 59 Comm: ksoftirqd/7 Tainted: G W 6.8.0-1010-gcp #11-Ubuntu Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.15.2 04/02/2024 RIP: 0010:iommu_dma_unmap_page+0xd5/0x100 Code: 89 ee 48 89 df e8 cb f2 69 ff 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 e9 ab 17 71 00 <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 RSP: 0018:ffffab1fc0597a48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99ff838280c8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffab1fc0597a78 R08: 0000000000000002 R09: ffffab1fc0597c1c R10: ffffab1fc0597cd3 R11: ffff99ffe375acd8 R12: 00000000e65b9000 R13: 0000000000000050 R14: 0000000000001000 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff9a06efb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000565c34c37210 CR3: 00000005c7e3e000 CR4: 0000000000350ef0 ? show_regs+0x6d/0x80 ? __warn+0x89/0x150 ? iommu_dma_unmap_page+0xd5/0x100 ? report_bug+0x16a/0x190 ? handle_bug+0x51/0xa0 ? exc_invalid_op+0x18/0x80 ? iommu_dma_unmap_page+0xd5/0x100 ? iommu_dma_unmap_page+0x35/0x100 dma_unmap_page_attrs+0x55/0x220 ? bpf_prog_4d7e87c0d30db711_xdp_dispatcher+0x64/0x9f bnxt_rx_xdp+0x237/0x520 [bnxt_en] bnxt_rx_pkt+0x640/0xdd0 [bnxt_en] __bnxt_poll_work+0x1a1/0x3d0 [bnxt_en] bnxt_poll+0xaa/0x1e0 [bnxt_en] __napi_poll+0x33/0x1e0 net_rx_action+0x18a/0x2f0 Fixes: `578fcfd26e` ("bnxt_en: Let the page pool manage the DMA mapping") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240820203415.168178-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-21 17:36:56 -07:00
Simon Horman	a99ef548bb	bnx2x: Set ivi->vlan field as an integer In bnx2x_get_vf_config(): * The vlan field of ivi is a 32-bit integer, it is used to store a vlan ID. * The vlan field of bulletin is a 16-bit integer, it is also used to store a vlan ID. In the current code, ivi->vlan is set using memset. But in the case of setting it to the value of bulletin->vlan, this involves reading 32 bits from a 16bit source. This is likely safe, as the following 6 bytes are padding in the same structure, but none the less, it seems undesirable. However, it is entirely unclear to me how this scheme works on big-endian systems. Resolve this by simply assigning integer values to ivi->vlan. Flagged by W=1 builds. f.e. gcc-14 reports: In function 'fortify_memcpy_chk', inlined from 'bnx2x_get_vf_config' at .../bnx2x_sriov.c:2655:4: .../fortify-string.h:580:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 580 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20240815-bnx2x-int-vlan-v1-1-5940b76e37ad@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-16 18:02:28 -07:00
Pavan Chebbi	c948c0973d	bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops The driver currently blindly deletes its cache of RSS cotexts and ntuple filters when the ethtool channel count is changing. It also deletes the ntuple filters cache when the default indirection table is changing. The core will not allow ethtool channels to drop below any that have been configured as ntuple destinations since this commit from 2022: `47f3ecf476` ("ethtool: Fail number of channels change when it conflicts with rxnfc") So there is absolutely no need to delete the ntuple filters and RSS contexts when changing ethtool channels. It is also unnecessary to delete ntuple filters when the default RSS indirection table is changing. Remove bnxt_clear_usr_fltrs() and bnxt_clear_rss_ctxis() from the ethtool ops and change them to static functions. This bug will cause confusion to the end user and causes failure when running the rss_ctx.py selftest. Fixes: `1018319f94` ("bnxt_en: Invalidate user filters when needed") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240725111912.7bc17cf6@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240814225429.199280-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-15 19:12:46 -07:00
Simon Horman	1418e9ab3e	bnxt_en: avoid truncation of per rx run debugfs filename Although it seems unlikely in practice - there would need to be rx ring indexes greater than 10^10 - it is theoretically possible for the filename of per rx ring debugfs files to be truncated. This is because although a 16 byte buffer is provided, the length of the filename is restricted to 10 bytes. Remove this restriction and allow the entire buffer to be used. Also reduce the buffer to 12 bytes, which is sufficient. Given that the range of rx ring indexes likely much smaller than the maximum range of a 32-bit signed integer, a smaller buffer could be used, with some further changes. But this change seems simple, robust, and has minimal stack overhead. Flagged by gcc-14: .../bnxt_debugfs.c: In function 'bnxt_debug_dev_init': drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c:69:30: warning: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size 10 [-Wformat-truncation=] 69 \| snprintf(qname, 10, "%d", ring_idx); \| ^~ In function 'debugfs_dim_ring_init', inlined from 'bnxt_debug_dev_init' at .../bnxt_debugfs.c:87:4: .../bnxt_debugfs.c:69:29: note: directive argument in the range [-2147483643, 2147483646] 69 \| snprintf(qname, 10, "%d", ring_idx); \| ^~~~ .../bnxt_debugfs.c:69:9: note: 'snprintf' output between 2 and 12 bytes into a destination of size 10 69 \| snprintf(qname, 10, "%d", ring_idx); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile tested only Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240813-bnxt-str-v2-2-872050a157e7@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-14 20:36:17 -07:00
Simon Horman	ffff7ee843	bnxt_en: Extend maximum length of version string by 1 byte This corrects an out-by-one error in the maximum length of the package version string. The size argument of snprintf includes space for the trailing '\0' byte, so there is no need to allow extra space for it by reducing the value of the size argument by 1. Found by inspection. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240813-bnxt-str-v2-1-872050a157e7@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-14 20:36:16 -07:00
Jakub Kicinski	ec6e57beaf	ethtool: rss: don't report key if device doesn't support it marvell/otx2 and mvpp2 do not support setting different keys for different RSS contexts. Contexts have separate indirection tables but key is shared with all other contexts. This is likely fine, indirection table is the most important piece. Don't report the key-related parameters from such drivers. This prevents driver-errors, e.g. otx2 always writes the main key, even when user asks to change per-context key. The second reason is that without this change tracking the keys by the core gets complicated. Even if the driver correctly reject setting key with rss_context != 0, change of the main key would have to be reflected in the XArray for all additional contexts. Since the additional contexts don't have their own keys not including the attributes (in Netlink speak) seems intuitive. ethtool CLI seems to deal with it just fine. Having to set the flag in majority of the drivers is a bit tedious but not reporting the key is a safer default. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
Jakub Kicinski	fb770fe758	eth: remove .cap_rss_ctx_supported from updated drivers Remove .cap_rss_ctx_supported from drivers which moved to the new API. This makes it easy to grep for drivers which still need to be converted. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-12 14:16:24 +01:00
David Wei	97cbf3d0ac	bnxt_en: only set dev->queue_mgmt_ops if supported by FW The queue API calls bnxt_hwrm_vnic_update() to stop/start the flow of packets, which can only properly flush the pipeline if FW indicates support. Add a macro BNXT_SUPPORTS_QUEUE_API that checks for the required flags and only set queue_mgmt_ops if true. Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
David Wei	b9d2956e86	bnxt_en: stop packet flow during bnxt_queue_stop/start The current implementation when resetting a queue while packets are flowing puts the queue into an inconsistent state. There needs to be some synchronisation with the FW. Add calls to bnxt_hwrm_vnic_update() to set the MRU for both the default and ntuple vnic during queue start/stop. When the MRU is set to 0, flow is stopped. Each Rx queue belongs to either the default or the ntuple vnic. With calling bnxt_hwrm_vnic_update() the calls to napi_enable() and napi_disable() must be removed for reset to work on a queue that has active traffic flowing e.g. iperf3. Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
David Wei	d41575f76a	bnxt_en: set vnic->mru in bnxt_hwrm_vnic_cfg() Set the newly added vnic->mru field in bnxt_hwrm_vnic_cfg(). Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Michael Chan	6e360862c0	bnxt_en: Check the FW's VNIC flush capability Check the HWRM_VNIC_QCAPS FW response for the receive engine flush capability. This capability indicates that we can reliably support RX ring restart when calling HWRM_VNIC_UPDATE with MRU set to 0. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Michael Chan	f2878cdeb7	bnxt_en: Add support to call FW to update a VNIC Add the function bnxt_hwrm_vnic_update() to call FW to update a VNIC. This call can be used when disabling and enabling a receive ring within a VNIC. The mru which is the maximum receive size of packets received by the VNIC can be updated. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Michael Chan	fbda8ee64b	bnxt_en: Update firmware interface to 1.10.3.68 The main changes are: 1. HWRM_VNIC_UPDATE used to safely disable and enable an RX ring within the VNIC. 2. New flag in HWRM_VNIC_QCAPS to indicate FW will do the proper flush during HWRM_VNIC_UPDATE. 3. New flag in HWRM_FUNC_QCAPS to indicate that reservations for some resources such as VNIC can be reduced. 4. New backing store memory types not used by the driver yet. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-11 13:48:02 +01:00
Simon Horman	a39036847f	bnx2x: Provide declaration of dmae_reg_go_c in header Provide declaration of dmae_reg_go_c in header. This symbol is defined in bnx2x_main.c. And used in that file and bnx2x_stats.c. However, Sparse complains that there is no declaration of the symbol in dmae_reg_go_c nor is the symbol static. .../bnx2x_main.c:291:11: warning: symbol 'dmae_reg_go_c' was not declared. Should it be static? Address this by moving the declaration from bnx2x_stats.c to bnx2x_reg.h. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240806-bnx2x-dec-v1-1-ae844ec785e4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 19:43:06 -07:00
Jakub Kicinski	e47fd9beb1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240808170148.3629934-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 14:04:17 -07:00
Edward Cree	b54de55990	net: ethtool: fix off-by-one error in max RSS context IDs Both ethtool_ops.rxfh_max_context_id and the default value used when it's not specified are supposed to be exclusive maxima (the former is documented as such; the latter, U32_MAX, cannot be used as an ID since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an inclusive maximum. Subtract one from 'limit' to produce an inclusive maximum, and pass that to xa_alloc(). Increase bnxt's max by one to prevent a (very minor) regression, as BNXT_MAX_ETH_RSS_CTX is an inclusive max. This is safe since bnxt is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a leftover from old driver code that managed context IDs itself. Rename rxfh_max_context_id to rxfh_max_num_contexts to make its semantics (hopefully) more obvious. Fixes: `847a8ab186` ("net: ethtool: let the core choose RSS context IDs") Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/5a2d11a599aa5b0cc6141072c01accfb7758650c.1723045898.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 08:54:33 -07:00
Florian Fainelli	9ee09edc05	net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC, while others might be only supported by the PHY. Make sure that the .get_wol() returns the union of both rather than only that of the PHY if the PHY supports Wake-on-LAN. Fixes: `7e400ff35c` ("net: bcmgenet: Add support for PHY-based Wake-on-LAN") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240806175659.3232204-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-08 08:29:29 -07:00
Michael Chan	da03f5d1b2	bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() A recent commit has modified the code in __bnxt_reserve_rings() to set the default RSS indirection table to default only when the number of RX rings is changing. While this works for newer firmware that requires RX ring reservations, it causes the regression on older firmware not requiring RX ring resrvations (BNXT_NEW_RM() returns false). With older firmware, RX ring reservations are not required and so hw_resc->resv_rx_rings is not always set to the proper value. The comparison: if (old_rx_rings != bp->hw_resc.resv_rx_rings) in __bnxt_reserve_rings() may be false even when the RX rings are changing. This will cause __bnxt_reserve_rings() to skip setting the default RSS indirection table to default to match the current number of RX rings. This may later cause bnxt_fill_hw_rss_tbl() to use an out-of-range index. We already have bnxt_check_rss_tbl_no_rmgr() to handle exactly this scenario. We just need to move it up in bnxt_need_reserve_rings() to be called unconditionally when using older firmware. Without the fix, if the TX rings are changing, we'll skip the bnxt_check_rss_tbl_no_rmgr() call and __bnxt_reserve_rings() may also skip the bnxt_set_dflt_rss_indir_tbl() call for the reason explained in the last paragraph. Without setting the default RSS indirection table to default, it causes the regression: BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40 Read of size 2 at addr ffff8881c5809618 by task ethtool/31525 Call Trace: __bnxt_hwrm_vnic_set_rss+0xb79/0xe40 bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460 __bnxt_setup_vnic_p5+0x12e/0x270 __bnxt_open_nic+0x2262/0x2f30 bnxt_open_nic+0x5d/0xf0 ethnl_set_channels+0x5d4/0xb30 ethnl_default_set_doit+0x2f1/0x620 Reported-by: Breno Leitao <leitao@debian.org> Closes: https://lore.kernel.org/netdev/ZrC6jpghA3PWVWSB@gmail.com/ Fixes: `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20240806053742.140304-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-07 20:14:13 -07:00
Jakub Kicinski	5fa35bd39c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240801131917.34494-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-01 10:45:23 -07:00
Allen Pais	8d3beb6bc7	net: cnic: Convert tasklet API to new bottom half workqueue mechanism Migrate tasklet APIs to the new bottom half workqueue mechanism. It replaces all occurrences of tasklet usage with the appropriate workqueue APIs throughout the cnic driver. This transition ensures compatibility with the latest design and enhances performance. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://patch.msgid.link/20240730183403.4176544-4-allen.lkml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-31 18:59:46 -07:00
Jakub Kicinski	9dbad38336	eth: bnxt: populate defaults in the RSS context struct As described in the kdoc for .create_rxfh_context we are responsible for populating the defaults. The core will not call .get_rxfh for non-0 context. The problem can be easily observed since Netlink doesn't currently use the cache. Using netlink ethtool: $ ethtool -x eth0 context 1 [...] RSS hash key: 13:60:cd:60:14:d3:55:36:86:df:90:f2:96:14:e2:21:05:57:a8:8f:a5:12:5e:54:62:7f:fd:3c:15:7e:76:05:71:42:a2:9a:73:80:09:9c RSS hash function: toeplitz: on xor: off crc32: off But using IOCTL ethtool shows: $ ./ethtool-old -x eth0 context 1 [...] RSS hash key: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 RSS hash function: Operation not supported Fixes: `7964e78846` ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-29 10:59:07 +01:00
Jakub Kicinski	daefd348a5	eth: bnxt: reject unsupported hash functions In commit under Fixes I split the bnxt_set_rxfh_context() function, and attached the appropriate chunks to new ops. I missed that bnxt_set_rxfh_context() gets called after some initial checks in bnxt_set_rxfh(), namely that the hash function is Toeplitz. Fixes: `5c466b4d4e` ("eth: bnxt: move from .set_rxfh to .create_rxfh_context and friends") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-29 10:59:07 +01:00
Pavan Chebbi	98ba1d931f	bnxt_en: Fix RSS logic in __bnxt_reserve_rings() In __bnxt_reserve_rings(), the existing code unconditionally sets the default RSS indirection table to default if netif_is_rxfh_configured() returns false. This used to be correct before we added RSS contexts support. For example, if the user is changing the number of ethtool channels, we will enter this path to reserve the new number of rings. We will then set the RSS indirection table to default to cover the new number of rings if netif_is_rxfh_configured() is false. Now, with RSS contexts support, if the user has added or deleted RSS contexts, we may now enter this path to reserve the new number of VNICs. However, netif_is_rxfh_configured() will not return the correct state if we are still in the middle of set_rxfh(). So the existing code may set the indirection table of the default RSS context to default by mistake. Fix it to check if the reservation of the RX rings is changing. Only check netif_is_rxfh_configured() if it is changing. RX rings will not change in the middle of set_rxfh() and this will fix the issue. Fixes: `b3d0083caf` ("bnxt_en: Support RSS contexts in ethtool .{get\|set}_rxfh()") Reported-and-tested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/20240625010210.2002310-1-kuba@kernel.org Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240724222106.147744-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-25 16:22:35 -07:00
Linus Torvalds	1722389b0d	A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Including fixes from bpf and netfilter. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaibxAACgkQMUZtbf5S IruuIRAAu96TiN/urPwmKznyb/Sk8x7p8iUzn6OvPS/TUlFUkURQtOh6M9uvbpN4 x/L//EWkMR0hY4SkBegoiXfb1GS0PjBdWTWUiROm5X9nVHqp5KRZAxWXhjFiS1BO BIYOT+JfCl7mQiPs90Mys/cEtYOggMBsCZQVIGw/iYoJLFREqxFSONwa0dG+tGMX jn9WNu4yCVDhJ/jtl2MaTsCNtYUaBUgYrKHJBfNGfJ2Lz/7rH9yFui2WSMlmOd/U QGeCb1DWURlShlCqY37wNinbFsxWkI5JN00ukTtwFAXLIaqc+zgHcIjrDjTJwK43 F4tKbJT3+bmehMU/h3Uo3c7DhXl7n9zDGiDtbCxnkykp0sFGJpjhDrWydo51c+YB qW5HaNrII2LiDicOVN8L29ylvKp7AEkClxgivEhZVGGk2f/szJRXfp9u3WBn5kAx 3paH55YN0DEsKbYbb1ZENEI1Vnc/4ff4PxZJCUNKwzcS8wCn1awqwcriK9TjS/cp fjilNFT4J3/uFrodHWTkx0jJT6UJFT0aF03qPLUH/J5kG+EVukOf1jBPInNdf1si 1j47SpblHUe86HiHphFMt32KZ210lJzWxh8uGma57Y2sB9makdLiK4etrFjkiMJJ Z8A3kGp3KpFjbuK4tHY25rp+5oxLNNOBNpay29lQrWtCL/NDcaQ= =9OsH -----END PGP SIGNATURE----- Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...	2024-07-25 13:32:25 -07:00
Linus Torvalds	c2a96b7f18	Driver core changes for 6.11-rc1 Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm cJEYtJpGtWX6aAtugm9E =ZyJV -----END PGP SIGNATURE----- Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add\|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create\|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...	2024-07-25 10:42:22 -07:00
Taehee Yoo	b537633ce5	bnxt_en: update xdp_rxq_info in queue restart logic When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver updates(creates and deletes) a page_pool. But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still connected to an old page_pool. So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool. An old page_pool is no longer used so it is supposed to be deleted by page_pool_destroy() but it isn't. Because the xdp_rxq_info is holding the reference count for it and the xdp_rxq_info is not updated, an old page_pool will not be deleted in the queue restart logic. Before restarting 1 queue: ./tools/net/ynl/samples/page-pool enp10s0f1np1[6] page pools: 4 (zombies: 0) refs: 8192 bytes: 33554432 (refs: 0 bytes: 0) recycling: 0.0% (alloc: 128:8048 recycle: 0:0) After restarting 1 queue: ./tools/net/ynl/samples/page-pool enp10s0f1np1[6] page pools: 5 (zombies: 0) refs: 10240 bytes: 41943040 (refs: 0 bytes: 0) recycling: 20.0% (alloc: 160:10080 recycle: 1920:128) Before restarting queues, an interface has 4 page_pools. After restarting one queue, an interface has 5 page_pools, but it should be 4, not 5. The reason is that queue restarting logic creates a new page_pool and an old page_pool is not deleted due to the absence of an update of xdp_rxq_info logic. Fixes: `2d694c27d3` ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-25 07:42:48 -07:00
Jakub Kicinski	30b3560050	Merge branch 'net-make-timestamping-selectable' First part of "net: Make timestamping selectable" from Kory Maincent. Change the driver-facing type already to lower rebasing pain. Link: https://lore.kernel.org/20240709-feature_ptp_netnext-v17-0-b5317f50df2a@bootlin.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-15 08:02:30 -07:00
Kory Maincent	2111375b85	net: Add struct kernel_ethtool_ts_info In prevision to add new UAPI for hwtstamp we will be limited to the struct ethtool_ts_info that is currently passed in fixed binary format through the ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code already started operating on an extensible kernel variant of that structure, similar in concept to struct kernel_hwtstamp_config vs struct hwtstamp_config. Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here we introduce the kernel-only structure in include/linux/ethtool.h. The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO. Acked-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-15 08:02:26 -07:00
Jakub Kicinski	46e457a454	eth: bnxt: use the indir table from ethtool context Instead of allocating a separate indir table in the vnic use the one already present in the RSS context allocated by the core. This saves some LoC and also we won't have to worry about syncing the local version back to the core, once core learns how to dump contexts. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-12-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:23 -07:00
Jakub Kicinski	73afb518af	eth: bnxt: bump the entry size in indir tables to u32 Ethtool core stores indirection table with u32 entries, "just to be safe". Switch the type in the driver, so that it's easier to swap local tables for the core ones. Memory allocations already use sizeof(*entry), switch the memset()s as well. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:23 -07:00
Jakub Kicinski	9c34c6c28c	eth: bnxt: pad out the correct indirection table bnxt allocates tables of max size, and changes the used size based on number of active rings. The unused entries get padded out with zeros. bnxt_modify_rss() seems to always pad out the table of the main / default RSS context, instead of the table of the modified context. I haven't observed any behavior change due to this patch, so I don't think it's a fix. Not entirely sure what role the padding plays, 0 is a valid queue ID. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:23 -07:00
Jakub Kicinski	20c8ad72eb	eth: bnxt: use the RSS context XArray instead of the local list Core already maintains all RSS contexts in an XArray, no need to keep a second list in the driver. Remove bnxt_get_max_rss_ctx_ring() completely since core performs the same check already. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	63d4769cf7	eth: bnxt: use context priv for struct bnxt_rss_ctx Core can allocate space for per-context driver-private data, use it for struct bnxt_rss_ctx. Inline bnxt_alloc_rss_ctx() at this point, most of the init (as in the actions bnxt_del_one_rss_ctx() will undo) is open coded in bnxt_create_rxfh_context(), anyway. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	bf30162915	eth: bnxt: depend on core cleaning up RSS contexts New RSS context API removes old contexts on netdev unregister. No need to wipe them manually. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	1a49a23c03	eth: bnxt: remove rss_ctx_bmap Core will allocate IDs for the driver, from the range [1, BNXT_MAX_ETH_RSS_CTX], no need to track the allocations. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	5c466b4d4e	eth: bnxt: move from .set_rxfh to .create_rxfh_context and friends Use the new ethtool ops for RSS context management. The conversion is pretty straightforward cut / paste of the right chunks of the combined handler. Main change is that we let the core pick the IDs (bitmap will be removed separately for ease of review), so we need to tell the core when we lose a context. Since the new API passes rxfh as const, change bnxt_modify_rss() to also take const. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	667ac333db	eth: bnxt: allow deleting RSS contexts when the device is down Contexts get deleted from FW when the device is down, but they are kept in SW and re-added back on open. bnxt_set_rxfh_context() apparently does not want to deal with complexity of dealing with both the device down and device up cases. This is perhaps acceptable for creating new contexts, but not being able to delete contexts makes core-driven cleanups messy. Specifically with the new RSS API core will try to delete contexts automatically after bringing the device down. Support the delete-while-down case. Skip the FW logic and delete just the driver state. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-12 22:16:22 -07:00
Jakub Kicinski	7c8267275d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c `26488172b0` ("net/sched: Fix UAF when resolving a clash") `3abbd7ed8b` ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-11 12:58:13 -07:00
Jakub Kicinski	0d1b7d6c92	bnxt: fix crashes when reducing ring count with active RSS contexts bnxt doesn't check if a ring is used by RSS contexts when reducing ring count. Core performs a similar check for the drivers for the main context, but core doesn't know about additional contexts, so it can't validate them. bnxt_fill_hw_rss_tbl_p5() uses ring id to index bp->rx_ring[], which without the check may end up being out of bounds. BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40 Read of size 2 at addr ffff8881c5809618 by task ethtool/31525 Call Trace: __bnxt_hwrm_vnic_set_rss+0xb79/0xe40 bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460 __bnxt_setup_vnic_p5+0x12e/0x270 __bnxt_open_nic+0x2262/0x2f30 bnxt_open_nic+0x5d/0xf0 ethnl_set_channels+0x5d4/0xb30 ethnl_default_set_doit+0x2f1/0x620 Core does track the additional contexts in net-next, so we can move this validation out of the driver as a follow up there. Fixes: `b3d0083caf` ("bnxt_en: Support RSS contexts in ethtool .{get\|set}_rxfh()") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240705020005.681746-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-09 11:35:49 +02:00
Dan Carpenter	0c754d9d86	net: bcmasp: Fix error code in probe() Return an error code if bcmasp_interface_create() fails. Don't return success. Fixes: `490cb41200` ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Link: https://patch.msgid.link/ZoWKBkHH9D1fqV4r@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-05 17:13:57 -07:00
Jakub Kicinski	76ed626479	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/phy/aquantia/aquantia.h `219343755e` ("net: phy: aquantia: add missing include guards") `61578f6793` ("net: phy: aquantia: add support for PHY LEDs") drivers/net/ethernet/wangxun/libwx/wx_hw.c `bd07a98178` ("net: txgbe: remove separate irq request for MSI and INTx") `b501d261a5` ("net: txgbe: add FDIR ATR support") https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/ include/linux/mlx5/mlx5_ifc.h `048a403648` ("net/mlx5: IFC updates for changing max EQs") `99be56171f` ("net/mlx5e: SHAMPO, Re-enable HW-GRO") https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/ Adjacent changes: drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c `4130c67cd1` ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference") `3f3126515f` ("wifi: iwlwifi: mvm: add mvm-specific guard") include/net/mac80211.h `816c6bec09` ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP") `5a009b42e0` ("wifi: mac80211: track changes in AP's TPE") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-04 14:16:11 -07:00
Pavan Chebbi	5d350dc342	bnxt_en: Fix the resource check condition for RSS contexts While creating a new RSS context, bnxt_rfs_capable() currently makes a strict check to see if the required VNICs are already available. If the current VNICs are not what is required, either too many or not enough, it will call the firmware to reserve the exact number required. There is a bug in the firmware when the driver tries to relinquish some reserved VNICs and RSS contexts. It will cause the default VNIC to lose its RSS configuration and cause receive packets to be placed incorrectly. Workaround this problem by skipping the resource reduction. The driver will not reduce the VNIC and RSS context reservations when a context is deleted. The resources will be available for use when new contexts are created later. Potentially, this workaround can cause us to run out of VNIC and RSS contexts if there are a lot of VF functions creating and deleting RSS contexts. In the future, we will conditionally disable this workaround when the firmware fix is available. Fixes: `438ba39b25` ("bnxt_en: Improve RSS context reservation infrastructure") Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-04 07:40:27 -07:00
David Wei	40eca00ae6	bnxt_en: unlink page pool when stopping Rx queue Have bnxt call page_pool_disable_direct_recycling() to unlink the old page pool when resetting a queue prior to destroying it, instead of touching a netdev core struct directly. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-07-02 15:00:11 +02:00
Pavan Chebbi	0603383907	bnxt_en: Remove atomic operations on ptp->tx_avail Now that we require the spinlock to protect ptp->txts_prod, change ptp->tx_avail to non-atomic and protect it under the same spinlock. Add a new helper function bnxt_ptp_get_txts_prod() to decrement ptp->tx_avail under spinlock and return the producer. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:22 +01:00
Pavan Chebbi	8aa2a79e9b	bnxt_en: Increase the max total outstanding PTP TX packets to 4 Start accepting up to 4 TX TS requests on BCM5750X (P5) chips. These PTP TX packets will be queued in the ptp->txts_req[] array waiting for the TX timestamp to complete. The entries in the array will be managed by a producer and consumer index. The producer index is updated under spinlock since multiple TX rings can try to send PTP packets at the same time. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:21 +01:00
Pavan Chebbi	9bf688d40d	bnxt_en: Let bnxt_stamp_tx_skb() return error code Change the function bnxt_stamp_tx_skb() to return 0 for suceess or -EAGAIN if the timestamp is still pending in firmware. The calling PTP aux worker will reschedule based on the return code. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:21 +01:00
Pavan Chebbi	573f2a4bfc	bnxt_en: Remove an impossible condition check for PTP TX pending SKB In the current 5750X PTP code paths, there is always at most one TX SKB requested for timestamp and we won't accept another one until we have retrieved the timestamp or it has timed out. Remove the unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change the function to void. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:21 +01:00
Pavan Chebbi	92595a0c02	bnxt_en: Refactor all PTP TX timestamp fields into a struct On the older 5750X (P5) chips, we currently support only 1 TX PTP packet in-flight waiting for the timestamp. Refactor the datastructures to prepare to support up to 4 TX PTP packets. Combine all fields required for PTP TX timestamp query into one structure. An array of this structure will be added in follow-on patches to support multiple outstanding TX timestamps. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:21 +01:00
Pavan Chebbi	4d588d32b0	bnxt_en: Add BCM5760X specific PHC registers mapping BCM5760X firmware will advertise direct 64-bit PHC registers access for the driver from BAR0. Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's response and PHC register mapping for 5760X chips. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:20 +01:00
Michael Chan	1d294b4f90	bnxt_en: Add TX timestamp completion logic The new BCM5760X chips will return the timestamp of TX packets in a new completion. Add logic in __bnxt_poll_work() to handle this completion type to retrieve the timestamp. This feature eliminates the limit on the number of in-flight PTP TX packets. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:20 +01:00
Michael Chan	ba0155f1e9	bnxt_en: Allow some TX packets to be unprocessed in NAPI The driver's current logic will always free all the TX SKBs up to txr->tx_hw_cons within NAPI. In the next patches, we'll be adding logic to handle TX timestamp completion and we may need to hold some remaining TX SKBs if we don't have the timestamp completions yet. Modify __bnxt_poll_work_done() to clear each event bit separately to allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or not. bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX SKBs are held waiting for TX timestamps. Note that legacy chips will never hold any SKBs this way. The SKB is always deferred to the PTP worker slow path to retrieve the timestamp from firmware. On the new P7 chips, the timestamp is returned by the hardware directly and we can retrieve it directly from NAPI. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:20 +01:00
Michael Chan	449da97512	bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd Remove the unused is_gso field and add the is_ts_pkt field to struct bnxt_sw_tx_bd. This field will mark the TX BD that has requested HW TX timestamp. The field needs to be cleared if the timestamp packet is later aborted. This field will be useful when processing the new TX timestamp completion from the hardware in the next patches. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:20 +01:00
Michael Chan	be6b7ca3c2	bnxt_en: Add new TX timestamp completion definitions The new BCM5760X chips will generate this new TX timestamp completion when a TX packet's timestamp has been taken right before transmission. The driver logic to retrieve the timestamp will be added in the next few patches. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-07-01 11:23:20 +01:00
Ghadi Elie Rahme	134061163e	bnx2x: Fix multiple UBSAN array-index-out-of-bounds Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter. Currently there is a read/write out of bounds that occurs on the array "struct stats_query_entry query" present inside the "bnx2x_fw_stats_req" struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h". Looking at the definition of the "struct stats_query_entry query" array: struct stats_query_entry query[FP_SB_MAX_E1x+ BNX2X_FIRST_QUEUE_QUERY_IDX]; FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3 meaning the array has a total size of 19. Since accesses to "struct stats_query_entry query" are offset-ted by BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet queues should not exceed FP_SB_MAX_E1x (16). However one of these queues is reserved for FCOE and thus the number of Ethernet queues should be set to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if it is not. This is also described in a comment in the source code in drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition of FP_SB_MAX_E1x. Below is the part of this explanation that it important for this patch /* * The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is * control by the number of fast-path status blocks supported by the * device (HW/FW). Each fast-path status block (FP-SB) aka non-default * status block represents an independent interrupts context that can * serve a regular L2 networking queue. However special L2 queues such * as the FCoE queue do not require a FP-SB and other components like * the CNIC may consume FP-SB reducing the number of possible L2 queues * * If the maximum number of FP-SB available is X then: * a. If CNIC is supported it consumes 1 FP-SB thus the max number of * regular L2 queues is Y=X-1 * b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor) * c. If the FCoE L2 queue is supported the actual number of L2 queues * is Y+1 * d. The number of irqs (MSIX vectors) is either Y+1 (one extra for * slow-path interrupts) or Y+2 if CNIC is supported (one additional * FP interrupt context for the CNIC). * e. The number of HW context (CID count) is always X or X+1 if FCoE * L2 queue is supported. The cid for the FCoE L2 queue is always X. */ However this driver also supports NICs that use the E2 controller which can handle more queues due to having more FP-SB represented by FP_SB_MAX_E2. Looking at the commits when the E2 support was added, it was originally using the E1x parameters: commit `f2e0899f0f` ("bnx2x: Add 57712 support"). Back then FP_SB_MAX_E2 was set to 16 the same as E1x. However the driver was later updated to take full advantage of the E2 instead of having it be limited to the capabilities of the E1x. But as far as we can tell, the array "stats_query_entry query" was still limited to using the FP-SB available to the E1x cards as part of an oversignt when the driver was updated to take full advantage of the E2, and now with the driver being aware of the greater queue size supported by E2 NICs, it causes the UBSAN warnings seen in the stack traces below. This patch increases the size of the "stats_query_entry query" array by replacing FP_SB_MAX_E1x with FP_SB_MAX_E2 to be large enough to handle both types of NICs. Stack traces: UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11 index 20 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11 index 28 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8 index 29 is out of range for type 'stats_query_entry [19]' CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Workqueue: bnx2x bnx2x_sp_task [bnx2x] Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x] bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x] ? bnx2x_hw_stats_post+0x231/0x250 [bnx2x] bnx2x_stats_start+0x44/0x70 [bnx2x] bnx2x_stats_handle+0x149/0x350 [bnx2x] bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x] bnx2x_sp_task+0x491/0x5c0 [bnx2x] process_one_work+0x18d/0x3f0 </TASK> ---[ end trace ]--- Fixes: `50f0a562f8` ("bnx2x: add fcoe statistics") Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240627111405.1037812-1-ghadi.rahme@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-28 18:19:05 -07:00
David Wei	2d694c27d3	bnxt_en: implement netdev_queue_mgmt_ops Implement netdev_queue_mgmt_ops for bnxt added in [1]. Two bnxt_rx_ring_info structs are allocated to hold the new/old queue memory. Queue memory is copied from/to the main bp->rx_ring[idx] bnxt_rx_ring_info. Queue memory is pre-allocated in bnxt_queue_mem_alloc() into a clone, and then copied into bp->rx_ring[idx] in bnxt_queue_mem_start(). Similarly, when bp->rx_ring[idx] is stopped its queue memory is copied into a clone, and then freed later in bnxt_queue_mem_free(). I tested this patchset with netdev_rx_queue_restart(), including inducing errors in all places that returns an error code. In all cases, the queue is left in a good working state. Rx queues are created/destroyed using bnxt_hwrm_rx_ring_alloc() and bnxt_hwrm_rx_ring_free(), which issue HWRM_RING_ALLOC and HWRM_RING_FREE commands respectively to the firmware. By the time a HWRM_RING_FREE response is received, there won't be any more completions from that queue. Thanks to Somnath for helping me with this patch. With their permission I've added them as Acked-by. [1]: https://lore.kernel.org/netdev/20240501232549.1327174-2-shailend@google.com/ Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-21 10:10:33 +01:00
David Wei	88f56254a2	bnxt_en: split rx ring helpers out from ring helpers To prepare for queue API implementation, split rx ring functions out from ring helpers. These new helpers will be called from queue API implementation. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-21 10:10:33 +01:00
Jakub Kicinski	a6ec08beec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c `1e7962114c` ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error") `165f87691a` ("bnxt_en: add timestamping statistics support") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-20 13:49:59 -07:00
Pavan Chebbi	1e7962114c	bnxt_en: Restore PTP tx_avail count in case of skb_pad() error The current code only restores PTP tx_avail count when we get DMA mapping errors. Fix it so that the PTP tx_avail count will be restored for both DMA mapping errors and skb_pad() errors. Otherwise PTP TX timestamp will not be available after a PTP packet hits the skb_pad() error. Fixes: `83bb623c96` ("bnxt_en: Transmit and retrieve packet timestamps") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618215313.29631-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-20 06:46:15 -07:00
Michael Chan	b7bfcb4c7c	bnxt_en: Set TSO max segs on devices with limits Firmware will now advertise a non-zero TSO max segments if the device has a limit. 0 means no limit. The latest 5760X chip (early revs) has a limit of 2047 that cannot be exceeded. If exceeded, the chip will send out just a small number of segments. Call netif_set_tso_max_segs() if the device has a limit. Fixes: `2012a6abc8` ("bnxt_en: Add 5760X (P7) PCI IDs") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240618215313.29631-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-20 06:46:08 -07:00
Michael Chan	8ad0440992	bnxt_en: Update firmware interface to 1.10.3.44 The relevant change is the max_tso_segs value returned by firmware in the HWRM_FUNC_QCAPS response. This value will be used in the next patch to cap the TSO segments. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240618215313.29631-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-20 06:46:08 -07:00
Greg Kroah-Hartman	b5dd424181	Linux 6.10-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmZvTbAeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGVksIAJEn4a9IVM8FNCJy Dxo0BItD1/qJ5mLDptqUFRKlxInjbojofz5CyoeIeXb0DwRfB16ALXqNXAkd3APi saoOpfjFsg2H2OqL9CHdkzWcJEAq2lDnL0zaOjumeDVu/EyeT+tC4e4hq1e6Bm0E fPC5ms2b+07DF9Rg6/DW8yPbdM5n6Mz1bRd3fQOIgvpM3yGOyGztEBgTRub/ZUgH 5pNJauknFAZgdiWhgNpc+lPWYZbgHKULQPhUBPdVhDIXPtQNUlKgNTQc6+L0Nmbb K1sG1q7FLeMJOTFGQfD4r26X5DNQUi894q/9SX8X7rcrECdJKcw2WjVyB4myADpf ae2gP+A= =XjWP -----END PGP SIGNATURE----- Merge tag 'v6.10-rc4' into driver-core-next We need the driver core and sysfs fixes in here to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-06-17 08:33:41 +02:00
Jakub Kicinski	4c7d3d79c7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-13 13:13:46 -07:00
Aleksandr Mishin	a9b9741854	bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send() In case of token is released due to token->state == BNXT_HWRM_DEFERRED, released token (set to NULL) is used in log messages. This issue is expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But this error code is returned by recent firmware. So some firmware may not return it. This may lead to NULL pointer dereference. Adjust this issue by adding token pointer check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `8fa4219dba` ("bnxt_en: add dynamic debug support for HWRM messages") Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-13 08:05:46 -07:00
Michael Chan	7d9df38c9c	bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response Firmware interface 1.10.2.118 has increased the size of HWRM_PORT_PHY_QCFG response beyond the maximum size that can be forwarded. When the VF's link state is not the default auto state, the PF will need to forward the response back to the VF to indicate the forced state. This regression may cause the VF to fail to initialize. Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum 96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the new speeds2 fields are beyond the legacy structure. Also modify bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96 bytes to make this failure more obvious. Fixes: `84a911db83` ("bnxt_en: Update firmware interface to 1.10.2.118") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-13 07:50:16 -07:00
Greg Kroah-Hartman	ff985c7597	auxbus: make to_auxiliary_drv accept and return a constant pointer In the quest to make struct device constant, start by making to_auxiliary_drv() return a constant pointer so that drivers that call this can be fixed up before the driver core changes. As the return type previously was not constant, also fix up all callers that were assuming that the pointer was not going to be a constant one in order to not break the build. Cc: Dave Ertman <david.m.ertman@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Bingbu Cao <bingbu.cao@intel.com> Cc: Tianshu Qiu <tian.shu.qiu@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Michael Chan <michael.chan@broadcom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Cc: Bard Liao <yung-chuan.liao@linux.intel.com> Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Cc: Daniel Baluta <daniel.baluta@nxp.com> Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: linux-media@vger.kernel.org Cc: netdev@vger.kernel.org Cc: intel-wired-lan@lists.osuosl.org Cc: linux-rdma@vger.kernel.org Cc: sound-open-firmware@alsa-project.org Cc: linux-sound@vger.kernel.org Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # drivers/media/pci/intel/ipu6 Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20240611130103.3262749-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-06-13 16:43:26 +02:00
Vadim Fedorenko	c790275b5e	bnxt_en: fix atomic counter for ptp packets atomic_dec_if_positive returns new value regardless if it is updated or not. The commit in fixes changed the behavior of the condition to one that differs from original code. Restore original condition to properly maintain atomic counter. Fixes: `165f87691a` ("bnxt_en: add timestamping statistics support") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240604091939.785535-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-05 12:52:42 -07:00
Vadim Fedorenko	165f87691a	bnxt_en: add timestamping statistics support The ethtool_ts_stats structure was introduced earlier this year. Now it's time to support this group of counters in more drivers. This patch adds support to bnxt driver. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240530204751.99636-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-01 16:02:51 -07:00
Vadim Fedorenko	38155539a1	bnxt_en: silence clang build warning Clang build brings a warning: ../drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:133:12: warning: comparison of distinct pointer types ('typeof (tmo_us) ' (aka 'unsigned int ') and 'typeof (65535) ' (aka 'int ')) [-Wcompare-distinct-pointer-types] 133 \| tmo_us = min(tmo_us, BNXT_PTP_QTS_MAX_TMO_US); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by specifying proper type for BNXT_PTP_QTS_MAX_TMO_US. Fixes: `7de3c2218e` ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240509151833.12579-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-10 18:16:35 -07:00
Eric Dumazet	1eb2cded45	net: annotate writes on dev->mtu from ndo_change_mtu() Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit `501a90c945` ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-07 16:19:14 -07:00
David Wei	5bfadc5737	bnxt: fix bnxt_get_avail_msix() returning negative values Current net-next/main does not boot for older chipsets e.g. Stratus. Sample dmesg: [ 11.368315] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Able to reserve only 0 out of 9 requested RX rings [ 11.390181] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Unable to reserve tx rings [ 11.438780] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): 2nd rings reservation failed. [ 11.487559] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Not enough rings available. [ 11.506012] bnxt_en 0000:02:00.0: probe with driver bnxt_en failed with error -12 This is caused by bnxt_get_avail_msix() returning a negative value for these chipsets not using the new resource manager i.e. !BNXT_NEW_RM. This in turn causes hwr.cp in __bnxt_reserve_rings() to be set to 0. In the current call stack, __bnxt_reserve_rings() is called from bnxt_set_dflt_rings() before bnxt_init_int_mode(). Therefore, bp->total_irqs is always 0 and for !BNXT_NEW_RM bnxt_get_avail_msix() always returns a negative number. Historically, MSIX vectors were requested by the RoCE driver during run-time and bnxt_get_avail_msix() was used for this purpose. Today, RoCE MSIX vectors are statically allocated. bnxt_get_avail_msix() should only be called for the BNXT_NEW_RM() case to reserve the MSIX ahead of time for RoCE use. bnxt_get_avail_msix() is also be simplified to handle the BNXT_NEW_RM() case only. Fixes: `d630624ebd` ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240502203757.3761827-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-03 16:04:04 -07:00
Jakub Kicinski	e958da0ddb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c `66e13b615a` ("bpf: verifier: prevent userspace memory access") `d503a04f8b` ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 12:06:25 -07:00
Ajit Khaparde	54d0b84f40	bnxt_en: Add VF PCI ID for 5760X (P7) chips No driver logic changes are required to support the VFs, so just add the VF PCI ID. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 07:27:21 -07:00
Kalesh AP	3c163f35bd	bnxt_en: Optimize recovery path ULP locking in the driver In the error recovery path (AER, firmware recovery, etc), the driver notifies the RoCE driver via ULP_STOP before the reset and via ULP_START after the reset, all under RTNL_LOCK. The RoCE driver can take a long time if there are a lot of QPs to destroy, so it is not ideal to hold the global RTNL lock. Rely on the new en_dev_lock mutex instead for ULP_STOP and ULP_START. For the most part, we move the ULP_STOP call before we take the RTNL lock and move the ULP_START after RTNL unlock. Note that SRIOV re-enablement must be done after ULP_START or RoCE on the VFs will not resume properly after reset. The one scenario in bnxt_hwrm_if_change() where the RTNL lock is already taken in the .ndo_open() context requires the ULP restart to be deferred to the bnxt_sp_task() workqueue. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 07:27:21 -07:00
Kalesh AP	de21ec442d	bnxt_en: Add a mutex to synchronize ULP operations The current scheme relies heavily on the RTNL lock for all ULP operations between the L2 and the RoCE driver. Add a new en_dev_lock mutex so that the asynchronous ULP_STOP and ULP_START operations can be serialized with bnxt_register_dev() and bnxt_unregister_dev() calls without relying on the RTNL lock. The next patch will remove the RTNL lock from the ULP_STOP and ULP_START calls. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 07:27:20 -07:00
Michael Chan	f79d7a9f1c	bnxt_en: Don't call ULP_STOP/ULP_START during L2 reset There is no need to call ULP_STOP and ULP_START before and after the L2 reset in bnxt_reset_task(). This L2 reset is done after detecting TX timeout, RX ring errors, or VF config changes. The L2 reset does not affect RoCE since the firmware is not reset and the backing store is left alone. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 07:27:20 -07:00
Kalesh AP	895621f1c8	bnxt_en: Don't support offline self test when RoCE driver is loaded Offline self test is a very disruptive operation for RoCE and requires all active QPs to be destroyed. With a large number of QPs, it can take a long time to destroy all the QPs and can timeout. Do not allow ethtool offline self test if the RoCE driver is registered on the device. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 07:27:20 -07:00
Edwin Peer	a75fbb3aa4	bnxt_en: share NQ ring sw_stats memory with subrings On P5_PLUS chips and later, the NQ rings have subrings for RX and TX completions respectively. These subrings are passed to the poll function instead of the base NQ, but each ring carries its own copy of the software ring statistics. For stats to be conveniently accessible in __bnxt_poll_work(), the statistics memory should either be shared between the NQ and its subrings or the subrings need to be included in the ethtool stats aggregation logic. This patch opts for the former, because it's more efficient and less confusing having the software statistics for a ring exist in a single place. Before this patch, the counter will not be displayed if the "wrong" cpr->sw_stats was used to increment a counter. Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/ Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 07:27:20 -07:00
Doug Berger	0d5e2a8223	net: bcmgenet: synchronize UMAC_CMD access The UMAC_CMD register is written from different execution contexts and has insufficient synchronization protections to prevent possible corruption. Of particular concern are the acceses from the phy_device delayed work context used by the adjust_link call and the BH context that may be used by the ndo_set_rx_mode call. A spinlock is added to the driver to protect contended register accesses (i.e. reg_lock) and it is used to synchronize accesses to UMAC_CMD. Fixes: `1c1008c793` ("net: bcmgenet: add main driver file") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:24:22 +01:00
Doug Berger	2dbe5f1936	net: bcmgenet: synchronize use of bcmgenet_set_rx_mode() The ndo_set_rx_mode function is synchronized with the netif_addr_lock spinlock and BHs disabled. Since this function is also invoked directly from the driver the same synchronization should be applied. Fixes: `72f9634762` ("net: bcmgenet: set Rx mode before starting netif") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:24:22 +01:00
Doug Berger	d85cf67a33	net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access The EXT_RGMII_OOB_CTRL register can be written from different contexts. It is predominantly written from the adjust_link handler which is synchronized by the phydev->lock, but can also be written from a different context when configuring the mii in bcmgenet_mii_config(). The chances of contention are quite low, but it is conceivable that adjust_link could occur during resume when WoL is enabled so use the phydev->lock synchronizer in bcmgenet_mii_config() to be sure. Fixes: `afe3f907d2` ("net: bcmgenet: power on MII block for all MII modes") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:24:21 +01:00
Jakub Kicinski	2bd87951de	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c `89884459a0` ("wifi: mac80211: fix idle calculation with multi-link") `87f5500285` ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c `1971d13ffa` ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") `4090fa373f` ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c `4dcd0e83ea` ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") `e2dc7bfd67` ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 12:41:37 -07:00
Peter Münster	e3eb7dd47b	net: b44: set pause params only when interface is up b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers) unconditionally, but b44::rx_buffers is only valid when the device is up (they get allocated in b44_open(), and deallocated again in b44_close()), any other time these are just a NULL pointers. So if you try to change the pause params while the network interface is disabled/administratively down, everything explodes (which likely netifd tries to do). Link: https://github.com/openwrt/openwrt/issues/13789 Fixes: `1da177e4c3` (Linux-2.6.12-rc2) Cc: stable@vger.kernel.org Reported-by: Peter Münster <pm@a16n.net> Suggested-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vaclav Svoboda <svoboda@neng.cz> Tested-by: Peter Münster <pm@a16n.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Peter Münster <pm@a16n.net> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:34:18 -07:00
Jakub Kicinski	7301177307	eth: bnxt: fix counting packets discarded due to OOM and netpoll I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually a completion ring but not the completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: `907fd4a294` ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-24 20:16:43 -07:00
Jakub Kicinski	21d9f921f8	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Support 5 layer Tx scheduler topology Mateusz Polchlopek says: For performance reasons there is a need to have support for selectable Tx scheduler topology. Currently firmware supports only the default 9-layer and 5-layer topology. This patch series enables switch from default to 5-layer topology, if user decides to opt-in. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Document tx_scheduling_layers parameter ice: Add tx_scheduling_layers devlink param ice: Enable switching default Tx scheduler topology ice: Adjust the VSI/Aggregator layers ice: Support 5 layer topology devlink: extend devlink_param *set pointer ==================== Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-24 20:05:31 -07:00
Asbjørn Sloth Tønnesen	3833e4834d	bnxt_en: flower: validate control flags This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Link: https://lore.kernel.org/r/20240422152626.175569-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-24 19:57:19 -07:00
Mateusz Polchlopek	5625ca5640	devlink: extend devlink_param set pointer Extend devlink_param set function pointer to take extack as a param. Sometimes it is needed to pass information to the end user from set function. It is more proper to use for that netlink instead of passing message to dmesg. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-04-22 13:05:19 -07:00
Michael Chan	41e54045b7	bnxt_en: Fix error recovery for 5760X (P7) chips During error recovery, such as AER fatal error slot reset, we call bnxt_try_map_fw_health_reg() to try to get access to the health register to determine the firmware state. Fix bnxt_try_map_fw_health_reg() to recognize the P7 chip correctly and set up the health register. This fixes this type of AER slot reset failure: bnxt_en 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI I/O error detected bnxt_en 0000:04:00.0 bnxt_re0: Handle device suspend call bnxt_en 0000:04:00.1 enp4s0f1np1: PCI I/O error detected bnxt_en 0000:04:00.1 bnxt_re1: Handle device suspend call pcieport 0000:00:02.0: AER: Root Port link has been reset (0) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI Slot Reset bnxt_en 0000:04:00.0: enabling device (0000 -> 0002) bnxt_en 0000:04:00.0: Firmware not ready bnxt_en 0000:04:00.1 enp4s0f1np1: PCI Slot Reset bnxt_en 0000:04:00.1: enabling device (0000 -> 0002) bnxt_en 0000:04:00.1: Firmware not ready pcieport 0000:00:02.0: AER: device recovery failed Fixes: `a432a45bdb` ("bnxt_en: Define basic P7 macros") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-22 14:13:18 +01:00
Vikas Gupta	a1acdc226b	bnxt_en: Fix the PCI-AER routines We do not support two simultaneous recoveries so check for reset flag, BNXT_STATE_IN_FW_RESET, and do not proceed with AER further. When the pci channel state is pci_channel_io_frozen, the PCIe link can not be trusted so we disable the traffic immediately and stop BAR access by calling bnxt_fw_fatal_close(). BAR access after AER fatal error can cause an NMI. Fixes: `f75d9a0aa9` ("bnxt_en: Re-write PCI BARs after PCI fatal error.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-22 14:13:18 +01:00
Vikas Gupta	7474b1c82b	bnxt_en: refactor reset close code Introduce bnxt_fw_fatal_close() API which can be used to stop data path and disable device when firmware is in fatal state. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-22 14:13:18 +01:00
Justin Chen	9f898fc2c3	net: bcmasp: fix memory leak when bringing down interface When bringing down the TX rings we flush the rings but forget to reclaimed the flushed packets. This leads to a memory leak since we do not free the dma mapped buffers. This also leads to tx control block corruption when bringing down the interface for power management. Fixes: `490cb41200` ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418180541.2271719-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-19 20:32:29 -07:00
Jakub Kicinski	94426ed213	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c `47d8ac011f` ("af_unix: Fix garbage collector racing against connect()") `4090fa373f` ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c `faa12ca245` ("bnxt_en: Reset PTP tx_avail after possible firmware reset") `b3d0083caf` ("bnxt_en: Support RSS contexts in ethtool .{get\|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c `7ac10c7d72` ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") `194fad5b27` ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c `958f56e483` ("net/mlx5e: Un-expose functions in en.h") `49e6c93870` ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-11 14:23:47 -07:00
Michael Chan	008ce0fd39	bnxt_en: Update MODULE_DESCRIPTION Update MODULE_DESCRIPTION to the more generic adapter family name. The old name only includes the first generation of supported adapters. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:06 -07:00
Vikas Gupta	d630624ebd	bnxt_en: Utilize ulp client resources if RoCE is not registered If the RoCE driver is not registered for a RoCE capable device, add flexibility to use the RoCE resources (MSIX/NQs) for L2 purposes, such as additional rings configured by the user or for XDP. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:06 -07:00
Vikas Gupta	2e4592dc9b	bnxt_en: Change MSIX/NQs allocation policy The existing scheme sets aside a number of MSIX/NQs for the RoCE driver whether the RoCE driver is registered or not. This scheme is not flexible and limits the resources available for the L2 rings if RoCE is never used. Modify the scheme so that the RoCE MSIX/NQs can be used by the L2 driver if they are not used for RoCE. The MSIX/NQs are now represented by 3 fields. bp->ulp_num_msix_want contains the desired default value, edev->ulp_num_msix_vec contains the available value (but not necessarily in use), and ulp_tbl->msix_requested contains the actual value in use by RoCE. The L2 driver can dip into edev->ulp_num_msix_vec if necessary. We need to add rtnl_lock() back in bnxt_register_dev() and bnxt_unregister_dev() to synchronize the MSIX usage between L2 and RoCE. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:06 -07:00
Vikas Gupta	194fad5b27	bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions In its current form, bnxt_rdma_aux_device_init() not only initializes the necessary data structures of the newly created aux device but also adds the aux device into the aux bus subsytem. Refactor the logic into separate functions, first function to initialize the aux device along with the required resources and second, to actually add the device to the aux bus subsytem. This separation helps to create bnxt_en_dev much earlier and save its resources separately. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:05 -07:00
Vikas Gupta	b58f5a9c70	bnxt_en: Remove unneeded MSIX base structure fields and code Ever since commit: `3034322113` ("bnxt_en: Remove runtime interrupt vector allocation") The MSIX base vector is effectively always 0. Remove all unneeded structure fields and code referencing the MSIX base. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:05 -07:00
Kalesh AP	43226dccd1	bnxt_en: Remove a redundant NULL check in bnxt_register_dev() The memory for "edev->ulp_tbl" is allocated inside the bnxt_rdma_aux_device_init() function. If it fails, the driver will not create the auxiliary device for RoCE. Hence the NULL check inside bnxt_register_dev() is unnecessary. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:05 -07:00
Pavan Chebbi	17b0dfa1f3	bnxt_en: Skip ethtool RSS context configuration in ifdown state The current implementation requires the ifstate to be up when configuring the RSS contexts. It will try to fetch the RX ring IDs and will crash if it is in ifdown state. Return error if !netif_running() to prevent the crash. An improved implementation is in the works to allow RSS contexts to be changed while in ifdown state. Fixes: `b3d0083caf` ("bnxt_en: Support RSS contexts in ethtool .{get\|set}_rxfh()") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-10 19:55:05 -07:00
Pavan Chebbi	faa12ca245	bnxt_en: Reset PTP tx_avail after possible firmware reset It is possible that during error recovery and firmware reset, there is a pending TX PTP packet waiting for the timestamp. We need to reset this condition so that after recovery, the tx_avail count for PTP is reset back to the initial value. Otherwise, we may not accept any PTP TX timestamps after recovery. Fixes: `118612d519` ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 13:55:47 +01:00

... 2 3 4 5 6 ...

4368 Commits