Separate the HDS config from the ethtool state struct.
The HDS config contains just simple parameters, not state.
Having it as a separate struct will make it easier to clone / copy
and also long term potentially make it per-queue.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
W=1 builds with gcc 14.2.1 report:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:4193:32: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size 27 [-Werror=format-truncation=]
4193 | "/pkg %s", buf);
It's upset that we let buf be full length but then we use 5
characters for "/pkg ".
The builds is also clear with clang version 19.1.5 now.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250117183726.1481524-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bnxt_en driver has configured the hds_threshold value automatically
when TPA is enabled based on the rx-copybreak default value.
Now the hds-thresh ethtool command is added, so it adds an
implementation of hds-thresh option.
Configuration of the hds-thresh is applied only when
the tcp-data-split is enabled. The default value of
hds-thresh is 256, which is the default value of
rx-copybreak, which used to be the hds_thresh value.
The maximum hds-thresh is 1023.
# Example:
# ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256
# ethtool -g enp14s0f0np0
Ring parameters for enp14s0f0np0:
Pre-set maximums:
...
HDS thresh: 1023
Current hardware settings:
...
TCP data split: on
HDS thresh: 256
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250114142852.3364986-9-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NICs that uses bnxt_en driver supports tcp-data-split feature by the
name of HDS(header-data-split).
But there is no implementation for the HDS to enable by ethtool.
Only getting the current HDS status is implemented and The HDS is just
automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
The hds_threshold follows rx-copybreak value. and it was unchangeable.
This implements `ethtool -G <interface name> tcp-data-split <value>`
command option.
The value can be <on> and <auto>.
The value is <auto> and one of LRO/GRO/JUMBO is enabled, HDS is
automatically enabled and all LRO/GRO/JUMBO are disabled, HDS is
automatically disabled.
HDS feature relies on the aggregation ring.
So, if HDS is enabled, the bnxt_en driver initializes the aggregation ring.
This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250114142852.3364986-8-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bnxt_en driver supports rx-copybreak, but it couldn't be set by
userspace. Only the default value(256) has worked.
This patch makes the bnxt_en driver support following command.
`ethtool --set-tunable <devname> rx-copybreak <value> ` and
`ethtool --get-tunable <devname> rx-copybreak`.
By this patch, hds_threshol is set to the rx-copybreak value.
But it will be set by `ethtool -G eth0 hds-thresh N`
in the next patch.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250114142852.3364986-7-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert the Broadcom ASP2 driver to use phylib managed EEE support.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/E1tXk81-000r4x-TS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Phylib maintains a copy of tx_lpi_enabled, which will be used to
populate the member when phy_ethtool_get_eee(). Therefore, writing to
this member before phy_ethtool_get_eee() will have no effect. Remove
it. Also remove setting our copy of info->eee.tx_lpi_enabled which
becomes write-only.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/E1tXk7w-000r4r-Pq@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the LPI timer handling in Broadcom ASP2 driver after the phylib
managed EEE patches were merged.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/E1tXk7r-000r4l-Li@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ulp_irq_stop() can be invoked from a context where FW is healthy or
when FW is in a reset state. In the latter case, ULP must stop all
interactions with HW/FW and also with application and stack. Added a
new parameter to the ulp_irq_stop() function to achieve that.
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1736446693-6692-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Using the option provided by Ethernet driver, register for FW Async
event. During probe, while registeriung with Ethernet driver, provide
the ulp hook 'ulp_async_notifier' for receiving the firmware events.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250107024553.2926983-3-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When the driver receives an async event notification from the Firmware,
we make the new ulp_async_notifier() call to inform the RDMA driver that
a firmware async event has been received. RDMA driver can then take
necessary actions based on the event type.
In the next patch, we will implement the ulp_async_notifier() callbacks
in the RDMA driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250107024553.2926983-2-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Recalculate features when XDP is detached.
Before:
# ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
# ip li set dev eth0 xdp off
# ethtool -k eth0 | grep gro
rx-gro-hw: off [requested on]
After:
# ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
# ip li set dev eth0 xdp off
# ethtool -k eth0 | grep gro
rx-gro-hw: on
The fact that HW-GRO doesn't get re-enabled automatically is just
a minor annoyance. The real issue is that the features will randomly
come back during another reconfiguration which just happens to invoke
netdev_update_features(). The driver doesn't handle reconfiguring
two things at a time very robustly.
Starting with commit 98ba1d931f ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") we only reconfigure the RSS hash table
if the "effective" number of Rx rings has changed. If HW-GRO is
enabled "effective" number of rings is 2x what user sees.
So if we are in the bad state, with HW-GRO re-enablement "pending"
after XDP off, and we lower the rings by / 2 - the HW-GRO rings
doing 2x and the ethtool -L doing / 2 may cancel each other out,
and the:
if (old_rx_rings != bp->hw_resc.resv_rx_rings &&
condition in __bnxt_reserve_rings() will be false.
The RSS map won't get updated, and we'll crash with:
BUG: kernel NULL pointer dereference, address: 0000000000000168
RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0
bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180
__bnxt_setup_vnic_p5+0x58/0x110
bnxt_init_nic+0xb72/0xf50
__bnxt_open_nic+0x40d/0xab0
bnxt_open_nic+0x2b/0x60
ethtool_set_channels+0x18c/0x1d0
As we try to access a freed ring.
The issue is present since XDP support was added, really, but
prior to commit 98ba1d931f ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") it wasn't causing major issues.
Fixes: 1054aee823 ("bnxt_en: Use NETIF_F_GRO_HW.")
Fixes: 98ba1d931f ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DIM work will call the firmware to adjust the coalescing parameters on
the RX rings. We should cancel DIM work before we call the firmware
to free the RX rings. Otherwise, FW will reject the call from DIM
work if the RX ring has been freed. This will generate an error
message like this:
bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2
and cause unnecessary concern for the user. It is also possible to
modify the coalescing parameters of the wrong ring if the ring has
been re-allocated.
To prevent this, cancel DIM work right before freeing the RX rings.
We also have to add a check in NAPI poll to not schedule DIM if the
RX rings are shutting down. Check that the VNIC is active before we
schedule DIM. The VNIC is always disabled before we free the RX rings.
Fixes: 0bc0b97fca ("bnxt_en: cleanup DIM work on device shutdown")
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop()
which could cause a memory leak.
Fixes: bbf33d1d98 ("bnxt_en: update all firmware calls to use the new APIs")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Check the return value of clk_prepare_enable to ensure that priv->clk has
been successfully enabled.
If priv->clk was not enabled during bcm_sysport_probe, bcm_sysport_resume,
or bcm_sysport_open, it must not be disabled in any subsequent execution
paths.
Fixes: 31bc72d976 ("net: systemport: fetch and use clock resources")
Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241227123007.2333397-1-mordan@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Newer firmware does not allow reading the PXP registers during
ethtool -d, so skip the firmware call in that case. Userspace
(bnxt.c) always expects the register block to be populated so
zeroes will be returned instead.
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Call the new HWRM_PORT_MAC_QCAPS to check if mac loopback is
supported. Skip the MAC loopback ethtool self test if it is
not supported.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241217182620.2454075-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Skip PHY loopback selftest if firmware advertises that it is unsupported
in the HWRM_PORT_PHY_QCAPS call. Only show PHY loopback test result to
be 0 if the test has run and passes. Do the same for external loopback
to be consistent.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Block all ethtool module operations on an untrusted VF. The firmware
won't allow it and will return error.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If FW supports setting resource limits for RoCE, then just use the
FW limits instead of using some fixed values in the driver. These
limits will be used to allocate context memory for QP, SRQ, AH, and
MR resources for RoCE.
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The OF node obtained by of_parse_phandle() is not freed. Call
of_node_put() to balance the refcount.
This bug was found by an experimental static analysis tool that I am
developing.
Fixes: 1676aba5ef ("net: ethernet: bgmac: device tree phy enablement")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241214014912.2810315-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
the previous generation (5750X or P5). However, the aggregation ID
fields in the completion structures on P7 have been redefined from
16 bits to 12 bits. The freed up 4 bits are redefined for part of the
metadata such as the VLAN ID. The aggregation ID mask was not modified
when adding support for P7 chips. Including the extra 4 bits for the
aggregation ID can potentially cause the driver to store or fetch the
packet header of GRO/LRO packets in the wrong TPA buffer. It may hit
the BUG() condition in __skb_pull() because the SKB contains no valid
packet header:
kernel BUG at include/linux/skbuff.h:2766!
Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI
CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022
RIP: 0010:eth_type_trans+0xda/0x140
Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
RSP: 0018:ff615003803fcc28 EFLAGS: 00010283
RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040
RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000
RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001
R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0
R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000
FS: 0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x33/0x90
? do_trap+0xd9/0x100
? eth_type_trans+0xda/0x140
? do_error_trap+0x65/0x80
? eth_type_trans+0xda/0x140
? exc_invalid_op+0x4e/0x70
? eth_type_trans+0xda/0x140
? asm_exc_invalid_op+0x16/0x20
? eth_type_trans+0xda/0x140
bnxt_tpa_end+0x10b/0x6b0 [bnxt_en]
? bnxt_tpa_start+0x195/0x320 [bnxt_en]
bnxt_rx_pkt+0x902/0xd90 [bnxt_en]
? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en]
? kmem_cache_free+0x343/0x440
? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en]
__bnxt_poll_work+0x193/0x370 [bnxt_en]
bnxt_poll_p5+0x9a/0x300 [bnxt_en]
? try_to_wake_up+0x209/0x670
__napi_poll+0x29/0x1b0
Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
12 bits. This will work because the maximum aggregation ID is less
than 4096 on all P5_PLUS chips.
Fixes: 13d2d3d381 ("bnxt_en: Add new P7 hardware interface definitions")
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing code is using RSS profile to determine IPV4/IPV6 GSO type
on all chips older than 5760X. This won't work on 5750X chips that may
be using modified RSS profiles. This commit from 2018 has updated the
driver to not use RSS profile for HW GRO packets on newer chips:
50f011b63d ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.")
However, a recent commit to add support for the newest 5760X chip broke
the logic. If the GRO packet needs to be re-segmented by the stack, the
wrong GSO type will cause the packet to be dropped.
Fix it to only use RSS profile to determine GSO type on the oldest
5730X/5740X chips which cannot use the new method and is safe to use the
RSS profiles.
Also fix the L3/L4 hash type for RX packets by not using the RSS
profile for the same reason. Use the ITYPE field in the RX completion
to determine L3/L4 hash types correctly.
Fixes: a7445d6980 ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 7ed816be35 ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.
But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.
Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-4-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work
on a single tpa_info in prep for queue API using them.
There are 2 pairs of operations:
* bnxt_alloc_one_tpa_info()
* bnxt_free_one_tpa_info()
These alloc/free the tpa_info array itself.
* bnxt_alloc_one_tpa_info_data()
* bnxt_free_one_tpa_info_data()
These alloc/free the frags stored in tpa_info array.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 2f4f9fe5bf ("bnxt_en: Support adding ntuple rules on RSS
contexts") added support for redirecting to an RSS context as an ntuple
rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE
codepath. This caused `ethtool -n` to always report the action as
"Action: Direct to queue 0" which is wrong.
Fix by teaching bnxt driver to report the RSS context when applicable.
Fixes: 2f4f9fe5bf ("bnxt_en: Support adding ntuple rules on RSS contexts")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://patch.msgid.link/2e884ae39e08dc5123be7c170a6089cefe6a78f7.1732748253.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we go through the PCI shutdown or suspend path, we shutdown the
NIC but PTP remains registered. If the kernel continues to run for
a little bit, the periodic PTP .do_aux_work() function may be called
and it will read the PHC from the BAR register. Since the device
has already been disabled, it will cause a PCIe completion timeout.
Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend
handlers. bnxt_ptp_clear() will unregister from PTP and
.do_aux_work() will be canceled.
In bnxt_resume(), we need to re-initialize PTP.
Fixes: a521c8a01d ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()")
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init().
Store it in bp->ptp_cfg so that the caller doesn't need to know what
the value should be.
In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume()
and this will make it easier.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler. This is done in bnxt_set_rx_skb_mode().
If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode(). This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:
BUG: kernel NULL pointer dereference, address: 00000000000003c0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1
Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021
RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en]
Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f
RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202
RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff
RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380
RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf
R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980
R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990
FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
__bnxt_poll_work+0x1c2/0x3e0 [bnxt_en]
To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.
Fixes: 08450ea98a ("bnxt_en: Fix max_mtu setting for multi-buf XDP")
Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to
have the same FW ring ID as before. So we must reinitialize the
RSS table with the correct ring IDs. Otherwise, traffic may not
resume properly if the restarted ring ID is stale. Since this
feature is only supported on P5_PLUS chips, we call
bnxt_vnic_set_rss_p5() to update the HW RSS table.
Fixes: 2d694c27d3 ("bnxt_en: implement netdev_queue_mgmt_ops")
Cc: David Wei <dw@davidwei.uk>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use the return value from bnxt_get_media() to determine the port and
link modes. bnxt_get_media() returns the proper BNXT_MEDIA_KR when
the PHY is backplane. This will correct the ethtool settings for
backplane devices.
Fixes: 5d4e1bf606 ("bnxt_en: extend media types to supported and autoneg modes")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After successful PCIe AER recovery, FW will reset all resource
reservations. If it is IF_UP, the driver will call bnxt_open() and
all resources will be reserved again. It it is IF_DOWN, we should
call bnxt_reserve_rings() so that we can reserve resources including
RoCE resources to allow RoCE to resume after AER. Without this
patch, RoCE fails to resume in this IF_DOWN scenario.
Later, if it becomes IF_UP, bnxt_open() will see that resources have
been reserved and will not reserve again.
Fixes: fb1e6e562b ("bnxt_en: Fix AER recovery.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The hardware on Broadcom 1G chipsets have a known limitation
where they cannot handle DMA addresses that cross over 4GB.
When such an address is encountered, the hardware sets the
address overflow error bit in the DMA status register and
triggers a reset.
However, BCM57766 hardware is setting the overflow bit and
triggering a reset in some cases when there is no actual
underlying address overflow. The hardware team analyzed the
issue and concluded that it is happening when the status
block update has an address with higher (b16 to b31) bits
as 0xffff following a previous update that had lowest bits
as 0xffff.
To work around this bug in the BCM57766 hardware, set the
coherent dma mask from the current 64b to 31b. This will
ensure that upper bits of the status block DMA address are
always at most 0x7fff, thus avoiding the improper overflow
check described above. This work around is intended for only
status block and ring memories and has no effect on TX and
RX buffers as they do not require coherent memory.
Fixes: 72f2afb8a6 ("[TG3]: Add DMA address workaround")
Reported-by: Salam Noureddine <noureddine@arista.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20241119055741.147144-1-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Seveal fixes scattered across the drivers and a few new features:
- Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns
- Force disassociate the userspace FD when hns does an async reset
- bnxt new features for optimized modify QP to skip certain stayes, CQ
coalescing, better debug dumping
- mlx5 new data placement ordering feature
- Faster destruction of mlx5 devx HW objects
- Improvements to RDMA CM mad handling
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZz4ENwAKCRCFwuHvBreF
YQYQAP9R54r5J1Iylg+zqhCc+e/9oveuuZbfLvy/EJiEpmdprQEAgPs1RrB0z7U6
1xrVStUKNPhGd5XeVVZGkIV0zYv6Tw4=
=V5xI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Seveal fixes scattered across the drivers and a few new features:
- Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns
- Force disassociate the userspace FD when hns does an async reset
- bnxt new features for optimized modify QP to skip certain stayes,
CQ coalescing, better debug dumping
- mlx5 new data placement ordering feature
- Faster destruction of mlx5 devx HW objects
- Improvements to RDMA CM mad handling"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits)
RDMA/bnxt_re: Correct the sequence of device suspend
RDMA/bnxt_re: Use the default mode of congestion control
RDMA/bnxt_re: Support different traffic class
IB/cm: Rework sending DREQ when destroying a cm_id
IB/cm: Do not hold reference on cm_id unless needed
IB/cm: Explicitly mark if a response MAD is a retransmission
RDMA/mlx5: Move events notifier registration to be after device registration
RDMA/bnxt_re: Cache MSIx info to a local structure
RDMA/bnxt_re: Refurbish CQ to NQ hash calculation
RDMA/bnxt_re: Refactor NQ allocation
RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved
RDMA/hns: Fix different dgids mapping to the same dip_idx
RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters
RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design
bnxt_en: Add support for RoCE sriov configuration
RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()
RDMA/hns: Fix out-of-order issue of requester when setting FENCE
RDMA/nldev: Add IB device and net device rename events
RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
...
The FW trace coredump segments are very similar to the context
memory segments in the previous patch. The main difference is
to call HWRM_DBG_LOG_BUFFER_FLUSH to flush the FW data to host
memory and to include an additional record in the coredump that
contains the head and tail information of the trace data.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-12-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new ethtool -W dump flag (2) to include driver coredump segments.
This patch adds the host backing store context memory pages used by the
chip and FW to store various states to the coredump. The pages for
each context memory type is dumped into a separate coredump segment.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pass the component ID and segment ID to this function to create
the coredump segment header. This will be needed in the next
patches to create more segments for the coredump.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Host context memory is used by the newer chips to store context
information for various L2 and RoCE states and FW logs. This
information will be useful for debugging. This patch adds the
functions to copy all pages of a context memory type to a contiguous
buffer. The next patches will include the context memory dump
during ethtool -w coredump.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Co-developed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If FW supports appending new FW logs to an offset in the context
memory after FW reset, then do not free this type of context memory
during reset. The driver will provide the initial offset to the FW
when configuring this type of context memory. This way, we don't lose
the older FW logs after reset.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The FW trace memory pages will be added to the ethtool -w coredump
in later patches. In addition to the raw data, the driver has to
add a header to provide the head and tail information on each FW
trace log segment when creating the coredump. The FW sends an async
message to the driver after DMAing a chunk of logs to the context
memory to indicate the last offset containing the tail of the logs.
The driver needs to keep track of that.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allocate the new FW trace log backing store context memory types
if they are supported by the FW. FW debug logs are DMA'ed to the host
backing store memory when the on-chip buffers are full. If host
memory cannot be allocated for these memory types, the driver
will not abort.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.
This patch always passes true for the 'force' parameter so there is
no change in behavior. Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new function bnxt_free_one_ctx_mem() to free one context
memory type. bnxt_free_ctx_mem() now calls the new function in
the loop to free each context memory type. There is no change in
behavior. Later patches will further make use of the new function.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging. If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-of-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The major change is the new firmware command to flush the FW debug
logs to the host backing store context memory buffers.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current implementation of gettimex64() makes at least 3 PCIe reads to
get current PHC time. It takes at least 2.2us to get this value back to
userspace. At the same time there is cached value of upper bits of PHC
available for packet timestamps already. This patch reuses cached value
to speed up reading of PHC time.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Testing small size RPCs (300B-400B) on a large AMD system suggests
that page pool recycling is very useful even for just the head frags.
With this patch (and copy break disabled) I see a 30% performance
improvement (82Gbps -> 106Gbps).
Convert bnxt from normal page frags to page pool frags for head buffers.
On systems with small page size we can use the same pool as for TPA
pages. On systems with large pages the frag allocation logic of the
page pool is already used to split a large page into TPA chunks.
TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB)
and we always allocate the same sized chunks. Mixing allocation
of TPA and head pages would lead to sub-optimal memory use.
Plus Taehee's work on zero-copy / devmem will need to differentiate
between TPA and non-TPA page pool, anyway. Conditionally allocate
a new page pool for heads.
Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Refine RoCE SRIOV resource configuration design,
using the INITIALIZE_FW's flag as an indication
for the new design to the firmware. RoCE driver does not
have to provision resources to VF when firmware
advertises support for RoCE resource management by NIC driver.
Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
During driver load, PF RDMA driver provisions resources
to the RDMA VFs. This logic takes into consideration of
the total number of VFs supported on the PF while
allocating resources. Firmware now advertises a capability
where NIC driver can allocate resources for RDMA VFs when
the user actually creates a VF. So this resource
distribution can be based on the number of active VFs.
This patch adds the support to check for the firmware
capability and follow the new RDMA VF resource allocation
strategy. The current logic in the RDMA driver will be
removed for the newer Firmware versions in a subsequent
patch in this series.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1730882676-24434-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Serialization of PHC read with FW reset mechanism uses ptp_lock which
also protects timecounter updates. This means we cannot grab it when
called from bnxt_cc_read(). Let's move locking into different function.
Fixes: 6c0828d00f ("bnxt_en: replace PTP spinlock with seqlock")
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20241107214917.2980976-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.
The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that. when the driver applies the
affinities by itself, it breaks the users' expectations:
1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
order to prevent IRQs from being moved to certain CPUs that run a
real-time workload.
2. bnxt_en device reopening will resets the affinity
in bnxt_open().
3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to
a banned CPU. The real-time workload suffers unacceptable latency.
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The latter is the preferred way to copy ethtool strings.
Avoids manually incrementing the pointer. Cleans up the code quite well.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241104205317.306140-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The latter is the preferred way to copy ethtool strings.
Avoids manually incrementing the pointer. Cleans up the code quite well.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241104202326.78418-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, trying to insert an ip4/ip6 ntuple rule with an unset
l4proto would get rejected with -EOPNOTSUPP. For example, the following
would fail:
ethtool -N eth0 flow-type ip6 dst-ip $IP6 context 1
The reason was that all the l4proto validation was being run despite the
l4proto mask being set to 0x0. Fix by respecting the mask on l4proto and
treating a mask of 0x0 as wildcard l4proto.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/1ac93a2836b25f79e7045f8874d9a17875229ffc.1730778566.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 9ba0e56199 ("bnxt_en: Enhance ethtool ntuple support for ip
flows besides TCP/UDP") added support for ip4/ip6 ntuple rules.
However, if you wanted to wildcard over l4proto, you had to provide
0xFF.
The choice of 0xFF is non-standard and non-intuitive. Delete support for
it in this commit. Next commit we will introduce a cleaner way to
wildcard l4proto.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/a5ba0d3bd926d27977c317efa7fdfbc8a704d2b8.1730778566.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can see high contention on ptp_lock while doing RX timestamping
on high packet rates over several queues. Spinlock is not effecient
to protect timecounter for RX timestamps when reads are the most
usual operations and writes are only occasional. It's better to use
seqlock in such cases.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241103215108.557531-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This hardware can provide only 48 bits of cycle counter. We can leave
only 24 bits in the cache to extend RX timestamps from 32 bits to 48
bits. Lower 8 bits of the cached value will be used to check for
roll-over while extending to full 48 bits.
This change makes cache writes atomic even on 32 bit platforms and we
can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
configuration structure will be also reduced by 4 bytes.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241103215108.557531-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoids having to use manual pointer manipulation.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241029233229.9385-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Change the type of the middle struct member currently causing trouble from
`struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`.
Additionally, update the type of some variables in various functions that
don't access the flexible-array member, changing them to the newly created
`struct ethtool_link_settings_hdr`. These changes are needed because the
type of the conflicting middle members changed. So, those instances that
expect the type to be `struct ethtool_link_settings` should be adjusted to
the newly created type `struct ethtool_link_settings_hdr`.
Also, adjust variable declarations to follow the reverse xmas tree
convention.
Fix 3338 of the following -Wflex-array-member-not-at-end warnings:
include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h
where it is more appropriate and where static inline helpers are
acceptable. While at it, make sure that the macro 'offset' argument does
not trigger a checkpatch warning due to possible argument re-use.
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir reported the following warning with clang-16 and W=1:
warning: unused function 'txchk_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
warning: unused function 'txchk_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
warning: unused function 'tbuf_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
warning: unused function 'tbuf_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
The TXCHK and RBUF blocks are not being accessed, remove the IO macros
used to access those blocks. No functional impact.
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.12-rc4).
Conflicts:
107a034d5c ("net/mlx5: qos: Store rate groups in a qos domain")
1da9cfd6c4 ("net/mlx5: Unregister notifier on eswitch init failure")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are some spelling mistakes of 'accelaration', 'exprienced' and
'rewritting' in comments which should be 'acceleration', 'experienced'
and 'rewriting'.
Suggested-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20241017162846.GA51712@kernel.org/
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <90D42CB167CA0842+20241018021910.31359-1-wangyuli@uniontech.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
In netpoll configuration the completion processing can happen in hard
irq context which will break with spin_lock_bh() for fullfilling RX
timestamp in case of all packets timestamping. Replace it with
spin_lock_irqsave() variant.
Fixes: 7f5515d19c ("bnxt_en: Get the RX packet timestamp")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Message-ID: <20241016195234.2622004-1-vadfed@meta.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
GCC is not happy with the current code, e.g.:
.../tg3.c:11313:37: error: ‘-txrx-’ directive output may be truncated writing 6 bytes into a region of size between 1 and 16 [-Werror=format-truncation=]
11313 | "%s-txrx-%d", tp->dev->name, irq_num);
| ^~~~~~
.../tg3.c:11313:34: note: using the range [-2147483648, 2147483647] for directive argument
11313 | "%s-txrx-%d", tp->dev->name, irq_num);
When `make W=1` is supplied, this prevents kernel building. Fix it by
increasing the buffer size for IRQ label and use sizeoF() instead of
hard coded constants.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Message-ID: <20241016090647.691022-1-andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
in case of mapping fails, add dev_kfree_skb() to fix it.
Fixes: 490cb41200 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb() to fix it.
Fixes: 80105befdb ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Address byte-order miss-matches flagged by Sparse.
In tg3_load_firmware_cpu() and tg3_get_device_address()
this is done using appropriate types to store big endian values.
In the cases of tg3_test_nvram(), where buf is an array which
contains values of several different types, cast to __le32
before converting values to host byte order.
Reported by Sparse as:
.../tg3.c:3745:34: warning: cast to restricted __be32
.../tg3.c:13096:21: warning: cast to restricted __le32
.../tg3.c:13096:21: warning: cast from restricted __be32
.../tg3.c:13101:21: warning: cast to restricted __le32
.../tg3.c:13101:21: warning: cast from restricted __be32
.../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17070:63: expected restricted __be32 [usertype] *val
.../tg3.c:17070:63: got unsigned int *
dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17071:63: expected restricted __be32 [usertype] *val
.../tg3.c:17071:63: got unsigned int *
Also, address white-space issues on lines modified for the above.
And, for consistency, lines adjacent to them.
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MII driver isn't used by brcmstb Ethernet drivers. Remove it
from the BCMASP, GENET, and SYSTEMPORT drivers.
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241010191332.1074642-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link queues to NAPIs using the netdev-genl API so this information is
queryable.
First, test with the default setting on my tg3 NIC at boot with 1 TX
queue:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}]
Now, adjust the number of TX queues to be 4 via ethtool:
$ sudo ethtool -L eth0 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows
the tg3 has renamed the IRQs to be combined:
343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4
Now query this via netlink to ensure the queues are linked properly to
their NAPIs:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}]
As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID
8960, and so on for each queue index up to 3.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link IRQs to NAPI instances with netif_napi_set_irq. This information
can be queried with the netdev-genl API.
Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX
queues.
Compare the output of /proc/interrupts for my tg3 device with the output of
netdev-genl after applying this patch:
$ cat /proc/interrupts | grep eth0
343: [...] eth0-tx-0
344: [...] eth0-rx-1
345: [...] eth0-rx-2
346: [...] eth0-rx-3
347: [...] eth0-rx-4
As you can see above, tg3 has named the IRQs such that there is a
dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs.
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8197, 'ifindex': 2, 'irq': 347},
{'id': 8196, 'ifindex': 2, 'irq': 346},
{'id': 8195, 'ifindex': 2, 'irq': 345},
{'id': 8194, 'ifindex': 2, 'irq': 344},
{'id': 8193, 'ifindex': 2, 'irq': 343}]
Netlink displays the same IRQs as above, noting that each is mapped to a
unique NAPI instance.
Now, reconfigure the NIC to have 4 TX queues and 4 RX queues:
$ sudo ethtool -L eth0 rx 4 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Examine /proc/interrupts once again, noting that tg3 will now rename the
IRQs to suggest that they are combined tx and rx without allocating
additional IRQs, so the total IRQ count in /proc/interrupts is
unchanged:
343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4
Check the output from netlink again:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8973, 'ifindex': 2, 'irq': 347},
{'id': 8972, 'ifindex': 2, 'irq': 346},
{'id': 8971, 'ifindex': 2, 'irq': 345},
{'id': 8970, 'ifindex': 2, 'irq': 344},
{'id': 8969, 'ifindex': 2, 'irq': 343}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 0edb555a65 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/net/ethernet to use
.remove(), with the eventual goal to drop struct
platform_driver::remove_new(). As .remove() and .remove_new() have the
same prototypes, conversion is done by just changing the structure
member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
The name field of struct bnxt_irq is written using snprintf in
bnxt_setup_msix(). Make the field large enough to fit the maximal
formatted string to prevent truncation. Truncated IRQ names are
less meaningful to the user. For example, "enp4s0f0np0-TxRx-0"
gets truncated to "enp4s0f0np0-TxRx-" with the existing code.
Make sure we have space for the extra characters added to the IRQ
names:
- the characters introduced by the static format string: hyphens
- the maximal static substituted ring type string: "TxRx"
- the maximum length of an integer formatted as a string, even
though reasonable ring numbers would never be as long as this.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_check_rings() is called to ensure that we have the hardware ring
resources before committing to reinitialize with the new number of
rings. MSIX vectors are never checked at this point, because up
until recently we must first disable MSIX before we can allocate the
new set of MSIX vectors.
Now that we support dynamic MSIX allocation, check to make sure we
can dynamically allocate the new MSIX vectors as the last step in
bnxt_check_rings() if dynamic MSIX is supported.
For example, the IOMMU group may limit the number of MSIX vectors
for the device. With this patch, the ring change will fail more
gracefully when there is not enough MSIX vectors.
It is also better to move bnxt_check_rings() to be called as the last
step when changing ethtool rings.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If RocE is supported on the device, set the number of RoCE MSIX vectors
to the number of online CPUs + 1 and capped at these maximums:
VF: 2
NPAR: 5
PF: 64
For the PF, the maximum is now increased from the previous value
of 9 to get better performance for kernel applications.
Remove the unnecessary check for BNXT_FLAG_ROCE_CAP.
bnxt_set_dflt_ulp_msix() will only be called if the flag is set.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use scoped for_each_available_child_of_node_scoped() when
iterating over device nodes to make code a bit simpler.
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A range of MSIX vectors are allocated at initialization for the number
needed for RocE and L2. During run-time, if the user increases or
decreases the number of L2 rings, all the MSIX vectors have to be
freed and a new range has to be allocated. This is not optimal and
causes disruptions to RoCE traffic every time there is a change in L2
MSIX.
If the system supports dynamic MSIX allocations, use dynamic
allocation to add new L2 MSIX vectors or free unneeded L2 MSIX
vectors. RoCE traffic is not affected using this scheme.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20240828183235.128948-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If dynamic MSIX allocation is supported, additional MSIX can be
allocated at run-time without reinitializing the existing MSIX entries.
The first step to support this dynamic scheme is to allocate a large
enough bp->irq_tbl if dynamic allocation is supported.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the new pci_alloc_irq_vectors() and pci_free_irq_vectors() to
replace the deprecated pci_enable_msix_range() and pci_disable_msix().
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In legacy INTX mode, a register is mapped so that the INTX handler can
read it to determine if the NIC is the source of the interrupt. This
and all the related macros are no longer needed now that INTX is no
longer supported.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we only support MSIX, the BNXT_FLAG_USING_MSIX is always true.
Remove it and any if conditions checking for it. Remove the INTX
handler and associated logic.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware has deprecated support for legacy INTX in 2022 (since v2.27)
and INTX hasn't been tested for many years before that. INTX was
only used as a fallback mechansim in case MSIX wasn't available. MSIX
is always supported by all firmware. If MSIX capability in PCI config
space is not found during probe, abort.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With recent changes in the .ndo_set_vf_*() guidelines, resubmitting
this patch that was reverted eariler in 2023:
c27153682e ("Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN")
Add these missing settings in the .ndo_set_vf_vlan() method.
Older firmware does not support the TPID setting so check for
proper support.
Remove the unused BNXT_VF_QOS flag.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for retrieving crash dump using ethtool -w on the
supported interface.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Newer firmware supports automatic DMA of crash dump to host memory
when it crashes. If the feature is supported, allocate the required
memory using the existing context memory infrastructure. Communicate
the page table containing the DMA addresses to the firmware.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240828183235.128948-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
c948c0973d ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops")
f2878cdeb7 ("bnxt_en: Add support to call FW to update a VNIC")
Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bnx2x_get_vf_config():
* The vlan field of ivi is a 32-bit integer, it is used to store a vlan ID.
* The vlan field of bulletin is a 16-bit integer, it is also used to store
a vlan ID.
In the current code, ivi->vlan is set using memset. But in the case of
setting it to the value of bulletin->vlan, this involves reading
32 bits from a 16bit source. This is likely safe, as the following
6 bytes are padding in the same structure, but none the less, it seems
undesirable.
However, it is entirely unclear to me how this scheme works on
big-endian systems.
Resolve this by simply assigning integer values to ivi->vlan.
Flagged by W=1 builds.
f.e. gcc-14 reports:
In function 'fortify_memcpy_chk',
inlined from 'bnx2x_get_vf_config' at .../bnx2x_sriov.c:2655:4:
.../fortify-string.h:580:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
580 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240815-bnx2x-int-vlan-v1-1-5940b76e37ad@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver currently blindly deletes its cache of RSS cotexts and
ntuple filters when the ethtool channel count is changing. It also
deletes the ntuple filters cache when the default indirection table
is changing.
The core will not allow ethtool channels to drop below any that
have been configured as ntuple destinations since this commit from 2022:
47f3ecf476 ("ethtool: Fail number of channels change when it conflicts with rxnfc")
So there is absolutely no need to delete the ntuple filters and
RSS contexts when changing ethtool channels.
It is also unnecessary to delete ntuple filters when the default
RSS indirection table is changing.
Remove bnxt_clear_usr_fltrs() and bnxt_clear_rss_ctxis() from the
ethtool ops and change them to static functions.
This bug will cause confusion to the end user and causes failure when
running the rss_ctx.py selftest.
Fixes: 1018319f94 ("bnxt_en: Invalidate user filters when needed")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240725111912.7bc17cf6@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240814225429.199280-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Although it seems unlikely in practice - there would need to be
rx ring indexes greater than 10^10 - it is theoretically possible
for the filename of per rx ring debugfs files to be truncated.
This is because although a 16 byte buffer is provided, the length
of the filename is restricted to 10 bytes. Remove this restriction
and allow the entire buffer to be used.
Also reduce the buffer to 12 bytes, which is sufficient.
Given that the range of rx ring indexes likely much smaller than the
maximum range of a 32-bit signed integer, a smaller buffer could be
used, with some further changes. But this change seems simple, robust,
and has minimal stack overhead.
Flagged by gcc-14:
.../bnxt_debugfs.c: In function 'bnxt_debug_dev_init':
drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c:69:30: warning: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size 10 [-Wformat-truncation=]
69 | snprintf(qname, 10, "%d", ring_idx);
| ^~
In function 'debugfs_dim_ring_init',
inlined from 'bnxt_debug_dev_init' at .../bnxt_debugfs.c:87:4:
.../bnxt_debugfs.c:69:29: note: directive argument in the range [-2147483643, 2147483646]
69 | snprintf(qname, 10, "%d", ring_idx);
| ^~~~
.../bnxt_debugfs.c:69:9: note: 'snprintf' output between 2 and 12 bytes into a destination of size 10
69 | snprintf(qname, 10, "%d", ring_idx);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compile tested only
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240813-bnxt-str-v2-2-872050a157e7@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This corrects an out-by-one error in the maximum length of the package
version string. The size argument of snprintf includes space for the
trailing '\0' byte, so there is no need to allow extra space for it by
reducing the value of the size argument by 1.
Found by inspection.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240813-bnxt-str-v2-1-872050a157e7@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
marvell/otx2 and mvpp2 do not support setting different
keys for different RSS contexts. Contexts have separate
indirection tables but key is shared with all other contexts.
This is likely fine, indirection table is the most important
piece.
Don't report the key-related parameters from such drivers.
This prevents driver-errors, e.g. otx2 always writes
the main key, even when user asks to change per-context key.
The second reason is that without this change tracking
the keys by the core gets complicated. Even if the driver
correctly reject setting key with rss_context != 0,
change of the main key would have to be reflected in
the XArray for all additional contexts.
Since the additional contexts don't have their own keys
not including the attributes (in Netlink speak) seems
intuitive. ethtool CLI seems to deal with it just fine.
Having to set the flag in majority of the drivers is
a bit tedious but not reporting the key is a safer
default.
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove .cap_rss_ctx_supported from drivers which moved to the new API.
This makes it easy to grep for drivers which still need to be converted.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The queue API calls bnxt_hwrm_vnic_update() to stop/start the flow of
packets, which can only properly flush the pipeline if FW indicates
support.
Add a macro BNXT_SUPPORTS_QUEUE_API that checks for the required flags
and only set queue_mgmt_ops if true.
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation when resetting a queue while packets are
flowing puts the queue into an inconsistent state.
There needs to be some synchronisation with the FW. Add calls to
bnxt_hwrm_vnic_update() to set the MRU for both the default and ntuple
vnic during queue start/stop. When the MRU is set to 0, flow is stopped.
Each Rx queue belongs to either the default or the ntuple vnic.
With calling bnxt_hwrm_vnic_update() the calls to napi_enable() and
napi_disable() must be removed for reset to work on a queue that has
active traffic flowing e.g. iperf3.
Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the newly added vnic->mru field in bnxt_hwrm_vnic_cfg().
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the HWRM_VNIC_QCAPS FW response for the receive engine flush
capability. This capability indicates that we can reliably support
RX ring restart when calling HWRM_VNIC_UPDATE with MRU set to 0.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the function bnxt_hwrm_vnic_update() to call FW to update
a VNIC. This call can be used when disabling and enabling a
receive ring within a VNIC. The mru which is the maximum receive
size of packets received by the VNIC can be updated.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main changes are:
1. HWRM_VNIC_UPDATE used to safely disable and enable an RX ring within
the VNIC.
2. New flag in HWRM_VNIC_QCAPS to indicate FW will do the proper flush
during HWRM_VNIC_UPDATE.
3. New flag in HWRM_FUNC_QCAPS to indicate that reservations for some
resources such as VNIC can be reduced.
4. New backing store memory types not used by the driver yet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide declaration of dmae_reg_go_c in header.
This symbol is defined in bnx2x_main.c.
And used in that file and bnx2x_stats.c.
However, Sparse complains that there is no declaration
of the symbol in dmae_reg_go_c nor is the symbol static.
.../bnx2x_main.c:291:11: warning: symbol 'dmae_reg_go_c' was not declared. Should it be static?
Address this by moving the declaration from bnx2x_stats.c to bnx2x_reg.h.
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240806-bnx2x-dec-v1-1-ae844ec785e4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both ethtool_ops.rxfh_max_context_id and the default value used when
it's not specified are supposed to be exclusive maxima (the former
is documented as such; the latter, U32_MAX, cannot be used as an ID
since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an
inclusive maximum.
Subtract one from 'limit' to produce an inclusive maximum, and pass
that to xa_alloc().
Increase bnxt's max by one to prevent a (very minor) regression, as
BNXT_MAX_ETH_RSS_CTX is an inclusive max. This is safe since bnxt
is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a
leftover from old driver code that managed context IDs itself.
Rename rxfh_max_context_id to rxfh_max_num_contexts to make its
semantics (hopefully) more obvious.
Fixes: 847a8ab186 ("net: ethtool: let the core choose RSS context IDs")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/5a2d11a599aa5b0cc6141072c01accfb7758650c.1723045898.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC,
while others might be only supported by the PHY. Make sure that the .get_wol()
returns the union of both rather than only that of the PHY if the PHY supports
Wake-on-LAN.
Fixes: 7e400ff35c ("net: bcmgenet: Add support for PHY-based Wake-on-LAN")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240806175659.3232204-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent commit has modified the code in __bnxt_reserve_rings() to
set the default RSS indirection table to default only when the number
of RX rings is changing. While this works for newer firmware that
requires RX ring reservations, it causes the regression on older
firmware not requiring RX ring resrvations (BNXT_NEW_RM() returns
false).
With older firmware, RX ring reservations are not required and so
hw_resc->resv_rx_rings is not always set to the proper value. The
comparison:
if (old_rx_rings != bp->hw_resc.resv_rx_rings)
in __bnxt_reserve_rings() may be false even when the RX rings are
changing. This will cause __bnxt_reserve_rings() to skip setting
the default RSS indirection table to default to match the current
number of RX rings. This may later cause bnxt_fill_hw_rss_tbl() to
use an out-of-range index.
We already have bnxt_check_rss_tbl_no_rmgr() to handle exactly this
scenario. We just need to move it up in bnxt_need_reserve_rings()
to be called unconditionally when using older firmware. Without the
fix, if the TX rings are changing, we'll skip the
bnxt_check_rss_tbl_no_rmgr() call and __bnxt_reserve_rings() may also
skip the bnxt_set_dflt_rss_indir_tbl() call for the reason explained
in the last paragraph. Without setting the default RSS indirection
table to default, it causes the regression:
BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
Call Trace:
__bnxt_hwrm_vnic_set_rss+0xb79/0xe40
bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
__bnxt_setup_vnic_p5+0x12e/0x270
__bnxt_open_nic+0x2262/0x2f30
bnxt_open_nic+0x5d/0xf0
ethnl_set_channels+0x5d4/0xb30
ethnl_default_set_doit+0x2f1/0x620
Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/netdev/ZrC6jpghA3PWVWSB@gmail.com/
Fixes: 98ba1d931f ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240806053742.140304-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Migrate tasklet APIs to the new bottom half workqueue mechanism. It
replaces all occurrences of tasklet usage with the appropriate workqueue
APIs throughout the cnic driver. This transition ensures compatibility
with the latest design and enhances performance.
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Link: https://patch.msgid.link/20240730183403.4176544-4-allen.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As described in the kdoc for .create_rxfh_context we are responsible
for populating the defaults. The core will not call .get_rxfh
for non-0 context.
The problem can be easily observed since Netlink doesn't currently
use the cache. Using netlink ethtool:
$ ethtool -x eth0 context 1
[...]
RSS hash key:
13:60:cd:60:14:d3:55:36:86:df:90:f2:96:14:e2:21:05:57:a8:8f:a5:12:5e:54:62:7f:fd:3c:15:7e:76:05:71:42:a2:9a:73:80:09:9c
RSS hash function:
toeplitz: on
xor: off
crc32: off
But using IOCTL ethtool shows:
$ ./ethtool-old -x eth0 context 1
[...]
RSS hash key:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
RSS hash function:
Operation not supported
Fixes: 7964e78846 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit under Fixes I split the bnxt_set_rxfh_context() function,
and attached the appropriate chunks to new ops. I missed that
bnxt_set_rxfh_context() gets called after some initial checks
in bnxt_set_rxfh(), namely that the hash function is Toeplitz.
Fixes: 5c466b4d4e ("eth: bnxt: move from .set_rxfh to .create_rxfh_context and friends")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In __bnxt_reserve_rings(), the existing code unconditionally sets the
default RSS indirection table to default if netif_is_rxfh_configured()
returns false. This used to be correct before we added RSS contexts
support. For example, if the user is changing the number of ethtool
channels, we will enter this path to reserve the new number of rings.
We will then set the RSS indirection table to default to cover the new
number of rings if netif_is_rxfh_configured() is false.
Now, with RSS contexts support, if the user has added or deleted RSS
contexts, we may now enter this path to reserve the new number of VNICs.
However, netif_is_rxfh_configured() will not return the correct state if
we are still in the middle of set_rxfh(). So the existing code may
set the indirection table of the default RSS context to default by
mistake.
Fix it to check if the reservation of the RX rings is changing. Only
check netif_is_rxfh_configured() if it is changing. RX rings will not
change in the middle of set_rxfh() and this will fix the issue.
Fixes: b3d0083caf ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Reported-and-tested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20240625010210.2002310-1-kuba@kernel.org
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20240724222106.147744-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
catching COVID, so relatively short PR. Including fixes from bpf
and netfilter.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaibxAACgkQMUZtbf5S
IruuIRAAu96TiN/urPwmKznyb/Sk8x7p8iUzn6OvPS/TUlFUkURQtOh6M9uvbpN4
x/L//EWkMR0hY4SkBegoiXfb1GS0PjBdWTWUiROm5X9nVHqp5KRZAxWXhjFiS1BO
BIYOT+JfCl7mQiPs90Mys/cEtYOggMBsCZQVIGw/iYoJLFREqxFSONwa0dG+tGMX
jn9WNu4yCVDhJ/jtl2MaTsCNtYUaBUgYrKHJBfNGfJ2Lz/7rH9yFui2WSMlmOd/U
QGeCb1DWURlShlCqY37wNinbFsxWkI5JN00ukTtwFAXLIaqc+zgHcIjrDjTJwK43
F4tKbJT3+bmehMU/h3Uo3c7DhXl7n9zDGiDtbCxnkykp0sFGJpjhDrWydo51c+YB
qW5HaNrII2LiDicOVN8L29ylvKp7AEkClxgivEhZVGGk2f/szJRXfp9u3WBn5kAx
3paH55YN0DEsKbYbb1ZENEI1Vnc/4ff4PxZJCUNKwzcS8wCn1awqwcriK9TjS/cp
fjilNFT4J3/uFrodHWTkx0jJT6UJFT0aF03qPLUH/J5kG+EVukOf1jBPInNdf1si
1j47SpblHUe86HiHphFMt32KZ210lJzWxh8uGma57Y2sB9makdLiK4etrFjkiMJJ
Z8A3kGp3KpFjbuK4tHY25rp+5oxLNNOBNpay29lQrWtCL/NDcaQ=
=9OsH
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"
* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
tun: add missing verification for short frame
tap: add missing verification for short frame
mISDN: Fix a use after free in hfcmulti_tx()
gve: Fix an edge case for TSO skb validity check
bnxt_en: update xdp_rxq_info in queue restart logic
tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
MAINTAINERS: make Breno the netconsole maintainer
MAINTAINERS: Update bonding entry
net: nexthop: Initialize all fields in dumped nexthops
net: stmmac: Correct byte order of perfect_match
selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
tipc: Return non-zero value from tipc_udp_addr2str() on error
netfilter: nft_set_pipapo_avx2: disable softinterrupts
ice: Fix recipe read procedure
ice: Add a per-VF limit on number of FDIR filters
net: bonding: correctly annotate RCU in bond_should_notify_peers()
...
Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases to
get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions. It's not all that useful for a "write a whole driver
in rust" type of thing, but the firmware bindings do help out the
phy rust drivers, and the driver core bindings give a solid base on
which others can start their work. There is still a long way to go
here before we have a multitude of rust drivers being added, but
it's a great first step.
- driver core const api changes. This reached across all bus types,
and there are some fix-ups for some not-common bus types that
linux-next and 0-day testing shook out. This work is being done to
help make the rust bindings more safe, as well as the C code, moving
toward the end-goal of allowing us to put driver structures into
read-only memory. We aren't there yet, but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm
cJEYtJpGtWX6aAtugm9E
=ZyJV
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver
updates(creates and deletes) a page_pool.
But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still
connected to an old page_pool.
So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but
bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool.
An old page_pool is no longer used so it is supposed to be
deleted by page_pool_destroy() but it isn't.
Because the xdp_rxq_info is holding the reference count for it and the
xdp_rxq_info is not updated, an old page_pool will not be deleted in
the queue restart logic.
Before restarting 1 queue:
./tools/net/ynl/samples/page-pool
enp10s0f1np1[6] page pools: 4 (zombies: 0)
refs: 8192 bytes: 33554432 (refs: 0 bytes: 0)
recycling: 0.0% (alloc: 128:8048 recycle: 0:0)
After restarting 1 queue:
./tools/net/ynl/samples/page-pool
enp10s0f1np1[6] page pools: 5 (zombies: 0)
refs: 10240 bytes: 41943040 (refs: 0 bytes: 0)
recycling: 20.0% (alloc: 160:10080 recycle: 1920:128)
Before restarting queues, an interface has 4 page_pools.
After restarting one queue, an interface has 5 page_pools, but it
should be 4, not 5.
The reason is that queue restarting logic creates a new page_pool and
an old page_pool is not deleted due to the absence of an update of
xdp_rxq_info logic.
Fixes: 2d694c27d3 ("bnxt_en: implement netdev_queue_mgmt_ops")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In prevision to add new UAPI for hwtstamp we will be limited to the struct
ethtool_ts_info that is currently passed in fixed binary format through the
ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code
already started operating on an extensible kernel variant of that
structure, similar in concept to struct kernel_hwtstamp_config vs struct
hwtstamp_config.
Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here
we introduce the kernel-only structure in include/linux/ethtool.h.
The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO.
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of allocating a separate indir table in the vnic use
the one already present in the RSS context allocated by the core.
This saves some LoC and also we won't have to worry about syncing
the local version back to the core, once core learns how to dump
contexts.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ethtool core stores indirection table with u32 entries, "just to be safe".
Switch the type in the driver, so that it's easier to swap local tables
for the core ones. Memory allocations already use sizeof(*entry), switch
the memset()s as well.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt allocates tables of max size, and changes the used size
based on number of active rings. The unused entries get padded
out with zeros. bnxt_modify_rss() seems to always pad out
the table of the main / default RSS context, instead of
the table of the modified context.
I haven't observed any behavior change due to this patch,
so I don't think it's a fix. Not entirely sure what role
the padding plays, 0 is a valid queue ID.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core already maintains all RSS contexts in an XArray, no need
to keep a second list in the driver.
Remove bnxt_get_max_rss_ctx_ring() completely since core performs
the same check already.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core can allocate space for per-context driver-private data,
use it for struct bnxt_rss_ctx. Inline bnxt_alloc_rss_ctx()
at this point, most of the init (as in the actions bnxt_del_one_rss_ctx()
will undo) is open coded in bnxt_create_rxfh_context(), anyway.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
New RSS context API removes old contexts on netdev unregister.
No need to wipe them manually.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core will allocate IDs for the driver, from the range
[1, BNXT_MAX_ETH_RSS_CTX], no need to track the allocations.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the new ethtool ops for RSS context management. The conversion
is pretty straightforward cut / paste of the right chunks of the
combined handler. Main change is that we let the core pick the IDs
(bitmap will be removed separately for ease of review), so we need
to tell the core when we lose a context.
Since the new API passes rxfh as const, change bnxt_modify_rss()
to also take const.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Contexts get deleted from FW when the device is down, but they
are kept in SW and re-added back on open. bnxt_set_rxfh_context()
apparently does not want to deal with complexity of dealing with
both the device down and device up cases. This is perhaps acceptable
for creating new contexts, but not being able to delete contexts
makes core-driven cleanups messy. Specifically with the new RSS
API core will try to delete contexts automatically after bringing
the device down.
Support the delete-while-down case. Skip the FW logic and delete
just the driver state.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b0 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt doesn't check if a ring is used by RSS contexts when reducing
ring count. Core performs a similar check for the drivers for
the main context, but core doesn't know about additional contexts,
so it can't validate them. bnxt_fill_hw_rss_tbl_p5() uses ring
id to index bp->rx_ring[], which without the check may end up
being out of bounds.
BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
Call Trace:
__bnxt_hwrm_vnic_set_rss+0xb79/0xe40
bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
__bnxt_setup_vnic_p5+0x12e/0x270
__bnxt_open_nic+0x2262/0x2f30
bnxt_open_nic+0x5d/0xf0
ethnl_set_channels+0x5d4/0xb30
ethnl_default_set_doit+0x2f1/0x620
Core does track the additional contexts in net-next, so we can
move this validation out of the driver as a follow up there.
Fixes: b3d0083caf ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240705020005.681746-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
While creating a new RSS context, bnxt_rfs_capable() currently
makes a strict check to see if the required VNICs are already
available. If the current VNICs are not what is required,
either too many or not enough, it will call the firmware to
reserve the exact number required.
There is a bug in the firmware when the driver tries to
relinquish some reserved VNICs and RSS contexts. It will
cause the default VNIC to lose its RSS configuration and
cause receive packets to be placed incorrectly.
Workaround this problem by skipping the resource reduction.
The driver will not reduce the VNIC and RSS context reservations
when a context is deleted. The resources will be available for
use when new contexts are created later.
Potentially, this workaround can cause us to run out of VNIC
and RSS contexts if there are a lot of VF functions creating
and deleting RSS contexts. In the future, we will conditionally
disable this workaround when the firmware fix is available.
Fixes: 438ba39b25 ("bnxt_en: Improve RSS context reservation infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Have bnxt call page_pool_disable_direct_recycling() to unlink the old
page pool when resetting a queue prior to destroying it, instead of
touching a netdev core struct directly.
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now that we require the spinlock to protect ptp->txts_prod, change
ptp->tx_avail to non-atomic and protect it under the same spinlock.
Add a new helper function bnxt_ptp_get_txts_prod() to decrement
ptp->tx_avail under spinlock and return the producer.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start accepting up to 4 TX TS requests on BCM5750X (P5) chips.
These PTP TX packets will be queued in the ptp->txts_req[] array
waiting for the TX timestamp to complete. The entries in the
array will be managed by a producer and consumer index. The
producer index is updated under spinlock since multiple TX rings
can try to send PTP packets at the same time.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the function bnxt_stamp_tx_skb() to return 0 for suceess
or -EAGAIN if the timestamp is still pending in firmware. The
calling PTP aux worker will reschedule based on the return code.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current 5750X PTP code paths, there is always at most one TX
SKB requested for timestamp and we won't accept another one until we
have retrieved the timestamp or it has timed out. Remove the
unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change
the function to void.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the older 5750X (P5) chips, we currently support only 1 TX PTP
packet in-flight waiting for the timestamp. Refactor the
datastructures to prepare to support up to 4 TX PTP packets.
Combine all fields required for PTP TX timestamp query into one
structure. An array of this structure will be added in follow-on
patches to support multiple outstanding TX timestamps.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BCM5760X firmware will advertise direct 64-bit PHC registers access
for the driver from BAR0.
Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's
response and PHC register mapping for 5760X chips.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new BCM5760X chips will return the timestamp of TX packets in a
new completion. Add logic in __bnxt_poll_work() to handle this
completion type to retrieve the timestamp. This feature eliminates
the limit on the number of in-flight PTP TX packets.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver's current logic will always free all the TX SKBs up to
txr->tx_hw_cons within NAPI. In the next patches, we'll be adding
logic to handle TX timestamp completion and we may need to hold
some remaining TX SKBs if we don't have the timestamp completions
yet.
Modify __bnxt_poll_work_done() to clear each event bit separately to
allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or
not. bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX
SKBs are held waiting for TX timestamps. Note that legacy chips will
never hold any SKBs this way. The SKB is always deferred to the PTP
worker slow path to retrieve the timestamp from firmware. On the new
P7 chips, the timestamp is returned by the hardware directly and we
can retrieve it directly from NAPI.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the unused is_gso field and add the is_ts_pkt field to struct
bnxt_sw_tx_bd. This field will mark the TX BD that has requested
HW TX timestamp. The field needs to be cleared if the timestamp packet
is later aborted. This field will be useful when processing the
new TX timestamp completion from the hardware in the next patches.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new BCM5760X chips will generate this new TX timestamp completion
when a TX packet's timestamp has been taken right before transmission.
The driver logic to retrieve the timestamp will be added in the next
few patches.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix UBSAN warnings that occur when using a system with 32 physical
cpu cores or more, or when the user defines a number of Ethernet
queues greater than or equal to FP_SB_MAX_E1x using the num_queues
module parameter.
Currently there is a read/write out of bounds that occurs on the array
"struct stats_query_entry query" present inside the "bnx2x_fw_stats_req"
struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h".
Looking at the definition of the "struct stats_query_entry query" array:
struct stats_query_entry query[FP_SB_MAX_E1x+
BNX2X_FIRST_QUEUE_QUERY_IDX];
FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and
has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3
meaning the array has a total size of 19.
Since accesses to "struct stats_query_entry query" are offset-ted by
BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet
queues should not exceed FP_SB_MAX_E1x (16). However one of these queues
is reserved for FCOE and thus the number of Ethernet queues should be set
to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if
it is not.
This is also described in a comment in the source code in
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition
of FP_SB_MAX_E1x. Below is the part of this explanation that it important
for this patch
/*
* The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is
* control by the number of fast-path status blocks supported by the
* device (HW/FW). Each fast-path status block (FP-SB) aka non-default
* status block represents an independent interrupts context that can
* serve a regular L2 networking queue. However special L2 queues such
* as the FCoE queue do not require a FP-SB and other components like
* the CNIC may consume FP-SB reducing the number of possible L2 queues
*
* If the maximum number of FP-SB available is X then:
* a. If CNIC is supported it consumes 1 FP-SB thus the max number of
* regular L2 queues is Y=X-1
* b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor)
* c. If the FCoE L2 queue is supported the actual number of L2 queues
* is Y+1
* d. The number of irqs (MSIX vectors) is either Y+1 (one extra for
* slow-path interrupts) or Y+2 if CNIC is supported (one additional
* FP interrupt context for the CNIC).
* e. The number of HW context (CID count) is always X or X+1 if FCoE
* L2 queue is supported. The cid for the FCoE L2 queue is always X.
*/
However this driver also supports NICs that use the E2 controller which can
handle more queues due to having more FP-SB represented by FP_SB_MAX_E2.
Looking at the commits when the E2 support was added, it was originally
using the E1x parameters: commit f2e0899f0f ("bnx2x: Add 57712 support").
Back then FP_SB_MAX_E2 was set to 16 the same as E1x. However the driver
was later updated to take full advantage of the E2 instead of having it be
limited to the capabilities of the E1x. But as far as we can tell, the
array "stats_query_entry query" was still limited to using the FP-SB
available to the E1x cards as part of an oversignt when the driver was
updated to take full advantage of the E2, and now with the driver being
aware of the greater queue size supported by E2 NICs, it causes the UBSAN
warnings seen in the stack traces below.
This patch increases the size of the "stats_query_entry query" array by
replacing FP_SB_MAX_E1x with FP_SB_MAX_E2 to be large enough to handle
both types of NICs.
Stack traces:
UBSAN: array-index-out-of-bounds in
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11
index 20 is out of range for type 'stats_query_entry [19]'
CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic
#202405052133
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 10/21/2019
Call Trace:
<TASK>
dump_stack_lvl+0x76/0xa0
dump_stack+0x10/0x20
__ubsan_handle_out_of_bounds+0xcb/0x110
bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x]
bnx2x_stats_init+0x156/0x320 [bnx2x]
bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x]
bnx2x_nic_load+0x8e8/0x19e0 [bnx2x]
bnx2x_open+0x16b/0x290 [bnx2x]
__dev_open+0x10e/0x1d0
RIP: 0033:0x736223927a0a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca
64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a
RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003
RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080
R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0
R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00
</TASK>
---[ end trace ]---
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11
index 28 is out of range for type 'stats_query_entry [19]'
CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic
#202405052133
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 10/21/2019
Call Trace:
<TASK>
dump_stack_lvl+0x76/0xa0
dump_stack+0x10/0x20
__ubsan_handle_out_of_bounds+0xcb/0x110
bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x]
bnx2x_stats_init+0x156/0x320 [bnx2x]
bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x]
bnx2x_nic_load+0x8e8/0x19e0 [bnx2x]
bnx2x_open+0x16b/0x290 [bnx2x]
__dev_open+0x10e/0x1d0
RIP: 0033:0x736223927a0a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca
64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a
RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003
RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080
R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0
R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00
</TASK>
---[ end trace ]---
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in
drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8
index 29 is out of range for type 'stats_query_entry [19]'
CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic
#202405052133
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 10/21/2019
Workqueue: bnx2x bnx2x_sp_task [bnx2x]
Call Trace:
<TASK>
dump_stack_lvl+0x76/0xa0
dump_stack+0x10/0x20
__ubsan_handle_out_of_bounds+0xcb/0x110
bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x]
bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x]
? bnx2x_hw_stats_post+0x231/0x250 [bnx2x]
bnx2x_stats_start+0x44/0x70 [bnx2x]
bnx2x_stats_handle+0x149/0x350 [bnx2x]
bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x]
bnx2x_sp_task+0x491/0x5c0 [bnx2x]
process_one_work+0x18d/0x3f0
</TASK>
---[ end trace ]---
Fixes: 50f0a562f8 ("bnx2x: add fcoe statistics")
Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20240627111405.1037812-1-ghadi.rahme@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement netdev_queue_mgmt_ops for bnxt added in [1].
Two bnxt_rx_ring_info structs are allocated to hold the new/old queue
memory. Queue memory is copied from/to the main bp->rx_ring[idx]
bnxt_rx_ring_info.
Queue memory is pre-allocated in bnxt_queue_mem_alloc() into a clone,
and then copied into bp->rx_ring[idx] in bnxt_queue_mem_start().
Similarly, when bp->rx_ring[idx] is stopped its queue memory is copied
into a clone, and then freed later in bnxt_queue_mem_free().
I tested this patchset with netdev_rx_queue_restart(), including
inducing errors in all places that returns an error code. In all cases,
the queue is left in a good working state.
Rx queues are created/destroyed using bnxt_hwrm_rx_ring_alloc() and
bnxt_hwrm_rx_ring_free(), which issue HWRM_RING_ALLOC and HWRM_RING_FREE
commands respectively to the firmware. By the time a HWRM_RING_FREE
response is received, there won't be any more completions from that
queue.
Thanks to Somnath for helping me with this patch. With their permission
I've added them as Acked-by.
[1]: https://lore.kernel.org/netdev/20240501232549.1327174-2-shailend@google.com/
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prepare for queue API implementation, split rx ring functions out
from ring helpers. These new helpers will be called from queue API
implementation.
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
1e7962114c ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error")
165f87691a ("bnxt_en: add timestamping statistics support")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current code only restores PTP tx_avail count when we get DMA
mapping errors. Fix it so that the PTP tx_avail count will be
restored for both DMA mapping errors and skb_pad() errors.
Otherwise PTP TX timestamp will not be available after a PTP
packet hits the skb_pad() error.
Fixes: 83bb623c96 ("bnxt_en: Transmit and retrieve packet timestamps")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618215313.29631-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware will now advertise a non-zero TSO max segments if the
device has a limit. 0 means no limit. The latest 5760X chip
(early revs) has a limit of 2047 that cannot be exceeded. If
exceeded, the chip will send out just a small number of segments.
Call netif_set_tso_max_segs() if the device has a limit.
Fixes: 2012a6abc8 ("bnxt_en: Add 5760X (P7) PCI IDs")
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240618215313.29631-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The relevant change is the max_tso_segs value returned by firmware
in the HWRM_FUNC_QCAPS response. This value will be used in the next
patch to cap the TSO segments.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240618215313.29631-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmZvTbAeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGVksIAJEn4a9IVM8FNCJy
Dxo0BItD1/qJ5mLDptqUFRKlxInjbojofz5CyoeIeXb0DwRfB16ALXqNXAkd3APi
saoOpfjFsg2H2OqL9CHdkzWcJEAq2lDnL0zaOjumeDVu/EyeT+tC4e4hq1e6Bm0E
fPC5ms2b+07DF9Rg6/DW8yPbdM5n6Mz1bRd3fQOIgvpM3yGOyGztEBgTRub/ZUgH
5pNJauknFAZgdiWhgNpc+lPWYZbgHKULQPhUBPdVhDIXPtQNUlKgNTQc6+L0Nmbb
K1sG1q7FLeMJOTFGQfD4r26X5DNQUi894q/9SX8X7rcrECdJKcw2WjVyB4myADpf
ae2gP+A=
=XjWP
-----END PGP SIGNATURE-----
Merge tag 'v6.10-rc4' into driver-core-next
We need the driver core and sysfs fixes in here to build on top of.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of token is released due to token->state == BNXT_HWRM_DEFERRED,
released token (set to NULL) is used in log messages. This issue is
expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But
this error code is returned by recent firmware. So some firmware may not
return it. This may lead to NULL pointer dereference.
Adjust this issue by adding token pointer check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 8fa4219dba ("bnxt_en: add dynamic debug support for HWRM messages")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware interface 1.10.2.118 has increased the size of
HWRM_PORT_PHY_QCFG response beyond the maximum size that can be
forwarded. When the VF's link state is not the default auto state,
the PF will need to forward the response back to the VF to indicate
the forced state. This regression may cause the VF to fail to
initialize.
Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum
96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the
new speeds2 fields are beyond the legacy structure. Also modify
bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96
bytes to make this failure more obvious.
Fixes: 84a911db83 ("bnxt_en: Update firmware interface to 1.10.2.118")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the quest to make struct device constant, start by making
to_auxiliary_drv() return a constant pointer so that drivers that call
this can be fixed up before the driver core changes.
As the return type previously was not constant, also fix up all callers
that were assuming that the pointer was not going to be a constant one
in order to not break the build.
Cc: Dave Ertman <david.m.ertman@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Bingbu Cao <bingbu.cao@intel.com>
Cc: Tianshu Qiu <tian.shu.qiu@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Cc: Bard Liao <yung-chuan.liao@linux.intel.com>
Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: Daniel Baluta <daniel.baluta@nxp.com>
Cc: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: linux-media@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: intel-wired-lan@lists.osuosl.org
Cc: linux-rdma@vger.kernel.org
Cc: sound-open-firmware@alsa-project.org
Cc: linux-sound@vger.kernel.org
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # drivers/media/pci/intel/ipu6
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20240611130103.3262749-7-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
atomic_dec_if_positive returns new value regardless if it is updated or
not. The commit in fixes changed the behavior of the condition to one
that differs from original code. Restore original condition to properly
maintain atomic counter.
Fixes: 165f87691a ("bnxt_en: add timestamping statistics support")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240604091939.785535-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ethtool_ts_stats structure was introduced earlier this year. Now
it's time to support this group of counters in more drivers.
This patch adds support to bnxt driver.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240530204751.99636-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c945 ("inet: protect against too small
mtu values.")
We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.
It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current net-next/main does not boot for older chipsets e.g. Stratus.
Sample dmesg:
[ 11.368315] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Able to reserve only 0 out of 9 requested RX rings
[ 11.390181] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Unable to reserve tx rings
[ 11.438780] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): 2nd rings reservation failed.
[ 11.487559] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Not enough rings available.
[ 11.506012] bnxt_en 0000:02:00.0: probe with driver bnxt_en failed with error -12
This is caused by bnxt_get_avail_msix() returning a negative value for
these chipsets not using the new resource manager i.e. !BNXT_NEW_RM.
This in turn causes hwr.cp in __bnxt_reserve_rings() to be set to 0.
In the current call stack, __bnxt_reserve_rings() is called from
bnxt_set_dflt_rings() before bnxt_init_int_mode(). Therefore,
bp->total_irqs is always 0 and for !BNXT_NEW_RM bnxt_get_avail_msix()
always returns a negative number.
Historically, MSIX vectors were requested by the RoCE driver during
run-time and bnxt_get_avail_msix() was used for this purpose. Today,
RoCE MSIX vectors are statically allocated. bnxt_get_avail_msix() should
only be called for the BNXT_NEW_RM() case to reserve the MSIX ahead of
time for RoCE use.
bnxt_get_avail_msix() is also be simplified to handle the BNXT_NEW_RM()
case only.
Fixes: d630624ebd ("bnxt_en: Utilize ulp client resources if RoCE is not registered")
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240502203757.3761827-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/filter.h
kernel/bpf/core.c
66e13b615a ("bpf: verifier: prevent userspace memory access")
d503a04f8b ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No driver logic changes are required to support the VFs, so just add
the VF PCI ID.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the error recovery path (AER, firmware recovery, etc), the
driver notifies the RoCE driver via ULP_STOP before the reset
and via ULP_START after the reset, all under RTNL_LOCK. The
RoCE driver can take a long time if there are a lot of QPs to
destroy, so it is not ideal to hold the global RTNL lock.
Rely on the new en_dev_lock mutex instead for ULP_STOP and
ULP_START. For the most part, we move the ULP_STOP call before
we take the RTNL lock and move the ULP_START after RTNL unlock.
Note that SRIOV re-enablement must be done after ULP_START
or RoCE on the VFs will not resume properly after reset.
The one scenario in bnxt_hwrm_if_change() where the RTNL lock
is already taken in the .ndo_open() context requires the ULP
restart to be deferred to the bnxt_sp_task() workqueue.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current scheme relies heavily on the RTNL lock for all ULP
operations between the L2 and the RoCE driver. Add a new en_dev_lock
mutex so that the asynchronous ULP_STOP and ULP_START operations
can be serialized with bnxt_register_dev() and bnxt_unregister_dev()
calls without relying on the RTNL lock. The next patch will remove
the RTNL lock from the ULP_STOP and ULP_START calls.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no need to call ULP_STOP and ULP_START before and after the
L2 reset in bnxt_reset_task(). This L2 reset is done after detecting
TX timeout, RX ring errors, or VF config changes. The L2 reset does
not affect RoCE since the firmware is not reset and the backing store
is left alone.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Offline self test is a very disruptive operation for RoCE and requires
all active QPs to be destroyed. With a large number of QPs, it can
take a long time to destroy all the QPs and can timeout. Do not allow
ethtool offline self test if the RoCE driver is registered on the
device.
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On P5_PLUS chips and later, the NQ rings have subrings for RX and TX
completions respectively. These subrings are passed to the poll
function instead of the base NQ, but each ring carries its own
copy of the software ring statistics.
For stats to be conveniently accessible in __bnxt_poll_work(), the
statistics memory should either be shared between the NQ and its
subrings or the subrings need to be included in the ethtool stats
aggregation logic. This patch opts for the former, because it's more
efficient and less confusing having the software statistics for a
ring exist in a single place.
Before this patch, the counter will not be displayed if the "wrong"
cpr->sw_stats was used to increment a counter.
Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The UMAC_CMD register is written from different execution
contexts and has insufficient synchronization protections to
prevent possible corruption. Of particular concern are the
acceses from the phy_device delayed work context used by the
adjust_link call and the BH context that may be used by the
ndo_set_rx_mode call.
A spinlock is added to the driver to protect contended register
accesses (i.e. reg_lock) and it is used to synchronize accesses
to UMAC_CMD.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Cc: stable@vger.kernel.org
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ndo_set_rx_mode function is synchronized with the
netif_addr_lock spinlock and BHs disabled. Since this
function is also invoked directly from the driver the
same synchronization should be applied.
Fixes: 72f9634762 ("net: bcmgenet: set Rx mode before starting netif")
Cc: stable@vger.kernel.org
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EXT_RGMII_OOB_CTRL register can be written from different
contexts. It is predominantly written from the adjust_link
handler which is synchronized by the phydev->lock, but can
also be written from a different context when configuring the
mii in bcmgenet_mii_config().
The chances of contention are quite low, but it is conceivable
that adjust_link could occur during resume when WoL is enabled
so use the phydev->lock synchronizer in bcmgenet_mii_config()
to be sure.
Fixes: afe3f907d2 ("net: bcmgenet: power on MII block for all MII modes")
Cc: stable@vger.kernel.org
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers)
unconditionally, but b44::rx_buffers is only valid when the
device is up (they get allocated in b44_open(), and deallocated
again in b44_close()), any other time these are just a NULL pointers.
So if you try to change the pause params while the network interface
is disabled/administratively down, everything explodes (which likely
netifd tries to do).
Link: https://github.com/openwrt/openwrt/issues/13789
Fixes: 1da177e4c3 (Linux-2.6.12-rc2)
Cc: stable@vger.kernel.org
Reported-by: Peter Münster <pm@a16n.net>
Suggested-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vaclav Svoboda <svoboda@neng.cz>
Tested-by: Peter Münster <pm@a16n.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Peter Münster <pm@a16n.net>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I added OOM and netpoll discard counters, naively assuming that
the cpr pointer is pointing to a common completion ring.
Turns out that is usually *a* completion ring but not *the*
completion ring which bnapi->cp_ring points to. bnapi->cp_ring
is where the stats are read from, so we end up reporting 0
thru ethtool -S and qstat even though the drop events have happened.
Make 100% sure we're recording statistics in the correct structure.
Fixes: 907fd4a294 ("bnxt: count discards due to memory allocation errors")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
ice: Support 5 layer Tx scheduler topology
Mateusz Polchlopek says:
For performance reasons there is a need to have support for selectable
Tx scheduler topology. Currently firmware supports only the default
9-layer and 5-layer topology. This patch series enables switch from
default to 5-layer topology, if user decides to opt-in.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Document tx_scheduling_layers parameter
ice: Add tx_scheduling_layers devlink param
ice: Enable switching default Tx scheduler topology
ice: Adjust the VSI/Aggregator layers
ice: Support 5 layer topology
devlink: extend devlink_param *set pointer
====================
Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This driver currently doesn't support any control flags.
Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.
In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.
Only compile-tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Link: https://lore.kernel.org/r/20240422152626.175569-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend devlink_param *set function pointer to take extack as a param.
Sometimes it is needed to pass information to the end user from set
function. It is more proper to use for that netlink instead of passing
message to dmesg.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
During error recovery, such as AER fatal error slot reset, we call
bnxt_try_map_fw_health_reg() to try to get access to the health
register to determine the firmware state. Fix
bnxt_try_map_fw_health_reg() to recognize the P7 chip correctly
and set up the health register.
This fixes this type of AER slot reset failure:
bnxt_en 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
bnxt_en 0000:04:00.0 enp4s0f0np0: PCI I/O error detected
bnxt_en 0000:04:00.0 bnxt_re0: Handle device suspend call
bnxt_en 0000:04:00.1 enp4s0f1np1: PCI I/O error detected
bnxt_en 0000:04:00.1 bnxt_re1: Handle device suspend call
pcieport 0000:00:02.0: AER: Root Port link has been reset (0)
bnxt_en 0000:04:00.0 enp4s0f0np0: PCI Slot Reset
bnxt_en 0000:04:00.0: enabling device (0000 -> 0002)
bnxt_en 0000:04:00.0: Firmware not ready
bnxt_en 0000:04:00.1 enp4s0f1np1: PCI Slot Reset
bnxt_en 0000:04:00.1: enabling device (0000 -> 0002)
bnxt_en 0000:04:00.1: Firmware not ready
pcieport 0000:00:02.0: AER: device recovery failed
Fixes: a432a45bdb ("bnxt_en: Define basic P7 macros")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not support two simultaneous recoveries so check for reset
flag, BNXT_STATE_IN_FW_RESET, and do not proceed with AER further.
When the pci channel state is pci_channel_io_frozen, the PCIe link
can not be trusted so we disable the traffic immediately and stop
BAR access by calling bnxt_fw_fatal_close(). BAR access after
AER fatal error can cause an NMI.
Fixes: f75d9a0aa9 ("bnxt_en: Re-write PCI BARs after PCI fatal error.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce bnxt_fw_fatal_close() API which can be used
to stop data path and disable device when firmware
is in fatal state.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bringing down the TX rings we flush the rings but forget to
reclaimed the flushed packets. This leads to a memory leak since we
do not free the dma mapped buffers. This also leads to tx control
block corruption when bringing down the interface for power
management.
Fixes: 490cb41200 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240418180541.2271719-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update MODULE_DESCRIPTION to the more generic adapter family name.
The old name only includes the first generation of supported
adapters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the RoCE driver is not registered for a RoCE capable device, add
flexibility to use the RoCE resources (MSIX/NQs) for L2 purposes,
such as additional rings configured by the user or for XDP.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing scheme sets aside a number of MSIX/NQs for the RoCE
driver whether the RoCE driver is registered or not. This scheme
is not flexible and limits the resources available for the L2 rings
if RoCE is never used.
Modify the scheme so that the RoCE MSIX/NQs can be used by the L2
driver if they are not used for RoCE. The MSIX/NQs are now
represented by 3 fields. bp->ulp_num_msix_want contains the
desired default value, edev->ulp_num_msix_vec contains the
available value (but not necessarily in use), and
ulp_tbl->msix_requested contains the actual value in use by RoCE.
The L2 driver can dip into edev->ulp_num_msix_vec if necessary.
We need to add rtnl_lock() back in bnxt_register_dev() and
bnxt_unregister_dev() to synchronize the MSIX usage between L2 and
RoCE.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In its current form, bnxt_rdma_aux_device_init() not only initializes
the necessary data structures of the newly created aux device but also
adds the aux device into the aux bus subsytem. Refactor the logic into
separate functions, first function to initialize the aux device along
with the required resources and second, to actually add the device to
the aux bus subsytem.
This separation helps to create bnxt_en_dev much earlier and save its
resources separately.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ever since commit:
3034322113 ("bnxt_en: Remove runtime interrupt vector allocation")
The MSIX base vector is effectively always 0. Remove all unneeded
structure fields and code referencing the MSIX base.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The memory for "edev->ulp_tbl" is allocated inside the
bnxt_rdma_aux_device_init() function. If it fails, the driver
will not create the auxiliary device for RoCE. Hence the NULL
check inside bnxt_register_dev() is unnecessary.
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current implementation requires the ifstate to be up when
configuring the RSS contexts. It will try to fetch the RX ring
IDs and will crash if it is in ifdown state. Return error if
!netif_running() to prevent the crash.
An improved implementation is in the works to allow RSS contexts
to be changed while in ifdown state.
Fixes: b3d0083caf ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409215431.41424-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is possible that during error recovery and firmware reset,
there is a pending TX PTP packet waiting for the timestamp.
We need to reset this condition so that after recovery, the
tx_avail count for PTP is reset back to the initial value.
Otherwise, we may not accept any PTP TX timestamps after
recovery.
Fixes: 118612d519 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>