Wrap copying of drop stats on TX path from fbd->hw_stats by the
hw_stats_lock. Currently, it is being performed outside the lock and
another thread accessing fbd->hw_stats can lead to inconsistencies.
Fixes: 5f8bd2ce82 ("eth: fbnic: add support for TMI stats")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250802024636.679317-3-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correctly copy the tx_dropped stats from the fbd->hw_stats to the
rtnl_link_stats64 struct.
Fixes: 5f8bd2ce82 ("eth: fbnic: add support for TMI stats")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250802024636.679317-2-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CI hit a UaF in fbnic in the AF_XDP portion of the queues.py test.
The UaF is in the __sk_mark_napi_id_once() call in xsk_bind(),
NAPI has been freed. Looks like the device failed to open earlier,
and we lack clearing the NAPI pointer from the queue.
Fixes: 557d02238e ("eth: fbnic: centralize the queue count and NAPI<>queue setting")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250728163129.117360-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There were several issues in the way we were handling the link info coming
from firmware.
First is the fact that we were carrying around "AUTO" flags to indicate
that we needed to populate the values. We can just drop this and assume
that we will always be populating the settings from firmware. With this we
can also clean up the masking as the "AUTO" flags were just there to be
stripped anyway.
Second since we are getting rid of the "AUTO" setting we still need a way
to report that the link is not configured. We convert the link_mode "AUTO"
to "UNKNOWN" to do this. With this we can avoid reporting link up in the
phylink_pcs_get_state call as we will just set link to 0 and return without
updating the link speed. This is preferred versus the driver just forcing
50G which makes it harder to recover when the FW does start providing valid
settings.
With this the plan is to eventually replace the link_mode we use with the
interface_mode from phylink for all intents and purposes and have the two
be interchangeable. At that point we can convert the FW provided settings
over to something closer to link partner settings and give phylink greater
control of the interface allowing for user override of the settings and an
asynchronous setup of the link versus having to pull early settings from
firmware.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/175028445548.625704.1367708155813490215.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We have two issues that need to be addressed in our IRQ handling.
One is the fact that we can end up double-freeing IRQs in the event of an
exception handling error such as a PCIe reset/recovery that fails. To
prevent that from becoming an issue we can use the msix_vector values to
indicate that we have successfully requested/freed the IRQ by only setting
or clearing them when we have completed the given action.
The other issue is that we have several potential races in our IRQ path due
to us manipulating the mask before the vector has been truly disabled. In
order to handle that in the case of the FW mailbox we need to not
auto-enable the IRQ and instead will be enabling/disabling it separately.
In the case of the PCS vector we can mitigate this by unmapping it and
synchronizing the IRQ before we clear the mask.
The general order of operations after this change is now to request the
interrupt, poll the FW mailbox to ready, and then enable the interrupt. For
the shutdown we do the reverse where we disable the interrupt, flush any
pending Tx, and then free the IRQ. I am renaming the enable/disable to
request/free to be equivilent with the IRQ calls being used. We may see
additions in the future to enable/disable the IRQs versus request/free them
for certain use cases.
Fixes: da3cde0820 ("eth: fbnic: Add FW communication mechanism")
Fixes: 69684376ee ("eth: fbnic: Add link detection")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/174654719271.499179.3634535105127848325.stgit@ahduyck-xeon-server.home.arpa
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix the tracking of rtnl_link_stats.tx_dropped. The counter
`tmi.drop.frames` is being double counted whereas, the counter
`tti.cm_drop.frames` is being skipped.
Fixes: f2957147ae ("eth: fbnic: add support for TTI HW stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250503020145.1868252-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add coverage for the TX Extension (TEI) Interface (TTI) stats. We are
tracking packets and control message drops because of credit exhaustion
on the TX interface.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch add coverage for TMI stats including PTP stats and drop
stats.
PTP stats include illegal requests, bad timestamp and good timestamps.
The bad timestamp and illegal request counters are reported under as
`error` via `ethtool -T` Both these counters are individually being
reported via `ethtool -S`
The good timestamp stats are being reported as `pkts` via `ethtool -T`
ethtool -S eth0 | grep "ptp"
ptp_illegal_req: 0
ptp_good_ts: 0
ptp_bad_ts: 0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-5-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch provides coverage to the RXB (RX Buffer) stats. RXB stats
are divided into 3 sections: RXB enqueue, RXB FIFO, and RXB dequeue
stats.
The RXB enqueue/dequeue stats are indexed from 0-3 and cater for the
input/output counters whereas, the RXB fifo stats are indexed from 0-7.
The RXB also supports pause frame stats counters which we are leaving
for a later patch.
ethtool -S eth0 | grep rxb
rxb_integrity_err0: 0
rxb_mac_err0: 0
rxb_parser_err0: 0
rxb_frm_err0: 0
rxb_drbo0_frames: 1433543
rxb_drbo0_bytes: 775949081
---
---
rxb_intf3_frames: 1195711
rxb_intf3_bytes: 739650210
rxb_pbuf3_frames: 1195711
rxb_pbuf3_bytes: 765948092
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-4-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch provides support for hardware queue stats and covers
packet errors for RX-DMA engine, RCQ drops and BDQ drops.
The packet errors are also aggregated with the `rx_errors` stats in the
`rtnl_link_stats` as well as with the `hw_drops` in the queue API.
The RCQ and BDQ drops are aggregated with `rx_over_errors` in the
`rtnl_link_stats` as well as with the `hw_drop_overruns` in the queue API.
ethtool -S eth0 | grep -E 'rde'
rde_0_pkt_err: 0
rde_0_pkt_cq_drop: 0
rde_0_pkt_bdq_drop: 0
---
---
rde_127_pkt_err: 0
rde_127_pkt_cq_drop: 0
rde_127_pkt_bdq_drop: 0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-3-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add ethtool support to configure the IRQ coalescing behavior. Support
separate timers for Rx and Tx for time based coalescing. For frame based
configuration, currently we only support the Rx side.
The hardware allows configuration of descriptor count instead of frame
count requiring conversion between the two. We assume 2 descriptors
per frame, one for the metadata and one for the data segment.
When rx-frames are not configured, we set the RX descriptor count to
half the ring size as a fail safe.
Default configuration:
ethtool -c eth0 | grep -E "rx-usecs:|tx-usecs:|rx-frames:"
rx-usecs: 30
rx-frames: 0
tx-usecs: 35
IRQ rate test:
With single iperf flow we monitor IRQ rate while changing the tx-usesc and
rx-usecs to high and low values.
ethtool -C eth0 rx-frames 8192 rx-usecs 150 tx-usecs 150
irq/sec 13k
irq/sec 14k
irq/sec 14k
ethtool -C eth0 rx-frames 8192 rx-usecs 10 tx-usecs 10
irq/sec 27k
irq/sec 28k
irq/sec 28k
Validating the use of extack:
ethtool -C eth0 rx-frames 16384
netlink error: fbnic: rx_frames is above device max
netlink error: Invalid argument
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250218023520.2038010-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add TSO support to the driver. Device can handle unencapsulated or
IPv6-in-IPv6 packets. Any other tunnel stacks are handled with
GSO partial.
Validate that the packet can be offloaded in ndo_features_check.
Main thing we need to check for is that the header geometry can
be expressed in the decriptor fields (offsets aren't too large).
Report number of TSO super-packets via the qstat API.
Link: https://patch.msgid.link/20250216174109.2808351-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add ethtool -n / -N support. Support only "un-ordered" rule sets
(RX_CLS_LOC_ANY), just for simplicity of the code. It's unclear
anyone actually cares about the rule ordering.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20250206235334.1425329-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I realized when we were adding unicast addresses we were enabling
promiscuous mode. I did a bit of digging and realized we had overlooked
setting the driver private flag to indicate we supported unicast filtering.
Example below shows the table with 00deadbeef01 as the main NIC address,
and 5 additional addresses in the 00deadbeefX0 format.
# cat $dbgfs/mac_addr
Idx S TCAM Bitmap Addr/Mask
----------------------------------
00 0 00000000,00000000 000000000000
000000000000
01 0 00000000,00000000 000000000000
000000000000
02 0 00000000,00000000 000000000000
000000000000
...
24 0 00000000,00000000 000000000000
000000000000
25 1 00100000,00000000 00deadbeef50
000000000000
26 1 00100000,00000000 00deadbeef40
000000000000
27 1 00100000,00000000 00deadbeef30
000000000000
28 1 00100000,00000000 00deadbeef20
000000000000
29 1 00100000,00000000 00deadbeef10
000000000000
30 1 00100000,00000000 00deadbeef01
000000000000
31 0 00000000,00000000 000000000000
000000000000
Before rule 31 would be active. With this change it correctly sticks
to just the unicast filters.
Signed-off-by: Alexander Duyck <alexanderduyck@meta.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204010038.1404268-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
To simplify dealing with RTNL_ASSERT() requirements further
down the line, move setting queue count and NAPI<>queue
association to their own helpers.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/20241220025241.1522781-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change our method of swapping NAPIs without disturbing existing config.
This is primarily needed for "live reconfiguration" such as changing
the channel count when interface is already up.
Previously we were planning to use a trick of using shared interrupts.
We would install a second IRQ handler for the new NAPI, and make it
return IRQ_NONE until we were ready for it to take over. This works fine
functionally but breaks IRQ naming. The IRQ subsystem uses the IRQ name
to create the procfs entry, since both handlers used the same name
the second handler wouldn't get a proc directory registered.
When first one gets removed on success full ring count change
it would remove its directory and we would be left with none.
New approach uses a double pointer to the NAPI. The IRQ handler needs
to know how to locate the NAPI to schedule. We register a single IRQ handler
and give it a pointer to a pointer. We can then change what it points to
without re-registering. This may have a tiny perf impact, but really
really negligible.
Link: https://patch.msgid.link/20241220025241.1522781-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We will need an array for storing NAPIs in the upcoming IRQ handler
reuse rework. Replace the current list we have, so that we are able
to reuse it later.
In a few places replace i as the iterator with t when we iterate
over triads, this seems slightly less confusing than having
i, j, k variables.
Link: https://patch.msgid.link/20241220025241.1522781-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to redirect host-to-BMC traffic by writing MACDA entries
from the RPC (RX Parser and Classifier) to TCE-TCAM. The TCE TCAM is a
small L2 destination TCAM which is placed at the end of the TX path (TCE).
Unlike other NICs, where BMC diversion is typically handled by firmware,
for fbnic, firmware does not touch anything related to the host; hence,
the host uses TCE TCAM to divert BMC traffic.
Currently, we lack metadata to track where addresses have been written
in the TCAM, except for the last entry written. To address this issue,
we start at the opposite end of the table in each pass, so that adding
or deleting entries does not affect the availability of all entries,
assuming there is no significant reordering of entries.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20241104031300.1330657-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add callbacks to support timestamping configuration via ethtool.
Add processing of RX timestamps.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Create PHC device and provide callbacks needed for ptp_clock device.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add ethtool ops support and enable 'get_drvinfo' for fbnic. The driver
provides firmware version information while the driver name and bus
information is provided by ethtool_get_drvinfo().
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement netdev_stat_ops and export the basic per-queue stats.
This interface expect users to set the values that are used
either to zero or to some other preserved value (they are 0xff by
default). So here we export bytes/packets/drops from tx and rx_stats
plus set some of the values that are exposed by queue stats
to zero.
$ cd tools/testing/selftests/drivers/net && ./stats.py
[...]
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20240810054322.2766421-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Count packets, bytes and drop on the datapath, and report
to the user. Since queues are completely freed when the
device is down - accumulate the stats in the main netdev struct.
This means that per-queue stats will only report values since
last reset (per qstat recommendation).
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240810054322.2766421-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RSS is controlled by the Rx filter tables. Program rules matching
on appropriate traffic types and set hashing fields using actions.
We need a separate set of rules for broadcast and multicast
because the action there needs to include forwarding to BMC.
This patch only initializes the default settings, the control
of the configuration using ethtool will come soon.
With this the necessary rules are put in place to enable Rx of packets by
the host.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079943591.1778861.17778587068185893750.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Program the Rx TCAM to control L2 forwarding. Since we are in full
control of the NIC we need to make sure we include BMC forwarding
in the rules. When host is not present BMC will program the TCAM
to get onto the network but once we take ownership it's up to
Linux driver to make sure BMC L2 addresses are handled correctly.
Co-developed-by: Sanman Pradhan <sanmanpradhan@meta.com>
Signed-off-by: Sanman Pradhan <sanmanpradhan@meta.com>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079943202.1778861.4410412697614789017.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Handle Rx packets with basic csum and Rx hash offloads.
NIC writes back to the completion ring a head buffer descriptor
(data buffer allocated from header pages), variable number of payload
descriptors (data buffers in payload pages), an optional metadata
descriptor (type 2) and finally the primary metadata descriptor
(type 3).
This format makes scatter support fairly easy - start gathering
the pages when we see head page, gather until we see the primary
metadata descriptor, do the processing. Use XDP infra to collect
the packet fragments as we traverse the descriptors. XDP itself
is not supported yet, but it will be soon.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079942839.1778861.10509071985738726125.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Handle Tx of simple packets. Support checksum offload and gather.
Use .ndo_features_check to make sure packet geometry will be
supported by the HW, i.e. we can fit the header lengths into
the descriptor fields.
The device writes to the completion rings the position of the tail
(consumer) pointer. Read all those writebacks, obviously the last
one will be the most recent, complete skbs up to that point.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079942464.1778861.17919428039428796180.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add basic support for detecting the link and reporting it at the netdev
layer. For now we will just use the values reporeted by the firmware as the
link configuration and assume that is the current configuration of the MAC
and PCS.
With this we start the stubbing out of the phylink interface that will be
used to provide the configuration interface for ethtool in a future patch
set.
The phylink interface isn't an exact fit. As such we are currently working
around several issues in this patch set that we plan to address in the
future such as:
1. Support for FEC
2. Support for multiple lanes to handle 50GbaseR2 vs 50GbaseR1
3. Support for BMC
CC: Russell King <linux@armlinux.org.uk>
CC: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079939835.1778861.5964790909718481811.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After the driver loads we need to get some initial capabilities from the
firmware to determine what the device is capable of and what functionality
needs to be enabled. Specifically we receive information about the current
state of the link and if a BMC is present.
After that when we bring the interface up we will need the ability to take
ownership from the FW. To do that we will need to notify it that we are
taking control before we start configuring the traffic classifier and MAC.
Once we have ownership we need to notify the firmware that we are still
present and active. To do that we will send a regular heartbeat to the FW.
If the FW doesn't receive the heartbeat in a timely fashion it will retake
control of the RPC and MAC and assume that the host has gone offline.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079939458.1778861.8966209942099133957.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement control path parts of Rx queue handling.
The NIC consumes memory in pages. It takes a full page and places
packets into it in a configurable manner (with the ability to define
headroom / tailroom as well as head alignment requirements).
As mentioned in prior patches there are two page submissions queues
one for packet headers and second (optional) for packet payloads.
For now feed both queues from a single page pool.
Use the page pool "fragment" API, as we can't predict upfront
how the page will be sliced.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079939092.1778861.3780136633831329550.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement basic management operations for Tx queues.
Allocate memory for submission and completion rings.
Learn how to start the queues, stop them, and wait for HW
to be idle.
We call HW rings "descriptor rings" (stored in ring->desc),
and SW context rings "buffer rings" (stored in ring->*_buf union).
This is the first patch which actually touches CSRs so add CSR
helpers.
No actual datapath / packet handling here, yet.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079938724.1778861.8329677776612865169.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allocate a netdev and figure out basics like how many queues
we need, MAC address, MTU bounds. Kick off a service task
to do various periodic things like health checking.
The service task only runs when device is open.
We have four levels of objects here:
- ring - A HW ring with head / tail pointers,
- triad - Two submission and one completion ring,
- NAPI - NAPI, with one IRQ and any number of Rx and Tx triads,
- Netdev - The ultimate container of the rings and napi vectors.
The "triad" is the only less-than-usual construct. On Rx we have
two "free buffer" submission rings, one for packet headers and
one for packet data. On Tx we have separate rings for XDP Tx
and normal Tx. So we ended up with ring triplets in both
directions.
We keep NAPIs on a local list, even though core already maintains a list.
Later on having a separate list will matter for live reconfig.
We introduce the list already, the churn would not be worth it.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://patch.msgid.link/172079938358.1778861.11681469974633489463.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>