mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-08-23 23:19:59 +00:00

The default RX IRQ coalescing settings of one IRQ per packet can represent a significant CPU load. However, increasing the coalescing unilaterally can result in undesirable latency under low load. Adaptive IRQ coalescing with DIM offers a way to adjust the coalescing settings based on load. This device only supports "CQE" mode [1], where each packet resets the timer. Therefore, an interrupt is fired either when we receive coalesce_count_rx packets or when the interface is idle for coalesce_usec_rx. With this in mind, consider the following scenarios: Link saturated Here we want to set coalesce_count_rx to a large value, in order to coalesce more packets and reduce CPU load. coalesce_usec_rx should be set to at least the time for one packet. Otherwise the link will be "idle" and we will get an interrupt for each packet anyway. Bursts of packets Each burst should be coalesced into a single interrupt, although it may be prudent to reduce coalesce_count_rx for better latency. coalesce_usec_rx should be set to at least the time for one packet so bursts are coalesced. However, additional time beyond the packet time will just increase latency at the end of a burst. Sporadic packets Due to low load, we can set coalesce_count_rx to 1 in order to reduce latency to the minimum. coalesce_usec_rx does not matter in this case. Based on this analysis, I expected the CQE profiles to look something like usec = 0, pkts = 1 // Low load usec = 16, pkts = 4 usec = 16, pkts = 16 usec = 16, pkts = 64 usec = 16, pkts = 256 // High load Where usec is set to 16 to be a few us greater than the 12.3 us packet time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is instead usec = 2, pkts = 256 // Low load usec = 8, pkts = 128 usec = 16, pkts = 64 usec = 32, pkts = 64 usec = 64, pkts = 64 // High load I found this very surprising. The number of coalesced packets *decreases* as load increases. But as load increases we have more opportunities to coalesce packets without affecting latency as much. Additionally, the profile *increases* the usec as the load increases. But as load increases, the gaps between packets will tend to become smaller, making it possible to *decrease* usec for better latency at the end of a "burst". I consider the default CQE profile unsuitable for this NIC. Therefore, we use the first profile outlined in this commit instead. coalesce_usec_rx is set to 16 by default, but the user can customize it. This may be necessary if they are using jumbo frames. I think adjusting the profile times based on the link speed/mtu would be good improvement for generic DIM. In addition to the above profile problems, I noticed the following additional issues with DIM while testing: - DIM tends to "wander" when at low load, since the performance gradient is pretty flat. If you only have 10p/ms anyway then adjusting the coalescing settings will not affect throughput very much. - DIM takes a long time to adjust back to low indices when load is decreased following a period of high load. This is because it only re-evaluates its settings once every 64 interrupts. However, at low load 64 interrupts can be several seconds. Finally: performance. This patch increases receive throughput with iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from 69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from 43% to 9%. [1] Who names this stuff? Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-5-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
45 lines
1.2 KiB
Plaintext
45 lines
1.2 KiB
Plaintext
# SPDX-License-Identifier: GPL-2.0-only
|
|
#
|
|
# Xilinx device configuration
|
|
#
|
|
|
|
config NET_VENDOR_XILINX
|
|
bool "Xilinx devices"
|
|
default y
|
|
help
|
|
If you have a network (Ethernet) card belonging to this class, say Y.
|
|
|
|
Note that the answer to this question doesn't directly affect the
|
|
kernel: saying N will just cause the configurator to skip all
|
|
the questions about Xilinx devices. If you say Y, you will be asked
|
|
for your specific card in the following questions.
|
|
|
|
if NET_VENDOR_XILINX
|
|
|
|
config XILINX_EMACLITE
|
|
tristate "Xilinx 10/100 Ethernet Lite support"
|
|
depends on HAS_IOMEM
|
|
select PHYLIB
|
|
help
|
|
This driver supports the 10/100 Ethernet Lite from Xilinx.
|
|
|
|
config XILINX_AXI_EMAC
|
|
tristate "Xilinx 10/100/1000 AXI Ethernet support"
|
|
depends on HAS_IOMEM
|
|
depends on XILINX_DMA
|
|
select PHYLINK
|
|
select DIMLIB
|
|
help
|
|
This driver supports the 10/100/1000 Ethernet from Xilinx for the
|
|
AXI bus interface used in Xilinx Virtex FPGAs and Soc's.
|
|
|
|
config XILINX_LL_TEMAC
|
|
tristate "Xilinx LL TEMAC (LocalLink Tri-mode Ethernet MAC) driver"
|
|
depends on HAS_IOMEM
|
|
select PHYLIB
|
|
help
|
|
This driver supports the Xilinx 10/100/1000 LocalLink TEMAC
|
|
core used in Xilinx Spartan and Virtex FPGAs
|
|
|
|
endif # NET_VENDOR_XILINX
|