Commit Graph

580 Commits

Author SHA1 Message Date
ValdikSS
1468c1f97c igc: fix disabling L1.2 PCI-E link substate on I226 on init
Device ID comparison in igc_is_device_id_i226 is performed before
the ID is set, resulting in always failing check on init.

Before the patch:
* L1.2 is not disabled on init
* L1.2 is properly disabled after suspend-resume cycle

With the patch:
* L1.2 is properly disabled both on init and after suspend-resume

How to test:
Connect to the 1G link with 300+ mbit/s Internet speed, and run
the download speed test, such as:

    curl -o /dev/null http://speedtest.selectel.ru/1GB

Without L1.2 disabled, the speed would be no more than ~200 mbit/s.
With L1.2 disabled, the speed would reach 1 gbit/s.
Note: it's required that the latency between your host and the remote
be around 3-5 ms, the test inside LAN (<1 ms latency) won't trigger the
issue.

Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com
Fixes: 0325143b59 ("igc: disable L1.2 PCI-E link substate to avoid performance issue")
Signed-off-by: ValdikSS <iam@valdikss.org.ru>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250819222000.3504873-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-20 18:46:30 -07:00
Bjorn Helgaas
fe09560f82 net: Fix typos
Fix typos in comments and error messages.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25 10:29:07 -07:00
Song Yoong Siang
d5b97c01ce igc: Add wildcard rule support to ethtool NFC using Default Queue
Introduce support for a lowest priority wildcard (catch-all) rule in
ethtool's Network Flow Classification (NFC) for the igc driver. The
wildcard rule directs all unmatched network traffic, including traffic not
captured by Receive Side Scaling (RSS), to a specified queue. This
functionality utilizes the Default Queue feature available in I225/I226
hardware.

The implementation has been validated on Intel ADL-S systems with two
back-to-back connected I226 network interfaces.

Testing Procedure:
1. On the Device Under Test (DUT), verify the initial statistic:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 0
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 0

2. From the Link Partner, send 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

3. On the DUT, verify the packet reception on Queue 0:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 0

4. On the DUT, add a wildcard rule to route all packets to Queue 3:
   $ sudo ethtool -N enp1s0 flow-type ether queue 3

5. From the Link Partner, send another 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

6. Now, packets are routed to Queue 3 by the wildcard (Default Queue) rule:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 10

7. On the DUT, add a EtherType rule to route ARP packet to Queue 1:
   $ sudo ethtool -N enp1s0 flow-type ether proto 0x0806 queue 1

8. From the Link Partner, send another 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

9. Now, packets are routed to Queue 1 by the EtherType rule because it is
   higher priority than the wildcard (Default Queue) rule:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 10
        rx_queue_2_packets: 0
        rx_queue_3_packets: 10

10. On the DUT, delete all the NFC rules:
    $ sudo ethtool -N enp1s0 delete 63
    $ sudo ethtool -N enp1s0 delete 64

11. From the Link Partner, send another 10 ARP packets:
    $ arping -c 10 -I enp170s0 169.254.1.2

12. Now, packets are routed to Queue 0 because the value of Default Queue
    is reset back to 0:
    $ ethtool -S enp1s0 | grep rx_q.*packets
         rx_queue_0_packets: 20
         rx_queue_1_packets: 10
         rx_queue_2_packets: 0
         rx_queue_3_packets: 10

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Co-developed-by: Blanco Alcaine Hector <hector.blanco.alcaine@intel.com>
Signed-off-by: Blanco Alcaine Hector <hector.blanco.alcaine@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18 09:02:28 -07:00
Song Yoong Siang
bdfaa8d70d igc: Relocate RSS field definitions to igc_defines.h
Move the RSS field definitions related to IPv4 and IPv6 UDP from igc.h to
igc_defines.h to consolidate the RSS field definitions in a single header
file, improving code organization and maintainability.

This refactoring does not alter the functionality of the driver but
enhances the logical grouping of related constants

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-18 09:02:28 -07:00
Jakub Kicinski
4b62261def Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-03

Vladimir Oltean converts Intel drivers (ice, igc, igb, ixgbe, i40e) to
utilize new timestamping API (ndo_hwtstamp_get() and ndo_hwtstamp_set()).

For ixgbe:
Paul, Don, Slawomir, and Radoslaw add Malicious Driver Detection (MDD)
support for X550 and E610 devices to detect, report, and handle
potentially malicious VFs.

Simon Horman corrects spelling mistakes.

For igbvf:
Kohei Enju removes a couple of unreported counters and adds reporting
of Tx timeouts.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igbvf: add tx_timeout_count to ethtool statistics
  igbvf: remove unused interrupt counter fields from struct igbvf_adapter
  ixgbe: spelling corrections
  ixgbe: turn off MDD while modifying SRRCTL
  ixgbe: add Tx hang detection unhandled MDD
  ixgbe: check for MDD events
  ixgbe: add MDD support
  i40e: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
  ixgbe: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
  igb: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
  igc: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
  ice: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
====================

Link: https://patch.msgid.link/20250703174242.3829277-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07 19:06:13 -07:00
Paolo Abeni
6b9fd8857b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc5).

No conflicts.

No adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-04 08:03:18 +02:00
Vladimir Oltean
033d0bcf4a igc: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
New timestamping API was introduced in commit 66f7223039 ("net: add
NDOs for configuring hardware timestamping") from kernel v6.6.

It is time to convert the Intel igc driver to the new API, so that
timestamping configuration can be removed from the ndo_eth_ioctl() path
completely.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Milena Olech <milena.olech@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-03 09:38:33 -07:00
Vitaly Lifshits
0325143b59 igc: disable L1.2 PCI-E link substate to avoid performance issue
I226 devices advertise support for the PCI-E link L1.2 substate. However,
due to a hardware limitation, the exit latency from this low-power state
is longer than the packet buffer can tolerate under high traffic
conditions. This can lead to packet loss and degraded performance.

To mitigate this, disable the L1.2 substate. The increased power draw
between L1.1 and L1.2 is insignificant.

Fixes: 4354621173 ("igc: Add new device ID's")
Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-01 08:26:08 -07:00
Colin Ian King
deb21a6e5b igc: Make the const read-only array supported_sizes static
Don't populate the const read-only array supported_sizes on the
stack at run time, instead make it static.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>>
Link: https://patch.msgid.link/20250618135408.1784120-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19 15:39:21 -07:00
Paolo Abeni
9149a63288 Merge branch 'intel-next-queue-1GbE'
Tony Nguyen says:

====================
Faizal Rahim says:

MAC Merge support for frame preemption was previously added for igc:
https://lore.kernel.org/netdev/20250418163822.3519810-1-anthony.l.nguyen@intel.com/

This series builds on that work and adds support for:
- Harmonizing taprio and mqprio queue priority behavior, based on past
  discussions and suggestions:
  https://lore.kernel.org/all/20250214102206.25dqgut5tbak2rkz@skbuf/
- Enabling preemptible queue support for both taprio and mqprio, with
  priority harmonization as a prerequisite.

Patch organization:
- Patches 1-3: Preparation work for patches 6 and 7
- Patches 4-5: Queue priority harmonization
- Patches 6-7: Add preemptible queue support
====================

Link: https://patch.msgid.link/20250611180314.2059166-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-17 15:04:05 +02:00
Jakub Kicinski
575d1b28d2 eth: igc: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250614180907.4167714-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16 18:14:52 -07:00
Faizal Rahim
a7d45bcfde igc: add preemptible queue support in mqprio
igc already supports enabling MAC Merge for FPE. This patch adds
support for preemptible queues in mqprio.

Tested preemption with mqprio by:
1. Enable FPE:
   ethtool --set-mm enp1s0 pmac-enabled on tx-enabled on verify-enabled on
2. Enable preemptible queue in mqprio:
   mqprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 \
   queues 1@0 1@1 1@2 1@3 \
   fp P P P E

Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 10:06:24 -07:00
Faizal Rahim
17643482e9 igc: add preemptible queue support in taprio
Changes:
1. Introduce tx_enabled flag to control preemptible queue. tx_enabled
   is set via mmsv module based on multiple factors, including link
   up/down status, to determine if FPE is active or inactive.
2. Add priority field to TXDCTL for express queue to improve data
   fetch performance.
3. Block preemptible queue setup in taprio unless reverse-tsn-txq-prio
   private flag is set. Encourages adoption of standard queue priority
   scheme for new features.
4. Hardware-padded frames from preemptible queues result in incorrect
   mCRC values, as padding bytes are excluded from the computation. Pad
   frames to at least 60 bytes using skb_padto() before transmission to
   ensure the hardware includes padding in the mCRC calculation.

Tested preemption with taprio by:
1. Enable FPE:
   ethtool --set-mm enp1s0 pmac-enabled on tx-enabled on verify-enabled on
2. Enable private flag to reverse TX queue priority:
   ethtool --set-priv-flags enp1s0 reverse-txq-prio on
3. Enable preemptible queue in taprio:
   taprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 \
   queues 1@0 1@1 1@2 1@3 \
   fp P P P E

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Co-developed-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 10:06:24 -07:00
Faizal Rahim
e395f6a690 igc: add private flag to reverse TX queue priority in TSN mode
By default, igc assigns TX hw queue 0 the highest priority and queue 3
the lowest. This is opposite of most NICs, where TX hw queue 3 has the
highest priority and queue 0 the lowest.

mqprio in igc already uses TX arbitration unconditionally to reverse TX
queue priority when mqprio is enabled. The TX arbitration logic does not
require a private flag, because mqprio was added recently and no known
users depend on the default queue ordering, which differs from the typical
convention.

taprio does not use TX arbitration, so it inherits the default igc TX
queue priority order. This causes tc command inconsistencies when
configuring frame preemption with taprio compared to mqprio in igc.
Other tc command inconsistencies and configuration issues already exist
when using taprio on igc compared to other network controllers. These
issues are described in a later section.

To harmonize TX queue priority behavior between taprio and mqprio, and
to fix these issues without breaking long-standing taprio use cases,
this patch adds a new private flag, called reverse-tsn-txq-prio, to
reverse the TX queue priority. It makes queue 3 the highest and queue 0
the lowest, reusing the TX arbitration logic already used by mqprio.

Users must set the private flag when enabling frame preemption with
taprio to follow the standard convention. Doing so promotes adoption of
the correct priority model for new features while preserving
compatibility with legacy configurations.

This new private flag addresses:

1.  Non-standard socket -> tc -> TX hw queue mapping for taprio in igc

Without the private flag:
- taprio maps (socket -> tc -> TX hardware queue) differently on igc
  compared to other network controllers
- On igc, mqprio maps tc differently from taprio, since mqprio already
  uses TX arbitration

The following examples compare taprio configuration on igc and other
network controllers:
a)  On other NICs (TX hw queue 3 is highest priority):
    taprio num_tc 4 map 0 1 2 3 .... \
    queues 1@0 1@1 1@2 1@3

    Mapping translates to:
    socket 0 -> tc 0 -> queue 0
    socket 3 -> tc 3 -> queue 3

    This is the normal mapping that respects the standard convention:
    higher socket number -> higher tc -> higher priority TX hw queue

b)  On igc (TX hw queue 0 is highest priority by default):
    taprio num_tc 4 map 3 2 1 0 .... \
    queues 1@0 1@1 1@2 1@3

    Mapping translates to:
    socket 0 -> tc 3 -> queue 3
    socket 3 -> tc 0 -> queue 0

    This igc tc mapping example is based on Intel's TSN validation test
    case, where a higher socket priority maps to a higher priority queue.
    It respects the mapping:
      higher socket number -> higher priority TX hw queue
    but breaks the expected ordering:
      higher tc -> higher priority TX hw queue
    as defined in [Ref1]. This custom mapping complicates common taprio
    setup across NICs.

2.  Non-standard frame preemption mapping for taprio in igc

Without the private flag:
- Compared to other network controllers, taprio on igc must flip the
  expected fp sequence, since express traffic is expected to map to the
  highest priority queue and preemptible traffic to lower ones
- On igc, frame preemption configuration for mqprio differs from taprio,
  since mqprio already uses TX arbitration

The following examples compare taprio frame preemption configuration on
igc and other network controllers:
a)  On other NICs (TX hw queue 3 is highest priority):
    taprio num_tc 4 map ..... \
    queues 1@0 1@1 1@2 1@3 \
    fp P P P E

    Mapping translates to:
    tc0, tc1, tc2 -> preemptible -> queue 0, 1, 2
    tc3           -> express     -> queue 3

    This is the normal mapping that respects the standard convention:
    higher tc -> express traffic -> higher priority TX hw queue
    lower tc  -> preemptible traffic -> lower priority TX hw queue

b)  On igc (TX hw queue 0 is highest priority by default):
    taprio num_tc 4 map ...... \
    queues 1@0 1@1 1@2 1@3 \
    fp E P P P

    Mapping translates to:
    tc0           -> express     -> queue 0
    tc1, tc2, tc3 -> preemptible -> queue 1, 2, 3

    This inversion respects the mapping of:
      express traffic -> higher priority TX hw queue
    but breaks the expected ordering:
      higher tc -> express traffic
    as defined in [Ref1] where higher tc indicates higher priority. In
    this case, the lower tc0 is assigned to express traffic. This custom
    mapping further complicates common preemption setup across NICs.

Tests were performed on taprio with the following combinations, where
two apps send traffic simultaneously on different queues:

  Private Flag   Traffic Sent By           Traffic Sent By
  ----------------------------------------------------------------
  enabled        iperf3 (queue 3)          iperf3 (queue 0)
  disabled       iperf3 (queue 0)          iperf3 (queue 3)
  enabled        iperf3 (queue 3)          real-time app (queue 0)
  disabled       iperf3 (queue 0)          real-time app (queue 3)
  enabled        real-time app (queue 3)   iperf3 (queue 0)
  disabled       real-time app (queue 0)   iperf3 (queue 3)
  enabled        real-time app (queue 3)   real-time app (queue 0)
  disabled       real-time app (queue 0)   real-time app (queue 3)

Private flag is controlled with:
 ethtool --set-priv-flags enp1s0 reverse-tsn-txq-prio <on|off>

[Ref1]
IEEE 802.1Q clause 8.6.8 Transmission selection:
"For a given Port and traffic class, frames are selected from the
corresponding queue for transmission if and only if:
...
b) For each queue corresponding to a numerically higher value of traffic
class supported by the Port, the operation of the transmission selection
algorithm supported by that queue determines that there is no frame
available for transmission."

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 10:06:24 -07:00
Faizal Rahim
650a2fe795 igc: assign highest TX queue number as highest priority in mqprio
Previously, TX arbitration prioritized queues based on the TC they were
mapped to. A queue mapped to TC 3 had higher priority than one mapped to
TC 0.

To improve code reuse for upcoming patches and align with typical NIC
behavior, this patch updates the logic to prioritize higher queue numbers
when mqprio is used. As a result, queue 0 becomes the lowest priority and
queue 3 becomes the highest.

This patch also introduces igc_tsn_is_tc_to_queue_priority_ordered() to
preserve the original TC-based priority rule and reject configurations
where a higher TC maps to a lower queue offset.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 10:06:22 -07:00
Faizal Rahim
e35ba6d3c6 igc: refactor TXDCTL macros to use FIELD_PREP and GEN_MASK
Refactor TXDCTL macro handling to use FIELD_PREP and GENMASK macros.
This prepares the code for adding a new TXDCTL priority field in an
upcoming patch.

Verified that the macro values remain unchanged before and after
refactoring.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 08:56:48 -07:00
Faizal Rahim
4cdb4ef8a9 igc: add DCTL prefix to related macros
Rename macros to use the DCTL prefix for consistency with existing
macros that reference the same register. This prepares for an upcoming
patch that adds new fields to TXDCTL.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 08:55:46 -07:00
Faizal Rahim
fe4d9e8394 igc: move TXDCTL and RXDCTL related macros
Move and consolidate TXDCTL and RXDCTL macros in preparation for
upcoming TXDCTL changes. This improves organization and readability.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-11 08:52:14 -07:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Jakub Kicinski
08906eacce Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf)

For igb:
Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and
adds persistent NAPI config. Lastly, he removes undesired IRQs that
occur while busy polling.

For igc:
Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the
current implementation with TAPRIO.

For ixgbe:
Jedrzej adds separate ethtool ops for E610 devices to account for device
differences.

Slawomir adds devlink region support for E610 devices.

For idpf:
Mateusz assigns and utilizes the ptype field out of libeth_rqe_info.

Michal removes unreachable code.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: remove unreachable code from setting mailbox
  idpf: assign extracted ptype to struct libeth_rqe_info field
  ixgbe: devlink: add devlink region support for E610
  ixgbe: add E610 .set_phys_id() callback implementation
  ixgbe: apply different rules for setting FC on E610
  ixgbe: add support for ACPI WOL for E610
  ixgbe: create E610 specific ethtool_ops structure
  igc: Change Tx mode for MQPRIO offloading
  igc: Limit netdev_tc calls to MQPRIO
  igb: Get rid of spurious interrupts
  igb: Add support for persistent NAPI config
  igb: Link queues to NAPI instances
  igb: Link IRQs to NAPI instances
====================

Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01 17:51:31 -07:00
Jakub Kicinski
337079d31f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc5).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01 15:11:38 -07:00
Kurt Kanzenbach
876863c3fc igc: Change Tx mode for MQPRIO offloading
The current MQPRIO offload implementation uses the legacy TSN Tx mode. In
this mode the hardware uses four packet buffers and considers queue
priorities.

In order to harmonize the TAPRIO implementation with MQPRIO, switch to the
regular TSN Tx mode. This mode also uses four packet buffers and considers
queue priorities. In addition to the legacy mode, transmission is always
coupled to Qbv. The driver already has mechanisms to use a dummy schedule
of 1 second with all gates open for ETF. Simply use this for MQPRIO too.

This reduces code and makes it easier to add support for frame preemption
later.

Tested on i225 with real time application using high priority queue, iperf3
using low priority queue and network TAP device.

Acked-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29 15:13:43 -07:00
Kurt Kanzenbach
68f37f26b0 igc: Limit netdev_tc calls to MQPRIO
Limit netdev_tc calls to MQPRIO. Currently these calls are made in
igc_tsn_enable_offload() and igc_tsn_disable_offload() which are used by
TAPRIO and ETF as well. However, these are only required for MQPRIO.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29 15:13:43 -07:00
Jacob Keller
c7d6cb96d5 igc: fix lock order in igc_ptp_reset
Commit 1a931c4f5e ("igc: add lock preventing multiple simultaneous PTM
transactions") added a new mutex to protect concurrent PTM transactions.
This lock is acquired in igc_ptp_reset() in order to ensure the PTM
registers are properly disabled after a device reset.

The flow where the lock is acquired already holds a spinlock, so acquiring
a mutex leads to a sleep-while-locking bug, reported both by smatch,
and the kernel test robot.

The critical section in igc_ptp_reset() does correctly use the
readx_poll_timeout_atomic variants, but the standard PTM flow uses regular
sleeping variants. This makes converting the mutex to a spinlock a bit
tricky.

Instead, re-order the locking in igc_ptp_reset. Acquire the mutex first,
and then the tmreg_lock spinlock. This is safe because there is no other
ordering dependency on these locks, as this is the only place where both
locks were acquired simultaneously. Indeed, any other flow acquiring locks
in that order would be wrong regardless.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 1a931c4f5e ("igc: add lock preventing multiple simultaneous PTM transactions")
Link: https://lore.kernel.org/intel-wired-lan/Z_-P-Hc1yxcw0lTB@stanley.mountain/
Link: https://lore.kernel.org/intel-wired-lan/202504211511.f7738f5d-lkp@intel.com/T/#u
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29 14:28:40 -07:00
Faizal Rahim
f05ce73cc3 igc: add support to get frame preemption statistics via ethtool
Implemented "ethtool --include-statistics --show-mm" callback for IGC.

Tested preemption scenario to check preemption statistics:
1) Trigger verification handshake on both boards:
    $ sudo ethtool --set-mm enp1s0 pmac-enabled on
    $ sudo ethtool --set-mm enp1s0 tx-enabled on
    $ sudo ethtool --set-mm enp1s0 verify-enabled on
2) Set preemptible or express queue in taprio for tx board:
    $ sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
      num_tc 4 map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \
      queues 1@0 1@1 1@2 1@3 base-time 0 sched-entry S F 100000 \
      fp E E P P
3) Send large size packets on preemptible queue
4) Send small size packets on express queue to preempt packets in
   preemptible queue
5) Show preemption statistics on the receiving board:
   $ ethtool --include-statistics --show-mm enp1s0
     MAC Merge layer state for enp1s0:
     pMAC enabled: on
     TX enabled: on
     TX active: on
     TX minimum fragment size: 64
     RX minimum fragment size: 60
     Verify enabled: on
     Verify time: 128
     Max verify time: 128
     Verification status: SUCCEEDED
     Statistics:
      MACMergeFrameAssErrorCount: 0
      MACMergeFrameSmdErrorCount: 0
      MACMergeFrameAssOkCount: 511
      MACMergeFragCountRx: 764
      MACMergeFragCountTx: 0
      MACMergeHoldCount: 0

Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Co-developed-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 09:17:14 -07:00
Faizal Rahim
10e2ffe10e igc: add support to get MAC Merge data via ethtool
Implement "ethtool --show-mm" callback for IGC.

Tested with command:
$ ethtool --show-mm enp1s0.
  MAC Merge layer state for enp1s0:
  pMAC enabled: on
  TX enabled: on
  TX active: on
  TX minimum fragment size: 64
  RX minimum fragment size: 60
  Verify enabled: on
  Verify time: 128
  Max verify time: 128
  Verification status: SUCCEEDED

Verified that the fields value are retrieved correctly.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 09:17:14 -07:00
Faizal Rahim
e9074d7f37 igc: block setting preemptible traffic class in taprio
Since preemptible tc implementation is not ready yet, block it from being
set in taprio. The existing code already blocks it in mqprio.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 09:17:14 -07:00
Faizal Rahim
55ececab98 igc: add support to set tx-min-frag-size
Add support for setting tx-min-frag-size via the set_mm callback in igc.
If the requested value is unsupported, round it up to the smallest
supported i226 size (64, 128, 192, 256) and send a netlink message to
inform the user.

Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 09:17:14 -07:00
Faizal Rahim
5422570c00 igc: add support for frame preemption verification
This patch implements the "ethtool --set-mm" callback to trigger the
frame preemption verification handshake.

Uses the MAC Merge Software Verification (mmsv) mechanism in ethtool
to perform the verification handshake for igc.
The structure fpe.mmsv is set by mmsv in ethtool and should remain
read-only for the driver.

Other mmsv callbacks:
a) configure_tx() -> not used yet at this point
   - igc lacks registers to configure FPE in the transmit direction, so
     this API is not utilized for now. When igc supports preemptible queue,
     driver will use this API to manage its configuration.

b) configure_pmac() -> not used
   - this callback dynamically controls pmac_enabled at runtime. For
     example, mmsv calls configure_pmac() and disables pmac_enabled when
     the link partner goes down, even if the user previously enabled it.
     The intention is to save power but it is not feasible in igc
     because it causes an endless adapter reset loop:

   1) Board A and Board B complete the verification handshake. Tx mode
      register for both boards are in TSN mode.
   2) Board B link goes down.

   On Board A:
   3) mmsv calls configure_pmac() with pmac_enabled = false.
   4) configure_pmac() in igc updates a new field based on pmac_enabled.
      Driver uses this field in igc_tsn_new_flags() to indicate that the
      user enabled/disabled FPE.
   5) configure_pmac() in igc calls igc_tsn_offload_apply() to check
      whether an adapter reset is needed. Calls existing logic in
      igc_tsn_will_tx_mode_change() and igc_tsn_new_flags().
   6) Since pmac_enabled is now disabled and no other TSN feature is
      active, igc_tsn_will_tx_mode_change() evaluates to true because Tx
      mode will switch from TSN to Legacy.
   7) Driver resets the adapter.
   8) Registers are set, and Tx mode switches to Legacy.
   9) When link partner is up, steps 3-8 repeat, but this time with
      pmac_enabled = true, reactivating TSN.
      igc_tsn_will_tx_mode_change() evaluates to true again, since Tx
      mode will switch from Legacy to TSN.
  10) Driver resets the adapter.
  11) Adapter reset completes, registers are set, and Tx mode switches to
      TSN.

  On Board B:
  12) Adapter reset on Board A at step 10 causes it to detect its link
      partner as down.
  13) Repeats steps 3-8.
  14) Once reset adapter on Board A is completed at step 11, it detects
      its link partner as up.
  15) Repeats steps 9-11.

   - this cycle repeats indefinitely. To avoid this issue, igc only uses
     mmsv.pmac_enabled to track whether FPE is enabled or disabled.

Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Co-developed-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Co-developed-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 09:16:58 -07:00
Faizal Rahim
7663370e32 igc: set the RX packet buffer size for TSN mode
In preparation for supporting frame preemption, when entering TSN mode,
set the receive packet buffer to 15KB for the Express MAC, 15KB for
the Preemptible MAC and 2KB for the BMC.

References:
I225/I226 SW User Manual, Section 4.7.9, Section 7.1.3.2, Section 8.3.1

The newly introduced macros follow the naming from the i226 SW User Manual
for easy reference.

Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 08:43:10 -07:00
Faizal Rahim
9cd87aafc7 igc: use FIELD_PREP and GENMASK for existing RX packet buffer size
Prepare for an upcoming patch that modifies the RX buffer size in TSN mode.
Refactor IGC_RXPBSIZE_EXP_BMC_DEFAULT and IGC_RXPBS_CFG_TS_EN using
FIELD_PREP and GENMASK to improve clarity and maintainability. Refactor
both macros for consistency, even though the upcoming patch only use
IGC_RXPBSIZE_EXP_BMC_DEFAULT.

The newly introduced macros follow the naming from the i226 SW User Manual
for easy reference.

I've tested IGC_RXPBSIZE_EXP_BMC_DEFAULT and IGC_RXPBS_CFG_TS_EN before
and after the refactoring, and their values remain unchanged.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 08:43:10 -07:00
Faizal Rahim
0d58cdc902 igc: optimize TX packet buffer utilization for TSN mode
In preparation for upcoming frame preemption patches, optimize the TX
packet buffer size. The total packet buffer size (RX + TX) is 64KB, with
a maximum of 34KB for either RX or TX. Split the buffer evenly,
allocating 32KB to each.

For TX, assign 7KB to each of the four TX packet buffers (total 28KB)
and reserve 4KB for BMC.

References:
I225/I226 SW User Manual Section 4.7.9, Section 8.3.2

Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 08:43:10 -07:00
Faizal Rahim
425d8d9cb0 igc: use FIELD_PREP and GENMASK for existing TX packet buffer size
In preparation for an upcoming patch that will modify the TX buffer size
in  TSN mode, replace IGC_TXPBSIZE_TSN and IGC_TXPBSIZE_DEFAULT
implementation with new macros that utilizes FIELD_PREP and GENMASK for
clarity.

The newly introduced macros follow the naming from the i226 SW User Manual
for easy reference.

I've tested IGC_TXPBSIZE_TSN and IGC_TXPBSIZE_DEFAULT before and after the
refactoring, and their values remain unchanged.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 08:43:10 -07:00
Faizal Rahim
67287d67be igc: rename I225_RXPBSIZE_DEFAULT and I225_TXPBSIZE_DEFAULT
Rename RX and TX packet buffer size macros in preparation for an
upcoming patch that will refactor buffer size handling using FIELD_PREP
and GENMASK.

Changes:
- Rename I225_RXPBSIZE_DEFAULT to IGC_RXPBSIZE_EXP_BMC_DEFAULT.
  The EXP_BMC suffix explicitly indicates Express and BMC buffer
  default values, improving readability and reusability for the
  upcoming changes, while also better reflecting the current buffer
  allocations.
- Rename I225_TXPBSIZE_DEFAULT to IGC_TXPBSIZE_DEFAULT.

These registers apply to both i225 and i226, so using the IGC prefix
aligns with existing macro naming conventions.

Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 08:43:10 -07:00
Faizal Rahim
19d629079c igc: rename xdp_get_tx_ring() for non-xdp usage
Renamed xdp_get_tx_ring() function to a more generic name for use in
upcoming frame preemption patches.

Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-18 08:43:10 -07:00
Jakub Kicinski
240ce924d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc3).

No conflicts. Adjacent changes:

tools/net/ynl/pyynl/ynl_gen_c.py
  4d07bbf2d4 ("tools: ynl-gen: don't declare loop iterator in place")
  7e8ba0c7de ("tools: ynl: don't use genlmsghdr in classic netlink")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17 12:26:50 -07:00
Jacob Keller
d9f3e9ecc4 net: ptp: introduce .supported_perout_flags to ptp_clock_info
The PTP_PEROUT_REQUEST2 ioctl has gained support for flags specifying
specific output behavior including PTP_PEROUT_ONE_SHOT,
PTP_PEROUT_DUTY_CYCLE, PTP_PEROUT_PHASE.

Driver authors are notorious for not checking the flags of the request.
This results in misinterpreting the request, generating an output signal
that does not match the requested value. It is anticipated that even more
flags will be added in the future, resulting in even more broken requests.

Expecting these issues to be caught during review or playing whack-a-mole
after the fact is not a great solution.

Instead, introduce the supported_perout_flags field in the ptp_clock_info
structure. Update the core character device logic to explicitly reject any
request which has a flag not on this list.

This ensures that drivers must 'opt in' to the flags they support. Drivers
which don't set the .supported_perout_flags field will not need to check
that unsupported flags aren't passed, as the core takes care of this.

Update the drivers which do support flags to set this new field.

Note the following driver files set n_per_out to a non-zero value but did
not check the flags at all:

 • drivers/ptp/ptp_clockmatrix.c
 • drivers/ptp/ptp_idt82p33.c
 • drivers/ptp/ptp_fc3.c
 • drivers/net/ethernet/ti/am65-cpts.c
 • drivers/net/ethernet/aquantia/atlantic/aq_ptp.c
 • drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
 • drivers/net/dsa/sja1105/sja1105_ptp.c
 • drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
 • drivers/net/ethernet/mscc/ocelot_vsc7514.c
 • drivers/net/ethernet/intel/i40e/i40e_ptp.c

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-2-f6b17d15475c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 20:20:58 -07:00
Jacob Keller
7c571ac57d net: ptp: introduce .supported_extts_flags to ptp_clock_info
The PTP_EXTTS_REQUEST(2) ioctl has a flags field which specifies how the
external timestamp request should behave. This includes which edge of the
signal to timestamp, as well as a specialized "offset" mode. It is expected
that more flags will be added in the future.

Driver authors routinely do not check the flags, often accepting requests
with flags which they do not support. Even drivers which do check flags may
not be future-proofed to reject flags not yet defined. Thus, any future
flag additions often require manually updating drivers to reject these
flags.

This approach of hoping we catch flag checks during review, or playing
whack-a-mole after the fact is the wrong approach.

Introduce the "supported_extts_flags" field to the ptp_clock_info
structure. This field defines the set of flags the device actually
supports.

Update the core character device logic to check this field and reject
unsupported requests. Getting this right is somewhat tricky. First, to
avoid unnecessary repetition and make basic functionality work when
.supported_extts_flags is 0, the core always accepts the PTP_ENABLE_FEATURE
flag. This flag is used to set the 'on' parameter to the .enable function
and is thus always 'supported' by all drivers.

For backwards compatibility, the PTP_RISING_EDGE and PTP_FALLING_EDGE flags
are merely "hints" when using the old PTP_EXTTS_REQUEST ioctl, and are not
expected to be enforced. If the user issues PTP_EXTTS_REQUEST2, the
PTP_STRICT_FLAGS flag is added which is supposed to inform the driver to
strictly validate the flags and reject unsupported requests. To handle
this, first check if the driver reports PTP_STRICT_FLAGS support. If it
does not, then always allow the PTP_RISING_EDGE and PTP_FALLING_EDGE flags.
This keeps backwards compatibility with the original PTP_EXTTS_REQUEST
ioctl where these flags are not guaranteed to be honored.

This way, drivers which do not set the supported_extts_flags will continue
to accept requests for the original PTP_EXTTS_REQUEST ioctl. The core will
automatically reject requests with new flags, and correctly reject requests
with PTP_STRICT_FLAGS, where the driver is supposed to strictly validate
the flags.

Update the various drivers, refactoring their validation logic into the
.supported_extts_flags field. For consistency and readability,
PTP_ENABLE_FEATURE is not set in the supported flags list, and
PTP_EXTTS_EDGES is expanded to PTP_RISING_EDGE | PTP_FALLING_EDGE in all
cases.

Note the following driver files set n_ext_ts to a non-zero value but did
not check flags at all:

 • drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
 • drivers/net/ethernet/freescale/enetc/enetc_ptp.c
 • drivers/net/ethernet/intel/i40e/i40e_ptp.c
 • drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c
 • drivers/net/ethernet/renesas/ravb_ptp.c
 • drivers/net/ethernet/renesas/rtsn.c
 • drivers/net/ethernet/renesas/rtsn.h
 • drivers/net/ethernet/ti/am65-cpts.c
 • drivers/net/ethernet/ti/cpts.h
 • drivers/net/ethernet/ti/icssg/icss_iep.c
 • drivers/net/ethernet/xscale/ptp_ixp46x.c
 • drivers/net/phy/bcm-phy-ptp.c
 • drivers/ptp/ptp_ocp.c
 • drivers/ptp/ptp_pch.c
 • drivers/ptp/ptp_qoriq.c

These drivers behavior does change slightly: they will now reject the
PTP_EXTTS_REQUEST2 ioctl, because they do not strictly validate their
flags. This also makes them no longer incorrectly accept PTP_EXT_OFFSET.

Also note that the renesas ravb driver does not support PTP_STRICT_FLAGS.
We could leave the .supported_extts_flags as 0, but I added the
PTP_RISING_EDGE | PTP_FALLING_EDGE since the driver previously manually
validated these flags. This is equivalent to 0 because the core will allow
these flags regardless unless PTP_STRICT_FLAGS is also set.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-1-f6b17d15475c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 20:20:57 -07:00
Rui Salvaterra
f9c961efb0 igc: enable HW vlan tag insertion/stripping by default
This is enabled by default in other Intel drivers I've checked (e1000, e1000e,
iavf, igb and ice). Fixes an out-of-the-box performance issue when running
OpenWrt on typical mini-PCs with igc-supported Ethernet controllers and 802.1Q
VLAN configurations, as ethtool isn't part of the default packages and sane
defaults are expected.

In my specific case, with an Intel N100-based machine with four I226-V Ethernet
controllers, my upload performance increased from under 30 Mb/s to the expected
~1 Gb/s.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 11:58:58 -07:00
Christopher S M Hall
1a931c4f5e igc: add lock preventing multiple simultaneous PTM transactions
Add a mutex around the PTM transaction to prevent multiple transactors

Multiple processes try to initiate a PTM transaction, one or all may
fail. This can be reproduced by running two instances of the
following:

$ sudo phc2sys -O 0 -i tsn0 -m

PHC2SYS exits with:

"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
 fails

Note: Normally two instance of PHC2SYS will not run, but one process
 should not break another.

Fixes: a90ec84837 ("igc: Add support for PTP getcrosststamp()")
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
Reviewed-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 09:13:28 -07:00
Christopher S M Hall
1f025759ba igc: cleanup PTP module if probe fails
Make sure that the PTP module is cleaned up if the igc_probe() fails by
calling igc_ptp_stop() on exit.

Fixes: d89f88419f ("igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller support")
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
Reviewed-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 09:13:28 -07:00
Christopher S M Hall
26a3910afd igc: handle the IGC_PTP_ENABLED flag correctly
All functions in igc_ptp.c called from igc_main.c should check the
IGC_PTP_ENABLED flag. Adding check for this flag to stop and reset
functions.

Fixes: 5f2958052c ("igc: Add basic skeleton for PTP")
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
Reviewed-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 09:13:28 -07:00
Christopher S M Hall
cd7f7328d6 igc: move ktime snapshot into PTM retry loop
Move ktime_get_snapshot() into the loop. If a retry does occur, a more
recent snapshot will result in a more accurate cross-timestamp.

Fixes: a90ec84837 ("igc: Add support for PTP getcrosststamp()")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
Reviewed-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 09:13:28 -07:00
Christopher S M Hall
714cd033da igc: increase wait time before retrying PTM
The i225/i226 hardware retries if it receives an inappropriate response
from the upstream device. If the device retries too quickly, the root
port does not respond.

The wait between attempts was reduced from 10us to 1us in commit
6b8aa753a9 ("igc: Decrease PTM short interval from 10 us to 1 us"), which
said:

  With the 10us interval, we were seeing PTM transactions take around
  12us. Hardware team suggested this interval could be lowered to 1us
  which was confirmed with PCIe sniffer. With the 1us interval, PTM
  dialogs took around 2us.

While a 1us short cycle time was thought to be theoretically sufficient, it
turns out in practice it is not quite long enough. It is unclear if the
problem is in the root port or an issue in i225/i226.

Increase the wait from 1us to 4us. Increasing to 2us appeared to work in
practice on the setups we have available. A value of 4us was chosen due to
the limited hardware available for testing, with a goal of ensuring we wait
long enough without overly penalizing the response time when unnecessary.

The issue can be reproduced with the following:

$ sudo phc2sys -R 1000 -O 0 -i tsn0 -m

Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
quickly reproduce the issue.

PHC2SYS exits with:

"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
  fails

Fixes: 6b8aa753a9 ("igc: Decrease PTM short interval from 10 us to 1 us")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
Reviewed-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 09:13:28 -07:00
Christopher S M Hall
8e404ad95d igc: fix PTM cycle trigger logic
Writing to clear the PTM status 'valid' bit while the PTM cycle is
triggered results in unreliable PTM operation. To fix this, clear the
PTM 'trigger' and status after each PTM transaction.

The issue can be reproduced with the following:

$ sudo phc2sys -R 1000 -O 0 -i tsn0 -m

Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
quickly reproduce the issue.

PHC2SYS exits with:

"ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
  fails

This patch also fixes a hang in igc_probe() when loading the igc
driver in the kdump kernel on systems supporting PTM.

The igc driver running in the base kernel enables PTM trigger in
igc_probe().  Therefore the driver is always in PTM trigger mode,
except in brief periods when manually triggering a PTM cycle.

When a crash occurs, the NIC is reset while PTM trigger is enabled.
Due to a hardware problem, the NIC is subsequently in a bad busmaster
state and doesn't handle register reads/writes.  When running
igc_probe() in the kdump kernel, the first register access to a NIC
register hangs driver probing and ultimately breaks kdump.

With this patch, igc has PTM trigger disabled most of the time,
and the trigger is only enabled for very brief (10 - 100 us) periods
when manually triggering a PTM cycle.  Chances that a crash occurs
during a PTM trigger are not 0, but extremely reduced.

Fixes: a90ec84837 ("igc: Add support for PTP getcrosststamp()")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
Reviewed-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Corinna Vinschen <vinschen@redhat.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11 09:13:11 -07:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Zdenek Bouska
d931cf9b38 igc: Fix TX drops in XDP ZC
Fixes TX frame drops in AF_XDP zero copy mode when budget < 4.
xsk_tx_peek_desc() consumed TX frame and it was ignored because of
low budget. Not even AF_XDP completion was done for dropped frames.

It can be reproduced on i226 by sending 100000x 60 B frames with
launch time set to minimal IPG (672 ns between starts of frames)
on 1Gbit/s. Always 1026 frames are not sent and are missing a
completion.

Fixes: 9acf59a752 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Zdenek Bouska <zdenek.bouska@siemens.com>
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-02 08:51:41 -07:00
Joe Damato
dddeeaa16c igc: Fix XSK queue NAPI ID mapping
In commit b65969856d ("igc: Link queues to NAPI instances"), the XSK
queues were incorrectly unmapped from their NAPI instances. After
discussion on the mailing list and the introduction of a test to codify
the expected behavior, we can see that the unmapping causes the
check_xsk test to fail:

NETIF=enp86s0 ./tools/testing/selftests/drivers/net/queues.py

[...]
  # Check|     ksft_eq(q.get('xsk', None), {},
  # Check failed None != {} xsk attr on queue we configured
  not ok 4 queues.check_xsk

After this commit, the test passes:

  ok 4 queues.check_xsk

Note that the test itself is only in net-next, so I tested this change
by applying it to my local net-next tree, booting, and running the test.

Cc: stable@vger.kernel.org
Fixes: b65969856d ("igc: Link queues to NAPI instances")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-02 08:51:41 -07:00
Linus Torvalds
1a9239bb42 Networking changes for 6.15.
Core & protocols
 ----------------
 
  - Continue Netlink conversions to per-namespace RTNL lock
    (IPv4 routing, routing rules, routing next hops, ARP ioctls).
 
  - Continue extending the use of netdev instance locks. As a driver
    opt-in protect queue operations and (in due course) ethtool
    operations with the instance lock and not RTNL lock.
 
  - Support collecting TCP timestamps (data submitted, sent, acked)
    in BPF, allowing for transparent (to the application) and lower
    overhead tracking of TCP RPC performance.
 
  - Tweak existing networking Rx zero-copy infra to support zero-copy
    Rx via io_uring.
 
  - Optimize MPTCP performance in single subflow mode by 29%.
 
  - Enable GRO on packets which went thru XDP CPU redirect (were queued
    for processing on a different CPU). Improving TCP stream performance
    up to 2x.
 
  - Improve performance of contended connect() by 200% by searching
    for an available 4-tuple under RCU rather than a spin lock.
    Bring an additional 229% improvement by tweaking hash distribution.
 
  - Avoid unconditionally touching sk_tsflags on RX, improving
    performance under UDP flood by as much as 10%.
 
  - Avoid skb_clone() dance in ping_rcv() to improve performance under
    ping flood.
 
  - Avoid FIB lookup in netfilter if socket is available, 20% perf win.
 
  - Rework network device creation (in-kernel) API to more clearly
    identify network namespaces and their roles.
    There are up to 4 namespace roles but we used to have just 2 netns
    pointer arguments, interpreted differently based on context.
 
  - Use sysfs_break_active_protection() instead of trylock to avoid
    deadlocks between unregistering objects and sysfs access.
 
  - Add a new sysctl and sockopt for capping max retransmit timeout
    in TCP.
 
  - Support masking port and DSCP in routing rule matches.
 
  - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
 
  - Support specifying at what time packet should be sent on AF_XDP
    sockets.
 
  - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
 
  - Add Netlink YAML spec for WiFi (nl80211) and conntrack.
 
  - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
    which only need to be exported when IPv6 support is built as a module.
 
  - Age FDB entries based on Rx not Tx traffic in VxLAN, similar
    to normal bridging.
 
  - Allow users to specify source port range for GENEVE tunnels.
 
  - netconsole: allow attaching kernel release, CPU ID and task name
    to messages as metadata
 
 Driver API
 ----------
 
  - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
    the SW layers. Delegate the responsibilities to phylink where possible.
    Improve its handling in phylib.
 
  - Support symmetric OR-XOR RSS hashing algorithm.
 
  - Support tracking and preserving IRQ affinity by NAPI itself.
 
  - Support loopback mode speed selection for interface selftests.
 
 Device drivers
 --------------
 
  - Remove the IBM LCS driver for s390.
 
  - Remove the sb1000 cable modem driver.
 
  - Add support for SFP module access over SMBus.
 
  - Add MCTP transport driver for MCTP-over-USB.
 
  - Enable XDP metadata support in multiple drivers.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - add PCIe TLP Processing Hints (TPH) support for new AMD platforms
      - support dumping RoCE queue state for debug
      - opt into instance locking
    - Intel (100G, ice, idpf):
      - ice: rework MSI-X IRQ management and distribution
      - ice: support for E830 devices
      - iavf: add support for Rx timestamping
      - iavf: opt into instance locking
    - nVidia/Mellanox:
      - mlx4: use page pool memory allocator for Rx
      - mlx5: support for one PTP device per hardware clock
      - mlx5: support for 200Gbps per-lane link modes
      - mlx5: move IPSec policy check after decryption
    - AMD/Solarflare:
      - support FW flashing via devlink
    - Cisco (enic):
      - use page pool memory allocator for Rx
      - enable 32, 64 byte CQEs
      - get max rx/tx ring size from the device
    - Meta (fbnic):
      - support flow steering and RSS configuration
      - report queue stats
      - support TCP segmentation
      - support IRQ coalescing
      - support ring size configuration
    - Marvell/Cavium:
      - support AF_XDP
    - Wangxun:
      - support for PTP clock and timestamping
    - Huawei (hibmcge):
      - checksum offload
      - add more statistics
 
  - Ethernet virtual:
    - VirtIO net:
      - aggressively suppress Tx completions, improve perf by 96% with
        1 CPU and 55% with 2 CPUs
      - expose NAPI to IRQ mapping and persist NAPI settings
    - Google (gve):
      - support XDP in DQO RDA Queue Format
      - opt into instance locking
    - Microsoft vNIC:
      - support BIG TCP
 
  - Ethernet NICs consumer, and embedded:
    - Synopsys (stmmac):
      - cleanup Tx and Tx clock setting and other link-focused cleanups
      - enable SGMII and 2500BASEX mode switching for Intel platforms
      - support Sophgo SG2044
    - Broadcom switches (b53):
      - support for BCM53101
    - TI:
      - iep: add perout configuration support
      - icssg: support XDP
    - Cadence (macb):
      - implement BQL
    - Xilinx (axinet):
      - support dynamic IRQ moderation and changing coalescing at runtime
      - implement BQL
      - report standard stats
    - MediaTek:
      - support phylink managed EEE
    - Intel:
      - igc: don't restart the interface on every XDP program change
    - RealTek (r8169):
      - support reading registers of internal PHYs directly
      - increase max jumbo packet size on RTL8125/RTL8126
    - Airoha:
      - support for RISC-V NPU packet processing unit
      - enable scatter-gather and support MTU up to 9kB
    - Tehuti (tn40xx):
      - support cards with TN4010 MAC and an Aquantia AQR105 PHY
 
  - Ethernet PHYs:
    - support for TJA1102S, TJA1121
    - dp83tg720: add randomized polling intervals for link detection
    - dp83822: support changing the transmit amplitude voltage
    - support for LEDs on 88q2xxx
 
  - CAN:
    - canxl: support Remote Request Substitution bit access
    - flexcan: add S32G2/S32G3 SoC
 
  - WiFi:
    - remove cooked monitor support
    - strict mode for better AP testing
    - basic EPCS support
    - OMI RX bandwidth reduction support
    - batman-adv: add support for jumbo frames
 
  - WiFi drivers:
    - RealTek (rtw88):
      - support RTL8814AE and RTL8814AU
    - RealTek (rtw89):
      - switch using wiphy_lock and wiphy_work
      - add BB context to manipulate two PHY as preparation of MLO
      - improve BT-coexistence mechanism to play A2DP smoothly
    - Intel (iwlwifi):
      - add new iwlmld sub-driver for latest HW/FW combinations
    - MediaTek (mt76):
      - preparation for mt7996 Multi-Link Operation (MLO) support
    - Qualcomm/Atheros (ath12k):
      - continued work on MLO
    - Silabs (wfx):
      - Wake-on-WLAN support
 
  - Bluetooth:
    - add support for skb TX SND/COMPLETION timestamping
    - hci_core: enable buffer flow control for SCO/eSCO
    - coredump: log devcd dumps into the monitor
 
  - Bluetooth drivers:
    - intel: add support to configure TX power
    - nxp: handle bootloader error during cmd5 and cmd7
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
 Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
 OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
 DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
 IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
 xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
 cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
 8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
 z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
 00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
 eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
 P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
 =XY0S
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Continue Netlink conversions to per-namespace RTNL lock
     (IPv4 routing, routing rules, routing next hops, ARP ioctls)

   - Continue extending the use of netdev instance locks. As a driver
     opt-in protect queue operations and (in due course) ethtool
     operations with the instance lock and not RTNL lock.

   - Support collecting TCP timestamps (data submitted, sent, acked) in
     BPF, allowing for transparent (to the application) and lower
     overhead tracking of TCP RPC performance.

   - Tweak existing networking Rx zero-copy infra to support zero-copy
     Rx via io_uring.

   - Optimize MPTCP performance in single subflow mode by 29%.

   - Enable GRO on packets which went thru XDP CPU redirect (were queued
     for processing on a different CPU). Improving TCP stream
     performance up to 2x.

   - Improve performance of contended connect() by 200% by searching for
     an available 4-tuple under RCU rather than a spin lock. Bring an
     additional 229% improvement by tweaking hash distribution.

   - Avoid unconditionally touching sk_tsflags on RX, improving
     performance under UDP flood by as much as 10%.

   - Avoid skb_clone() dance in ping_rcv() to improve performance under
     ping flood.

   - Avoid FIB lookup in netfilter if socket is available, 20% perf win.

   - Rework network device creation (in-kernel) API to more clearly
     identify network namespaces and their roles. There are up to 4
     namespace roles but we used to have just 2 netns pointer arguments,
     interpreted differently based on context.

   - Use sysfs_break_active_protection() instead of trylock to avoid
     deadlocks between unregistering objects and sysfs access.

   - Add a new sysctl and sockopt for capping max retransmit timeout in
     TCP.

   - Support masking port and DSCP in routing rule matches.

   - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.

   - Support specifying at what time packet should be sent on AF_XDP
     sockets.

   - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
     users.

   - Add Netlink YAML spec for WiFi (nl80211) and conntrack.

   - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
     which only need to be exported when IPv6 support is built as a
     module.

   - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
     normal bridging.

   - Allow users to specify source port range for GENEVE tunnels.

   - netconsole: allow attaching kernel release, CPU ID and task name to
     messages as metadata

  Driver API:

   - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
     the SW layers. Delegate the responsibilities to phylink where
     possible. Improve its handling in phylib.

   - Support symmetric OR-XOR RSS hashing algorithm.

   - Support tracking and preserving IRQ affinity by NAPI itself.

   - Support loopback mode speed selection for interface selftests.

  Device drivers:

   - Remove the IBM LCS driver for s390

   - Remove the sb1000 cable modem driver

   - Add support for SFP module access over SMBus

   - Add MCTP transport driver for MCTP-over-USB

   - Enable XDP metadata support in multiple drivers

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - add PCIe TLP Processing Hints (TPH) support for new AMD
           platforms
         - support dumping RoCE queue state for debug
         - opt into instance locking
      - Intel (100G, ice, idpf):
         - ice: rework MSI-X IRQ management and distribution
         - ice: support for E830 devices
         - iavf: add support for Rx timestamping
         - iavf: opt into instance locking
      - nVidia/Mellanox:
         - mlx4: use page pool memory allocator for Rx
         - mlx5: support for one PTP device per hardware clock
         - mlx5: support for 200Gbps per-lane link modes
         - mlx5: move IPSec policy check after decryption
      - AMD/Solarflare:
         - support FW flashing via devlink
      - Cisco (enic):
         - use page pool memory allocator for Rx
         - enable 32, 64 byte CQEs
         - get max rx/tx ring size from the device
      - Meta (fbnic):
         - support flow steering and RSS configuration
         - report queue stats
         - support TCP segmentation
         - support IRQ coalescing
         - support ring size configuration
      - Marvell/Cavium:
         - support AF_XDP
      - Wangxun:
         - support for PTP clock and timestamping
      - Huawei (hibmcge):
         - checksum offload
         - add more statistics

   - Ethernet virtual:
      - VirtIO net:
         - aggressively suppress Tx completions, improve perf by 96%
           with 1 CPU and 55% with 2 CPUs
         - expose NAPI to IRQ mapping and persist NAPI settings
      - Google (gve):
         - support XDP in DQO RDA Queue Format
         - opt into instance locking
      - Microsoft vNIC:
         - support BIG TCP

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - cleanup Tx and Tx clock setting and other link-focused
           cleanups
         - enable SGMII and 2500BASEX mode switching for Intel platforms
         - support Sophgo SG2044
      - Broadcom switches (b53):
         - support for BCM53101
      - TI:
         - iep: add perout configuration support
         - icssg: support XDP
      - Cadence (macb):
         - implement BQL
      - Xilinx (axinet):
         - support dynamic IRQ moderation and changing coalescing at
           runtime
         - implement BQL
         - report standard stats
      - MediaTek:
         - support phylink managed EEE
      - Intel:
         - igc: don't restart the interface on every XDP program change
      - RealTek (r8169):
         - support reading registers of internal PHYs directly
         - increase max jumbo packet size on RTL8125/RTL8126
      - Airoha:
         - support for RISC-V NPU packet processing unit
         - enable scatter-gather and support MTU up to 9kB
      - Tehuti (tn40xx):
         - support cards with TN4010 MAC and an Aquantia AQR105 PHY

   - Ethernet PHYs:
      - support for TJA1102S, TJA1121
      - dp83tg720: add randomized polling intervals for link detection
      - dp83822: support changing the transmit amplitude voltage
      - support for LEDs on 88q2xxx

   - CAN:
      - canxl: support Remote Request Substitution bit access
      - flexcan: add S32G2/S32G3 SoC

   - WiFi:
      - remove cooked monitor support
      - strict mode for better AP testing
      - basic EPCS support
      - OMI RX bandwidth reduction support
      - batman-adv: add support for jumbo frames

   - WiFi drivers:
      - RealTek (rtw88):
         - support RTL8814AE and RTL8814AU
      - RealTek (rtw89):
         - switch using wiphy_lock and wiphy_work
         - add BB context to manipulate two PHY as preparation of MLO
         - improve BT-coexistence mechanism to play A2DP smoothly
      - Intel (iwlwifi):
         - add new iwlmld sub-driver for latest HW/FW combinations
      - MediaTek (mt76):
         - preparation for mt7996 Multi-Link Operation (MLO) support
      - Qualcomm/Atheros (ath12k):
         - continued work on MLO
      - Silabs (wfx):
         - Wake-on-WLAN support

   - Bluetooth:
      - add support for skb TX SND/COMPLETION timestamping
      - hci_core: enable buffer flow control for SCO/eSCO
      - coredump: log devcd dumps into the monitor

   - Bluetooth drivers:
      - intel: add support to configure TX power
      - nxp: handle bootloader error during cmd5 and cmd7"

* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
  unix: fix up for "apparmor: add fine grained af_unix mediation"
  mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
  net: usb: asix: ax88772: Increase phy_name size
  net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
  net: libwx: fix Tx L4 checksum
  net: libwx: fix Tx descriptor content for some tunnel packets
  atm: Fix NULL pointer dereference
  net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
  net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
  net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
  net: phy: aquantia: add essential functions to aqr105 driver
  net: phy: aquantia: search for firmware-name in fwnode
  net: phy: aquantia: add probe function to aqr105 for firmware loading
  net: phy: Add swnode support to mdiobus_scan
  gve: add XDP DROP and PASS support for DQ
  gve: update XDP allocation path support RX buffer posting
  gve: merge packet buffer size fields
  gve: update GQ RX to use buf_size
  gve: introduce config-based allocation for XDP
  gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
  ...
2025-03-26 21:48:21 -07:00
Willem de Bruijn
e6116fc605 net: skb: free up one bit in tx_flags
The linked series wants to add skb tx completion timestamps.
That needs a bit in skb_shared_info.tx_flags, but all are in use.

A per-skb bit is only needed for features that are configured on a
per packet basis. Per socket features can be read from sk->sk_tsflags.

Per packet tsflags can be set in sendmsg using cmsg, but only those in
SOF_TIMESTAMPING_TX_RECORD_MASK.

Per packet tsflags can also be set without cmsg by sandwiching a
send inbetween two setsockopts:

    val |= SOF_TIMESTAMPING_$FEATURE;
    setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
    write(fd, buf, sz);
    val &= ~SOF_TIMESTAMPING_$FEATURE;
    setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));

Changing a datapath test from skb_shinfo(skb)->tx_flags to
skb->sk->sk_tsflags can change behavior in that case, as the tx_flags
is written before the second setsockopt updates sk_tsflags.

Therefore, only bits can be reclaimed that cannot be set by cmsg and
are also highly unlikely to be used to target individual packets
otherwise.

Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This
selects between clock and free running counter source for HW TX
timestamps. It is probable that all packets of the same socket will
always use the same source.

Link: https://lore.kernel.org/netdev/cover.1739988644.git.pav@iki.fi/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20250225023416.2088705-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 19:01:08 -08:00