netdevice reg_state was split into two 16 bit enums back in 2010
in commit a2835763e1 ("rtnetlink: handle rtnl_link netlink
notifications manually"). Since the split the fields have been
moved apart, and last year we converted reg_state to a normal
u8 in commit 4d42b37def ("net: convert dev->reg_state to u8").
rtnl_link_state being a 16 bitfield makes no sense. Convert it
to a single bool, it seems very unlikely after 15 years that
we'll need more values in it.
We could drop dev->rtnl_link_ops from the conditions but feels
like having it there more clearly points at the reason for this
hack.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250410014246.780885-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DCCP was orphaned in 2021 by commit 054c4610bd ("MAINTAINERS: dccp:
move Gerrit Renker to CREDITS"), which noted that the last maintainer
had been inactive for five years.
In recent years, it has become a playground for syzbot, and most changes
to DCCP have been odd bug fixes triggered by syzbot. Apart from that,
the only changes have been driven by treewide or networking API updates
or adjustments related to TCP.
Thus, in 2023, we announced we would remove DCCP in 2025 via commit
b144fcaf46 ("dccp: Print deprecation notice.").
Since then, only one individual has contacted the netdev mailing list. [0]
There is ongoing research for Multipath DCCP. The repository is hosted
on GitHub [1], and development is not taking place through the upstream
community. While the repository is published under the GPLv2 license,
the scheduling part remains proprietary, with a LICENSE file [2] stating:
"This is not Open Source software."
The researcher mentioned a plan to address the licensing issue, upstream
the patches, and step up as a maintainer, but there has been no further
communication since then.
Maintaining DCCP for a decade without any real users has become a burden.
Therefore, it's time to remove it.
Removing DCCP will also provide significant benefits to TCP. It allows
us to freely reorganize the layout of struct inet_connection_sock, which
is currently shared with DCCP, and optimize it to reduce the number of
cachelines accessed in the TCP fast path.
Note that we keep DCCP netfilter modules as requested. [3]
Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0]
Link: https://github.com/telekom/mp-dccp #[1]
Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2]
Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When TCP is in TIME_WAIT state, PAWS verification uses
LINUX_PAWSESTABREJECTED, which is ambiguous and cannot be distinguished
from other PAWS verification processes.
We added a new counter, like the existing PAWS_OLD_ACK one.
Also we update the doc with previously missing PAWS_OLD_ACK.
usage:
'''
nstat -az | grep PAWSTimewait
TcpExtPAWSTimewait 1 0.0
'''
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250409112614.16153-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We mostly needed rtnl_lock in qstat to make sure the queue count
is stable while we work. For "ops locked" drivers the instance
lock protects the queue count, so we don't have to take rtnl_lock.
For currently ops-locked drivers: netdevsim and bnxt need
the protection from netdev going down while we dump, which
instance lock provides. gve doesn't care.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250408195956.412733-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Explicitly list all the ops structs and what locking they provide.
Use "ops locked" as a term for drivers which have ops called under
the instance lock.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250408195956.412733-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Protect xdp_features with netdev->lock. This way pure readers
no longer have to take rtnl_lock to access the field.
This includes calling NETDEV_XDP_FEAT_CHANGE under the lock.
Looks like that's fine for bonding, the only "real" listener,
it's the same as ethtool feature change.
In terms of normal drivers - only GVE need special consideration
(other drivers don't use instance lock or don't support XDP).
It calls xdp_set_features_flag() helper from gve_init_priv() which
in turn is called from gve_reset_recovery() (locked), or prior
to netdev registration. So switch to _locked.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250408195956.412733-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We don't have a consistent state yet, but document where we think
we are and where we wanna be.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-8-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTP8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpm6oEACnpGL52FAKTVj14GDqFo6Pq0Jmnh07x8qj
mpHFPwxfWAzRiuNyji2iS9ecS2cnlkixNyMWZipXRi4KJAUjJH6YDd7IofUI3Glf
6v7b6srFSvsWJIJ8LdkJHLHAJuzYnJvFZ8apwgQczEDqgHq7BAunM1sVQ+mydjYk
EXT4kN6DSBOPzwr9GAay52f8nXhbqdHfT+YTGHPHg+QToojL6gD7vvW57w/QqD/x
91hJef1z01cSIsDZOxA0EUeD+9bBAHpoamr/e3IOOCVYCN6hy0dGa9g0QGbbpVyE
AeU4FGZLV9J8OOfvHVraDt5Wn3IXxYaMu22dSn1S6tVinwjXhJR2LAA+t4fGHAkt
i38LjOsIbopSQn/cNhzwC8UZcHLqnVsdDolHlHzsVFVfcpck2/4JFpUeP8QhWgrk
f9tY12QUf/oEaWm0/sUCHZNFxpIGeFA5FFXf0Z92clnzBuiuWoesBNvxqY/2DeZn
IDNXiv+Trxr6kFEjTpzPeuxbWrn4PJ7afQSAFcEmOCguk5riM+zJZNIKg0TxUHSS
tt6sfxmTP1DhgDKad5kT3MLyzOcx47Kbjf4dj6KmRnD+3DGwwN2F7X7R1GJylPSp
RLOzJ+Ouuy9UmBN6JMsT4BmR9+FJTVirADU926d/ZqCTtRV8Tnq/6HHmKmmr4CR0
THJ0PJqQjg==
=MOve
-----END PGP SIGNATURE-----
Merge tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux
Pull io_uring zero-copy receive support from Jens Axboe:
"This adds support for zero-copy receive with io_uring, enabling fast
bulk receive of data directly into application memory, rather than
needing to copy the data out of kernel memory.
While this version only supports host memory as that was the initial
target, other memory types are planned as well, with notably GPU
memory coming next.
This work depends on some networking components which were queued up
on the networking side, but have now landed in your tree.
This is the work of Pavel Begunkov and David Wei. From the v14 posting:
'We configure a page pool that a driver uses to fill a hw rx queue
to hand out user pages instead of kernel pages. Any data that ends
up hitting this hw rx queue will thus be dma'd into userspace
memory directly, without needing to be bounced through kernel
memory. 'Reading' data out of a socket instead becomes a
_notification_ mechanism, where the kernel tells userspace where
the data is. The overall approach is similar to the devmem TCP
proposal
This relies on hw header/data split, flow steering and RSS to
ensure packet headers remain in kernel memory and only desired
flows hit a hw rx queue configured for zero copy. Configuring this
is outside of the scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is
that io_uring is used for the uAPI and the lifetime of all objects
are bound to an io_uring instance. Data is 'read' using a new
io_uring request type. When done, data is returned via a new shared
refill queue. A zero copy page pool refills a hw rx queue from this
refill queue directly. Of course, the lifetime of these data
buffers are managed by io_uring rather than the networking stack,
with different refcounting rules.
This patchset is the first step adding basic zero copy support. We
will extend this iteratively with new features e.g. dynamically
allocated zero copy areas, THP support, dmabuf support, improved
copy fallback, general optimisations and more'
In a local setup, I was able to saturate a 200G link with a single CPU
core, and at netdev conf 0x19 earlier this month, Jamal reported
188Gbit of bandwidth using a single core (no HT, including soft-irq).
Safe to say the efficiency is there, as bigger links would be needed
to find the per-core limit, and it's considerably more efficient and
faster than the existing devmem solution"
* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
io_uring/zcrx: add selftest case for recvzc with read limit
io_uring/zcrx: add a read limit to recvzc requests
io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
io_uring: Rename KConfig to Kconfig
io_uring/zcrx: fix leaks on failed registration
io_uring/zcrx: recheck ifq on shutdown
io_uring/zcrx: add selftest
net: add documentation for io_uring zcrx
io_uring/zcrx: add copy fallback
io_uring/zcrx: throttle receive requests
io_uring/zcrx: set pp memory provider for an rx queue
io_uring/zcrx: add io_recvzc request
io_uring/zcrx: dma-map area for the device
io_uring/zcrx: implement zerocopy receive pp memory provider
io_uring/zcrx: grab a net device
io_uring/zcrx: add io_zcrx_area
io_uring/zcrx: add interface queue and refill queue
- Removal of support for IBM Cell Blades
- SMP support for microwatt platform
- Support for inline static calls on PPC32
- Enable pmu selftests for power11 platform
- Enable hardware trace macro (HTM) hcall support
- Support for limited address mode capability
- Changes to RMA size from 512 MB to 768 MB to handle fadump
- Misc fixes and cleanups
Thanks to: Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav
Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar,
Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy,
Segher Boessenkool, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmfjZnYACgkQpnEsdPSH
ZJTmKg/+NGW7wyFY9d8Iai9ncYY7GSzsMSDTaan7qg0QWOd5gjHbsdbava7TM/DW
8p9XsC+17kSeftRNUjtc52bSN8Ei2gBdsXIagQG1alfB2X2e6wkNauifK+dz3Su6
usMEZZTO5R/jFDotFXNM1nsUj+8dvjnPgOUrji/P8k7PT5295wpza0hz1fy5SrOA
hM5cliBP36UgFe5Efvgm4OUX2gQIhbc3stt9MVfymW/k0Mit5f41UIPuVGiTWowY
s0cUJGkhxUlGXT3VfOVKuZfn4u9KMha7UCl9afSceJzXOdnUIKIbskui1VEv6cD/
iSIxi839uErAobFHlsLYprgYFciYLII3xe2qNZCA/ZxeIMS/Mm6xokESeWLhBnfa
P7ke6l0z3GDtTvgI2eSeU9BdrVveF1NgbP9GYSKgT6gtw/kRRnxgHF8tzmLON5PT
KXpQlzz8VuSBRtF2jnLFU89+FFwSA1bRUhDrp89HyYFqw1B5g4N7kFFTUJWHOuKS
fwPGy+cveKehmCUBedeTRFqHvvqdwpD/WnPlQzCly3WxqdL8U/eTXYftMiAwuK28
ovLuSs3vRThKRQ8DnUa5oB0UGsjMpRV5LdvYkhw+x8mZKUR59oj4fx2ae4TtPakg
dbAYuPPkCORdaSga/nV6vQgsLprFpcGX3dq6E19+BVBAY5D+1PE=
=GFcj
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- Remove support for IBM Cell Blades
- SMP support for microwatt platform
- Support for inline static calls on PPC32
- Enable pmu selftests for power11 platform
- Enable hardware trace macro (HTM) hcall support
- Support for limited address mode capability
- Changes to RMA size from 512 MB to 768 MB to handle fadump
- Misc fixes and cleanups
Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.
* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
powerpc/kexec: fix physical address calculation in clear_utlb_entry()
crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
powerpc: Fix 'intra_function_call not a direct call' warning
powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
powerpc/microwatt: Add SMP support
powerpc: Define config option for processors with broadcast TLBIE
powerpc/microwatt: Define an idle power-save function
powerpc/microwatt: Device-tree updates
powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
net: spider_net: Remove powerpc Cell driver
cpufreq: ppc_cbe: Remove powerpc Cell driver
genirq: Remove IRQ_EDGE_EOI_HANDLER
docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
powerpc: Remove UDBG_RTAS_CONSOLE
powerpc/io: Use standard barrier macros in io.c
powerpc/io: Rename _insw_ns() etc.
powerpc/io: Use generic raw accessors
...
Core & protocols
----------------
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls).
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked)
in BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream performance
up to 2x.
- Improve performance of contended connect() by 200% by searching
for an available 4-tuple under RCU rather than a spin lock.
Bring an additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles.
There are up to 4 namespace roles but we used to have just 2 netns
pointer arguments, interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout
in TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar
to normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name
to messages as metadata
Driver API
----------
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where possible.
Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers
--------------
- Remove the IBM LCS driver for s390.
- Remove the sb1000 cable modem driver.
- Add support for SFP module access over SMBus.
- Add MCTP transport driver for MCTP-over-USB.
- Enable XDP metadata support in multiple drivers.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96% with
1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
=XY0S
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
core:
- Add support for skb TX SND/COMPLETION timestamping
- hci_core: Enable buffer flow control for SCO/eSCO
- coredump: Log devcd dumps into the monitor
drivers:
- btusb: Add 2 HWIDs for MT7922
- btusb: Fix regression in the initialization of fake Bluetooth controllers
- btusb: Add 14 USB device IDs for Qualcomm WCN785x
- btintel: Add support for Intel Scorpius Peak
- btintel: Add support to configure TX power
- btintel: Add DSBR support for ScP
- btintel_pcie: Add device id of Whale Peak
- btintel_pcie: Setup buffers for firmware traces
- btintel_pcie: Read hardware exception data
- btintel_pcie: Add support for device coredump
- btintel_pcie: Trigger device coredump on hardware exception
- btnxpuart: Support for controller wakeup gpio config
- btnxpuart: Add support to set BD address
- btnxpuart: Add correct bootloader error codes
- btnxpuart: Handle bootloader error during cmd5 and cmd7
- btnxpuart: Fix kernel panic during FW release
- qca: add WCN3950 support
- hci_qca: use the power sequencer for wcn6750
- btmtksdio: Prevent enabling interrupts after IRQ handler removal
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmfjA3AZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeWWD/0QZYBrNP9QSTyYTeNlYhCC
Lw/n7n3+LhxOqIu+tOGS7UplTqR3p4WQGu2g+XV9wSu4dvLplZGn/40XtiXJA0+r
VsXivQ4IR/Vjd8sNLLixfmuH4g4CbMSblQvECD3/5wFTeSH8T6/gyts/WV/LDYOJ
jp5kG6HCFHMv7RaiaHdZ0Pe4c1xJ/Ek8TnrW4G/kPBaLzm+lhjRkfzxCx8cO0k0H
mpoheUohLtUSgfLf49826t7rp3HDuX9db2hiGXQfKSrL2milwufKNaMFTmbuU2Uq
IyAwR1CEdSsKlcpbnVNF05r94sjf8NjuBD3YWxB9OfVXq9aymJYZdGh54XT5nF3f
ccD1WNnOoTsXDbEAnuVx+EYrJAI70e7vE4m8XbpxJcxhFoSIQCmtDoXUY199LiEG
El7DlwouNFESmOIj6gSK7ogUEhmcQA0AJxBVpdxnGBAcQMn1hrGSb75VGg7rul8K
HcBtf5j4gxZum2fzrufeY1+eBxfSRjNFIsnutAr53gEtidPZYlmiRyAuqQJMo2sO
0hlpC6gQGQJ5HiTR5Jbo9u+OC1oeHHXNN5IpNrd6J65n0pqCRldDsoXCerEJsOou
cqWzb9lfQCrjRnWRxUWOLukdYRgBH15wuORU+Z7/qVhqOr49RvARnzq7AjGSenTS
YKb2N+NGlqpxG+vQscFJUA==
=XCdn
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
core:
- Add support for skb TX SND/COMPLETION timestamping
- hci_core: Enable buffer flow control for SCO/eSCO
- coredump: Log devcd dumps into the monitor
drivers:
- btusb: Add 2 HWIDs for MT7922
- btusb: Fix regression in the initialization of fake Bluetooth controllers
- btusb: Add 14 USB device IDs for Qualcomm WCN785x
- btintel: Add support for Intel Scorpius Peak
- btintel: Add support to configure TX power
- btintel: Add DSBR support for ScP
- btintel_pcie: Add device id of Whale Peak
- btintel_pcie: Setup buffers for firmware traces
- btintel_pcie: Read hardware exception data
- btintel_pcie: Add support for device coredump
- btintel_pcie: Trigger device coredump on hardware exception
- btnxpuart: Support for controller wakeup gpio config
- btnxpuart: Add support to set BD address
- btnxpuart: Add correct bootloader error codes
- btnxpuart: Handle bootloader error during cmd5 and cmd7
- btnxpuart: Fix kernel panic during FW release
- qca: add WCN3950 support
- hci_qca: use the power sequencer for wcn6750
- btmtksdio: Prevent enabling interrupts after IRQ handler removal
* tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (53 commits)
Bluetooth: MGMT: Add LL Privacy Setting
Bluetooth: hci_event: Fix handling of HCI_EV_LE_DIRECT_ADV_REPORT
Bluetooth: btnxpuart: Fix kernel panic during FW release
Bluetooth: btnxpuart: Handle bootloader error during cmd5 and cmd7
Bluetooth: btnxpuart: Add correct bootloader error codes
t blameBluetooth: btintel: Fix leading white space
Bluetooth: btintel: Add support to configure TX power
Bluetooth: btmtksdio: Prevent enabling interrupts after IRQ handler removal
Bluetooth: btmtk: Remove the resetting step before downloading the fw
Bluetooth: SCO: add TX timestamping
Bluetooth: L2CAP: add TX timestamping
Bluetooth: ISO: add TX timestamping
Bluetooth: add support for skb TX SND/COMPLETION timestamping
net-timestamp: COMPLETION timestamp on packet tx completion
HCI: coredump: Log devcd dumps into the monitor
Bluetooth: HCI: Add definition of hci_rp_remote_name_req_cancel
Bluetooth: hci_vhci: Mark Sync Flow Control as supported
Bluetooth: hci_core: Enable buffer flow control for SCO/eSCO
Bluetooth: btintel_pci: Fix build warning
Bluetooth: btintel_pcie: Trigger device coredump on hardware exception
...
====================
Link: https://patch.msgid.link/20250325192925.2497890-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
icsk->icsk_ack.timeout can be replaced by icsk->csk_delack_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires
This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
when hardware reports a packet completed.
Completion tstamp is useful for Bluetooth, as hardware timestamps do not
exist in the HCI specification except for ISO packets, and the hardware
has a queue where packets may wait. In this case the software SND
timestamp only reflects the kernel-side part of the total latency
(usually small) and queue length (usually 0 unless HW buffers
congested), whereas the completion report time is more informative of
the true latency.
It may also be useful in other cases where HW TX timestamps cannot be
obtained and user wants to estimate an upper bound to when the TX
probably happened.
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmfg9oUACgkQrB3Eaf9P
W7d8eRAAhUdQztUoVjNfRBScD34EHEBq8ruFMgbHcXkkdhg223CUaTYKqz3dYE1E
04OCSwe7yxFFSs7CKR4dRMSzcx7uH/ISPprk45g+wwoIzBCYOWjLBzS5qmamtSYu
E3sI/QZaVUU7mFDg5n3sr5nkCNz+LtTL2dUbOgi5AOqY9h7LZRzLBEB1RCVOioVO
14vDRBJ5RGIvmNx3pd0sRhB4BySUw1GQVTYLSFaL/V2JZcRu3yGhKT8q637fdKcS
blvdf5q4CYyu5i3p9LHhsg33D8fYUotBVFtzw3QaM1G9MdyhMF4WUbeZCa2kkk7J
N7j2N0WDjP7crKQkPbm9ATQiruF2zvPcYJqgIHKuNhWY35KNIFARWxbP8KGJJFUI
EqUUKuT0lUi39hULmm5YEqCGWLNQldTBOOZL1fccRj2DZelx4uhQsnt5B2yimUCD
7QJGH0Vx6NJAutuCgTXQtmD0qO2XYi50XLmh4K+HZ1M3DkSGohcYSz+aj8IIY/wT
+yT55uRoONqDbh9kOLHab0WWE6R+XJyY5rKnz6XpzyTPh0OwoxrbVXJF045NYIpB
tSl9ykVpymvWNT56nsfXEOm5THZhdvL+uDS/QV4rtF9En4TEJU0dGbhESvriuu2j
g6EAQpYJqL9OwA5aKVZCGlHf7siji7Z/uc57DADZ8mtmDXVMqco=
=9RMg
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-03-24
1) Prevent setting high order sequence number bits input in
non-ESN mode. From Leon Romanovsky.
2) Support PMTU handling in tunnel mode for packet offload.
From Leon Romanovsky.
3) Make xfrm_state_lookup_byaddr lockless.
From Florian Westphal.
4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
From Dan Carpenter.
* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
xfrm: state: make xfrm_state_lookup_byaddr lockless
xfrm: check for PMTU in tunnel mode for packet offload
xfrm: provide common xdo_dev_offload_ok callback implementation
xfrm: rely on XFRM offload
xfrm: simplify SA initialization routine
xfrm: delay initialization of offload path till its actually requested
xfrm: prevent high SEQ input in non-ESN mode
====================
Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support adjusting/reading RTO MIN for socket level by using set/getsockopt().
This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.
When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.
This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Significant changes throughout the tree to bring Python code up to
current standards and raise the minimum Python required to 3.9. Much of
this is preparatory to replacing the ancient Perl scripts/kernel-doc
horror with a slightly less horrifying Python implementation, expected
for 6.16.
- Update the minimum Sphinx required to 3.4.3, allowing us to remove a
bunch of older compatibility code.
- Rework and improve the generation of the ABI documentation.
(All of the above done by Mauro)
- Lots of translation updates. Alex Shi and Yanteng Si are taking on
responsibility for the Chinese translations going forward; that work will
still get to you via docs-next
- Try to standardize the format for indicating a developer's affiliation in
commit tags.
- Clarify the TAB's role in CoC enforcement actions.
- Try to spell out the rules for when a commit tag can name another
developer without their explicit permission.
Plus lots of other typo fixes and updates.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
=Gvdp
-----END PGP SIGNATURE-----
Merge tag 'docs-6.15' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
current standards and raise the minimum Python required to 3.9
Much of this is preparatory to replacing the ancient Perl
scripts/kernel-doc horror with a slightly less horrifying Python
implementation, expected for 6.16
- Update the minimum Sphinx required to 3.4.3, allowing us to remove
a bunch of older compatibility code
- Rework and improve the generation of the ABI documentation
(All of the above done by Mauro)
- Lots of translation updates. Alex Shi and Yanteng Si are taking on
responsibility for the Chinese translations going forward; that
work will still get to you via docs-next
- Try to standardize the format for indicating a developer's
affiliation in commit tags
- Clarify the TAB's role in CoC enforcement actions
- Try to spell out the rules for when a commit tag can name another
developer without their explicit permission
Plus lots of other typo fixes and updates"
* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
docs/zh_CN: fix spelling mistake
docs/Chinese: change the disclaimer words
docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
docs: driver-api: firmware: clarify userspace requirements
docs: clarify rules wrt tagging other people
docs: Remove outdated highuid.rst documentation
Documentation: dma-buf: heaps: Add heap name definitions
docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
docs: Correct installation instruction
Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
Documentation/CoC: Spell out the TAB role in enforcement decisions
Documentation: ocxl.rst: Update consortium site
scripts: get_feat.pl: substitute s390x with s390
scripts/kernel-doc: drop dead code for Wcontents_before_sections
scripts/kernel-doc: don't add not needed new lines
docs: driver-api/infiniband.rst: fix Kerneldoc markup
drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
include/asm-generic/io.h: fix kerneldoc markup
Docs/arch/arm64: Fix spelling in amu.rst
...
The context indicates that 'than' is the correct word instead of 'then',
as a comparison is being performed.
Given that 'then' is also a valid English word, checkpatch.pl wouldn't
have picked up on this spelling error.
This typo was caught by AI during code review.
Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This updates the old path and fixes the description of unavailable options.
Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As a followup of my presentation in Zagreb for netdev 0x19:
icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE
is enabled from tcp_ack().
Rename it to tcp_clean_acked, move it to tcp_sock structure
in the tcp_sock_read_rx for better cache locality in TCP
fast path.
Define this field only when CONFIG_TLS_DEVICE is enabled
saving 8 bytes on configs not using it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250317085313.2023214-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Display recovery event of PPCNT recovery counters group. Counts (per
link) the number of total successful recovery events of any recovery
types during port reset cycle.
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add documentation explaining the kernel release auto-population feature
in netconsole.
This feature appends kernel version information to the userdata
dictionary in every message sent when enabled via the `release_enabled`
file in the configfs hierarchy.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-6-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Similarly to net.mptcp.available_schedulers, this patch adds a new one
net.mptcp.available_path_managers to list the available path managers.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob
is added to determine which path manager will be used by each newly
created MPTCP socket by setting the name of it.
Dealing with an explicit name is easier than with a number, especially
when more PMs will be introduced.
This sysctl knob makes the old one "pm_type" deprecated.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, netconsole has two methods of configuration - module
parameter and configfs. The former interface allows for netconsole
activation earlier during boot (by specifying the module parameter on
the kernel command line), so it is preferred for debugging issues which
arise before userspace is up/the configfs interface can be used. The
module parameter syntax requires specifying the egress interface name.
This requirement makes it hard to use for a couple reasons:
- The egress interface name can be hard or impossible to predict. For
example, installing a new network card in a system can change the
interface names assigned by the kernel.
- When constructing the module parameter, one may have trouble
determining the original (kernel-assigned) name of the interface
(which is the name that should be given to netconsole) if some stable
interface naming scheme is in effect. A human can usually look at
kernel logs to determine the original name, but this is very painful
if automation is constructing the parameter.
For these reasons, allow selection of the egress interface via MAC
address when configuring netconsole using the module parameter. Update
the netconsole documentation with an example of the new syntax.
Selection of egress interface by MAC address via configfs is far less
interesting (since when this interface can be used, one should be able
to easily convert between MAC address and interface name), so it is left
unimplemented.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Tested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312-netconsole-v6-2-3437933e79b8@purestorage.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
- bump version strings, by Simon Wunderlich
- drop batadv_priv_debug_log struct, by Sven Eckelmann
- adopt netdev_hold() / netdev_put(), by Eric Dumazet
- add support for jumbo frames, by Sven Eckelmann
- use consistent name for mesh interface, by Sven Eckelmann
- cleanup B.A.T.M.A.N. IV OGM aggregation handling,
by Sven Eckelmann (4 patches)
- add missing newlines for log macros, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmfTCE0WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoea+EACQ3fJp7kf6iWxhJLiwon3Aoo4u
dg84xX5nc02KHT/zUC/ADSSN3FDdc3y8jgjpLfZv5TkWmRUqMNP9449PaoXpiIsW
oI6HKrVaEKIEPpMAaBrv0dPtG5fBZPku4tZBLP7ekU/2N0Ri3eKTCph7IznO3Ma5
cM4FN6H9Nxwnht2qN7Gx3nWoBn9LeaSyKfAFLR+hI3QxTh9PYLV26rGCLrUhbbvT
St8TtgdeULSH/50rhMqC4wpT8verk+BMJdShNPLyse4QSRky6DD6YOj7zS/c7akP
z8S77gvp0mcgzb/t49K4AFjUFEvK9l0rvPDx6DMrfJwQAZUPjFY93y1XI5QUOPRZ
+kVj0nr2XRc7c1/wDNfn4131T9Twi41+TX6BI8Fan8WpWKVTo2a/O5VWk5pHKjy+
a8aZqL41444wWyFhyCL+Io1y1mDc9TDa7644+fJjR5H4h47M9rGnLVEX3F9mVtBl
LIOwpHpHXg9z3JDgpCwLWykUzW4lEuF1hQFUD3DPIbVKENjvibiiB+qPNZJG1icL
n9Pt6843vF8gcw3aGbp0VGsESILiKr7EN7CMkhfcO1CaqAErJG4DinubH6DLQIjO
uW5Ss2751BfqWPOakWVzvvWafnleVRdw4rysbil4ko/xONg0qdfTMWe5fS42bkbk
CMmGPNeSiqVTXp5bIg==
=hmiI
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- drop batadv_priv_debug_log struct, by Sven Eckelmann
- adopt netdev_hold() / netdev_put(), by Eric Dumazet
- add support for jumbo frames, by Sven Eckelmann
- use consistent name for mesh interface, by Sven Eckelmann
- cleanup B.A.T.M.A.N. IV OGM aggregation handling,
by Sven Eckelmann (4 patches)
- add missing newlines for log macros, by Sven Eckelmann
* tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge:
batman-adv: add missing newlines for log macros
batman-adv: Limit aggregation size to outgoing MTU
batman-adv: Use actual packet count for aggregated packets
batman-adv: Switch to bitmap helper for aggregation handling
batman-adv: Limit number of aggregated packets directly
batman-adv: Use consistent name for mesh interface
batman-adv: Add support for jumbo frames
batman-adv: adopt netdev_hold() / netdev_put()
batman-adv: Drop batadv_priv_debug_log struct
batman-adv: Start new development cycle
====================
Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add set/show support for the ENABLE_ROCE NVM parameter to
enable/disable RoCE for a PF.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250310183129.3154117-4-michael.chan@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is a couple of places from which we can arrive to ndo_setup_tc
with TC_SETUP_BLOCK/TC_SETUP_FT:
- netlink
- netlink notifier
- netdev notifier
Locking netdev too deep in this call chain seems to be problematic
(especially assuming some/all of the call_netdevice_notifiers
NETDEV_UNREGISTER) might soon be running with the instance lock).
Revert to lockless ndo_setup_tc for TC_SETUP_BLOCK/TC_SETUP_FT. NFT
framework already takes care of most of the locking. Document
the assumptions.
ndo_setup_tc TC_SETUP_BLOCK
nft_block_offload_cmd
nft_chain_offload_cmd
nft_flow_block_chain
nft_flow_offload_chain
nft_flow_rule_offload_abort
nft_flow_rule_offload_commit
nft_flow_rule_offload_commit
nf_tables_commit
nfnetlink_rcv_batch
nfnetlink_rcv_skb_batch
nfnetlink_rcv
nft_offload_netdev_event
NETDEV_UNREGISTER notifier
ndo_setup_tc TC_SETUP_FT
nf_flow_table_offload_cmd
nf_flow_table_offload_setup
nft_unregister_flowtable_hook
nft_register_flowtable_net_hooks
nft_flowtable_update
nf_tables_newflowtable
nfnetlink_rcv_batch (.call NFNL_CB_BATCH)
nft_flowtable_update
nf_tables_newflowtable
nft_flowtable_event
nf_tables_flowtable_event
NETDEV_UNREGISTER notifier
__nft_unregister_flowtable_net_hooks
nft_unregister_flowtable_net_hooks
nf_tables_commit
nfnetlink_rcv_batch (.call NFNL_CB_BATCH)
__nf_tables_abort
nf_tables_abort
nfnetlink_rcv_batch
__nft_release_hook
__nft_release_hooks
nf_tables_pre_exit_net -> module unload
nft_rcv_nl_event
netlink_register_notifier (oh boy)
nft_register_flowtable_net_hooks
nft_flowtable_update
nf_tables_newflowtable
nf_tables_newflowtable
Fixes: c4f0f30b42 ("net: hold netdev instance lock during nft ndo_setup_tc")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reported-by: syzbot+0afb4bcf91e5a1afdcad@syzkaller.appspotmail.com
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250308044726.1193222-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Also clarify ndo_get_stats (that read and write paths can run
concurrently) and mention only RCU.
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250305163732.2766420-14-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add documentation for the netconsole task name feature in
Documentation/networking/netconsole.rst. This explains how to enable
task name via configfs and demonstrates the output format.
The documentation includes:
- How to enable/disable the feature via taskname_enabled
- The format of the task name in the output
- An example showing the task name appearing in messages
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The name 'netns_local' is confusing. A following commit will export it via
netlink, so let's use a more explicit name.
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A common task for most drivers is to remember the user-set CPU affinity
to its IRQs. On each netdev reset, the driver should re-assign the user's
settings to the IRQs. Unify this task across all drivers by moving the CPU
affinity to napi->config.
However, to move the CPU affinity to core, we also need to move aRFS
rmap management since aRFS uses its own IRQ notifiers.
For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting
aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate
and manage the aRFS rmaps. Freeing the rmap is also done by core when the
netdev is freed. For better IRQ affinity management, move the IRQ rmap
notifier inside the napi_struct and add new notify.notify and
notify.release functions: netif_irq_cpu_rmap_notify() and
netif_napi_affinity_release().
Now we have the aRFS rmap management in core, add CPU affinity mask to
napi_config. To delegate the CPU affinity management to the core, drivers
must:
1 - set the new netdev flag "irq_affinity_auto":
netif_enable_irq_affinity(netdev)
2 - create the napi with persistent config:
netif_napi_add_config()
3 - bind an IRQ to the napi instance: netif_napi_set_irq()
the core will then make sure to use re-assign affinity to the napi's
IRQ.
The default IRQ mask is set to one cpu starting from the closest NUMA.
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS,
for flows using TCP TS option (RFC 7323)
As hinted by an old comment in tcp_check_req(),
we can check the TSEcr value in the incoming packet corresponds
to one of the SYNACK TSval values we have sent.
In this patch, I record the oldest and most recent values
that SYNACK packets have used.
Send a challenge ACK if we receive a TSEcr outside
of this range, and increase a new SNMP counter.
nstat -az | grep TSEcrRejected
TcpExtTSEcrRejected 0 0.0
Due to TCP fastopen implementation, do not apply yet these checks
for fastopen flows.
v2: No longer use req->num_timeout, but treq->snt_tsval_first
to detect when first SYNACK is prepared. This means
we make sure to not send an initial zero TSval.
Make sure MPTCP and TCP selftests are passing.
Change MIB name to TcpExtTSEcrRejected
v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/
Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This driver can no longer be built since support for IBM Cell Blades was
removed, in particular PPC_IBM_CELL_BLADE.
Remove the driver and the documentation.
Remove the MAINTAINERS entry, and add Ishizaki and Geoff to CREDITS.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241218105523.416573-24-mpe@ellerman.id.au
The way how the virtual interface is called inside the batman-adv source
code is not consistent. The genl headers call it meshif and the rest of the
code calls is (mostly) softif.
The genl definitions cannot be touched because they are part of the UAPI.
But the rest of the batman-adv code can be touched to have a consistent
name again.
The bulk of the renaming was done using
sed -i -e 's/soft\(-\|\_\| \|\)i\([nf]\)/mesh\1i\2/g' \
-e 's/SOFT\(-\|\_\| \|\)I\([NF]\)/MESH\1I\2/g'
and then it was adjusted slightly when proofreading the changes.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ7ffOQAKCRAraS+Z+3y5
EzVHAP9h/QkeYoOZW9gul08I8vFiZsFe/lbOSLJWxeVfxb9JhgD/cMqby3qAxQK6
lsdNQ9jYG2232Wym89ag7fvTBK15Wg4=
=gkN2
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-02-20
We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).
The main changes are:
1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing
2) Add network TX timestamping support to BPF sock_ops, from Jason Xing
3) Add TX metadata Launch Time support, from Song Yoong Siang
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
igc: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
net: stmmac: Add launch time support to XDP ZC
selftests/bpf: Add launch time request to xdp_hw_metadata
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
bpf: Support selective sampling for bpf timestamping
bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
bpf: Disable unsafe helpers in TX timestamping callbacks
bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
bpf: Add networking timestamping support to bpf_get/setsockopt()
selftests/bpf: Add rto max for bpf_setsockopt test
bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================
Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Almost all drivers except bond and nsim had same check if device
can perform XFRM offload on that specific packet. The check was that
packet doesn't have IPv4 options and IPv6 extensions.
In NIC drivers, the IPv4 HELEN comparison was slightly different, but
the intent was to check for the same conditions. So let's chose more
strict variant as a common base.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
- Fix a soft-lockup in BPF arena_map_free on 64k page size
kernels (Alan Maguire)
- Fix a missing allocation failure check in BPF verifier's
acquire_lock_state (Kumar Kartikeya Dwivedi)
- Fix a NULL-pointer dereference in trace_kfree_skb by adding
kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima)
- Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
- Fix a syzbot-reported deadlock when holding BPF map's
freeze_mutex (Andrii Nakryiko)
- Fix a use-after-free issue in bpf_test_init when
eth_skb_pkt_type is accessing skb data not containing an
Ethernet header (Shigeru Yoshida)
- Fix skipping non-existing keys in generic_map_lookup_batch
(Yan Zhai)
- Several BPF sockmap fixes to address incorrect TCP copied_seq
calculations, which prevented correct data reads from recv(2)
in user space (Jiayuan Chen)
- Two fixes for BPF map lookup nullness elision (Daniel Xu)
- Fix a NULL-pointer dereference from vmlinux BTF lookup in
bpf_sk_storage_tracing_allowed (Jared Kangas)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZ7evlRUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISIPwHgD/dTvM00Ck4Q73fPivyT7tcqxeXJlD
D6ggzWl/SG9LAbwA/2/cSgAM9Jm1g7ddvn/S9QaDYOs5GmFl6urq6krs+tYD
=FCs9
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull BPF fixes from Daniel Borkmann:
- Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
(Alan Maguire)
- Fix a missing allocation failure check in BPF verifier's
acquire_lock_state (Kumar Kartikeya Dwivedi)
- Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
to the raw_tp_null_args set (Kuniyuki Iwashima)
- Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
- Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
(Andrii Nakryiko)
- Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
accessing skb data not containing an Ethernet header (Shigeru
Yoshida)
- Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)
- Several BPF sockmap fixes to address incorrect TCP copied_seq
calculations, which prevented correct data reads from recv(2) in user
space (Jiayuan Chen)
- Two fixes for BPF map lookup nullness elision (Daniel Xu)
- Fix a NULL-pointer dereference from vmlinux BTF lookup in
bpf_sk_storage_tracing_allowed (Jared Kangas)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests: bpf: test batch lookup on array of maps with holes
bpf: skip non exist keys in generic_map_lookup_batch
bpf: Handle allocation failure in acquire_lock_state
bpf: verifier: Disambiguate get_constant_map_key() errors
bpf: selftests: Test constant key extraction on irrelevant maps
bpf: verifier: Do not extract constant map keys for irrelevant maps
bpf: Fix softlockup in arena_map_free on 64k page kernel
net: Add rx_skb of kfree_skb to raw_tp_null_args[].
bpf: Fix deadlock when freeing cgroup storage
selftests/bpf: Add strparser test for bpf
selftests/bpf: Fix invalid flag of recv()
bpf: Disable non stream socket for strparser
bpf: Fix wrong copied_seq calculation
strparser: Add read_sock callback
bpf: avoid holding freeze_mutex during mmap operation
bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
selftests/bpf: Adjust data size to have ETH_HLEN
bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
Extend the J1939 stack documentation to include information about how
buffer sizes influence stack behavior, detailing handling of simple
sessions, TP, and ETP transfers.
Additionally, describe various setsockopt(2) options, including their
usage and potential error values that can be returned by the stack.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241013181715.3488980-1-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add documentation for io_uring zero copy Rx that explains requirements
and the user API.
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250215000947.789731-11-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Document the existence of persistent per-NAPI configuration space and
the API that drivers can opt into.
Update stale documentation which suggested that NAPI IDs cannot be
queried from userspace.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20250213191535.38792-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ICM is a portion of the host's memory assigned to a function by the OS
through requests made by the NIC's firmware.
PF ICM consumption can be accessed directly, while VF/SF ICM consumption
can be accessed through their representors in switchdev mode.
The value is exposed to the user in granularity of 4KB through the vnic
health reporter as follows:
$ devlink health diagnose pci/0000:08:00.0 reporter vnic
vNIC env counters:
total_error_queues: 0 send_queue_priority_update_flow: 0
comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
invalid_command: 0 quota_exceeded_command: 0
nic_receive_steering_discard: 0 icm_consumption: 1032
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update the information in sfc's devlink documentation including
support for firmware update with devlink flash.
Also update the help text for CONFIG_SFC_MTD, as it is no longer
strictly required for firmware updates.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previous patch added a TCP_RTO_MAX_MS socket option
to tune a TCP socket max RTO value.
Many setups prefer to change a per netns sysctl.
This patch adds /proc/sys/net/ipv4/tcp_rto_max_ms
Its initial value is 120000 (120 seconds).
Keep in mind that a decrease of tcp_rto_max_ms
means shorter overall timeouts, unless tcp_retries2
sysctl is increased.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, TCP stack uses a constant (120 seconds)
to limit the RTO value exponential growth.
Some applications want to set a lower value.
Add TCP_RTO_MAX_MS socket option to set a value (in ms)
between 1 and 120 seconds.
It is discouraged to change the socket rto max on a live
socket, as it might lead to unexpected disconnects.
Following patch is adding a netns sysctl to control the
default value at socket creation time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Update the netconsole documentation to explain the new feature that
allows automatic population of the CPU number.
The key changes include introducing a new section titled "CPU number
auto population in userdata", explaining how to enable the CPU number
auto-population feature by writing to the "populate_cpu_nr" file in the
netconsole configfs hierarchy.
This documentation update ensures users are aware of the new CPU number
auto-population functionality and how to leverage it for better
demultiplexing and visibility of parallel netconsole output.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current struct sockaddr_can tp is member of can_addr. tp is not
member of struct sockaddr_can.
Signed-off-by: Reyders Morales <reyders1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250203224720.42530-1-reyders1@gmail.com
Fixes: 67711e0425 ("Documentation: networking: document ISO 15765-2")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Use generic devlink PF MSI-X parameter to allow user to change MSI-X
range.
Add notes about this parameters into ice devlink documentation.
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
All other sysctl entries mention it, and it is a per-namespace sysctl.
So mention it as well.
Fixes: 27069e7cb3 ("mptcp: disable active MPTCP in case of blackhole")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Added a new read_sock handler, allowing users to customize read operations
instead of relying on the native socket's read_sock.
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://patch.msgid.link/20250122100917.49845-2-mrpre@163.com
Fix a couple of typos/spelling mistakes in the documentation.
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20250123082521.59997-1-khaledelnaggarlinux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- bump version strings, by Simon Wunderlich
- Reorder includes for distributed-arp-table.c, by Sven Eckelmann
- Fix translation table change handling, by Remi Pommarel (2 patches)
- Map VID 0 to untagged TT VLAN, by Sven Eckelmann
- Update MAINTAINERS/mailmap e-mail addresses, by the respective authors
(4 patches)
- netlink: reduce duplicate code by returning interfaces,
by Linus Lüssing
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmeKTuUWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoREQEACKKLbl6IGJ9xXTlaRhtUwr55KD
JmvJT+y25IOmJBlRpZ6zHEjDxlsF/W+Se63k4tWrQSEscqBLXL9K0nY171cU+sJ0
Nx6AF2F6GobfVuzJGoJWjOZ3eCk4+stiCsBPz0ZJmu5cN24nZRkQZWpOJCSl1KtY
huRG+bum58hw4kvaK5IU+9iLxtN5ec3STVZjEe0GKVqB6vo19+XGnPJqNSlVDDiQ
iJSl+6VjaYYG2zclyQDefo4CuILVTAxGi5vaVFhoLVkUj2QPo41BEpkfQxZ8P1HR
bFXfnOITuhCeDgzF2uwocaeW0P06kyKa+cyH4PO5BvQOp5up667aDdXtgKPBvKzV
5Fn+WN0A/SsM+tLdGkVdbU1ceNgsNYLyROJgorDNkiGUmRvrFOmhXVzhJMlcppkR
O+qh7K7Kpoyuz3a2ZiktvG5sSG0aFhQ3GRJyZlqKGV5bLzJ8e2Z+0aCFKtVw0RlZ
3lg2/iEb5//+aJKZToHH97U3PG2ahO76WPSiaTPVVfvGYS2gE7fihHWtI9Uf9FzB
AHFhiOP5oPet5jSjH0jrybOEHcSvuxsOi+vIC0hCiPV4O/fN7tjeEj4XM3uPF5LQ
FI26vvS5izhZxOWVIUPnz5pSWzTmfiRPNbckcESHt8CHjlSTUw3TPlC0jua0iXMF
shhpONEPSjZvExybLg==
=cv/9
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Reorder includes for distributed-arp-table.c, by Sven Eckelmann
- Fix translation table change handling, by Remi Pommarel (2 patches)
- Map VID 0 to untagged TT VLAN, by Sven Eckelmann
- Update MAINTAINERS/mailmap e-mail addresses, by the respective authors
(4 patches)
- netlink: reduce duplicate code by returning interfaces,
by Linus Lüssing
* tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge:
batman-adv: netlink: reduce duplicate code by returning interfaces
MAINTAINERS: mailmap: add entries for Antonio Quartulli
mailmap: add entries for Sven Eckelmann
mailmap: add entries for Simon Wunderlich
MAINTAINERS: update email address of Marek Linder
batman-adv: Map VID 0 to untagged TT VLAN
batman-adv: Don't keep redundant TT change events
batman-adv: Remove atomic usage for tt.local_changes
batman-adv: Reorder includes for distributed-arp-table.c
batman-adv: Start new development cycle
====================
Link: https://patch.msgid.link/20250117123910.219278-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The number of SYN + MPC retransmissions before falling back to TCP was
fixed to 2. This is certainly a good default value, but having a fixed
number can be a problem in some environments.
The current behaviour means that if all packets are dropped, there will
be:
- The initial SYN + MPC
- 2 retransmissions with MPC
- The next ones will be without MPTCP.
So typically ~3 seconds before falling back to TCP. In some networks
where some temporally blackholes are unfortunately frequent, or when a
client tries to initiate connections while the network is not ready yet,
this can cause new connections not to have MPTCP connections.
In such environments, it is now possible to increase the number of SYN
retransmissions with MPTCP options to make sure MPTCP is used.
Interesting values are:
- 0: the first retransmission will be done without MPTCP options: quite
aggressive, but also a higher risk of detecting false-positive
MPTCP blackholes.
- >= 128: all SYN retransmissions will keep the MPTCP options: back to
the < 6.12 behaviour.
The default behaviour is not changed here.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For packets with two-step timestamp requests, the hardware timestamp
comes back to the driver through a confirmation mechanism of sorts,
which allows the driver to confidently bump the successful "pkts"
counter.
For one-step PTP, the NIC is supposed to autonomously insert its
hardware TX timestamp in the packet headers while simultaneously
transmitting it. There may be a confirmation that this was done
successfully, or there may not.
None of the current drivers which implement ethtool_ops :: get_ts_stats()
also support HWTSTAMP_TX_ONESTEP_SYNC or HWTSTAMP_TX_ONESTEP_SYNC, so it
is a bit unclear which model to follow. But there are NICs, such as DSA,
where there is no transmit confirmation at all. Here, it would be wrong /
misleading to increment the successful "pkts" counter, because one-step
PTP packets can be dropped on TX just like any other packets.
So introduce a special counter which signifies "yes, an attempt was made,
but we don't know whether it also exited the port or not". I expect that
for one-step PTP packets where a confirmation is available, the "pkts"
counter would be bumped.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250116104628.123555-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The hds-thresh option configures the threshold value of
the header-data-split.
If a received packet size is larger than this threshold value, a packet
will be split into header and payload.
The header indicates TCP and UDP header, but it depends on driver spec.
The bnxt_en driver supports HDS(Header-Data-Split) configuration at
FW level, affecting TCP and UDP too.
So, If hds-thresh is set, it affects UDP and TCP packets.
Example:
# ethtool -G <interface name> hds-thresh <value>
# ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256
# ethtool -g enp14s0f0np0
Ring parameters for enp14s0f0np0:
Pre-set maximums:
...
HDS thresh: 1023
Current hardware settings:
...
TCP data split: on
HDS thresh: 256
The default/min/max values are not defined in the ethtool so the drivers
should define themself.
The 0 value means that all TCP/UDP packets' header and payload
will be split.
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250114142852.3364986-3-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace generic instructions for monitoring error counters with a
procedure using the unified PHY statistics interface (`--all-groups`).
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Introduce a new way to report PHY statistics in a structured and
standardized format using the netlink API. This new method does not
replace the old driver-specific stats, which can still be accessed with
`ethtool -S <eth name>`. The structured stats are available with
`ethtool -S <eth name> --all-groups`.
This new method makes it easier to diagnose problems by organizing stats
in a consistent and documented way.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmd/mNkACgkQrB3Eaf9P
W7cQww/9Hnv7+wosuBxFW2o2ptgZThiwj/Ovz0oiWGWvctpN7VlpwmrDOsXQ+XMn
xF6JBFEjnJanYoDBb78D0dMffJcMcUKZopTUU/ZMitSNr8aIHYiuB4SWPG1tqxl4
Ete7Mr2m3tS96YePQNnAaRZzEuGsx3BQb28VLTWl9So81MByD2OK4fsAbYz22Gg8
7A6tDHn1mUd9b2VG+LeeBZaDDFG8C0O2x4E/8Z3DX3z1N8y3LABPwZ38jcgTviKO
1ZldGrJT+PownBydu23bWDassKE2TuVvGH9e/SOPeQj8DJ4Lmd0bafMZTY6xwfNT
RJCwhlzZUpYRXFzvcf3+U3egsqEWEemV7/LzAapdT0V9OqfLWMUh3b1jMA4KcblZ
qmlm/MhZyXutDoeuASwtM4jgM3wGwovOofrKKsb13hD9VLBs5jFZmFSw5TlbmwE3
sjv7V4pFwNyfJnwQtyMmfuuHiy8w+fzqAA2GCg8mF3OosHABH/FOvEBP8xg1Vqu1
iKlLkByfyaCFD+GxTPzqSDvSB8nDzeZBgBM/ILGuwH0OWr+0gxBzl1sxrhAkw9hC
Gf+4J3wg7EknTxfrJk4LyqfyS50GvUIzpLSkHYxDtQRd4zv75bxRzMvoZvuapTtN
GGpWYftKO8n2kgLORQ0dTtFee3c4w/No+KXWsFdtRVNpyk3N0Yw=
=U79q
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2025-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next-2025-01-09
1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality.
From Christian Hopps.
2) Support ESN context update to hardware for TX.
From Jianbo Liu.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move python code to a separate directory so it can be
packaged as a python module. Updates existing references
in selftests and docs.
Also rename ynl-gen-[c|rst] to ynl_gen_[c|rst], avoid
dashes as these prevent easy imports for entrypoints.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/a4151bad0e6984e7164d395125ce87fd2e048bf1.1736343575.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus suggested during one of past maintainer summits (in context of
a DMA_BUF discussion) that symbol namespaces can be used to prevent
unwelcome but in-tree code from using all exported functions.
Create a namespace for netdev.
Export netdev_rx_queue_restart(), drivers may want to use it since
it gives them a simple and safe way to restart a queue to apply
config changes. But it's both too low level and too actively developed
to be used outside netdev.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs.
This flag is also used in other network adapters like ICE.
Usage:
- "on" - The problematic VF will be automatically reset
if a malformed descriptor is detected.
- "off" - The problematic VF will be disabled.
In cases where a VF sends malformed packets classified as malicious, it can
cause the Tx queue to freeze, rendering it unusable for several minutes. When
an MDD event occurs, this new implementation allows for a graceful VF reset to
quickly restore operational state.
Currently, VF queues are disabled if an MDD event occurs. This patch adds the
ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD
event logging throttling to avoid dmesg pollution and unifies the format of
Tx and Rx MDD messages.
Note: Standard message rate limiting functions like dev_info_ratelimited()
do not meet our requirements. Custom rate limiting is implemented,
please see the code for details.
Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com>
Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously xfrm_dev_state_advance_esn() was added for RX only. But
it's possible that ESN context also need to be synced to hardware for
TX, so call it for outbound in this patch.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2025-01-03
Leo Stone provided a documatation fix to improve the grammar.
David Gilbert spotted a non-used fucntion we can safely remove.
* tag 'ieee802154-for-net-next-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next:
net: mac802154: Remove unused ieee802154_mlme_tx_one
Documentation: ieee802154: fix grammar
====================
Link: https://patch.msgid.link/20250103154605.440478-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Marek Lindner <marek.lindner@mailbox.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Document expectations from drivers looking to add support for device
memory tcp or other netmem based features.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20241217201206.2360389-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bonding documentation had several "insure" which is not
properly used in the context. Suggest to change to "ensure"
to improve readability.
Signed-off-by: shunlizhou <shunlizhou@aliyun.com>
Link: https://patch.msgid.link/20241216135447.57681-1-shunlizhou@aliyun.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket
to read and configure hwtstamp configuration of a PHC provider. Note that
simultaneous hwtstamp isn't supported; configuring a new one disables the
previous setting.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Either the MAC or the PHY can provide hwtstamp, so we should be able to
read the tsinfo for any hwtstamp provider.
Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a
network topology.
Add support for a specific dump command to retrieve all hwtstamp
providers within the network topology, with added functionality for
filtered dump to target a single interface.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the kernel's behavior and userspace expectations.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today we have a hardcoded delay of 1 sec before a TIME-WAIT socket can be
reused by reopening a connection. This is a safe choice based on an
assumption that the other TCP timestamp clock frequency, which is unknown
to us, may be as low as 1 Hz (RFC 7323, section 5.4).
However, this means that in the presence of short lived connections with an
RTT of couple of milliseconds, the time during which a 4-tuple is blocked
from reuse can be orders of magnitude longer that the connection lifetime.
Combined with a reduced pool of ephemeral ports, when using
IP_LOCAL_PORT_RANGE to share an egress IP address between hosts [1], the
long TIME-WAIT reuse delay can lead to port exhaustion, where all available
4-tuples are tied up in TIME-WAIT state.
Turn the reuse delay into a per-netns setting so that sysadmins can make
more aggressive assumptions about remote TCP timestamp clock frequency and
shorten the delay in order to allow connections to reincarnate faster.
Note that applications can completely bypass the TIME-WAIT delay protection
already today by locking the local port with bind() before connecting. Such
immediate connection reuse may result in PAWS failing to detect old
duplicate segments, leaving us with just the sequence number check as a
safety net.
This new configurable offers a trade off where the sysadmin can balance
between the risk of PAWS detection failing to act versus exhausting ports
by having sockets tied up in TIME-WAIT state for too long.
[1] https://lpc.events/event/16/contributions/1349/
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20241209-jakub-krn-909-poc-msec-tw-tstamp-v2-2-66aca0eed03e@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net.ipv4.nexthop_compat_mode was added when nexthop objects were added to
provide the view of nexthop objects through the usual lens of the route
UAPI. As nexthop objects evolved, the information provided through this
lens became incomplete. For example, details of resilient nexthop groups
are obviously omitted.
Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot
convey the >8-bit weight accurately. Instead of inventing workarounds for
an obsolete interface, just document the expectations of inaccuracy.
Fixes: b72a6a7ab9 ("net: nexthop: Increase weight to u16")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/b575e32399ccacd09079b2a218255164535123bd.1733740749.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enhance observability of netconsole. Packet sends can fail.
Start tracking at least two failure possibilities: ENOMEM and
NET_XMIT_DROP for every target. Stats are exposed via an additional
attribute in CONFIGFS.
The exposed statistics allows easier debugging of cases when netconsole
messages were not seen by receivers, eliminating the guesswork if the
sender thinks that messages in question were sent out.
Stats are not reset on enable/disable/change remote ip/etc, they
belong to the netcons target itself.
Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/all/ZsWoUzyK5du9Ffl+@gmail.com/
Signed-off-by: Maksym Kutsevol <max@kutsevol.com>
Link: https://patch.msgid.link/20241202-netcons-add-udp-send-fail-statistics-to-netconsole-v5-2-70e82239f922@kutsevol.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The BareUDP documentation had several grammar and spelling mistakes,
making it harder to read. This patch fixes those errors to improve
clarity and readability for developers.
Signed-off-by: Vyshnav Ajith <puthen1977@gmail.com>
Link: https://patch.msgid.link/20241121224827.12293-1-puthen1977@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix typos and grammar where it improves readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20241124230002.56058-1-leocstone@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The hyphen is removed to have the same style as the others.
Fixes: 09ef17863f ("Documentation: add more details in tipc.rst")
Signed-off-by: tuqiang <tu.qiang35@zte.com.cn>
Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20241115104731435HuFjZOmkaqumuFhB7mrwe@zte.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Report Rx parser statistics via ethtool -S.
The parser stats are 32b, so we need to add refresh to the service
task to make sure we don't miss overflows.
Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add PCIe hardware statistics support to the fbnic driver. These stats
provide insight into PCIe transaction performance and error conditions.
Which includes, read/write and completion TLP counts and DWORD counts and
debug counters for tag, completion credit and NP credit exhaustion
The stats are exposed via debugfs and can be used to monitor PCIe
performance and debug PCIe issues.
Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adds documentation for creating and configuring rvu port representors
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Describe irq suspension, the epoll ioctls, and the tradeoffs of using
different gro_flush_timeout values.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20241109050245.191288-7-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a mapping between a netdev and its neighoburs,
allowing for much cheaper flushes.
Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241107160444.2913124-7-gnaaman@drivenets.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unify the column width of the document to comply with specifications.
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add support for userspace to enable/disable the debug ports(ADB,MIPC).
- ADB port: /dev/wwan0adb0
- MIPC port: /dev/wwan0mipc0
Application can use ADB (Android Debug Bridge) port to implement
functions (shell, pull, push ...) by ADB protocol commands.
E.g., ADB commands:
- A_OPEN: OPEN(local-id, 0, "destination")
- A_WRTE: WRITE(local-id, remote-id, "data")
- A_OKEY: READY(local-id, remote-id, "")
- A_CLSE: CLOSE(local-id, remote-id, "")
Link: https://android.googlesource.com/platform/packages/modules/adb/+/refs/heads/main/README.md
Application can use MIPC (Modem Information Process Center) port
to debug antenna tuner or noise profiling through this MTK modem
diagnostic interface.
By default, debug ports are not exposed, so using the command
to enable or disable debug ports.
Enable debug ports:
- enable: 'echo 1 > /sys/bus/pci/devices/${bdf}/t7xx_debug_ports
Disable debug ports:
- disable: 'echo 0 > /sys/bus/pci/devices/${bdf}/t7xx_debug_ports
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The description of PDU1 format usage mistakenly referred to PDU2 format.
Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20241023145257.82709-1-alexander.hoelzl@gmx.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ETHTOOL_MSG_PHY_GET/GET_REPLY/NTF is missing in the ethtool message list.
Add it to the ethool netlink documentation.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241028132351.75922-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.
Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.
Currently there's one conflict in
Documentation/networking/net_cachelines/net_device.rst. To fix that
just remove the iw_public_data line:
https://lore.kernel.org/all/20241011121014.674661a0@canb.auug.org.au/
And when net is merged to net-next there will be another simple
conflict in in net/mac80211/cfg.c:
https://lore.kernel.org/all/20241024115523.4cd35dde@canb.auug.org.au/
Major changes:
cfg80211/mac80211
* stop exporting wext symbols
* new mac80211 op to indicate that a new interface is to be added
* support radio separation of multi-band devices
Wireless Extensions
* move wext spy implementation to libiw
* remove iw_public_data from struct net_device
brcmfmac
* optional LPO clock support
ipw2x00
* move remaining lib80211 code into libiw
wilc1000
* WILC3000 support
rtw89
* RTL8852BE and RTL8852BE-VT BT-coexistence improvements
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmcbz9YRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZsabQf8CWJ/kyonw/Z8hRxgfE/7D6Jiqoq7R+ML
8W8lbc6F5wra4eCBq/oo6UVV36Ss6mxQYcRcmLq+nCkXa4qdMpg/z55QECMHxx5Z
YnIBbD2vBrIj7W21gfCKH1WJ+b5IQFZl3zuxuCgXjxD9TJM2CjUfOkvrhrqqzrPn
clfUx5f01vfv2jdvClPR5977gFE5One/ANeRQNs7uDS0TeeD2P+61DEB1//htIJo
7GwwCyUJCeOcfWRMzQwhpoppWKcPAV70kSVJrl/fRstS68vQGSQbcx9yiNeWkSFw
JXjQGdc8eYLPzLqECwS0KwFkta6AXbafAYYXe1wdlAzr+kmJ9x5oqA==
=x+mr
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.13
The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.
Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.
Major changes:
cfg80211/mac80211
* stop exporting wext symbols
* new mac80211 op to indicate that a new interface is to be added
* support radio separation of multi-band devices
Wireless Extensions
* move wext spy implementation to libiw
* remove iw_public_data from struct net_device
brcmfmac
* optional LPO clock support
ipw2x00
* move remaining lib80211 code into libiw
wilc1000
* WILC3000 support
rtw89
* RTL8852BE and RTL8852BE-VT BT-coexistence improvements
* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
mac80211: Remove NOP call to ieee80211_hw_config
wifi: iwlwifi: work around -Wenum-compare-conditional warning
wifi: mac80211: re-order assigning channel in activate links
wifi: mac80211: convert debugfs files to short fops
debugfs: add small file operations for most files
wifi: mac80211: remove misleading j_0 construction parts
wifi: mac80211_hwsim: use hrtimer_active()
wifi: mac80211: refactor BW limitation check for CSA parsing
wifi: mac80211: filter on monitor interfaces based on configured channel
wifi: mac80211: refactor ieee80211_rx_monitor
wifi: mac80211: add support for the monitor SKIP_TX flag
wifi: cfg80211: add monitor SKIP_TX flag
wifi: mac80211: add flag to opt out of virtual monitor support
wifi: cfg80211: pass net_device to .set_monitor_channel
wifi: mac80211: remove status->ampdu_delimiter_crc
wifi: cfg80211: report per wiphy radio antenna mask
wifi: mac80211: use vif radio mask to limit creating chanctx
wifi: mac80211: use vif radio mask to limit ibss scan frequencies
wifi: cfg80211: add option for vif allowed radios
wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
...
====================
Link: https://patch.msgid.link/20241025170705.5F6B2C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add document about which modes have native XDP support.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241021031211.814-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original link returns 404 now. This commit replaces the dead google
site link with archive.org link.
Signed-off-by: Levi Zim <rsworktech@outlook.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241021-packet_mmap_fix_link-v1-1-dffae4a174c0@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix grammar where it improves readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/20241023041203.35313-1-leocstone@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.
napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).
Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.
Per-NAPI settings are saved in napi_disable and restored in napi_enable.
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow per-NAPI gro_flush_timeout setting.
The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.
The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.
The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.
The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sysctl_tcp_l3mdev_accept is read from TCP receive fast path from
tcp_v6_early_demux(),
__inet6_lookup_established,
inet_request_bound_dev_if().
Move it to netns_ipv4_read_rx.
Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding
its definition.
Note this adds a hole of three bytes that could be filled later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce the basic infrastructure to implement the net-shaper
core functionality. Each network devices carries a net-shaper cache,
the NL get() operation fetches the data from such cache.
The cache is initially empty, will be fill by the set()/group()
operation implemented later and is destroyed at device cleanup time.
The net_shaper_fill_handle(), net_shaper_ctx_init(), and
net_shaper_generic_pre() implementations handle generic index type
attributes, despite the current caller always pass a constant value
to avoid more noise in later patches using them with different
attributes.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-10-08 (ice, iavf, igb, e1000e, e1000)
This series contains updates to ice, iavf, igb, e1000e, and e1000
drivers.
For ice:
Wojciech adds support for ethtool reset.
Paul adds support for hardware based VF mailbox limits for E830 devices.
Jake adjusts to a common iterator in ice_vc_cfg_qs_msg() and moves
storing of max_frame and rx_buf_len from VSI struct to the ring
structure.
Hongbo Li uses assign_bit() to replace an open-coded instance.
Markus Elfring adjusts a couple of PTP error paths to use a common,
shared exit point.
Yue Haibing removes unused declarations.
For iavf:
Yue Haibing removes unused declarations.
For igb:
Yue Haibing removes unused declarations.
For e1000e:
Takamitsu Iwai removes unneccessary writel() calls.
Joe Damato adds support for netdev-genl support to query IRQ, NAPI,
and queue information.
For e1000:
Joe Damato adds support for netdev-genl support to query IRQ, NAPI,
and queue information.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000: Link NAPI instances to queues and IRQs
e1000e: Link NAPI instances to queues and IRQs
e1000e: Remove duplicated writel() in e1000_configure_tx/rx()
igb: Cleanup unused declarations
iavf: Remove unused declarations
ice: Cleanup unused declarations
ice: Use common error handling code in two functions
ice: Make use of assign_bit() API
ice: store max_frame and rx_buf_len only in ice_rx_ring
ice: consistently use q_idx in ice_vc_cfg_qs_msg()
ice: add E830 HW VF mailbox message limit support
ice: Implement ethtool reset support
====================
Link: https://patch.msgid.link/20241008233441.928802-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The doc pages under /networking/net_cachelines are unreadable because
they lack .rst formatting for the tabular text.
Add simple table markup and tidy up the table contents:
- remove dashes that represent empty cells because they render
as bullets and are not needed
- replace 'struct_*' with 'struct *' in the first column so that
sphinx can render links for any structs that appear in the docs
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008165329.45647-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The wireless-next tree was based on something older, and there
are now conflicts between -rc2 and work here. Merge net-next,
which has enough of -rc2 for the conflicts to happen, resolving
them in the process.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch introduces a diagnostic guide for troubleshooting Twisted
Pair Ethernet variants at OSI Layer 1. It provides detailed steps for
detecting and resolving common link issues, such as incorrect wiring,
cable damage, and power delivery problems. The guide also includes
interface verification steps and PHY-specific diagnostics.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241004121824.1716303-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.
fib_info_devhash[] is one of them.
It is unclear why we used a hash table, because
a single hlist_head per net device was cheaper and scalable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg for UDP sockets.
The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix multiple grammatical issues and add a missing period to improve
readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240929005001.370991-1-leocstone@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 8380c81d5c ("net: Treat __napi_schedule_irqoff() as
__napi_schedule() on PREEMPT_RT"), napi_schedule_irqoff will do the
right thing if IRQs are threaded. Therefore, there is no need to use
IRQF_NO_THREAD.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240930153955.971657-1-sean.anderson@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The iptables example was added in commit d2f26037a3 (netfilter: Add
documentation for tproxy, 2008-10-08), but xt_socket 'transparent'
option was added in commit a31e1ffd22 (netfilter: xt_socket: added new
revision of the 'socket' match supporting flags, 2009-06-09).
Now add the 'transparent' option to the iptables example to ignore
non-transparent sockets, which is also consistent with the nft example.
Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix a missing end of phrase in the documentation. It describes the
ETHTOOL_A_C33_PSE_ACTUAL_PW attribute, which was not fully explained.
Also, fix grammar issues by using simple present tense instead of
present continuous.
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240912090550.743174-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When SHAMPO can't identify the protocol/header of a packet, it will
yield a packet that is not split - all the packet is in the data part.
Count this value in packets and bytes.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-15-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ENA Express metrics, called `ena_srd` are exposed to
customers via `ethtool`.
The metrics allow customers to check the configuration
(mode), tx/rx counters as well as resource utilization.
The documentation is also updated to provide a general
explanation about ENA Express as well as links for further
information about metrics and configurations.
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20240909084704.13856-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a
single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach
PHY supporting full duplex point-to-point operation over 1 km of single
balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach
PHY supporting full / half duplex point-to-point operation over 15 m of
single balanced pair of conductors, or half duplex multidrop bus
operation over 25 m of single balanced pair of conductors.
Furthermore, the IEEE 802.3cg project defines the new Physical Layer
Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to
provide improved determinism to the CSMA/CD media access method. PLCA
works in conjunction with the 10BASE-T1S PHY operating in multidrop mode.
The aforementioned PHYs are intended to cover the low-speed / low-cost
applications in industrial and automotive environment. The large number
of pins (16) required by the MII interface, which is specified by the
IEEE 802.3 in Clause 22, is one of the major cost factors that need to be
addressed to fulfil this objective.
The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY
exposing a low pin count Serial Peripheral Interface (SPI) to the host
microcontroller. This also enables the addition of Ethernet functionality
to existing low-end microcontrollers which do not integrate a MAC
controller.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-2-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add documentation outlining the usage and details of devmem TCP.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240910171458.219195-12-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An MPTCP firewall blackhole can be detected if the following SYN
retransmission after a fallback to "plain" TCP is accepted.
In case of blackhole, a similar technique to the one in place with TFO
is now used: MPTCP can be disabled for a certain period of time, 1h by
default. This time period will grow exponentially when more blackhole
issues get detected right after MPTCP is re-enabled and will reset to
the initial value when the blackhole issue goes away.
The blackhole period can be modified thanks to a new sysctl knob:
blackhole_timeout. Two new MIB counters help understanding what's
happening:
- 'Blackhole', incremented when a blackhole is detected.
- 'MPCapableSYNTXDisabled', incremented when an MPTCP connection
directly falls back to TCP during the blackhole period.
Because the technique is inspired by the one used by TFO, an important
part of the new code is similar to what can find in tcp_fastopen.c, with
some adaptations to the MPTCP case.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
The patches are:
[patch 01/15] net/mlx5: Added missing mlx5_ifc definition for HW Steering
[patch 02/15] net/mlx5: Added missing definitions in preparation for HW Steering
[patch 03/15] net/mlx5: HWS, added actions handling
[patch 04/15] net/mlx5: HWS, added tables handling
[patch 05/15] net/mlx5: HWS, added rules handling
[patch 06/15] net/mlx5: HWS, added definers handling
[patch 07/15] net/mlx5: HWS, added matchers functionality
[patch 08/15] net/mlx5: HWS, added FW commands handling
[patch 09/15] net/mlx5: HWS, added modify header pattern and args handling
[patch 10/15] net/mlx5: HWS, added vport handling
[patch 11/15] net/mlx5: HWS, added memory management handling
[patch 12/15] net/mlx5: HWS, added backward-compatible API handling
[patch 13/15] net/mlx5: HWS, added debug dump and internal headers
[patch 14/15] net/mlx5: HWS, added send engine and context handling
[patch 15/15] net/mlx5: HWS, added API and enabled HWS support
=======================
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmbfOf4ACgkQSD+KveBX
+j7hWgf/UzlKp8uyqb+7MpMWP6EgT8WUwWdpDfAr1jubIFz7e+VGaA/7QCThe89u
alcgYvIDQCGrpB/0qXY+kaKPvqOej2wPCiLU7K2JB5y20pZ/RATlFuFBZjsMzufJ
7NxzZgTPUDz+8OWK0mm0LxEJRJYoJ69gAnR0jvLGx9uSjv/f9lNICvWBaI58hkzb
HJa6sJNBiFj4EnkipxWCP0GQ4dddMkgCIVYb91FtlBA4SGZtmPS35NqQJKtGnKF3
ZhZuaTeRdw8bFDJnhbu0ur9cs4EUorZE5QBWhoHYN0zFZF4JmqCCC1HvCS7LEKaU
PgtREk20H2jPIRFwQuX05D6M4zSizg==
=AsB3
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.
Example output:
# devlink dev info
pci/0000:01:00.0:
driver fbnic
serial_number 88-25-08-ff-ff-01-50-92
versions:
running:
fw 24.07.15-017
fw.commit h999784ae9df0
fw.bootloader 24.07.10-000
fw.bootloader.commit hfef3ac835ce7
stored:
fw 24.07.24-002
fw.commit hc9d14a68b3f2
fw.bootloader 24.07.22-000
fw.bootloader.commit h922f8493eb96
fw.undi 01.00.03-000
Signed-off-by: Lee Trager <lee@trager.us>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240905233820.1713043-1-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In commit 6f8b12d661 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Before this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647
After this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range
Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:
include/linux/netdevice.h: unsigned int tx_queue_len;
And has an overflow check:
dev_change_tx_queue_len(..., unsigned long new_len):
if (new_len != (unsigned int)new_len)
return -ERANGE;
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Make dev->priv_flags `u32` back and define bits higher than 31 as
bitfield booleans as per Jakub's suggestion. This simplifies code
which accesses these bits with no optimization loss (testb both
before/after), allows to not extend &netdev_priv_flags each time,
but also scales better as bits > 63 in the future would only add
a new u64 to the structure with no complications, comparing to
that extending ::priv_flags would require converting it to a bitmap.
Note that I picked `unsigned long :1` to not lose any potential
optimizations comparing to `bool :1` etc.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Two fields, page_pools and *irq_moder, were added to struct net_device
in commit 083772c9f9 ("net: page_pool: record pools per netdev") and
commit f750dfe825 ("ethtool: provide customized dim profile
management"), respectively.
Add both to the net_cachelines documentation, as well.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240829155742.366584-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the ethtool netlink cable testing interface by adding support for
specifying the source of cable testing results. This allows users to
differentiate between results obtained through different diagnostic
methods.
For example, some TI 10BaseT1L PHYs provide two variants of cable
diagnostics: Time Domain Reflectometry (TDR) and Active Link Cable
Diagnostic (ALCD). By introducing `ETHTOOL_A_CABLE_RESULT_SRC` and
`ETHTOOL_A_CABLE_FAULT_LENGTH_SRC` attributes, this update enables
drivers to indicate whether the result was derived from TDR or ALCD,
improving the clarity and utility of diagnostic information.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240822120703.1393130-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The newly introduced phy_link_topology tracks all ethernet PHYs that are
attached to a netdevice. Document the base principle, internal and
external APIs. As the phy_link_topology is expected to be extended, this
documentation will hold any further improvements and additions made
relative to topology handling.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we have the ability to track the PHYs connected to a net_device
through the link_topology, we can expose this list to userspace. This
allows userspace to use these identifiers for phy-specific commands and
take the decision of which PHY to target by knowing the link topology.
Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list
devices on only one interface.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the passed phy_index.
Add a helper that netlink command handlers need to use to grab the
targeted PHY from the req_info. This helper needs to hold rtnl_lock()
while interacting with the PHY, as it may be removed at any point.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following commit 9f7e8fbb91 ("net/mlx5: offset comp irq index in name by one"),
which fixed the index in IRQ name to start once again from 0, we change
the documentation accordingly.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20240815142343.2254247-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct spelling problems for Documentation/networking/ as reported
by ispell.
Signed-off-by: Jing-Ping Jan <zoo868e@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240812170910.5760-1-zoo868e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>