The description of PDU1 format usage mistakenly referred to PDU2 format.
Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20241023145257.82709-1-alexander.hoelzl@gmx.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ETHTOOL_MSG_PHY_GET/GET_REPLY/NTF is missing in the ethtool message list.
Add it to the ethool netlink documentation.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20241028132351.75922-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.
Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.
Currently there's one conflict in
Documentation/networking/net_cachelines/net_device.rst. To fix that
just remove the iw_public_data line:
https://lore.kernel.org/all/20241011121014.674661a0@canb.auug.org.au/
And when net is merged to net-next there will be another simple
conflict in in net/mac80211/cfg.c:
https://lore.kernel.org/all/20241024115523.4cd35dde@canb.auug.org.au/
Major changes:
cfg80211/mac80211
* stop exporting wext symbols
* new mac80211 op to indicate that a new interface is to be added
* support radio separation of multi-band devices
Wireless Extensions
* move wext spy implementation to libiw
* remove iw_public_data from struct net_device
brcmfmac
* optional LPO clock support
ipw2x00
* move remaining lib80211 code into libiw
wilc1000
* WILC3000 support
rtw89
* RTL8852BE and RTL8852BE-VT BT-coexistence improvements
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmcbz9YRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZsabQf8CWJ/kyonw/Z8hRxgfE/7D6Jiqoq7R+ML
8W8lbc6F5wra4eCBq/oo6UVV36Ss6mxQYcRcmLq+nCkXa4qdMpg/z55QECMHxx5Z
YnIBbD2vBrIj7W21gfCKH1WJ+b5IQFZl3zuxuCgXjxD9TJM2CjUfOkvrhrqqzrPn
clfUx5f01vfv2jdvClPR5977gFE5One/ANeRQNs7uDS0TeeD2P+61DEB1//htIJo
7GwwCyUJCeOcfWRMzQwhpoppWKcPAV70kSVJrl/fRstS68vQGSQbcx9yiNeWkSFw
JXjQGdc8eYLPzLqECwS0KwFkta6AXbafAYYXe1wdlAzr+kmJ9x5oqA==
=x+mr
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.13
The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.
Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.
Major changes:
cfg80211/mac80211
* stop exporting wext symbols
* new mac80211 op to indicate that a new interface is to be added
* support radio separation of multi-band devices
Wireless Extensions
* move wext spy implementation to libiw
* remove iw_public_data from struct net_device
brcmfmac
* optional LPO clock support
ipw2x00
* move remaining lib80211 code into libiw
wilc1000
* WILC3000 support
rtw89
* RTL8852BE and RTL8852BE-VT BT-coexistence improvements
* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
mac80211: Remove NOP call to ieee80211_hw_config
wifi: iwlwifi: work around -Wenum-compare-conditional warning
wifi: mac80211: re-order assigning channel in activate links
wifi: mac80211: convert debugfs files to short fops
debugfs: add small file operations for most files
wifi: mac80211: remove misleading j_0 construction parts
wifi: mac80211_hwsim: use hrtimer_active()
wifi: mac80211: refactor BW limitation check for CSA parsing
wifi: mac80211: filter on monitor interfaces based on configured channel
wifi: mac80211: refactor ieee80211_rx_monitor
wifi: mac80211: add support for the monitor SKIP_TX flag
wifi: cfg80211: add monitor SKIP_TX flag
wifi: mac80211: add flag to opt out of virtual monitor support
wifi: cfg80211: pass net_device to .set_monitor_channel
wifi: mac80211: remove status->ampdu_delimiter_crc
wifi: cfg80211: report per wiphy radio antenna mask
wifi: mac80211: use vif radio mask to limit creating chanctx
wifi: mac80211: use vif radio mask to limit ibss scan frequencies
wifi: cfg80211: add option for vif allowed radios
wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
...
====================
Link: https://patch.msgid.link/20241025170705.5F6B2C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add document about which modes have native XDP support.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241021031211.814-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original link returns 404 now. This commit replaces the dead google
site link with archive.org link.
Signed-off-by: Levi Zim <rsworktech@outlook.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241021-packet_mmap_fix_link-v1-1-dffae4a174c0@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix grammar where it improves readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/20241023041203.35313-1-leocstone@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.
napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).
Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.
Per-NAPI settings are saved in napi_disable and restored in napi_enable.
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow per-NAPI gro_flush_timeout setting.
The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.
The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.
The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.
The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sysctl_tcp_l3mdev_accept is read from TCP receive fast path from
tcp_v6_early_demux(),
__inet6_lookup_established,
inet_request_bound_dev_if().
Move it to netns_ipv4_read_rx.
Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding
its definition.
Note this adds a hole of three bytes that could be filled later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce the basic infrastructure to implement the net-shaper
core functionality. Each network devices carries a net-shaper cache,
the NL get() operation fetches the data from such cache.
The cache is initially empty, will be fill by the set()/group()
operation implemented later and is destroyed at device cleanup time.
The net_shaper_fill_handle(), net_shaper_ctx_init(), and
net_shaper_generic_pre() implementations handle generic index type
attributes, despite the current caller always pass a constant value
to avoid more noise in later patches using them with different
attributes.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-10-08 (ice, iavf, igb, e1000e, e1000)
This series contains updates to ice, iavf, igb, e1000e, and e1000
drivers.
For ice:
Wojciech adds support for ethtool reset.
Paul adds support for hardware based VF mailbox limits for E830 devices.
Jake adjusts to a common iterator in ice_vc_cfg_qs_msg() and moves
storing of max_frame and rx_buf_len from VSI struct to the ring
structure.
Hongbo Li uses assign_bit() to replace an open-coded instance.
Markus Elfring adjusts a couple of PTP error paths to use a common,
shared exit point.
Yue Haibing removes unused declarations.
For iavf:
Yue Haibing removes unused declarations.
For igb:
Yue Haibing removes unused declarations.
For e1000e:
Takamitsu Iwai removes unneccessary writel() calls.
Joe Damato adds support for netdev-genl support to query IRQ, NAPI,
and queue information.
For e1000:
Joe Damato adds support for netdev-genl support to query IRQ, NAPI,
and queue information.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000: Link NAPI instances to queues and IRQs
e1000e: Link NAPI instances to queues and IRQs
e1000e: Remove duplicated writel() in e1000_configure_tx/rx()
igb: Cleanup unused declarations
iavf: Remove unused declarations
ice: Cleanup unused declarations
ice: Use common error handling code in two functions
ice: Make use of assign_bit() API
ice: store max_frame and rx_buf_len only in ice_rx_ring
ice: consistently use q_idx in ice_vc_cfg_qs_msg()
ice: add E830 HW VF mailbox message limit support
ice: Implement ethtool reset support
====================
Link: https://patch.msgid.link/20241008233441.928802-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The doc pages under /networking/net_cachelines are unreadable because
they lack .rst formatting for the tabular text.
Add simple table markup and tidy up the table contents:
- remove dashes that represent empty cells because they render
as bullets and are not needed
- replace 'struct_*' with 'struct *' in the first column so that
sphinx can render links for any structs that appear in the docs
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008165329.45647-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The wireless-next tree was based on something older, and there
are now conflicts between -rc2 and work here. Merge net-next,
which has enough of -rc2 for the conflicts to happen, resolving
them in the process.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch introduces a diagnostic guide for troubleshooting Twisted
Pair Ethernet variants at OSI Layer 1. It provides detailed steps for
detecting and resolving common link issues, such as incorrect wiring,
cable damage, and power delivery problems. The guide also includes
interface verification steps and PHY-specific diagnostics.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241004121824.1716303-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.
fib_info_devhash[] is one of them.
It is unclear why we used a hash table, because
a single hlist_head per net device was cheaper and scalable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg for UDP sockets.
The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix multiple grammatical issues and add a missing period to improve
readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240929005001.370991-1-leocstone@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 8380c81d5c ("net: Treat __napi_schedule_irqoff() as
__napi_schedule() on PREEMPT_RT"), napi_schedule_irqoff will do the
right thing if IRQs are threaded. Therefore, there is no need to use
IRQF_NO_THREAD.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240930153955.971657-1-sean.anderson@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The iptables example was added in commit d2f26037a3 (netfilter: Add
documentation for tproxy, 2008-10-08), but xt_socket 'transparent'
option was added in commit a31e1ffd22 (netfilter: xt_socket: added new
revision of the 'socket' match supporting flags, 2009-06-09).
Now add the 'transparent' option to the iptables example to ignore
non-transparent sockets, which is also consistent with the nft example.
Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix a missing end of phrase in the documentation. It describes the
ETHTOOL_A_C33_PSE_ACTUAL_PW attribute, which was not fully explained.
Also, fix grammar issues by using simple present tense instead of
present continuous.
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240912090550.743174-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When SHAMPO can't identify the protocol/header of a packet, it will
yield a packet that is not split - all the packet is in the data part.
Count this value in packets and bytes.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-15-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ENA Express metrics, called `ena_srd` are exposed to
customers via `ethtool`.
The metrics allow customers to check the configuration
(mode), tx/rx counters as well as resource utilization.
The documentation is also updated to provide a general
explanation about ENA Express as well as links for further
information about metrics and configurations.
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20240909084704.13856-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a
single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach
PHY supporting full duplex point-to-point operation over 1 km of single
balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach
PHY supporting full / half duplex point-to-point operation over 15 m of
single balanced pair of conductors, or half duplex multidrop bus
operation over 25 m of single balanced pair of conductors.
Furthermore, the IEEE 802.3cg project defines the new Physical Layer
Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to
provide improved determinism to the CSMA/CD media access method. PLCA
works in conjunction with the 10BASE-T1S PHY operating in multidrop mode.
The aforementioned PHYs are intended to cover the low-speed / low-cost
applications in industrial and automotive environment. The large number
of pins (16) required by the MII interface, which is specified by the
IEEE 802.3 in Clause 22, is one of the major cost factors that need to be
addressed to fulfil this objective.
The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY
exposing a low pin count Serial Peripheral Interface (SPI) to the host
microcontroller. This also enables the addition of Ethernet functionality
to existing low-end microcontrollers which do not integrate a MAC
controller.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-2-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add documentation outlining the usage and details of devmem TCP.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240910171458.219195-12-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An MPTCP firewall blackhole can be detected if the following SYN
retransmission after a fallback to "plain" TCP is accepted.
In case of blackhole, a similar technique to the one in place with TFO
is now used: MPTCP can be disabled for a certain period of time, 1h by
default. This time period will grow exponentially when more blackhole
issues get detected right after MPTCP is re-enabled and will reset to
the initial value when the blackhole issue goes away.
The blackhole period can be modified thanks to a new sysctl knob:
blackhole_timeout. Two new MIB counters help understanding what's
happening:
- 'Blackhole', incremented when a blackhole is detected.
- 'MPCapableSYNTXDisabled', incremented when an MPTCP connection
directly falls back to TCP during the blackhole period.
Because the technique is inspired by the one used by TFO, an important
part of the new code is similar to what can find in tcp_fastopen.c, with
some adaptations to the MPTCP case.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
The patches are:
[patch 01/15] net/mlx5: Added missing mlx5_ifc definition for HW Steering
[patch 02/15] net/mlx5: Added missing definitions in preparation for HW Steering
[patch 03/15] net/mlx5: HWS, added actions handling
[patch 04/15] net/mlx5: HWS, added tables handling
[patch 05/15] net/mlx5: HWS, added rules handling
[patch 06/15] net/mlx5: HWS, added definers handling
[patch 07/15] net/mlx5: HWS, added matchers functionality
[patch 08/15] net/mlx5: HWS, added FW commands handling
[patch 09/15] net/mlx5: HWS, added modify header pattern and args handling
[patch 10/15] net/mlx5: HWS, added vport handling
[patch 11/15] net/mlx5: HWS, added memory management handling
[patch 12/15] net/mlx5: HWS, added backward-compatible API handling
[patch 13/15] net/mlx5: HWS, added debug dump and internal headers
[patch 14/15] net/mlx5: HWS, added send engine and context handling
[patch 15/15] net/mlx5: HWS, added API and enabled HWS support
=======================
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmbfOf4ACgkQSD+KveBX
+j7hWgf/UzlKp8uyqb+7MpMWP6EgT8WUwWdpDfAr1jubIFz7e+VGaA/7QCThe89u
alcgYvIDQCGrpB/0qXY+kaKPvqOej2wPCiLU7K2JB5y20pZ/RATlFuFBZjsMzufJ
7NxzZgTPUDz+8OWK0mm0LxEJRJYoJ69gAnR0jvLGx9uSjv/f9lNICvWBaI58hkzb
HJa6sJNBiFj4EnkipxWCP0GQ4dddMkgCIVYb91FtlBA4SGZtmPS35NqQJKtGnKF3
ZhZuaTeRdw8bFDJnhbu0ur9cs4EUorZE5QBWhoHYN0zFZF4JmqCCC1HvCS7LEKaU
PgtREk20H2jPIRFwQuX05D6M4zSizg==
=AsB3
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.
Example output:
# devlink dev info
pci/0000:01:00.0:
driver fbnic
serial_number 88-25-08-ff-ff-01-50-92
versions:
running:
fw 24.07.15-017
fw.commit h999784ae9df0
fw.bootloader 24.07.10-000
fw.bootloader.commit hfef3ac835ce7
stored:
fw 24.07.24-002
fw.commit hc9d14a68b3f2
fw.bootloader 24.07.22-000
fw.bootloader.commit h922f8493eb96
fw.undi 01.00.03-000
Signed-off-by: Lee Trager <lee@trager.us>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240905233820.1713043-1-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In commit 6f8b12d661 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Before this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647
After this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range
Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:
include/linux/netdevice.h: unsigned int tx_queue_len;
And has an overflow check:
dev_change_tx_queue_len(..., unsigned long new_len):
if (new_len != (unsigned int)new_len)
return -ERANGE;
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Make dev->priv_flags `u32` back and define bits higher than 31 as
bitfield booleans as per Jakub's suggestion. This simplifies code
which accesses these bits with no optimization loss (testb both
before/after), allows to not extend &netdev_priv_flags each time,
but also scales better as bits > 63 in the future would only add
a new u64 to the structure with no complications, comparing to
that extending ::priv_flags would require converting it to a bitmap.
Note that I picked `unsigned long :1` to not lose any potential
optimizations comparing to `bool :1` etc.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Two fields, page_pools and *irq_moder, were added to struct net_device
in commit 083772c9f9 ("net: page_pool: record pools per netdev") and
commit f750dfe825 ("ethtool: provide customized dim profile
management"), respectively.
Add both to the net_cachelines documentation, as well.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240829155742.366584-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the ethtool netlink cable testing interface by adding support for
specifying the source of cable testing results. This allows users to
differentiate between results obtained through different diagnostic
methods.
For example, some TI 10BaseT1L PHYs provide two variants of cable
diagnostics: Time Domain Reflectometry (TDR) and Active Link Cable
Diagnostic (ALCD). By introducing `ETHTOOL_A_CABLE_RESULT_SRC` and
`ETHTOOL_A_CABLE_FAULT_LENGTH_SRC` attributes, this update enables
drivers to indicate whether the result was derived from TDR or ALCD,
improving the clarity and utility of diagnostic information.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240822120703.1393130-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The newly introduced phy_link_topology tracks all ethernet PHYs that are
attached to a netdevice. Document the base principle, internal and
external APIs. As the phy_link_topology is expected to be extended, this
documentation will hold any further improvements and additions made
relative to topology handling.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we have the ability to track the PHYs connected to a net_device
through the link_topology, we can expose this list to userspace. This
allows userspace to use these identifiers for phy-specific commands and
take the decision of which PHY to target by knowing the link topology.
Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list
devices on only one interface.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the passed phy_index.
Add a helper that netlink command handlers need to use to grab the
targeted PHY from the req_info. This helper needs to hold rtnl_lock()
while interacting with the PHY, as it may be removed at any point.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following commit 9f7e8fbb91 ("net/mlx5: offset comp irq index in name by one"),
which fixed the index in IRQ name to start once again from 0, we change
the documentation accordingly.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20240815142343.2254247-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct spelling problems for Documentation/networking/ as reported
by ispell.
Signed-off-by: Jing-Ping Jan <zoo868e@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240812170910.5760-1-zoo868e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Applications may want to deal with dynamic RSS contexts only.
So dumping context 0 will be counter-productive for them.
Support starting the dump from a given context ID.
Alternative would be to implement a dump flag to skip just
context 0, not sure which is better...
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp no longer uses sk_user_data in tunnel sockets and now manages
tunnel/session lifetimes slightly differently. Update docs to cover
this.
CC: linux-doc@vger.kernel.org
CC: corbet@lwn.net
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
draft-ietf-6man-pio-pflag is adding a new flag to the Prefix Information
Option to signal that the network can allocate a unique IPv6 prefix per
client via DHCPv6-PD (see draft-ietf-v6ops-dhcp-pd-per-device).
When ra_honor_pio_pflag is enabled, the presence of a P-flag causes
SLAAC autoconfiguration to be disabled for that particular PIO.
An automated test has been added in Android (r.android.com/3195335) to
go along with this change.
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Lamparter <equinox@opensourcerouting.org>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The response to a GET request in Netlink should fully identify
the queried object. RSS_GET accepts context id as an input,
so it must echo that attribute back to the response.
After (assuming context 1 has been created):
$ ./cli.py --spec netlink/specs/ethtool.yaml \
--do rss-get \
--json '{"header": {"dev-index": 2}, "context": 1}'
{'context': 1,
'header': {'dev-index': 2, 'dev-name': 'eth0'},
[...]
Fixes: 7112a04664 ("ethtool: add netlink based get rss support")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240724234249.2621109-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZqIl1AAKCRDbK58LschI
g/MdAP9oyZV9/IZ6Y6Z1fWfio0SB+yJGugcwbFjWcEtNrzsqJQEAwipQnemAI4NC
HBMfK2a/w7vhAFMXrP/SbkB/gUJJ7QE=
=vovf
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-07-25
We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).
The main changes are:
1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
BPF sockhash. Also add test coverage for this case, from Michal Luczaj.
2) Fix a segmentation issue when downgrading gso_size in the BPF helper
bpf_skb_adjust_room(), from Fred Li.
3) Fix a compiler warning in resolve_btfids due to a missing type cast,
from Liwei Song.
4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.
5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
from Stanislav Fomichev.
6) Fix function prototype BTF dumping in libbpf for prototypes that have
no input arguments, from Andrii Nakryiko.
7) Fix stacktrace symbol resolution in perf script for BPF programs
containing subprograms, from Hou Tao.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
bpf, events: Use prog to emit ksymbol event for main program
selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
bpftool: Fix typo in usage help
libbpf: Fix no-args func prototype BTF dumping syntax
MAINTAINERS: Update powerpc BPF JIT maintainers
MAINTAINERS: Update email address of Naveen
selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================
Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian reports that commit 341ac980ea ("xsk: Support tx_metadata_len")
can break existing use cases which don't zero-initialize xdp_umem_reg
padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we
interpret the padding as tx_metadata_len only when being explicitly
asked.
Fixes: 341ac980ea ("xsk: Support tx_metadata_len")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
DT Bindings:
- Convert and add a bunch of IBM FSI related bindings
- Add a new schema listing legacy compatibles which will (probably)
never be documented. This will silence various checks warning about
them.
- Add bindings for Sierra Wireless mangOH Green SPI IoT interface, new
Arm 2024 Cortex and Neoverse CPUs, QCom sc8180x PDC, QCom SDX75 GPI
DMA, imx8mp/imx8qxp fsl,irqsteer, and Renesas RZ/G2UL CRU and CSI-2
blocks
- Convert Spreadtrum sprd-timer, FSL cpm_qe, FSL fsl,ls-scfg-msi, FSL
q(b)man-*, FSL qoriq-mc, and img,pdc-wdt bindings to DT schema
- Drop obsolete stericsson,abx500.txt
DT core:
- Update dtc to upstream version v1.7.0-93-g1df7b047fe43
- Add support to run DT validation on DTs with applied overlays
- Add helper for creating boolean properties in dynamic nodes and use
that for dynamic PCI nodes
- Clean-up early parsing of '#{address,size}-cells'
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmaW6UAACgkQ+vtdtY28
YcOyHRAAoDbhRxRtsF7pWwbiaEFi4y7yTyX6ogxGM3gL5xoXmT7Xri0OWakbHcTp
gfy9mWdeI9lw4eEheGDiX7qI66ax8SuuQjZ96wxMvsflFhnaLsL+088G208uGCMU
BuJroP2hvgOixeNi4hyy9ia2j036VpLLTqLHHFK7kzC7NCX2cWpaV2Tk7knHV8OY
OrJIUeRhcaTmotBJB0A2G+AkHTXQkfR1FdULvIQP8dewA2RI7R2Y6jffmh53gK+f
hLo1geUBVWe8y8xNjz9LVDYxrKPawAPOwO/n92kaSdw780suRUs4oq4L2+o1rYzV
sXTfx3+pZuL80FfTPheT4mHTTMZ2Hhq2wa4u2CWK4SHwv9KFBefYp6w7nlMELkM/
BQ1YLjtPh/GhywDa1TxGWPOha3wPFCewBNJuo4MrHKjhvSKBn7OPCdyNPBAahwQa
jFypbcWFhtcXtNTa4M9LhGJLlNK4RpTp4RGRcYvTNtZSa0TTUVz+1jvQ4ToPnXIf
C5VV1c370NpRJ1BUGeY8R4k946hzJAOxgaMGlkLaW90Cwn16VTCy666R9hwI1nx5
vdftlbgTHbZ/KOe6zTM6ywOsol8na1Wk7rqyfKR2vWHnmtj/DvFrKwXvBiKR0SuN
ru7vdOdi13YxcOmkgPoso+kBf1V0qELzxyrC4I8gPiOm68bPLZg=
=tjMz
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DT Bindings:
- Convert and add a bunch of IBM FSI related bindings
- Add a new schema listing legacy compatibles which will (probably)
never be documented. This will silence various checks warning about
them.
- Add bindings for Sierra Wireless mangOH Green SPI IoT interface,
new Arm 2024 Cortex and Neoverse CPUs, QCom sc8180x PDC, QCom SDX75
GPI DMA, imx8mp/imx8qxp fsl,irqsteer, and Renesas RZ/G2UL CRU and
CSI-2 blocks
- Convert Spreadtrum sprd-timer, FSL cpm_qe, FSL fsl,ls-scfg-msi, FSL
q(b)man-*, FSL qoriq-mc, and img,pdc-wdt bindings to DT schema
- Drop obsolete stericsson,abx500.txt
DT core:
- Update dtc to upstream version v1.7.0-93-g1df7b047fe43
- Add support to run DT validation on DTs with applied overlays
- Add helper for creating boolean properties in dynamic nodes and use
that for dynamic PCI nodes
- Clean-up early parsing of '#{address,size}-cells'"
* tag 'devicetree-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (39 commits)
dt-bindings: timer: sprd-timer: convert to YAML
dt-bindings: incomplete-devices: document devices without bindings
dt-bindings: trivial-devices: document the Sierra Wireless mangOH Green SPI IoT interface
scripts/dtc: Update to upstream version v1.7.0-93-g1df7b047fe43
dt-bindings: soc: fsl: Add fsl,ls1028a-reset for reset syscon node
dt-bindings: soc: fsl: cpm_qe: convert to yaml format
dt-bindings: i2c: i2c-fsi: Convert to json-schema
dt-bindings: fsi: Document the FSI Hub Controller
dt-bindings: fsi: Document the AST2700 FSI controller
dt-bindings: fsi: ast2600-fsi-master: Convert to json-schema
dt-bindings: fsi: ibm,i2cr-fsi-master: Reference common FSI controller
dt-bindings: fsi: Document the FSI controller common properties
dt-bindings: fsi: Document the IBM SBEFIFO engine
dt-bindings: fsi: p9-occ: Convert to json-schema
dt-bindings: fsi: Document the IBM SCOM engine
dt-bindings: fsi: fsi2spi: Document SPI controller child nodes
dt-bindings: interrupt-controller: convert fsl,ls-scfg-msi to yaml
dt-bindings: soc: fsl: Convert q(b)man-* to yaml format
dt-bindings: misc: fsl,qoriq-mc: convert to yaml format
dt-bindings: drop stale Anson Huang from maintainers
...
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-07-11 (net/intel)
This series contains updates to most Intel network drivers.
Tony removes MODULE_AUTHOR from drivers containing the entry.
Simon Horman corrects a kdoc entry for i40e.
Pawel adds implementation for devlink param "local_forwarding" on ice.
Michal removes unneeded call, and code, for eswitch rebuild for ice.
Sasha removed a no longer used field from igc.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igc: Remove the internal 'eee_advert' field
ice: remove eswitch rebuild
ice: Add support for devlink local_forwarding param
i40e: correct i40e_addr_to_hkey() name in kdoc
net: intel: Remove MODULE_AUTHORs
====================
Link: https://patch.msgid.link/20240711201932.2019925-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b0 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for driver-specific devlink local_forwarding param.
Supported values are "enabled", "disabled" and "prioritized".
Default configuration is set to "enabled".
Add documentation in networking/devlink/ice.rst.
In previous generations of Intel NICs the transmit scheduler was only
limited by PCIe bandwidth when scheduling/assigning hairpin-bandwidth
between VFs. Changes to E810 HW design introduced scheduler limitation,
so that available hairpin-bandwidth is bound to external port speed.
In order to address this limitation and enable NFV services such as
"service chaining" a knob to adjust the scheduler config was created.
Driver can send a configuration message to the FW over admin queue and
internal FW logic will reconfigure HW to prioritize and add more BW to
VF to VF traffic. An end result, for example, 10G port will no longer
limit hairpin-bandwidth to 10G and much higher speeds can be achieved.
Devlink local_forwarding param set to "prioritized" enables higher
hairpin-bandwitdh on related PFs. Configuration is applicable only to
8x10G and 4x25G cards.
Changing local_forwarding configuration will trigger CORER reset in
order to take effect.
Example command to change current value:
devlink dev param set pci/0000:b2:00.3 name local_forwarding \
value prioritized \
cmode runtime
Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Convert fsl,qoriq-mc from txt to yaml format.
Addition changes:
- Child node name allow 'ethernet'.
- Use 32bit address in example.
- Fixed missed ';' in example.
- Allow dma-coherent.
- Remove smmu, its part in example.
- Change child node name as 'ethernet'
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240617170934.813321-1-Frank.Li@nxp.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Correct the example to match the help text from the devlink utility.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch expands the status information provided by ethtool for PSE c33
with available power limit and available power limit ranges. It also adds
a call to pse_ethtool_set_pw_limit() to configure the PSE control power
limit.
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-5-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This update expands the status information provided by ethtool for PSE c33.
It includes details such as the detected class, current power delivered,
and extended state information.
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-1-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CMIS compliant modules such as QSFP-DD might be running a firmware that
can be updated in a vendor-neutral way by exchanging messages between
the host and the module as described in section 7.3.1 of revision 5.2 of
the CMIS standard.
Add a pair of new ethtool messages that allow:
* User space to trigger firmware update of transceiver modules
* The kernel to notify user space about the progress of the process
The user interface is designed to be asynchronous in order to avoid
RTNL being held for too long and to allow several modules to be
updated simultaneously. The interface is designed with CMIS compliant
modules in mind, but kept generic enough to accommodate future use
cases, if these arise.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NetDIM library, currently leveraged by an array of NICs, delivers
excellent acceleration benefits. Nevertheless, NICs vary significantly
in their dim profile list prerequisites.
Specifically, virtio-net backends may present diverse sw or hw device
implementation, making a one-size-fits-all parameter list impractical.
On Alibaba Cloud, the virtio DPU's performance under the default DIM
profile falls short of expectations, partly due to a mismatch in
parameter configuration.
I also noticed that ice/idpf/ena and other NICs have customized
profilelist or placed some restrictions on dim capabilities.
Motivated by this, I tried adding new params for "ethtool -C" that provides
a per-device control to modify and access a device's interrupt parameters.
Usage
========
The target NIC is named ethx.
Assume that ethx only declares support for rx profile setting
(with DIM_PROFILE_RX flag set in profile_flags) and supports modification
of usec and pkt fields.
1. Query the currently customized list of the device
$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 256, .comps = n/a,},
{.usec = 8, .pkts = 256, .comps = n/a,},
{.usec = 64, .pkts = 256, .comps = n/a,},
{.usec = 128, .pkts = 256, .comps = n/a,},
{.usec = 256, .pkts = 256, .comps = n/a,}
tx-profile: n/a
2. Tune
$ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
"n" means do not modify this field.
$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 1, .comps = n/a,},
{.usec = 2, .pkts = 256, .comps = n/a,},
{.usec = 3, .pkts = 3, .comps = n/a,},
{.usec = 4, .pkts = 4, .comps = n/a,},
{.usec = 256, .pkts = 5, .comps = n/a,}
tx-profile: n/a
3. Hint
If the device does not support some type of customized dim profiles,
the corresponding "n/a" will display.
If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmZ1MB4THG1rbEBwZW5n
dXRyb25peC5kZQAKCRAoOKI+ei28b+QQB/9wDkLdjgRsvI3c32IXxTrOmuX97Pm2
jYaHXmH1QkOKK0CXkPLEGjzpxSVXO6/d+ooeINkKmeF9VrtxxYWuE9pEsulyPz4y
dbeK5oAZdhtS6CEnArzbNHR6mdP1IcmJ3uKojsV71t+4GTdZw4WHe6LNgC4ua7n8
oYOpgbEV/YP/AyP8vO1L65M+tI69/0jeDpidtDQkNgWepE/Kh0W0ASfqjYrBgvZq
eycUmLTCigKNhG8fNJVHejyrnLynY2QBqwM7M5yYw1q8lX4W8Yf1y55VCqzPz63D
RiYVdVdG720jk9Ux9ZqxAKBPyzACnV6CG/MnzIgIdpV3O1aEf1WDkU2Y
=rRB9
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-6.11-20240621' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2024-06-21
The first 2 patches are by Andy Shevchenko, one cleans up the includes
in the mcp251x driver, the other one updates the sja100 plx_pci driver
to make use of predefines PCI subvendor ID.
Mans Rullgard's patch cleans up the Kconfig help text of for the slcan
driver.
Oliver Hartkopp provides a patch to update the documentation, which
removes the ISO 15675-2 specification version where possible.
The next 2 patches are by Harini T and update the documentation of the
xilinx_can driver.
Francesco Valla provides documentation for the ISO 15765-2 protocol.
A patch by Dr. David Alan Gilbert removes an unused struct from the
mscan driver.
12 patches are by Martin Jocic. The first three add support for 3 new
devices to the kvaser_usb driver. The remaining 9 first clean up the
kvaser_pciefd driver, and then add support for MSI.
Krzysztof Kozlowski contributes 3 patches simplifies the CAN SPI
drivers by making use of spi_get_device_match_data().
The last patch is by Martin Hundebøll, which reworks the m_can driver
to not enable the CAN transceiver during probe.
* tag 'linux-can-next-for-6.11-20240621' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (24 commits)
can: m_can: don't enable transceiver when probing
can: mcp251xfd: simplify with spi_get_device_match_data()
can: mcp251x: simplify with spi_get_device_match_data()
can: hi311x: simplify with spi_get_device_match_data()
can: kvaser_pciefd: Add MSI interrupts
can: kvaser_pciefd: Move reset of DMA RX buffers to the end of the ISR
can: kvaser_pciefd: Change name of return code variable
can: kvaser_pciefd: Rename board_irq to pci_irq
can: kvaser_pciefd: Add unlikely
can: kvaser_pciefd: Add inline
can: kvaser_pciefd: Remove unnecessary comment
can: kvaser_pciefd: Skip redundant NULL pointer check in ISR
can: kvaser_pciefd: Group #defines together
can: kvaser_usb: Add support for Kvaser Mini PCIe 1xCAN
can: kvaser_usb: Add support for Kvaser USBcan Pro 5xCAN
can: kvaser_usb: Add support for Vining 800
can: mscan: remove unused struct 'mscan_state'
Documentation: networking: document ISO 15765-2
can: xilinx_can: Document driver description to list all supported IPs
can: isotp: remove ISO 15675-2 specification version where possible
...
====================
Link: https://patch.msgid.link/20240621080201.305471-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing method of reserving unicast filter count leads to wasted
MCAM entries if the functionality is not used or fewer entries are used.
Furthermore, the amount of MCAM entries differs amongst Octeon SoCs.
We implemented a means to adjust the UC filter count via devlink,
allowing for better use of MCAM entries across Netdev apps.
commands:
To get the current unicast filter count
# devlink dev param show pci/0002:02:00.0 name unicast_filter_count
To change/set the unicast filter count
# devlink dev param set pci/0002:02:00.0 name unicast_filter_count
value 5 cmode runtime
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
New drivers were prevented from adding ndo_set_vf_* callbacks
over the last few years. This was expected to result in broader
switchdev adoption, but seems to have had little effect.
Based on recent netdev meeting there is broad support for allowing
adding those ops.
There is a problem with the current API supporting a limited number
of VFs (100+, which is less than some modern HW supports).
We can try to solve it by adding similar functionality on devlink
ports, but that'd be another API variation to maintain.
So a netlink attribute reshuffling is a more likely outcome.
Document the guidance, make it clear that the API is frozen.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document basic concepts, APIs and behaviour of the ISO 15675-2 (ISO-TP)
CAN stack.
Signed-off-by: Francesco Valla <valla.francesco@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20240501092413.414700-2-valla.francesco@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport
specification. It uses the same signaling as USXGMII, but it multiplexes
4 ports over the link, resulting in a maximum speed of 2.5G per port.
Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean
either the single-port USXGMII or the quad-port 10G-QXGMII variant, and
they could get away just fine with that thus far. But there is a need to
distinguish between the 2 as far as SerDes drivers are concerned.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When calculating hashes for the purpose of multipath forwarding, both IPv4
and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
randomly-generated seed. That's a fine choice by default, but unfortunately
some deployments may need a tighter control over the seed used.
In this patch, make the seed configurable by adding a new sysctl key,
net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
specifically for multipath forwarding and not for the other concerns that
flow_hash_from_keys() is used for, such as queue selection. Expose the knob
as sysctl because other such settings, such as headers to hash, are also
handled that way. Like those, the multipath hash seed is a per-netns
variable.
Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
variables.
The seed used by flow_hash_from_keys() is a 128-bit quantity. However it
seems that usually the seed is a much more modest value. 32 bits seem
typical (Cisco, Cumulus), some systems go even lower. For that reason, and
to decouple the user interface from implementation details, go with a
32-bit quantity, which is then quadruplicated to form the siphash key.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/pensando/ionic/ionic_txrx.c
d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
491aee894a ("ionic: fix kernel panic in XDP_TX action")
net/ipv6/ip6_fib.c
b4cb4a1391 ("net: use unrcu_pointer() helper")
b01e1c0307 ("ipv6: fix possible race in __fib6_drop_pcpu_from()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Count the number of header-only packets and bytes from SHAMPO.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After modifying rx_gro_packets to be more accurate, the
rx_gro_match_packets counter is redundant.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Don't count non GRO packets. A non GRO packet is a packet with
a GRO cb count of 1.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adding a sysctl knob to allow user to specify a default
rto_min at socket init time, other than using the hard
coded 200ms default rto_min.
Note that the rto_min route option has the highest precedence
for configuring this setting, followed by the TCP_BPF_RTO_MIN
socket option, followed by the tcp_rto_min_us sysctl.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A general documentation about MPTCP was missing since its introduction
in v5.6.
Most of what is there comes from our recently updated mptcp.dev website,
with additional links to resources from the kernel documentation.
This is a first version, mainly targeting app developers and users.
Link: https://www.mptcp.dev
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-3-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to what is done in other 'sysctl' pages: it looks clearer from a
readability perspective.
This might cause troubles in the short/mid-term with the backports, but
by not putting new entries at the end, this can help to reduce conflicts
in case of backports in the long term. We don't change the information
here too often, so it looks OK to do that.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-2-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This sysctl knob has been added recently, but the documentation has not
been updated.
This knob is used to show the available schedulers choices that are
registered, similar to 'net.ipv4.tcp_available_congestion_control'.
Fixes: 73c900aa36 ("mptcp: add net.mptcp.available_schedulers")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-1-e94cdd9f2673@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZkGcZAAKCRDbK58LschI
g6o6APwLsqhrM2w71VUN5ciCxu4H5VDtZp6wkdqtVbxxU4qNxQEApKgYgKt8ZLF3
Kily5c7m+S4ZXhMX21rb8JhSAz0dfQk=
=5Dk7
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-13
We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).
The main changes are:
1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.
2) Add BPF range computation improvements to the verifier in particular
around XOR and OR operators, refactoring of checks for range computation
and relaxing MUL range computation so that src_reg can also be an unknown
scalar, from Cupertino Miranda.
3) Add support to attach kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. Session mode is a common use-case for tetragon and bpftrace,
from Jiri Olsa.
4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.
5) Improvements to BPF selftests in context of BPF gcc backend,
from Jose E. Marchesi & David Faust.
6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
-style in order to retire the old test, run it in BPF CI and additionally
expand test coverage, from Jordan Rife.
7) Big batch for BPF selftest refactoring in order to remove duplicate code
around common network helpers, from Geliang Tang.
8) Another batch of improvements to BPF selftests to retire obsolete
bpf_tcp_helpers.h as everything is available vmlinux.h,
from Martin KaFai Lau.
9) Fix BPF map tear-down to not walk the map twice on free when both timer
and wq is used, from Benjamin Tissoires.
10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
from Alexei Starovoitov.
11) Change BTF build scripts to using --btf_features for pahole v1.26+,
from Alan Maguire.
12) Small improvements to BPF reusing struct_size() and krealloc_array(),
from Andy Shevchenko.
13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
from Ilya Leoshkevich.
14) Extend TCP ->cong_control() callback in order to feed in ack and
flag parameters and allow write-access to tp->snd_cwnd_stamp
from BPF program, from Miao Xu.
15) Add support for internal-only per-CPU instructions to inline
bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
from Puranjay Mohan.
16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
from Tushar Vyavahare.
17) Extend libbpf to support "module:<function>" syntax for tracing
programs, from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
bpf: make list_for_each_entry portable
bpf: ignore expected GCC warning in test_global_func10.c
bpf: disable strict aliasing in test_global_func9.c
selftests/bpf: Free strdup memory in xdp_hw_metadata
selftests/bpf: Fix a few tests for GCC related warnings.
bpf: avoid gcc overflow warning in test_xdp_vlan.c
tools: remove redundant ethtool.h from tooling infra
selftests/bpf: Expand ATTACH_REJECT tests
selftests/bpf: Expand getsockname and getpeername tests
sefltests/bpf: Expand sockaddr hook deny tests
selftests/bpf: Expand sockaddr program return value tests
selftests/bpf: Retire test_sock_addr.(c|sh)
selftests/bpf: Remove redundant sendmsg test cases
selftests/bpf: Migrate ATTACH_REJECT test cases
selftests/bpf: Migrate expected_attach_type tests
selftests/bpf: Migrate wildcard destination rewrite test
selftests/bpf: Migrate sendmsg6 v4 mapped address tests
selftests/bpf: Migrate sendmsg deny test cases
selftests/bpf: Migrate WILDCARD_IP test
selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
...
====================
Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZA578ACgkQ1V2XiooU
IOS18g//Zyuv+23GcUM7+FEXrlMN658xJyWiYvKjOaZZx5ZiV0QdZc4cbfPFD44p
qZBMmVC/WVoC89SLdwH1W47KoJU0xsK3/OdqGHHeNJ69111wIQMpOLfAetS2K0mb
F+Ue2vyWg1GQDICGsCdenHX7ihtVvnJJkomxc+3ObxtLCNsb2Dsr6JM5hMVP5Bil
4UZnPsrgfWy3A8O92burlPVE1sTWDFfFUGIf8geJc4QadwkgufkzxMhXNO7xHlpG
EZ99s8FPyD3R6tRPjf4gwdjr7JjinrdrYjZDuS4d3Uv8pKlUqcx8PgXG51/unr/y
qlynLXtEc1QU6SO2jENosHAG2/LQG2zsYEiiLFCP+a1JOtOxevZQKx8MyAFW8xDX
+RQhcBpTBocIyJ/tCDoM9lp69iYTR196Ct48v6pSGMNhZcddT4K4BkUL47GEs8T3
IA5x8h5gV2Q9ECMgqSaycdUsfLNgE/6fWx0ROs/wo3tMsgWrXCSJi8RFtN1sNbIO
rfuNnQiETIFBkQxBi7um8jadxdfIHm65cjgZBCyVbNNml3JwjYvXLxCXt2G7LxC4
Sg4nZvIbqWIifoMc1aQKypvFZjzzsWtFmYCuEUVLrnpj2SFTyh5CNzNo3MlHf7LG
sRb/XubdY6e0spLzd5VDjwH5qOT3poWAccatRr5BVUarxXCD5Vs=
=RD9p
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 skips transaction if object type provides no .update interface.
Patch #2 skips NETDEV_CHANGENAME which is unused.
Patch #3 enables conntrack to handle Multicast Router Advertisements and
Multicast Router Solicitations from the Multicast Router Discovery
protocol (RFC4286) as untracked opposed to invalid packets.
From Linus Luessing.
Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
dropping them, from Jason Xing.
Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
also from Jason.
Patch #6 removes reference in netfilter's sysctl documentation on pickup
entries which were already removed by Florian Westphal.
Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
allows to evict entries from the conntrack table,
also from Florian.
Patches #8 to #16 updates nf_tables pipapo set backend to allocate
the datastructure copy on-demand from preparation phase,
to better deal with OOM situations where .commit step is too late
to fail. Series from Florian Westphal.
Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
transitions, also from Florian.
Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
quick atomic reserves exhaustion with large sets, reporter refers
to million entries magnitude.
* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow clone callbacks to sleep
selftests: netfilter: add packetdrill based conntrack tests
netfilter: nft_set_pipapo: remove dirty flag
netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
netfilter: nft_set_pipapo: merge deactivate helper into caller
netfilter: nft_set_pipapo: prepare walk function for on-demand clone
netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
netfilter: nft_set_pipapo: move prove_locking helper around
netfilter: conntrack: remove flowtable early-drop test
netfilter: conntrack: documentation: remove reference to non-existent sysctl
netfilter: use NF_DROP instead of -NF_DROP
netfilter: conntrack: dccp: try not to drop skb in conntrack
netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
netfilter: nf_tables: skip transaction if update object is not implemented
====================
Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:
test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS
Deployment and structure
------------------------
The related codes are added to "arch/arc/net":
- bpf_jit.h -- The interface that a back-end translator must provide
- bpf_jit_core.c -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic
The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".
1. Normal pass # The necessary pass
1a. Dry run # Get the whole JIT length, epilogue offset, etc.
1b. Emit phase # Allocate memory and start emitting instructions
2. Extra pass # Only needed if there are relocations to be fixed
2a. Patch relocations
Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:
- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)
The result of "test_bpf" test suite on an HSDK board is:
hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:
.-----------.------------.-------------.
| test type | opcodes | # of cases |
|-----------+------------+-------------|
| atomic | 0xC3, 0xDB | 149 |
| div64 | 0x37, 0x3F | 22 |
| mod64 | 0x97, 0x9F | 15 |
`-----------^------------+-------------|
| (total) 186 |
`-------------'
Setup: build config
-------------------
The following configs must be set to have a working JIT test:
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_TEST_BPF=m
The following options are not necessary for the tests module,
but are good to have:
CONFIG_DEBUG_INFO=y # prerequisite for below
CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h
CONFIG_FTRACE=y #
CONFIG_BPF_SYSCALL=y # all these options lead to
CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y
CONFIG_PERF_EVENTS=y #
Some BPF programs provide data through /sys/kernel/debug:
CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug
Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.
[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7
Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:
pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats
Or else, the build will fail:
$ make V=1
...
BTF .btf.vmlinux.bin.o
pahole -J --btf_gen_floats \
-j --lang_exclude=rust \
--skip_encoding_btf_inconsistent_proto \
--btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
...
BTFIDS vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.
Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
...
test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmY0mEIACgkQrB3Eaf9P
W7fU7g//bQyydwei/4Vo+cNCPp82k8wL/qhDY3IjN10PfJOSNmeCAcSgkuuHTRSx
g/hoxZEVzLrQT5bt+Sb38JxADFiL787GjdGEUy1gzF7CnDKcGT5KnydYNjDqDVGt
nOv9kAGfWIkMKdNqrhHifddPMWd+ZqvpUcFz5olvqIE2mpNgMwy2i3NID9bNAV31
v5AEvNINa1LKOhX9cEka8iPQXwp+I6yTLqyOd4VciOuFr8dPg0FQqFaYR+OtMsV0
kIxdGTVfmRWaNgq/Tsg4z/2rXEwmEjTWzAhNVGu8o8L3JozXOvbjIrDG7Ws6qB3V
XTFl8ueRMk0UCTlY/QAfip5H7IlAo+H0FUBC45FNP1UhHeWisXT4D5rqAEqQTlZR
bddtuueLZyKclFpXRNi+/8vdDrXhhEzeNINkc52Ef33rUTtZJR8bXrEUKzaYCIuF
ldub0PA3+e5wvwIxq5/Chc/+MIaIHnXBMUmbCJSPnMrupBQtO+i6arPQcbtaBAgS
YyVGTRk9YN0UAjSriIuiViLlgUCMsvsWgfSz9rd0PE54MFBrvcLPeCtPxKZ+sTVG
Y2iSZ8d3ThvsMiQVNU8gj3SlTY1oTvuaijDDGjnR0nWkxV9LMJHCPKfIzsbOKLJe
+ee5hKP4TOFygnV58BkqdGK/LavNpouTIbrM43hgmJ0IX9kSt4o=
=QiGZ
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-05-03
1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support.
This was defined by an early version of an IETF draft
that did not make it to a standard.
2) Introduce direction attribute for xfrm states.
xfrm states have a direction, a stsate can be used
either for input or output packet processing.
Add a direction to xfrm states to make it clear
for what a xfrm state is used.
* tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Restrict SA direction attribute to specific netlink message types
xfrm: Add dir validation to "in" data path lookup
xfrm: Add dir validation to "out" data path lookup
xfrm: Add Direction to the SA in or out
udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support
====================
Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduces validation for the x->dir attribute within the XFRM input
data lookup path. If the configured direction does not match the
expected direction, input, increment the XfrmInStateDirError counter
and drop the packet to ensure data integrity and correct flow handling.
grep -vw 0 /proc/net/xfrm_stat
XfrmInStateDirError 1
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Introduces validation for the x->dir attribute within the XFRM output
data lookup path. If the configured direction does not match the expected
direction, output, increment the XfrmOutStateDirError counter and drop
the packet to ensure data integrity and correct flow handling.
grep -vw 0 /proc/net/xfrm_stat
XfrmOutPolError 1
XfrmOutStateDirError 1
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
New driver specific parameter 'tx_scheduling_layers' was introduced.
Describe parameter in the documentation.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The Power Sourcing Equipment Power Interface (PSE PI) plays a pivotal role
in the architecture of Power over Ethernet (PoE) systems. It is essentially
a blueprint that outlines how one or multiple power sources are connected
to the eight-pin modular jack, commonly known as the Ethernet RJ45 port.
This connection scheme is crucial for enabling the delivery of power
alongside data over Ethernet cables.
This patch adds support for getting the PSE controller node through PSE PI
device subnode.
This supports adds a way to get the PSE PI id from the pse_pi devicetree
subnode of a PSE controller node simply by reading the reg property.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-7-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the current PSE interface for Ethernet Power Equipment, support is
limited to PoDL. This patch extends the interface to accommodate the
objects specified in IEEE 802.3-2022 145.2 for Power sourcing
Equipment (PSE).
The following objects are now supported and considered mandatory:
- IEEE 802.3-2022 30.9.1.1.5 aPSEPowerDetectionStatus
- IEEE 802.3-2022 30.9.1.1.2 aPSEAdminState
- IEEE 802.3-2022 30.9.1.2.1 aPSEAdminControl
To avoid confusion between "PoDL PSE" and "PoE PSE", which have similar
names but distinct values, we have followed the suggestion of Oleksij
Rempel and Andrew Lunn to maintain separate naming schemes for each,
using c33 (clause 33) prefix for "PoE PSE".
You can find more details in the discussion threads here:
https://lore.kernel.org/netdev/20230912110637.GI780075@pengutronix.de/https://lore.kernel.org/netdev/2539b109-72ad-470a-9dae-9f53de4f64ec@lunn.ch/
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240417-feature_poe-v9-1-242293fd1900@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to query scc version by devlink info for device V3.
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20240410125354.2177067-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Newer NIC will introduce a new part number, now add it
into devlink device info.
This patch also updates the information of "board.id" in
nfp.rst to match the devlink-info.rst.
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add definition and documentation for the new generic
info "board.part_number".
The new one is for part number specific use, and board.id
is modified to match the documentation in devlink-info.
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nla_put_uint can either write a u32 or u64 netlink attribute value. The
size depends on whether the value can be represented with a u32 or requires
a u64. Use a uint annotation in various documentation to represent this.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20240409232520.237613-2-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Many devices send event notifications for the IO queues,
such as tx and rx queues, through event queues.
Enable a privileged owner, such as a hypervisor PF, to set the number
of IO event queues for the VF and SF during the provisioning stage.
example:
Get maximum IO event queues of the VF device::
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
Set maximum IO event queues of the VF device::
$ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the relevant phydev when the command targets a PHY.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count number of transmitted packets that were hardware timestamped at the
device DMA layer.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-4-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Track the number of times a CQE was expected to not be delivered on PTP Tx
port timestamping CQ. A CQE is expected to not be delivered if a certain
amount of time passes since the corresponding CQE containing the DMA
timestamp information has arrived. Increment the late_cqe counter when such
a CQE does manage to be delivered to the CQ.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-3-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multiple network devices that support hardware timestamping appear to have
common behavior with regards to timestamp handling. Implement common Tx
hardware timestamping statistics in a tx_stats struct_group. Common Rx
hardware timestamping statistics can subsequently be implemented in a
rx_stats struct_group for ethtool_ts_stats.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20240403212931.128541-2-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide devlink documentation for three eswitch attributes:
mode, inline-mode, and encap-mode.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240325181228.6244-1-witu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix an incorrect module name and sysfs path in dns resolver
documentation.
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240324104338.44083-1-bharathsm@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add information to our documentation for the XDP features
and related ethtool stats.
While we're here, we also add the missing timestamp stats.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240319163534.38796-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dev->state can be read in rx and tx fast paths.
netif_running() which needs dev->state is called from
- enqueue_to_backlog() [RX path]
- __dev_direct_xmit() [TX path]
Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Core & protocols
----------------
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps etc.)
lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core
instead of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length
and budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global config
variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug
of ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for
use on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code
--------------------------------------------
- Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena).
- Rework selftest harness to enable the use of the full range of
ksft exit code (pass, fail, skip, xfail, xpass).
Netfilter
---------
- Allow userspace to define a table that is exclusively owned by a daemon
(via netlink socket aliveness) without auto-removing this table when
the userspace program exits. Such table gets marked as orphaned and
a restarting management daemon can re-attach/regain ownership.
- Speed up element insertions to nftables' concatenated-ranges set type.
Compact a few related data structures.
BPF
---
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between BPF
program and user space where structures inside the arena can have
pointers to other areas of the arena, and pointers work seamlessly
for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the verifier
and the program. The verifier allows the program to loop assuming it's
behaving well, but reserves the right to terminate it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF objects.
Wireless
--------
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API
----------
- Major overhaul of the Energy Efficient Ethernet internals to support
new link modes (2.5GE, 5GE), share more code between drivers
(especially those using phylib), and encourage more uniform behavior.
Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc
----
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions,
and packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message encapsulation
or "Netlink polymorphism", where interpretation of nested attributes
depends on link type, classifier type or some other "class type".
Drivers
-------
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic
on CAN BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO) support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs to
have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXv0mgACgkQMUZtbf5S
IrtgMxAAuRd+WJW++SENr4KxIWhYO1q6Xcxnai43wrNkan9swD24icG8TYALt4f3
yoT6idQvWReAb5JNlh9rUQz8R7E0nJXlvEFn5MtJwcthx2C6wFo/XkJlddlRrT+j
c2xGILwLjRhW65LaC0MZ2ECbEERkFz8xcGfK2SWzUgh6KYvPjcRfKFxugpM7xOQK
P/Wnqhs4fVRS/Mj/bCcXcO+yhwC121Q3qVeQVjGS0AzEC65hAW87a/kc2BfgcegD
EyI9R7mf6criQwX+0awubjfoIdr4oW/8oDVNvUDczkJkbaEVaLMQk9P5x/0XnnVS
UHUchWXyI80Q8Rj12uN1/I0h3WtwNQnCRBuLSmtm6GLfCAwbLvp2nGWDnaXiqryW
DVKUIHGvqPKjkOOMOVfSvfB3LvkS3xsFVVYiQBQCn0YSs/gtu4CoF2Nty9CiLPbK
tTuxUnLdPDZDxU//l0VArZmP8p2JM7XQGJ+JH8GFH4SBTyBR23e0iyPSoyaxjnYn
RReDnHMVsrS1i7GPhbqDJWn+uqMSs7N149i0XmmyeqwQHUVSJN3J2BApP2nCaDfy
H2lTuYly5FfEezt61NvCE4qr/VsWeEjm1fYlFQ9dFn4pGn+HghyCpw+xD1ZN56DN
lujemau5B3kk1UTtAT4ypPqvuqjkRFqpNV2LzsJSk/Js+hApw8Y=
=oY52
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
- Some cleanup of the main index page for easier navigation
- Rework some of the other top-level pages for better readability and, with
luck, fewer merge conflicts in the future.
- Submit-checklist improvements, hopefully the first of many.
- New Italian translations
- A fair number of kernel-doc fixes and improvements. We have also dropped
the recommendation to use an old version of Sphinx.
- A new document from Thorsten on bisection
...and lots of fixes and updates.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmXvKVIACgkQF0NaE2wM
flik1gf/ZFS1mHwDdmHA/vpx8UxdUlFEo0Pms8V24iPSW5aEIqkZ406c9DSyMTtp
CXTzW+RSCfB1Q3ciYtakHBgv0RzZ5+RyaEZ1l7zVmMyw4nYvK6giYKmg8Y0EVPKI
fAVuPWo5iE7io0sNVbKBKJJkj9Z8QEScM48hv/CV1FblMvHYn0lie6muJrF9G6Ez
HND+hlYZtWkbRd5M86CDBiFeGMLVPx17T+psQyQIcbUYm9b+RUqZRHIVRLYbad7r
18r9+83DsOhXTVJCBBSfCSZwzF8yAm+eD1w47sxnSItF8OiIjqCzQgXs3BZe9TXH
h2YyeWbMN3xByA4mEgpmOPP44RW7Pg==
=SC60
-----END PGP SIGNATURE-----
Merge tag 'docs-6.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A moderatly busy cycle for development this time around.
- Some cleanup of the main index page for easier navigation
- Rework some of the other top-level pages for better readability
and, with luck, fewer merge conflicts in the future.
- Submit-checklist improvements, hopefully the first of many.
- New Italian translations
- A fair number of kernel-doc fixes and improvements. We have also
dropped the recommendation to use an old version of Sphinx.
- A new document from Thorsten on bisection
... and lots of fixes and updates"
* tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
docs: verify/bisect: fixes, finetuning, and support for Arch
docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
docs: submit-checklist: use subheadings
docs: submit-checklist: structure by category
docs: new text on bisecting which also covers bug validation
docs: drop the version constraints for sphinx and dependencies
docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
docs: Restore "smart quotes" for quotes
docs/zh_CN: accurate translation of "function"
docs: Include simplified link titles in main index
docs: Correct formatting of title in admin-guide/index.rst
docs: kernel_feat.py: fix build error for missing files
MAINTAINERS: Set the field name for subsystem profile section
kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
Fixed case issue with 'fault-injection' in documentation
kernel-doc: handle #if in enums as well
Documentation: update mailing list addresses
doc: kerneldoc.py: fix indentation
scripts/kernel-doc: simplify signature printing
...
Add a space (' ') prefix to every userdata line to match docs for
dev-kmsg. To account for this extra character in each userdata entry,
reduce userdata entry names (directory name) from 54 characters to 53.
According to the dev-kmsg docs, a space is used for subsequent lines to
mark them as continuation lines.
> A line starting with ' ', is a continuation line, adding
> key/value pairs to the log message, which provide the machine
> readable context of the message, for reliable processing in
> userspace.
Testing for this patch::
cd /sys/kernel/config/netconsole && mkdir cmdline0
cd cmdline0
mkdir userdata/test && echo "hello" > userdata/test/value
mkdir userdata/test2 && echo "hello2" > userdata/test2/value
echo "message" > /dev/kmsg
Outputs::
6.8.0-rc5-virtme,12,493,231373579,-;message
test=hello
test2=hello2
And I confirmed all testing works as expected from the original patchset
Fixes: df03f830d0 ("net: netconsole: cache userdata formatted string in netconsole_target")
Signed-off-by: Matthew Wood <thepacketgeek@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240308002525.248672-1-thepacketgeek@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
ethtool: ice: Support for RSS settings to GTP
Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.
A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`
In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.
gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.
We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.
The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.
We pick one device to be a primary (leader), and it fills a special
role. The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.
Currently, we limit the support to PFs only, and up to two devices
(sockets).
V6:
- Address documentation comments from Jakub.
V5:
- Address documentation comments from Przemek Kitszel.
V4:
- Improve documentation for better user observability and understanding
of the feature, in terms of queues and their expected NUMA/CPU/IRQ
affinity.
V3:
- Fix documentation per Jakubs feedback.
- Fix typos
- Link new documentation in the networking index.rst
V2:
- Add documentation in a new patch.
- Add debugfs in a new patch.
- Add mlx5_ifc bit for MPIR cap check and use it before query.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXpfYgACgkQSD+KveBX
+j5jIAf/VGIX/UQttq74MzK9pWgJNKtf7l8aSYtZuKXx68pmpr+25DfsxbKEeVfy
KzjvGFx5peoKisWILyaljQXSn7snmSqOsQf/IwDzmsmF/2ZTDyf6NPC6gND0bIjJ
Uu6cJ2T6Sa9ktg+ANz/gLDvGBBfPqSYTYIXrJnNQKsnW6nV8mDvy4WVf6etvCxOi
rMjfcqwNijf3GPTJd/qkaWhwneDG2AFWd5HzdORpNh6iuv8Cbc9aNhWgAPh18o7v
VWuAiFraTgaz6jj2H/NfziAk4ZrtVsCqhaFjJe3eLO+MCk/bZ/SizsAcR61JLkjL
pFqh5wqxA6v+5YJm4zVatZqPLIt4gQ==
=GZBa
-----END PGP SIGNATURE-----
Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Support Multi-PF netdev (Socket Direct)
This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.
We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.
The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.
We pick one device to be a primary (leader), and it fills a special
role. The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.
Currently, we limit the support to PFs only, and up to two devices
(sockets).
* tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
Documentation: networking: Add description for multi-pf netdev
net/mlx5: Enable SD feature
net/mlx5e: Block TLS device offload on combined SD netdev
net/mlx5e: Support per-mdev queue counter
net/mlx5e: Support cross-vhca RSS
net/mlx5e: Let channels be SD-aware
net/mlx5e: Create EN core HW resources for all secondary devices
net/mlx5e: Create single netdev per SD group
net/mlx5: SD, Add debugfs
net/mlx5: SD, Add informative prints in kernel log
net/mlx5: SD, Implement steering for primary and secondaries
net/mlx5: SD, Implement devcom communication and primary election
net/mlx5: SD, Implement basic query and instantiation
net/mlx5: SD, Introduce SD lib
net/mlx5: Add MPIR bit in mcam_access_reg
====================
Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.
Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.
The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).
Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that phylink has a comprehensive PCS support, update the porting
guide to explain the process of supporting the PCS configuration. This
also removed outdated references to phylink_config fields that no longer
exists.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Following the addition of new GTP RSS hash options to ethtool.h, this patch
implements the corresponding RSS settings for GTP packets in the Intel ice
driver. It enables users to configure RSS for GTP-U and GTP-C traffic over IPv4
and IPv6, utilizing the newly defined hash options.
The implementation covers the handling of gtpu(4|6), gtpc(4|6), gtpc(4|6)t,
gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d traffic, providing enhanced load
distribution for GTP traffic across multiple processing units.
Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI
g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg
D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg=
=mTAj
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-02-29
We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
The main changes are:
1) Extend the BPF verifier to enable static subprog calls in spin lock
critical sections, from Kumar Kartikeya Dwivedi.
2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
in BPF global subprogs, from Andrii Nakryiko.
3) Larger batch of riscv BPF JIT improvements and enabling inlining
of the bpf_kptr_xchg() for RV64, from Pu Lehui.
4) Allow skeleton users to change the values of the fields in struct_ops
maps at runtime, from Kui-Feng Lee.
5) Extend the verifier's capabilities of tracking scalars when they
are spilled to stack, especially when the spill or fill is narrowing,
from Maxim Mikityanskiy & Eduard Zingerman.
6) Various BPF selftest improvements to fix errors under gcc BPF backend,
from Jose E. Marchesi.
7) Avoid module loading failure when the module trying to register
a struct_ops has its BTF section stripped, from Geliang Tang.
8) Annotate all kfuncs in .BTF_ids section which eventually allows
for automatic kfunc prototype generation from bpftool, from Daniel Xu.
9) Several updates to the instruction-set.rst IETF standardization
document, from Dave Thaler.
10) Shrink the size of struct bpf_map resp. bpf_array,
from Alexei Starovoitov.
11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
from Benjamin Tissoires.
12) Fix bpftool to be more portable to musl libc by using POSIX's
basename(), from Arnaldo Carvalho de Melo.
13) Add libbpf support to gcc in CORE macro definitions,
from Cupertino Miranda.
14) Remove a duplicate type check in perf_event_bpf_event,
from Florian Lehner.
15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
with notrace correctly, from Yonghong Song.
16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
array to fix build warnings, from Kees Cook.
17) Fix resolve_btfids cross-compilation to non host-native endianness,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
selftests/bpf: Test if shadow types work correctly.
bpftool: Add an example for struct_ops map and shadow type.
bpftool: Generated shadow variables for struct_ops maps.
libbpf: Convert st_ops->data to shadow type.
libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
bpf, arm64: use bpf_prog_pack for memory management
arm64: patching: implement text_poke API
bpf, arm64: support exceptions
arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
bpf: add is_async_callback_calling_insn() helper
bpf: introduce in_sleepable() helper
bpf: allow more maps in sleepable bpf programs
selftests/bpf: Test case for lacking CFI stub functions.
bpf: Check cfi_stubs before registering a struct_ops type.
bpf: Clarify batch lookup/lookup_and_delete semantics
bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
bpf, docs: Fix typos in instruction-set.rst
selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
bpf: Shrink size of struct bpf_map/bpf_array.
...
====================
Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fast path usage breakdown describes the detail for 'inet_sock', fix
the markup title.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing documentation was not telling that one has to create a PPP
channel and a PPP interface to get PPPoL2TP data offloading working.
Also, tunnel switching was not mentioned, so that people were thinking
it was not supported, while it actually is.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Tom Parkin <tparkin@katalix.com>
Link: https://lore.kernel.org/r/20240217211425.qj576u3jmaa6yidf@begin
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mailman2 server running on lists.linuxfoundation.org will be shut
down in very imminent future. Update all instances of obsolete list
addresses throughout the tree with their new destinations.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240214-lf-org-list-migration-v1-1-ef1eab4b1543@linuxfoundation.org
If the preferred lifetime was less than the minimum required lifetime,
ipv6_create_tempaddr would error out without creating any new address.
On my machine and network, this error happened immediately with the
preferred lifetime set to 5 seconds or less, after a few minutes with
the preferred lifetime set to 6 seconds, and not at all with the
preferred lifetime set to 7 seconds. During my investigation, I found a
Stack Exchange post from another person who seems to have had the same
problem: They stopped getting new addresses if they lowered the
preferred lifetime below 3 seconds, and they didn't really know why.
The preferred lifetime is a preference, not a hard requirement. The
kernel does not strictly forbid new connections on a deprecated address,
nor does it guarantee that the address will be disposed of the instant
its total valid lifetime expires. So rather than disable IPv6 privacy
extensions altogether if the minimum required lifetime swells above the
preferred lifetime, it is more in keeping with the user's intent to
increase the temporary address's lifetime to the minimum necessary for
the current network conditions.
With these fixes, setting the preferred lifetime to 5 or 6 seconds "just
works" because the extra fraction of a second is practically
unnoticeable. It's even possible to reduce the time before deprecation
to 1 or 2 seconds by setting /proc/sys/net/ipv6/conf/*/regen_min_advance
and /proc/sys/net/ipv6/conf/*/dad_transmits to 0. I realize that that is
a pretty niche use case, but I know at least one person who would gladly
sacrifice performance and convenience to be sure that they are getting
the maximum possible level of privacy.
Link: https://serverfault.com/a/1031168/310447
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In RFC 8981, REGEN_ADVANCE cannot be less than 2 seconds, and the RFC
does not permit the creation of temporary addresses with lifetimes
shorter than that:
> When processing a Router Advertisement with a
> Prefix Information option carrying a prefix for the purposes of
> address autoconfiguration (i.e., the A bit is set), the host MUST
> perform the following steps:
> 5. A temporary address is created only if this calculated preferred
> lifetime is greater than REGEN_ADVANCE time units.
However, some users want to change their IPv6 address as frequently as
possible regardless of the RFC's arbitrary minimum lifetime. For the
benefit of those users, add a regen_min_advance sysctl parameter that
can be set to below or above 2 seconds.
Link: https://datatracker.ietf.org/doc/html/rfc8981
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
RFC 8981 defines REGEN_ADVANCE as follows:
REGEN_ADVANCE = 2 + (TEMP_IDGEN_RETRIES * DupAddrDetectTransmits * RetransTimer / 1000)
Thus, allowing it to be less than 2 seconds is technically a protocol
violation.
Link: https://datatracker.ietf.org/doc/html/rfc8981#name-defined-protocol-parameters
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Change comments incorrectly mentioning dev_base_lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmXLUSQTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAoOKI+ei28b6VMB/0eqFcC233/c60/7iEbxXTGG52qs4mc
4LeTs57+4Msfibq7M81ZzBuZoMqFluFELunYT5gDPXgnSn4AWXyCv9ciYCW8vort
Z/2wcSNUMdOIbmKZhdc96gnqXuE6fNMx/eYTsn34HBkMkM7BfxZSIH3pZsys+eGw
JrVwhT2aBVKG5ji4YPZF/RuqHwuM00GLMs9G9GR6yw9JiCwI1n+Jjru/6zwJprpi
NAyLhJGgvgp+twLID2jH2Gy6Mqs/ZrXMyxPMqycbYOtZ4oQJOfTkg1SXzT/J3GsY
VFWvhGWrADSx7CnISuS9VXsoWpe5nZ7yMhFBOtKME3Gh3qmhQegPIMY3
=w4J5
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-6.9-20240213' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
linux-can-next-for-6.9-20240213
this is a pull request of 23 patches for net-next/master.
The first patch is by Nicolas Maier and targets the CAN Broadcast
Manager (bcm), it adds message flags to distinguish between own local
and remote traffic.
Oliver Hartkopp contributes a patch for the CAN ISOTP protocol that
adds dynamic flow control parameters.
Stefan Mätje's patch series add support for the esd PCIe/402 CAN
interface family.
Markus Schneider-Pargmann contributes 14 patches for the m_can to
optimize for the SPI attached tcan4x5x controller.
A patch by Vincent Mailhol replaces Wolfgang Grandegger by Vincent
Mailhol as the CAN drivers Co-Maintainer.
Jimmy Assarsson's patch add support for the Kvaser M.2 PCIe 4xCAN
adapter.
A patch by Daniil Dulov removed a redundant NULL check in the softing
driver.
Oliver Hartkopp contributes a patch to add CANXL virtual CAN network
identifier support.
A patch by myself removes Naga Sureshkumar Relli as the maintainer of
the xilinx_can driver, as their email bounces.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CAN RAW sockets allow userspace to tell if a received CAN frame comes
from the same socket, another socket on the same host, or another host.
See commit 1e55659ce6 ("can-raw: add msg_flags to distinguish local
traffic"). However, this feature is missing in CAN BCM sockets.
Add the same feature to CAN BCM sockets. When reading a received frame
(opcode RX_CHANGED) using recvmsg, two flags in msg->msg_flags may be
set following the previous convention (from CAN RAW), to distinguish
between 'own', 'local' and 'remote' CAN traffic.
Update the documentation to reflect this change.
Signed-off-by: Nicolas Maier <nicolas.maier.dev@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20240120081018.2319-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
dev->lstats is notably used from loopback ndo_start_xmit()
and other virtual drivers.
Per cpu stats updates are dirtying per-cpu data,
but the pointer itself is read-only.
Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->tcp_usec_ts is a read mostly field, used in rx and tx fast paths.
Fixes: d5fed5addb ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->scaling_ratio is a read mostly field, used in rx and tx fast paths.
Fixes: d5fed5addb ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add driver framework and device setup and initialization for Octeon
PCI Endpoint NIC VF.
Add implementation to load module, initialize, register network device,
cleanup and unload module.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On early detection of wwan device in fastboot mode, driver sets
up CLDMA0 HW tx/rx queues for raw data transfer and then create
fastboot port to userspace.
Application can use this port to flash firmware and collect
core dump by fastboot protocol commands.
E.g., flash firmware through fastboot port:
- "download:%08x": write data to memory with the download size.
- "flash:%s": write the previously downloaded image to the named partition.
- "reboot": reboot the device.
Link: https://android.googlesource.com/platform/system/core/+/refs/heads/main/fastboot/README.md
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for userspace to get/set the device mode, device's state
machine changes between (unknown/ready/reset/fastboot).
Get the device state mode:
- 'cat /sys/bus/pci/devices/${bdf}/t7xx_mode'
Set the device state mode:
- reset(cold reset): 'echo reset > /sys/bus/pci/devices/${bdf}/t7xx_mode'
- fastboot: 'echo fastboot_switching > /sys/bus/pci/devices/${bdf}/t7xx_mode'
Reload driver to get the new device state after setting operation.
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new User Data section to the netconsole docs to describe the
appending of user data capability (for netconsole dynamic configuration)
with usage and netconsole output examples.
Co-developed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Matthew Wood <thepacketgeek@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Command example string is not read as command.
Fix command annotation.
Fixes: a8ce7b26a5 ("devlink: Expose port function commands to control migratable")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240206161717.466653-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1) IPSec global stats for xfrm and mlx5
2) XSK memory improvements for non-linear SKBs
3) Software steering debug dump to use seq_file ops
4) Various code clean-ups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXBgUEACgkQSD+KveBX
+j4qmQgAr5pNBUkxyylEKU0cB8kh21LaOzimVjpGrKCzPtb/Fz8h6E1h668icpFy
2G4iuyJ2/doA0Bn4FJC730i6Xk2sqkXiY85yclvMPObS7b0PckD72d6iH9UfzfYB
Vof1lg3HdVvTq1KhJ1T/sCEB2skBLzEGV/fZWmL39zPaxM9JfHA2o5xDxdW7/GhT
0LPg84MinikoKbfw9m45l2zp52yTRrLzTUbDubwXFC4Ltg/yM4uAsbTArQzA8VJ9
q95XGZ0ZmZDFsVbjp9suCJFaTVrKmh2yy16a3BpKpm72L+K8yciUTqcqDd5vH2wW
CQ0eKMqBtB8c1Jr16oKfQ42NFF0FIQ==
=0IjR
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-02-01
1) IPSec global stats for xfrm and mlx5
2) XSK memory improvements for non-linear SKBs
3) Software steering debug dump to use seq_file ops
4) Various code clean-ups
* tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: XDP, Exclude headroom and tailroom from memory calculations
net/mlx5e: XSK, Exclude tailroom from non-linear SKBs memory calculations
net/mlx5: DR, Change SWS usage to debug fs seq_file interface
net/mlx5: Change missing SyncE capability print to debug
net/mlx5: Remove initial segmentation duplicate definitions
net/mlx5: Return specific error code for timeout on wait_fw_init
net/mlx5: SF, Stop waiting for FW as teardown was called
net/mlx5: remove fw reporter dump option for non PF
net/mlx5: remove fw_fatal reporter dump option for non PF
net/mlx5: Rename mlx5_sf_dev_remove
Documentation: Fix counter name of mlx5 vnic reporter
net/mlx5e: Delete obsolete IPsec code
net/mlx5e: Connect mlx5 IPsec statistics with XFRM core
xfrm: get global statistics from the offloaded device
xfrm: generalize xdo_dev_state_update_curlft to allow statistics update
====================
Link: https://lore.kernel.org/r/20240206005527.1353368-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for the independent control state machine per IEEE
802.1AX-2008 5.4.15 in addition to the existing implementation of the
coupled control state machine.
Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in
the LACP MUX state machine for separated handling of an initial
Collecting state before the Collecting and Distributing state. This
enables a port to be in a state where it can receive incoming packets
while not still distributing. This is useful for reducing packet loss when
a port begins distributing before its partner is able to collect.
Added new functions such as bond_set_slave_tx_disabled_flags and
bond_set_slave_rx_enabled_flags to precisely manage the port's collecting
and distributing states. Previously, there was no dedicated method to
disable TX while keeping RX enabled, which this patch addresses.
Note that the regular flow process in the kernel's bonding driver remains
unaffected by this patch. The extension requires explicit opt-in by the
user (in order to ensure no disruptions for existing setups) via netlink
support using the new bonding parameter coupled_control. The default value
for coupled_control is set to 1 so as to preserve existing behaviour.
Signed-off-by: Aahil Awatramani <aahila@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix counter name in documentation of mlx5 vnic health reporter diagnose
output: total_error_queues.
While here fix alignment in the documentation file of another counter,
comp_eq_overrun, as it should have its own line and not be part of
another counter's description.
Example:
$ devlink health diagnose pci/0000:00:04.0 reporter vnic
vNIC env counters:
total_error_queues: 0 send_queue_priority_update_flow: 0
comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
invalid_command: 0 quota_exceeded_command: 0
nic_receive_steering_discard: 0
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In order to allow drivers to fill all statistics, change the name
of xdo_dev_state_update_curlft to be xdo_dev_state_update_stats.
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch contains more details about the functionality
of RX copybreak.
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a note when using esw_port_metadata. The parameter has runtime
mode but setting it does not take effect immediately. Setting it must
happen in legacy mode, and the port metadata takes effects when the
switchdev mode is enabled.
Disable eswitch port metadata::
$ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value \
false cmode runtime
Change eswitch mode to switchdev mode where after choosing the metadata value::
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
Note that other mlx5 devlink runtime parameters, esw_multiport and
flow_steering_mode, do not have this limitation.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The minimum Sphinx requirement has been raised to 2.4.4, following a
warning that was added in 6.2.
- Some reworking of the Documentation/process front page to, hopefully,
make it more useful.
- Various kernel-doc tweaks to, for example, make it deal properly with
__counted_by annotations.
- We have also restored a warning for documentation of nonexistent
structure members that disappeared a while back. That had the delightful
consequence of adding some 600 warnings to the docs build. A sustained
effort by Randy, Vegard, and myself has addressed almost all of those,
bringing the documentation back into sync with the code. The fixes are
going through the appropriate maintainer trees.
- Various improvements to the HTML rendered docs, including automatic links
to Git revisions and a nice new pulldown to make translations easy to
access.
- Speaking of translations, more of those for Spanish and Chinese.
...plus the usual stream of documentation updates and typo fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmWcRKMPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YTKIH/AxBt/3iWt40dPf18arZHLU6tdUbmg01ttef
CNKWkniCmABGKc//KYDXvjZMRDt0YlrS0KgUzrb8nIQTBlZG40D+88EwjXE0HeGP
xt1Fk7OPOiJEqBZ3HEe0PDVfOiA+4yR6CmDKklCJuKg77X9atklneBwPUw/cOASk
CWj+BdbwPBiSNQv48Lp87rGusKwnH/g0MN2uS0z9MPr1DYjM1K8+ngZjGW24lZHt
qs5yhP43mlZGBF/lwNJXQp/xhnKAqJ9XwylBX9Wmaoxaz9yyzNVsADGvROMudgzi
9YB+Jdy7Z0JSrVoLIRhUuDOv7aW8vk+8qLmGJt2aTIsqehbQ6pk=
=fCtT
-----END PGP SIGNATURE-----
Merge tag 'docs-6.8' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"Another moderately busy cycle for documentation, including:
- The minimum Sphinx requirement has been raised to 2.4.4, following
a warning that was added in 6.2
- Some reworking of the Documentation/process front page to,
hopefully, make it more useful
- Various kernel-doc tweaks to, for example, make it deal properly
with __counted_by annotations
- We have also restored a warning for documentation of nonexistent
structure members that disappeared a while back. That had the
delightful consequence of adding some 600 warnings to the docs
build. A sustained effort by Randy, Vegard, and myself has
addressed almost all of those, bringing the documentation back into
sync with the code. The fixes are going through the appropriate
maintainer trees
- Various improvements to the HTML rendered docs, including automatic
links to Git revisions and a nice new pulldown to make translations
easy to access
- Speaking of translations, more of those for Spanish and Chinese
... plus the usual stream of documentation updates and typo fixes"
* tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits)
MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL
A reworked process/index.rst
ring-buffer/Documentation: Add documentation on buffer_percent file
Translated the RISC-V architecture boot documentation.
Docs: remove mentions of fdformat from util-linux
Docs/zh_CN: Fix the meaning of DEBUG to pr_debug()
Documentation: move driver-api/dcdbas to userspace-api/
Documentation: move driver-api/isapnp to userspace-api/
Documentation/core-api : fix typo in workqueue
Documentation/trace: Fixed typos in the ftrace FLAGS section
kernel-doc: handle a void function without producing a warning
scripts/get_abi.pl: ignore some temp files
docs: kernel_abi.py: fix command injection
scripts/get_abi: fix source path leak
CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer
docs: translations: add translations links when they exist
kernel-doc: Align quick help and the code
MAINTAINERS: add reviewer for Spanish translations
docs: ignore __counted_by attribute in structure definitions
scripts: kernel-doc: Clarify missing struct member description
..
Core & protocols
----------------
- Analyze and reorganize core networking structs (socks, netdev,
netns, mibs) to optimize cacheline consumption and set up
build time warnings to safeguard against future header changes.
This improves TCP performances with many concurrent connections
up to 40%.
- Add page-pool netlink-based introspection, exposing the
memory usage and recycling stats. This helps indentify
bad PP users and possible leaks.
- Refine TCP/DCCP source port selection to no longer favor even
source port at connect() time when IP_LOCAL_PORT_RANGE is set.
This lowers the time taken by connect() for hosts having
many active connections to the same destination.
- Refactor the TCP bind conflict code, shrinking related socket
structs.
- Refactor TCP SYN-Cookie handling, as a preparation step to
allow arbitrary SYN-Cookie processing via eBPF.
- Tune optmem_max for 0-copy usage, increasing the default value
to 128KB and namespecifying it.
- Allow coalescing for cloned skbs coming from page pools, improving
RX performances with some common configurations.
- Reduce extension header parsing overhead at GRO time.
- Add bridge MDB bulk deletion support, allowing user-space to
request the deletion of matching entries.
- Reorder nftables struct members, to keep data accessed by the
datapath first.
- Introduce TC block ports tracking and use. This allows supporting
multicast-like behavior at the TC layer.
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
classifiers (RSVP and tcindex).
- More data-race annotations.
- Extend the diag interface to dump TCP bound-only sockets.
- Conditional notification of events for TC qdisc class and actions.
- Support for WPAN dynamic associations with nearby devices, to form
a sub-network using a specific PAN ID.
- Implement SMCv2.1 virtual ISM device support.
- Add support for Batman-avd mulicast packet type.
BPF
---
- Tons of verifier improvements:
- BPF register bounds logic and range support along with a large
test suite
- log improvements
- complete precision tracking support for register spills
- track aligned STACK_ZERO cases as imprecise spilled registers. It
improves the verifier "instructions processed" metric from single
digit to 50-60% for some programs
- support for user's global BPF subprogram arguments with few
commonly requested annotations for a better developer experience
- support tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
like
- several fixes
- Add initial TX metadata implementation for AF_XDP with support in
mlx5 and stmmac drivers. Two types of offloads are supported right
now, that is, TX timestamp and TX checksum offload.
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into
kernel and from kernel into BPF work with CFI enabled. This allows
BPF to work with CONFIG_FINEIBT=y.
- Change BPF verifier logic to validate global subprograms lazily
instead of unconditionally before the main program, so they can be
guarded using BPF CO-RE techniques.
- Support uid/gid options when mounting bpffs.
- Add a new kfunc which acquires the associated cgroup of a task
within a specific cgroup v1 hierarchy where the latter is identified
by its id.
- Extend verifier to allow bpf_refcount_acquire() of a map value field
obtained via direct load which is a use-case needed in sched_ext.
- Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter.
- Support for VLAN tag in XDP hints.
- Remove deprecated bpfilter kernel leftovers given the project
is developed in user-space (https://github.com/facebook/bpfilter).
Misc
----
- Support for parellel TC self-tests execution.
- Increase MPTCP self-tests coverage.
- Updated the bridge documentation, including several so-far
undocumented features.
- Convert all the net self-tests to run in unique netns, to
avoid random failures due to conflict and allow concurrent
runs.
- Add TCP-AO self-tests.
- Add kunit tests for both cfg80211 and mac80211.
- Autogenerate Netlink families documentation from YAML spec.
- Add yml-gen support for fixed headers and recursive nests, the
tool can now generate user-space code for all genetlink families
for which we have specs.
- A bunch of additional module descriptions fixes.
- Catch incorrect freeing of pages belonging to a page pool.
Driver API
----------
- Rust abstractions for network PHY drivers; do not cover yet the
full C API, but already allow implementing functional PHY drivers
in rust.
- Introduce queue and NAPI support in the netdev Netlink interface,
allowing complete access to the device <> NAPIs <> queues
relationship.
- Introduce notifications filtering for devlink to allow control
application scale to thousands of instances.
- Improve PHY validation, requesting rate matching information for
each ethtool link mode supported by both the PHY and host.
- Add support for ethtool symmetric-xor RSS hash.
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
platform.
- Expose pin fractional frequency offset value over new DPLL generic
netlink attribute.
- Convert older drivers to platform remove callback returning void.
- Add support for PHY package MMD read/write.
New hardware / drivers
----------------------
- Ethernet:
- Octeon CN10K devices
- Broadcom 5760X P7
- Qualcomm SM8550 SoC
- Texas Instrument DP83TG720S PHY
- Bluetooth:
- IMC Networks Bluetooth radio
Removed
-------
- WiFi:
- libertas 16-bit PCMCIA support
- Atmel at76c50x drivers
- HostAP ISA/PCMCIA style 802.11b driver
- zd1201 802.11b USB dongles
- Orinoco ISA/PCMCIA 802.11b driver
- Aviator/Raytheon driver
- Planet WL3501 driver
- RNDIS USB 802.11b driver
Drivers
-------
- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- allow one by one port representors creation and removal
- add temperature and clock information reporting
- add get/set for ethtool's header split ringparam
- add again FW logging
- adds support switchdev hardware packet mirroring
- iavf: implement symmetric-xor RSS hash
- igc: add support for concurrent physical and free-running timers
- i40e: increase the allowable descriptors
- nVidia/Mellanox:
- Preparation for Socket-Direct multi-dev netdev. That will allow
in future releases combining multiple PFs devices attached to
different NUMA nodes under the same netdev
- Broadcom (bnxt):
- TX completion handling improvements
- add basic ntuple filter support
- reduce MSIX vectors usage for MQPRIO offload
- add VXLAN support, USO offload and TX coalesce completion for P7
- Marvell Octeon EP:
- xmit-more support
- add PF-VF mailbox support and use it for FW notifications for VFs
- Wangxun (ngbe/txgbe):
- implement ethtool functions to operate pause param, ring param,
coalesce channel number and msglevel
- Netronome/Corigine (nfp):
- add flow-steering support
- support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual:
- Xilinx AXI: remove duplicate DMA code adopting the dma engine driver
- stmmac: add support for HW-accelerated VLAN stripping
- TI AM654x sw: add mqprio, frame preemption & coalescing
- gve: add support for non-4k page sizes.
- virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches:
- allow firmware upgrade without a reboot
- more flexible support for bridge flooding via the compressed
FID flooding mode
- Ethernet embedded switches:
- Microchip:
- fine-tune flow control and speed configurations in KSZ8xxx
- KSZ88X3: enable setting rmii reference
- Renesas:
- add jumbo frames support
- Marvell:
- 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs:
- aquantia: add firmware load support
- at803x: refactor the driver to simplify adding support for more
chip variants
- NXP C45 TJA11xx: Add MACsec offload support
- Wifi:
- MediaTek (mt76):
- NVMEM EEPROM improvements
- mt7996 Extremely High Throughput (EHT) improvements
- mt7996 Wireless Ethernet Dispatcher (WED) support
- mt7996 36-bit DMA support
- Qualcomm (ath12k):
- support for a single MSI vector
- WCN7850: support AP mode
- Intel (iwlwifi):
- new debugfs file fw_dbg_clear
- allow concurrent P2P operation on DFS channels
- Bluetooth:
- QCA2066: support HFP offload
- ISO: more broadcast-related improvements
- NXP: better recovery in case receiver/transmitter get out of sync
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWdamsSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkGC4P/2xjLzdw22ckSssuE9ORbGko9SNjnqHk
PQh1E+26BHiCg5KB8VvzMsL78E79MRNXEattSW+1g7dhCvln3oi+Vd0WkdRkgt35
98Iv18zLbbwFAJeyKvmLAPAkQkMLtVj19QILBBRrugF+egEZgVSE3JBcTAiKv2ZQ
HzkabA171Ri6LpCcEEtY5XuaKvimGnGzF8YMFf8rX0wtqd2p5kbY9aMe47WAGxvU
Vf9548XvH+A5yVH2/4/gujtUOpA/RHuhuCMb+oo0cZ+VCC1x9MGzoXzj6r87OTkf
k2W1whNzcGoin92f+9Lk1JYMuiGKBH4QVaDdNXJnYFSJWPTE7RvRsPzYTSD4/GzK
yEZbzSJXpy/2vDQm16NoAxl7evRs8Sorzkw4LQRviZHI/5SAkK2ZQiCK5CO8QSYy
C1LELcV5kn6Foe24xWnrWLjAGug9oJnYoGPMU5gvPmFJMvUMXqm5rmbBgUWL5Rxw
q1M6gVzabCyWUy6z2G2vaqW2ZntNVvCkdsLtIX0XZkcTzNoP0MA+TuhyGz4wbiuo
PeyQp/mbGnDgCYggqKIA0YWrTVxkhFrKN520cbO8qXBQytV9oFbM/0/+C0/r/5WX
pL1JVzLrh6l5ME7EIQfha8UOF9j8q4ueSwb40P3AR2NaZiDABM0zfUZ6+sx+91WF
ucqPEcZB5cRE
=1bW6
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most interesting thing is probably the networking structs
reorganization and a significant amount of changes is around
self-tests.
Core & protocols:
- Analyze and reorganize core networking structs (socks, netdev,
netns, mibs) to optimize cacheline consumption and set up build
time warnings to safeguard against future header changes
This improves TCP performances with many concurrent connections up
to 40%
- Add page-pool netlink-based introspection, exposing the memory
usage and recycling stats. This helps indentify bad PP users and
possible leaks
- Refine TCP/DCCP source port selection to no longer favor even
source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
lowers the time taken by connect() for hosts having many active
connections to the same destination
- Refactor the TCP bind conflict code, shrinking related socket
structs
- Refactor TCP SYN-Cookie handling, as a preparation step to allow
arbitrary SYN-Cookie processing via eBPF
- Tune optmem_max for 0-copy usage, increasing the default value to
128KB and namespecifying it
- Allow coalescing for cloned skbs coming from page pools, improving
RX performances with some common configurations
- Reduce extension header parsing overhead at GRO time
- Add bridge MDB bulk deletion support, allowing user-space to
request the deletion of matching entries
- Reorder nftables struct members, to keep data accessed by the
datapath first
- Introduce TC block ports tracking and use. This allows supporting
multicast-like behavior at the TC layer
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
classifiers (RSVP and tcindex)
- More data-race annotations
- Extend the diag interface to dump TCP bound-only sockets
- Conditional notification of events for TC qdisc class and actions
- Support for WPAN dynamic associations with nearby devices, to form
a sub-network using a specific PAN ID
- Implement SMCv2.1 virtual ISM device support
- Add support for Batman-avd mulicast packet type
BPF:
- Tons of verifier improvements:
- BPF register bounds logic and range support along with a large
test suite
- log improvements
- complete precision tracking support for register spills
- track aligned STACK_ZERO cases as imprecise spilled registers.
This improves the verifier "instructions processed" metric from
single digit to 50-60% for some programs
- support for user's global BPF subprogram arguments with few
commonly requested annotations for a better developer
experience
- support tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
like
- several fixes
- Add initial TX metadata implementation for AF_XDP with support in
mlx5 and stmmac drivers. Two types of offloads are supported right
now, that is, TX timestamp and TX checksum offload
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into
kernel and from kernel into BPF work with CFI enabled. This allows
BPF to work with CONFIG_FINEIBT=y
- Change BPF verifier logic to validate global subprograms lazily
instead of unconditionally before the main program, so they can be
guarded using BPF CO-RE techniques
- Support uid/gid options when mounting bpffs
- Add a new kfunc which acquires the associated cgroup of a task
within a specific cgroup v1 hierarchy where the latter is
identified by its id
- Extend verifier to allow bpf_refcount_acquire() of a map value
field obtained via direct load which is a use-case needed in
sched_ext
- Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter
- Support for VLAN tag in XDP hints
- Remove deprecated bpfilter kernel leftovers given the project is
developed in user-space (https://github.com/facebook/bpfilter)
Misc:
- Support for parellel TC self-tests execution
- Increase MPTCP self-tests coverage
- Updated the bridge documentation, including several so-far
undocumented features
- Convert all the net self-tests to run in unique netns, to avoid
random failures due to conflict and allow concurrent runs
- Add TCP-AO self-tests
- Add kunit tests for both cfg80211 and mac80211
- Autogenerate Netlink families documentation from YAML spec
- Add yml-gen support for fixed headers and recursive nests, the tool
can now generate user-space code for all genetlink families for
which we have specs
- A bunch of additional module descriptions fixes
- Catch incorrect freeing of pages belonging to a page pool
Driver API:
- Rust abstractions for network PHY drivers; do not cover yet the
full C API, but already allow implementing functional PHY drivers
in rust
- Introduce queue and NAPI support in the netdev Netlink interface,
allowing complete access to the device <> NAPIs <> queues
relationship
- Introduce notifications filtering for devlink to allow control
application scale to thousands of instances
- Improve PHY validation, requesting rate matching information for
each ethtool link mode supported by both the PHY and host
- Add support for ethtool symmetric-xor RSS hash
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
platform
- Expose pin fractional frequency offset value over new DPLL generic
netlink attribute
- Convert older drivers to platform remove callback returning void
- Add support for PHY package MMD read/write
New hardware / drivers:
- Ethernet:
- Octeon CN10K devices
- Broadcom 5760X P7
- Qualcomm SM8550 SoC
- Texas Instrument DP83TG720S PHY
- Bluetooth:
- IMC Networks Bluetooth radio
Removed:
- WiFi:
- libertas 16-bit PCMCIA support
- Atmel at76c50x drivers
- HostAP ISA/PCMCIA style 802.11b driver
- zd1201 802.11b USB dongles
- Orinoco ISA/PCMCIA 802.11b driver
- Aviator/Raytheon driver
- Planet WL3501 driver
- RNDIS USB 802.11b driver
Driver updates:
- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- allow one by one port representors creation and removal
- add temperature and clock information reporting
- add get/set for ethtool's header split ringparam
- add again FW logging
- adds support switchdev hardware packet mirroring
- iavf: implement symmetric-xor RSS hash
- igc: add support for concurrent physical and free-running
timers
- i40e: increase the allowable descriptors
- nVidia/Mellanox:
- Preparation for Socket-Direct multi-dev netdev. That will
allow in future releases combining multiple PFs devices
attached to different NUMA nodes under the same netdev
- Broadcom (bnxt):
- TX completion handling improvements
- add basic ntuple filter support
- reduce MSIX vectors usage for MQPRIO offload
- add VXLAN support, USO offload and TX coalesce completion
for P7
- Marvell Octeon EP:
- xmit-more support
- add PF-VF mailbox support and use it for FW notifications
for VFs
- Wangxun (ngbe/txgbe):
- implement ethtool functions to operate pause param, ring
param, coalesce channel number and msglevel
- Netronome/Corigine (nfp):
- add flow-steering support
- support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual:
- Xilinx AXI: remove duplicate DMA code adopting the dma engine
driver
- stmmac: add support for HW-accelerated VLAN stripping
- TI AM654x sw: add mqprio, frame preemption & coalescing
- gve: add support for non-4k page sizes.
- virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches:
- allow firmware upgrade without a reboot
- more flexible support for bridge flooding via the compressed
FID flooding mode
- Ethernet embedded switches:
- Microchip:
- fine-tune flow control and speed configurations in KSZ8xxx
- KSZ88X3: enable setting rmii reference
- Renesas:
- add jumbo frames support
- Marvell:
- 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs:
- aquantia: add firmware load support
- at803x: refactor the driver to simplify adding support for more
chip variants
- NXP C45 TJA11xx: Add MACsec offload support
- Wifi:
- MediaTek (mt76):
- NVMEM EEPROM improvements
- mt7996 Extremely High Throughput (EHT) improvements
- mt7996 Wireless Ethernet Dispatcher (WED) support
- mt7996 36-bit DMA support
- Qualcomm (ath12k):
- support for a single MSI vector
- WCN7850: support AP mode
- Intel (iwlwifi):
- new debugfs file fw_dbg_clear
- allow concurrent P2P operation on DFS channels
- Bluetooth:
- QCA2066: support HFP offload
- ISO: more broadcast-related improvements
- NXP: better recovery in case receiver/transmitter get out of sync"
* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
lan78xx: remove redundant statement in lan78xx_get_eee
lan743x: remove redundant statement in lan743x_ethtool_get_eee
bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
tcp: Revert no longer abort SYN_SENT when receiving some ICMP
Revert "mlx5 updates 2023-12-20"
Revert "net: stmmac: Enable Per DMA Channel interrupt"
ipvlan: Remove usage of the deprecated ida_simple_xx() API
ipvlan: Fix a typo in a comment
net/sched: Remove ipt action tests
net: stmmac: Use interrupt mode INTM=1 for per channel irq
net: stmmac: Add support for TX/RX channel interrupt
net: stmmac: Make MSI interrupt routine generic
dt-bindings: net: snps,dwmac: per channel irq
net: phy: at803x: make read_status more generic
net: phy: at803x: add support for cdt cross short test for qca808x
net: phy: at803x: refactor qca808x cable test get status function
net: phy: at803x: generalize cdt fault length function
net: ethernet: cortina: Drop TSO support
...
are included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the
series
"maple_tree: add mt_free_one() and mt_attr() helpers"
"Some cleanups of maple tree"
- In the series "mm: use memmap_on_memory semantics for dax/kmem"
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few
fixes) in the patch series
"Add folio_zero_tail() and folio_fill_tail()"
"Make folio_start_writeback return void"
"Fix fault handler's handling of poisoned tail pages"
"Convert aops->error_remove_page to ->error_remove_folio"
"Finish two folio conversions"
"More swap folio conversions"
- Kefeng Wang has also contributed folio-related work in the series
"mm: cleanup and use more folio in page fault"
- Jim Cromie has improved the kmemleak reporting output in the
series "tweak kmemleak report format".
- In the series "stackdepot: allow evicting stack traces" Andrey
Konovalov to permits clients (in this case KASAN) to cause
eviction of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series "mm:
page_alloc: fixes for high atomic reserve caluculations".
- Dmitry Rokosov has added to the samples/ dorectory some sample
code for a userspace memcg event listener application. See the
series "samples: introduce cgroup events listeners".
- Some mapletree maintanance work from Liam Howlett in the series
"maple_tree: iterator state changes".
- Nhat Pham has improved zswap's approach to writeback in the
series "workload-specific and memory pressure-driven zswap
writeback".
- DAMON/DAMOS feature and maintenance work from SeongJae Park in
the series
"mm/damon: let users feed and tame/auto-tune DAMOS"
"selftests/damon: add Python-written DAMON functionality tests"
"mm/damon: misc updates for 6.8"
- Yosry Ahmed has improved memcg's stats flushing in the series
"mm: memcg: subtree stats flushing and thresholds".
- In the series "Multi-size THP for anonymous memory" Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series "More buffer_head
cleanups".
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
"userfaultfd move option". UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a "KSM Advisor", in the series
"mm/ksm: Add ksm advisor". This is a governor which tunes KSM's
scanning aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory
use in the series "mm/zswap: dstmem reuse optimizations and
cleanups".
- Matthew Wilcox has performed some maintenance work on the
writeback code, both code and within filesystems. The series is
"Clean up the writeback paths".
- Andrey Konovalov has optimized KASAN's handling of alloc and
free stack traces for secondary-level allocators, in the series
"kasan: save mempool stack traces".
- Andrey also performed some KASAN maintenance work in the series
"kasan: assorted clean-ups".
- David Hildenbrand has gone to town on the rmap code. Cleanups,
more pte batching, folio conversions and more. See the series
"mm/rmap: interface overhaul".
- Kinsey Ho has contributed some maintenance work on the MGLRU
code in the series "mm/mglru: Kconfig cleanup".
- Matthew Wilcox has contributed lruvec page accounting code
cleanups in the series "Remove some lruvec page accounting
functions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
=0NHs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
commit 23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit c902ba322c.
This reverts commit 50648968b3.
This reverts commit 77cef1e021.
This reverts commit 8f8d322bc4.
This reverts commit 6ca7b5486e.
This reverts commit db468f92c3.
This reverts commit 5f8c64c234.
This reverts commit ebdc193b2c.
The driver needs more work.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xdp_prog is used in receive path, both from XDP enabled drivers
and from netif_elide_gro().
This patch also removes two 4-bytes holes.
Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240102162220.750823-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
XDP system has a very large footprint in the driver's overall code.
makes the whole driver's code much harder to read.
Moving XDP code to dedicated files.
This patch doesn't make any changes to the code itself and only
cut-pastes the code into ena_xdp.c and ena_xdp.h files so the change
is purely cosmetic.
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add driver framework and device setup and initialization for Octeon
PCI Endpoint NIC VF.
Add implementation to load module, initialize, register network device,
cleanup and unload module.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->gso_partial_features is read from tx fast path for GSO packets.
Move it to appropriate section to avoid a cache line miss.
Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly introduced phy_link_topology tracks all ethernet PHYs that are
attached to a netdevice. Document the base principle, internal and
external APIs. As the phy_link_topology is expected to be extended, this
documentation will hold any further improvements and additions made
relative to topology handling.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we have the ability to track the PHYs connected to a net_device
through the link_topology, we can expose this list to userspace. This
allows userspace to use these identifiers for phy-specific commands and
take the decision of which PHY to target by knowing the link topology.
Add PHY_GET and PHY_DUMP, which can be a filtered DUMP operation to list
devices on only one interface.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some netlink commands are target towards ethernet PHYs, to control some
of their features. As there's several such commands, add the ability to
pass a PHY index in the ethnl request, which will populate the generic
ethnl_req_info with the relevant phydev when the command targets a PHY.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmWAz2EACgkQ6rmadz2v
bToqrw/9EwroZCc8GEHOKAlb/fzrMvn92rLo0ZW/cGN84QJPnx4zM6Zo0+fgLaaN
oqqztwMUwdzGC3uX3FfVXaaLKbJ/MeHeL9BXFZNW8zkRHciw4R7kIBhOdPnHyET7
uT+rQ4xPe1Mt7e9PjepKlSL5mEsxWfBkdUgsdn19Z2Vjdfr9mZMhYWYMJGcfTCD1
TwxHKBPhq5fN3IsshmMBB8IrRp1HStUKb65MgZ4dI22LJXxTsFkx5XMFXcmuqvkH
NhKj8jDcPEEh31bYcb6aG2Z4onw5F2lquygjk1Qyy5cyw45m/ipJKAXKdAyvJG+R
VZCWOET/9wbRwFSK5wxwihCuKghFiofK52i2PcGtXZh0PCouyZZneSJOKM0yVWKO
BvuJBxK4ETRnQyN6ZxhuJiEXG3/YMBBhyR2TX1LntVK9ct/k7qFVzATG49J39/sR
SYMbptBRj4a5oMJ1qn0nFVEDFkg0jTnTDNnsEpcz60Ayt6EsJ1XosO5yz2huf861
xgRMTKMseyG1/uV45tQ8ZPzbSPpBxjUi9Dl3coYsIm1a+y6clWUXcarONY5KVrpS
CR98DuFgl+E7dXuisd/Kz2p2KxxSPq8nytsmLlgOvrUqhwiXqB+TKN8EHgIapVOt
l1A5LrzXFTcGlT9MlaWBqEIy83Bu1nqQqbxrAFOE0k8A5jomXaw=
=stU2
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-12-18
This PR is larger than usual and contains changes in various parts
of the kernel.
The main changes are:
1) Fix kCFI bugs in BPF, from Peter Zijlstra.
End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.
2) Introduce BPF token object, from Andrii Nakryiko.
It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.
Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
-o delegate_cmds=prog_load:MAP_CREATE \
-o delegate_progs=kprobe \
-o delegate_attachs=xdp
3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.
- Complete precision tracking support for register spills
- Fix verification of possibly-zero-sized stack accesses
- Fix access to uninit stack slots
- Track aligned STACK_ZERO cases as imprecise spilled registers.
It improves the verifier "instructions processed" metric from single
digit to 50-60% for some programs.
- Fix verifier retval logic
4) Support for VLAN tag in XDP hints, from Larysa Zaremba.
5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.
End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.
6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.
7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.
It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.
8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.
9) BPF file verification via fsverity, from Song Liu.
It allows BPF progs get fsverity digest.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
bpf: Ensure precise is reset to false in __mark_reg_const_zero()
selftests/bpf: Add more uprobe multi fail tests
bpf: Fail uprobe multi link with negative offset
selftests/bpf: Test the release of map btf
s390/bpf: Fix indirect trampoline generation
selftests/bpf: Temporarily disable dummy_struct_ops test on s390
x86/cfi,bpf: Fix bpf_exception_cb() signature
bpf: Fix dtor CFI
cfi: Add CFI_NOSEAL()
x86/cfi,bpf: Fix bpf_struct_ops CFI
x86/cfi,bpf: Fix bpf_callback_t CFI
x86/cfi,bpf: Fix BPF JIT call
cfi: Flip headers
selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
bpf: Limit the number of kprobes when attaching program to multiple kprobes
bpf: Limit the number of uprobes when attaching program to multiple uprobes
bpf: xdp: Register generic_kfunc_set with XDP programs
selftests/bpf: utilize string values for delegate_xxx mount options
...
====================
Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add documentation for FW logging in
Documentation/networking/device_drivers/ethernet/intel/ice.rst
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Symmetric RSS hash functions are beneficial in applications that monitor
both Tx and Rx packets of the same flow (IDS, software firewalls, ..etc).
Getting all traffic of the same flow on the same RX queue results in
higher CPU cache efficiency.
A NIC that supports "symmetric-xor" can achieve this RSS hash symmetry
by XORing the source and destination fields and pass the values to the
RSS hash algorithm.
The user may request RSS hash symmetry for a specific algorithm, via:
# ethtool -X eth0 hfunc <hash_alg> symmetric-xor
or turn symmetry off (asymmetric) by:
# ethtool -X eth0 hfunc <hash_alg>
The specific fields for each flow type should then be specified as usual
via:
# ethtool -N|-U eth0 rx-flow-hash <flow_type> s|d|f|n
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-4-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TCP got MSG_EOR support in linux-4.7.
This is a canonical way of making sure no coalescing
will be performed on the skb, even if it could not be
immediately sent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231212110608.3673677-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement functionality that enables drivers to expose VLAN tag
to XDP code.
VLAN tag is represented by 2 variables:
- protocol ID, which is passed to bpf code in BE
- VLAN TCI, in host byte order
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20231205210847.28460-10-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yep, my VIM spellchecker is not good enough for typos like this one.
Fixes: 7fe0e38bb6 ("Documentation/tcp: Add TCP-AO documentation")
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Reported-by: Markus Elfring <Markus.Elfring@web.de>
Closes: https://lore.kernel.org/all/2745ab4e-acac-40d4-83bf-37f2600d0c3d@web.de/
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If Clock Generation Unit is present on NIC board user shall know its
details.
Provide the devlink info callback with a new:
- fixed type object (cgu.id) indicating hardware variant of onboard CGU,
- running type object (fw.cgu) consisting of CGU id, config and firmware
versions.
These information shall be known for debugging purposes.
Test (on NIC board with CGU)
$ devlink dev info <bus_name>/<dev_name> | grep cgu
cgu.id 36
fw.cgu 8032.16973825.6021
Test (on NIC board without CGU)
$ devlink dev info <bus_name>/<dev_name> | grep cgu -c
0
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add some features that are not appropriate for the existing section to
the "Others" part of the bridge document.
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add switchdev part for bridge document.
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add multicast part for bridge document.
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add VLAN part for bridge document.
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add STP part for bridge document.
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The current bridge kernel doc is too old. It only pointed to the
linuxfoundation wiki page which lacks of the new features.
Here let's start the new bridge document and put all the bridge info
so new developers and users could catch up the last bridge status soon.
In this patch, Convert the doc to rst format. Add bridge brief introduction,
FAQ and contact info.
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Analyzed a few structs in the networking stack by looking at variables
within them that are used in the TCP/IP fast path.
Fast path is defined as TCP path where data is transferred from sender to
receiver unidirectionally. It doesn't include phases other than
TCP_ESTABLISHED, nor does it look at error paths.
We hope to re-organizing variables that span many cachelines whose fast
path variables are also spread out, and this document can help future
developers keep networking fast path cachelines small.
Optimized_cacheline field is computed as
(Fastpath_Bytes/L3_cacheline_size_x86), and not the actual organized
results (see patches to come for these).
Investigation is done on 6.5
Name Struct_Cachelines Cur_fastpath_cache Fastpath_Bytes Optimized_cacheline
tcp_sock 42 (2664 Bytes) 12 396 8
net_device 39 (2240 bytes) 12 234 4
inet_sock 15 (960 bytes) 14 922 14
Inet_connection_sock 22 (1368 bytes) 18 1166 18
Netns_ipv4 (sysctls) 12 (768 bytes) 4 77 2
linux_mib 16 (1060) 6 104 2
Note how there isn't much improvement space for inet_sock and
Inet_connection_sock because sk and icsk_inet respectively takes up so
much of the struct that rest of the variables become a small portion of
the struct size.
So, we decided to reorganize tcp_sock, net_device, netns_ipv4
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Physical Layer Collision Avoidance messages are correctly documented but
were left-out of the global list of ethnl messages, add them to the
list.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/r/20231130191400.817948-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add PCI Endpoint NIC support for Octeon CN98 devices.
CN98 devices are part of Octeon 9 family products with
similar PCI NIC characteristics to CN93, already supported
driver.
Add CN98 card to the device id table, as well
as support differences in the register fields and
certain usage scenarios such as unload.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231129045348.2538843-3-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZWiCPAAKCRDbK58LschI
g4djAQC1FdqCRIFkhbiIRNHTgHjnfQShELQbd9ofJqzylLqmmgD+JI1E7D9SXagm
pIXQ26EGmq8/VcCT3VLncA8EsC76Gg4=
=Xowm
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-11-30
We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).
The main changes are:
1) Add initial TX metadata implementation for AF_XDP with support in mlx5
and stmmac drivers. Two types of offloads are supported right now, that
is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
stmmac implementation from Song Yoong Siang.
2) Change BPF verifier logic to validate global subprograms lazily instead
of unconditionally before the main program, so they can be guarded using
BPF CO-RE techniques, from Andrii Nakryiko.
3) Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter, from Jiri Olsa.
4) Use pkg-config in BPF selftests to determine ld flags which is
in particular needed for linking statically, from Akihiko Odaki.
5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
bpf/tests: Remove duplicate JSGT tests
selftests/bpf: Add TX side to xdp_hw_metadata
selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
selftests/bpf: Add TX side to xdp_metadata
selftests/bpf: Add csum helpers
selftests/xsk: Support tx_metadata_len
xsk: Add option to calculate TX checksum in SW
xsk: Validate xsk_tx_metadata flags
xsk: Document tx_metadata_len layout
net: stmmac: Add Tx HWTS support to XDP ZC
net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
tools: ynl: Print xsk-features from the sample
xsk: Add TX timestamp and TX checksum offload support
xsk: Support tx_metadata_len
selftests/bpf: Use pkg-config for libelf
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Choose pkg-config for the target
bpftool: Add support to display uprobe_multi links
selftests/bpf: Add link_info test for uprobe_multi link
selftests/bpf: Use bpf_link__destroy in fill_link_info tests
...
====================
Conflicts:
Documentation/netlink/specs/netdev.yaml:
839ff60df3 ("net: page_pool: add nlspec for basic access to page pools")
48eb03dd26 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/
While at it also regen, tree is dirty after:
48eb03dd26 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.
Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Be more explicit about devlink entities that may stay and that have to
be removed during reload reinit action.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hardware. I also use this mode
in the selftests.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The first features pull request for v6.8. Not so big in number of
commits but we removed quite a few ancient drivers: libertas 16-bit
PCMCIA support, atmel, hostap, zd1201, orinoco, ray_cs, wl3501 and
rndis_wlan.
Major changes:
cfg80211/mac80211
* extend support for scanning while Multi-Link Operation (MLO) connected
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmVk2JkRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZtD4gf7B8eR+lq6+5Qg7ObiURTci/WYlDo3prz4
R7hDmuHI5S3AJIRXF87fmC1iThIigi6dri1akRdRo3UvNr9gBTRpDGHnALuDPxOp
p8MpsG1gPNhMo0aNFfX5RDaGkfKqZQMT3er3ELG4B+k4q0glNWud07D2/0t1zSt9
rag3vlFVPRzcCz2T/Xv/nXXWaiIaQx0RLHrQshbxg0Glu+m/l3/ACBrirMiWami4
nr4SX717yckqwRElH0l5fSmCbylTDCTslo4UnpVKusxTtt1f3RxFZvsxvRgI5O9q
LEYsMtjNj2Ea3uLb0DvSNY640O/dqbRSm0PjwYxPwlmrtQB/6GMx8g==
=a+Xs
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2023-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.8
The first features pull request for v6.8. Not so big in number of
commits but we removed quite a few ancient drivers: libertas 16-bit
PCMCIA support, atmel, hostap, zd1201, orinoco, ray_cs, wl3501 and
rndis_wlan.
Major changes:
cfg80211/mac80211
- extend support for scanning while Multi-Link Operation (MLO) connected
* tag 'wireless-next-2023-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (68 commits)
wifi: nl80211: Documentation update for NL80211_CMD_PORT_AUTHORIZED event
wifi: mac80211: Extend support for scanning while MLO connected
wifi: cfg80211: Extend support for scanning while MLO connected
wifi: ieee80211: fix PV1 frame control field name
rfkill: return ENOTTY on invalid ioctl
MAINTAINERS: update iwlwifi maintainers
wifi: rtw89: 8922a: read efuse content from physical map
wifi: rtw89: 8922a: read efuse content via efuse map struct from logic map
wifi: rtw89: 8852c: read RX gain offset from efuse for 6GHz channels
wifi: rtw89: mac: add to access efuse for WiFi 7 chips
wifi: rtw89: mac: use mac_gen pointer to access about efuse
wifi: rtw89: 8922a: add 8922A basic chip info
wifi: rtlwifi: drop unused const_amdpci_aspm
wifi: mwifiex: mwifiex_process_sleep_confirm_resp(): remove unused priv variable
wifi: rtw89: regd: update regulatory map to R65-R44
wifi: rtw89: regd: handle policy of 6 GHz according to BIOS
wifi: rtw89: acpi: process 6 GHz band policy from DSM
wifi: rtlwifi: simplify rtl_action_proc() and rtl_tx_agg_start()
wifi: rtw89: pci: update interrupt mitigation register for 8922AE
wifi: rtw89: pci: correct interrupt mitigation register for 8852CE
...
====================
Link: https://lore.kernel.org/r/20231127180056.0B48DC433C8@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new sysctl: net.smc.smcr_max_conns_per_lgr, which is
used to control the preferred max connections per lgr for
SMC-R v2.1. The default value of this sysctl is 255, and
the acceptable value ranges from 16 to 255.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new sysctl: net.smc.smcr_max_links_per_lgr, which is
used to control the preferred max links per lgr for SMC-R
v2.1. The default value of this sysctl is 2, and the acceptable
value ranges from 1 to 2.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a simple script that parses the Netlink YAML spec files
(Documentation/netlink/specs/), and generates RST files to be rendered
in the Network -> Netlink Specification documentation page.
Create a python script that is invoked during 'make htmldocs', reads the
YAML specs input file and generate the correspondent RST file.
Create a new Documentation/networking/netlink_spec index page, and
reference each Netlink RST file that was processed above in this main
index.rst file.
In case of any exception during the parsing, dump the error and skip
the file.
Do not regenerate the RST files if the input files (YAML) were not
changed in-between invocations.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
----
Changelog:
V3:
* Do not regenerate the RST files if the input files were not
changed. In order to do it, a few things changed:
- Rely on Makefile more to find what changed, and trigger
individual file processing
- The script parses file by file now (instead of batches)
- Create a new option to generate the index file
V2:
* Moved the logic from a sphinx extension to a external script
* Adjust some formatting as suggested by Donald Hunter and Jakub
* Auto generating all the rsts instead of having stubs
* Handling error gracefully
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add PCI Endpoint NIC support for Octeon CN10K devices.
CN10K devices are part of Octeon 10 family products with
similar PCI NIC characteristics. These include:
- CN10KA
- CNF10KA
- CNF10KB
- CN10KB
Update supported device list in Documentation
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://lore.kernel.org/r/20231117103817.2468176-1-srasheed@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Revert following commits:
commit acec05fb78 ("net_tstamp: Add TIMESTAMPING SOFTWARE and HARDWARE mask")
commit 11d55be06d ("net: ethtool: Add a command to expose current time stamping layer")
commit bb8645b00c ("netlink: specs: Introduce new netlink command to get current timestamp")
commit d905f9c753 ("net: ethtool: Add a command to list available time stamping layers")
commit aed5004ee7 ("netlink: specs: Introduce new netlink command to list available time stamping layers")
commit 51bdf3165f ("net: Replace hwtstamp_source by timestamping layer")
commit 0f7f463d48 ("net: Change the API of PHY default timestamp to MAC")
commit 091fab1228 ("net: ethtool: ts: Update GET_TS to reply the current selected timestamp")
commit 152c75e1d0 ("net: ethtool: ts: Let the active time stamping layer be selectable")
commit ee60ea6be0 ("netlink: specs: Introduce time stamping set command")
They need more time for reviews.
Link: https://lore.kernel.org/all/20231118183529.6e67100c@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the current timestamp is saved in a variable lets add the
ETHTOOL_MSG_TS_SET ethtool netlink socket to make it selectable.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new netlink message that lists all available time stamping
layers on a given interface.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Time stamping on network packets may happen either in the MAC or in
the PHY, but not both. In preparation for making the choice
selectable, expose both the current layers via ethtool.
In accordance with the kernel implementation as it stands, the current
layer will always read as "phy" when a PHY time stamping device is
present. Future patches will allow changing the current layer
administratively.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There aren't a ton of references to commits in the documentation, but
they do exist, and we can use automarkup to linkify them to make them
easier to follow.
Use something like this to find references to commits:
git grep -P 'commit.*[0-9a-f]{8,}' Documentation/
Also fix a few of these to standardize on the exact format that is
already used in changelogs.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20231027115420.205279-1-vegard.nossum@oracle.com
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp: fix usec timestamps with TCP fastopen
- tcp_sigpool: fix some off by one bugs
- tcp: fix possible out-of-bounds reads in tcp_hash_fail()
- tcp: fix SYN option room calculation for TCP-AO
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmVNRnAACgkQMUZtbf5S
IrsYaA/+IUoYi96/oLtvvrET6HIbXeMaLKef0UlytEicQKy8h5EWlhcTZPhQEY0g
dtaKOemQsO0dQTma4eQBiBDHeCeSkitgD9p7fh0i+//QFYWSFqHrBiF2mlToc/ZQ
T1p4BlVL7D2Xsr1Lki93zk+EhFGEy2KroYgrWbZc9TWE5ap9PtSVF9eqeHAVCmZ7
ocre/eo4pqUM9rAHIAyhoL+0xtVQ59dBevbJC0qYcmflhafr82Gtdveo6pBBKuYm
GhwbRrAXER3Neav9c6NHqat4zsMwGpC27SiN9dYWm6dlkeS9U9t2PUu71OkJGVfw
VaSE+utkC/WmzGbuiUIjqQLBrRe372ItHCr78BfSRMshS+RBTHtoK7njeH8Iv67E
RsMeCyVNj9dtGlOQG5JAv8IoCQ1WbMw9B36Yzw3ip/MmDX/ntXz7Dcr4ZMZ6VURS
CHhHFZPnmMykMXkT6SIlxeAg2r8ELtESzkvLimdTVFPAlk3cPkibKJbh3F/tEqXS
PDb3y0uoEgRQBAsWXXx9FQEvv9rTL6YrzbMhmJBIIEoNxppQYQ7FZBJX9utAVp5B
1GdyqhR6IRTaKb9cMRj/K1xPwm2KgCw9xj9pjKdAA7QUMslXbFp8blv1rIkFGshg
hiNXmPcI8wo0j+0lZYktEcIERL5y6c8BgK2NnPU6RULua96tuQ4=
=k6Wk
-----END PGP SIGNATURE-----
Merge tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp:
- fix usec timestamps with TCP fastopen
- fix possible out-of-bounds reads in tcp_hash_fail()
- fix SYN option room calculation for TCP-AO
- tcp_sigpool: fix some off by one bugs
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
Here is the big set of staging driver updates for 6.7-rc1. A bit bigger
than 6.6 this time around, as it coincided with the Outreachy and
mentorship application process, so we got a bunch of new developers
sending in their first changes, which is nice to see.
Also in here is a removal of the qlge ethernet driver, and the rtl8192u
wireless driver. Both of these were very old and no one was maintaining
them, the wireless driver removal was due to no one using it anymore,
and no hardware to be found, and is part of a larger effort to remove
unused and old wifi drivers from the system.
The qlge ethernet driver did have one user pop up after it was dropped,
and we are working with the network mainainers to figure out what tree
it will come back in from and who will be responsible for it, and if it
really is being used or not. Odds are it will show up in a network
subsystem pull request after -rc1 is out, but we aren't sure yet.
Other smaller changes in here are:
- Lots of vc04_services work by Umang to clean up the mess created by
the rpi developers long ago, bringing it almost into good enough
shape to get out of staging, hopefully next major release, it's
getting close.
- rtl8192e variable cleanups and removal of unused code and structures
- vme_user coding style cleanups
- other small coding style cleanups to lots of the staging drivers
- octeon typedef removals, and then last-minute revert when it was
found to break the build in some configurations (it's a hard driver
to build properly, none of the normal automated testing catches it.)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTg9w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymL+QCg2YS9Mbr7W6zNAdRTbDybzp08ZzUAnRHQXUfB
WbZERHQpNOSTE3HwuW1D
=tZpl
-----END PGP SIGNATURE-----
Merge tag 'staging-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the big set of staging driver updates for 6.7-rc1. A bit
bigger than 6.6 this time around, as it coincided with the Outreachy
and mentorship application process, so we got a bunch of new
developers sending in their first changes, which is nice to see.
Also in here is a removal of the qlge ethernet driver, and the
rtl8192u wireless driver. Both of these were very old and no one was
maintaining them, the wireless driver removal was due to no one using
it anymore, and no hardware to be found, and is part of a larger
effort to remove unused and old wifi drivers from the system.
The qlge ethernet driver did have one user pop up after it was
dropped, and we are working with the network mainainers to figure out
what tree it will come back in from and who will be responsible for
it, and if it really is being used or not. Odds are it will show up in
a network subsystem pull request after -rc1 is out, but we aren't sure
yet.
Other smaller changes in here are:
- Lots of vc04_services work by Umang to clean up the mess created by
the rpi developers long ago, bringing it almost into good enough
shape to get out of staging, hopefully next major release, it's
getting close.
- rtl8192e variable cleanups and removal of unused code and
structures
- vme_user coding style cleanups
- other small coding style cleanups to lots of the staging drivers
- octeon typedef removals, and then last-minute revert when it was
found to break the build in some configurations (it's a hard driver
to build properly, none of the normal automated testing catches
it.)
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (256 commits)
Revert "staging: octeon: remove typedef in enum cvmx_spi_mode_t"
Revert "staging: octeon: remove typedef in enum cvmx_helper_interface_mode_t"
Revert "staging: octeon: remove typedef in enum cvmx_pow_wait_t"
Revert "staging: octeon: remove typedef in struct cvmx_pko_lock_t"
Revert "staging: octeon: remove typedef in enum cvmx_pko_status_t"
Revert "staging: octeon: remove typedef in structs cvmx_pip_port_status_t and cvmx_pko_port_status_t"
staging: vt6655: Type encoding info dropped from variable name "byRxRate"
staging: vt6655: Type encoding info dropped from function name "CARDbUpdateTSF"
staging: vt6655: Type encoding info dropped from function name "CARDvSetRSPINF"
staging: vt6655: Type encoding info dropped from function name "CARDbyGetPktType"
staging: vt6655: Type encoding info dropped from variable name "byPacketType"
staging: vt6655: Type encoding info dropped from function name "CARDbSetPhyParameter"
staging: vt6655: Type encoding info dropped from variable name "pbyRsvTime"
staging: vt6655: Type encoding info dropped from variable name "pbyTxRate"
staging: vt6655: Type encoding info dropped from function name "s_vCalculateOFDMRParameter"
staging: vt6655: Type encoding info dropped from array name "cwRXBCNTSFOff"
staging: fbtft: Convert to platform remove callback returning void
staging: olpc_dcon: Remove I2C_CLASS_DDC support
staging: vc04_services: use snprintf instead of sprintf
staging: rtl8192e: Fix line break issue at priv->rx_buf[priv->rx_idx]
...
Since commit 833bac7ec3 ("net/smc: Fix setsockopt and sysctl to
specify same buffer size again") the SMC protocol uses its own
default values for the smc.rmem and smc.wmem sysctl variables
which are no longer derived from the TCP IPv4 buffer sizes.
Fixup the kernel documentation to reflect this change, too.
Fixes: 833bac7ec3 ("net/smc: Fix setsockopt and sysctl to specify same buffer size again")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231030170343.748097-1-gbayer@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The ia64 architecture gets its well-earned retirement as planned,
now that there is one last (mostly) working release that will
be maintained as an LTS kernel.
The architecture specific system call tables are updated for
the added map_shadow_stack() syscall and to remove references
to the long-gone sys_lookup_dcookie() syscall.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ
Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ
tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq
XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms
R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ
kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv
shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4
LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9
ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD
eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A
BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W
vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI=
=H1vH
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull ia64 removal and asm-generic updates from Arnd Bergmann:
- The ia64 architecture gets its well-earned retirement as planned,
now that there is one last (mostly) working release that will be
maintained as an LTS kernel.
- The architecture specific system call tables are updated for the
added map_shadow_stack() syscall and to remove references to the
long-gone sys_lookup_dcookie() syscall.
* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
hexagon: Remove unusable symbols from the ptrace.h uapi
asm-generic: Fix spelling of architecture
arch: Reserve map_shadow_stack() syscall number for all architectures
syscalls: Cleanup references to sys_lookup_dcookie()
Documentation: Drop or replace remaining mentions of IA64
lib/raid6: Drop IA64 support
Documentation: Drop IA64 from feature descriptions
kernel: Drop IA64 support from sig_fault handlers
arch: Remove Itanium (IA-64) architecture
Aviator/Raytheon is an early PCMCIA driver, apparently predating 802.11b
and only supporting wireless extensions.
The driver has been orphaned since 2010 and only seen cosmetic updates
long before than. Jean Tourrilhes pointed out in a 2005 changelog that
he tested a change on actual hardware, which was apparently already
noteworthy back then.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
It has Frequently Asked Questions (FAQ) on RFC 5925 - I found it very
useful answering those before writing the actual code. It provides answers
to common questions that arise on a quick read of the RFC, as well as how
they were answered. There's also comparison to TCP-MD5 option,
evaluation of per-socket vs in-kernel-DB approaches and description of
uAPI provided.
Hopefully, it will be as useful for reviewing the code as it was for writing.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MPTCP protocol allows sockets with no alive subflows to stay
in ESTABLISHED status for and user-defined timeout, to allow for
later subflows creation.
Currently such timeout is constant - TCP_TIMEWAIT_LEN. Let the
user-space configure them via a newly added sysctl, to better cope
with busy servers and simplify (make them faster) the relevant
pktdrill tests.
Note that the new know does not apply to orphaned MPTCP socket
waiting for the data_fin handshake completion: they always wait
TCP_TIMEWAIT_LEN.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-2-v1-1-9dc60939d371@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI
and creates an alias named IFLA_DSA_CONDUIT.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use more inclusive terms throughout the DSA subsystem by moving away
from "master" which is replaced by "conduit" and "slave" which is
replaced by "user". No functional changes.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As more drivers begin to use the fragment API, update the
document about how to decide which API to use for the
driver author.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Dima Tisnek <dimaqq@gmail.com>
Link: https://lore.kernel.org/r/20231020095952.11055-5-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No significant improvements have been done to this driver since commit
a7c3ddf29a ("staging: qlge: clean up debugging code in the QL_ALL_DUMP
ifdef land") in January 2021. The driver should not stay in staging
forever. Since it has been abandoned by the vendor and no one has stepped
up to maintain it, delete it.
If some users manifest themselves, the driver will be restored to
drivers/net/ as suggested in the linked message.
Link: https://lore.kernel.org/netdev/20231019074237.7ef255d7@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Cc: Manish Chopra <manishc@marvell.com>
Cc: Coiby Xu <coiby.xu@gmail.com>
Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231020124457.312449-3-benjamin.poirier@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Align devlink info versions with ice driver so change 'fw.mgmt'
version to be 2-digit version [major.minor], add 'fw.mgmt.build'
that reports mgmt firmware build number and use '"fw.psid.api'
for NVM format version instead of incorrect '"fw.psid'.
Additionally add missing i40e devlink documentation.
Fixes: 5a423552e0 ("i40e: Add handler for devlink .info_get")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231018123558.552453-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There seems to be no docs for the concept of multiple RSS
contexts and how to configure it. I had to explain it three
times recently, the last one being the charm, document it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20231018010758.2382742-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a note describing the locking order of taking RTNL lock with devlink
instance lock.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a part talking about nested devlink instances describing
the helpers and locking ordering.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Adham Faris, Increase max supported channels number to 256
2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes
3) Shay Drory, Replace global mlx5_intf_lock with
HCA devcom component lock
4) Wei Zhang, Optimize SF creation flow
During SF creation, HCA state gets changed from INVALID to
IN_USE step by step. Accordingly, FW sends vhca event to
driver to inform about this state change asynchronously.
Each vhca event is critical because all related SW/FW
operations are triggered by it.
Currently there is only a single mlx5 general event handler
which not only handles vhca event but many other events.
This incurs huge bottleneck because all events are forced
to be handled in serial manner.
Moreover, all SFs share same table_lock which inevitably
impacts each other when they are created in parallel.
This series will solve this issue by:
1. A dedicated vhca event handler is introduced to eliminate
the mutual impact with other mlx5 events.
2. Max FW threads work queues are employed in the vhca event
handler to fully utilize FW capability.
3. Redesign SF active work logic to completely remove
table_lock.
With above optimization, SF creation time is reduced by 25%,
i.e. from 80s to 60s when creating 100 SFs.
Patches summary:
Patch 1 - implement dedicated vhca event handler with max FW
cmd threads of work queues.
Patch 2 - remove table_lock by redesigning SF active work
logic.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmUqzPEACgkQSD+KveBX
+j52rwf/c/LnKjPdcWztDIj4YQys9VfyWm9rkYmuiQ9XTYjzY7Y9RTW9XvrJSTdA
GsD8EmWnqysr0HyRqksuhR6QHydU+fOrOIYxA0OeIbqXXmhIoDqnqTPQftK1q8Cq
xZ4wDWowwoOGs5BXYsj6m+53ukyB91/dVHS8qiqL6SzDhm6pEEqjeoum7bxM1lKF
HxLHNLWp3wolLn361Qlvd7SyOcDu3/c7+DYUJ04Hc2TcJEc7G5ZkBtqMNUZ599Z8
06lXXj1FgIGtqzuddHLpS71gvp5yvtiw/2ujmIX7YuH8zHDqGRhKUYiJ1eRk80iw
aYPESXC0OsQDWgNFc13f5eTQgc2p/Q==
=Qaae
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-10-10
1) Adham Faris, Increase max supported channels number to 256
2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes
3) Shay Drory, Replace global mlx5_intf_lock with
HCA devcom component lock
4) Wei Zhang, Optimize SF creation flow
During SF creation, HCA state gets changed from INVALID to
IN_USE step by step. Accordingly, FW sends vhca event to
driver to inform about this state change asynchronously.
Each vhca event is critical because all related SW/FW
operations are triggered by it.
Currently there is only a single mlx5 general event handler
which not only handles vhca event but many other events.
This incurs huge bottleneck because all events are forced
to be handled in serial manner.
Moreover, all SFs share same table_lock which inevitably
impacts each other when they are created in parallel.
This series will solve this issue by:
1. A dedicated vhca event handler is introduced to eliminate
the mutual impact with other mlx5 events.
2. Max FW threads work queues are employed in the vhca event
handler to fully utilize FW capability.
3. Redesign SF active work logic to completely remove
table_lock.
With above optimization, SF creation time is reduced by 25%,
i.e. from 80s to 60s when creating 100 SFs.
Patches summary:
Patch 1 - implement dedicated vhca event handler with max FW
cmd threads of work queues.
Patch 2 - remove table_lock by redesigning SF active work
logic.
* tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Allow IPsec soft/hard limits in bytes
net/mlx5e: Increase max supported channels number to 256
net/mlx5e: Preparations for supporting larger number of channels
net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's
net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh()
net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs
net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code
net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code
net/mlx5: fix config name in Kconfig parameter documentation
net/mlx5: Remove unused declaration
net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock
net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom
net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
net/mlx5: Redesign SF active work to remove table_lock
net/mlx5: Parallelize vhca event handling
====================
Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MAC .validate() method is no longer used, so remove it from the
phylink_mac_ops structure, and remove the callsite in
phylink_validate_mac_and_pcs().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qsPkF-009wij-QM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a new method, mac_get_caps() to get the MAC capabilities for
the specified interface mode. This is for MACs which have special
requirements, such as not supporting half-duplex in certain interface
modes, and will replace the validate() method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qsPk5-009wiX-G5@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZS1d4wAKCRDbK58LschI
g4DSAP441CdKh8fd+wNKUSKHFbpCQ6EvocR6Nf+Sj2DFUx/w/QEA7mfju7Abqjc3
xwDEx0BuhrjMrjV5MmEpxc7lYl9XcQU=
=vuWk
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-10-16
We've added 90 non-merge commits during the last 25 day(s) which contain
a total of 120 files changed, 3519 insertions(+), 895 deletions(-).
The main changes are:
1) Add missed stats for kprobes to retrieve the number of missed kprobe
executions and subsequent executions of BPF programs, from Jiri Olsa.
2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is
for systemd to reimplement the LogNamespace feature which allows
running multiple instances of systemd-journald to process the logs
of different services, from Daan De Meyer.
3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich.
4) Improve BPF verifier log output for scalar registers to better
disambiguate their internal state wrt defaults vs min/max values
matching, from Andrii Nakryiko.
5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving
the source IP address with a new BPF_FIB_LOOKUP_SRC flag,
from Martynas Pumputis.
6) Add support for open-coded task_vma iterator to help with symbolization
for BPF-collected user stacks, from Dave Marchevsky.
7) Add libbpf getters for accessing individual BPF ring buffers which
is useful for polling them individually, for example, from Martin Kelly.
8) Extend AF_XDP selftests to validate the SHARED_UMEM feature,
from Tushar Vyavahare.
9) Improve BPF selftests cross-building support for riscv arch,
from Björn Töpel.
10) Add the ability to pin a BPF timer to the same calling CPU,
from David Vernet.
11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic
implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments,
from Alexandre Ghiti.
12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen.
13) Fix bpftool's skeleton code generation to guarantee that ELF data
is 8 byte aligned, from Ian Rogers.
14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4
security mitigations in BPF verifier, from Yafang Shao.
15) Annotate struct bpf_stack_map with __counted_by attribute to prepare
BPF side for upcoming __counted_by compiler support, from Kees Cook.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits)
bpf: Ensure proper register state printing for cond jumps
bpf: Disambiguate SCALAR register state output in verifier logs
selftests/bpf: Make align selftests more robust
selftests/bpf: Improve missed_kprobe_recursion test robustness
selftests/bpf: Improve percpu_alloc test robustness
selftests/bpf: Add tests for open-coded task_vma iter
bpf: Introduce task_vma open-coded iterator kfuncs
selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
bpf: Don't explicitly emit BTF for struct btf_iter_num
bpf: Change syscall_nr type to int in struct syscall_tp_t
net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set
bpf: Avoid unnecessary audit log for CPU security mitigations
selftests/bpf: Add tests for cgroup unix socket address hooks
selftests/bpf: Make sure mount directory exists
documentation/bpf: Document cgroup unix socket address hooks
bpftool: Add support for cgroup unix socket address hooks
libbpf: Add support for cgroup unix socket address hooks
bpf: Implement cgroup sockaddr hooks for unix sockets
bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
bpf: Propagate modified uaddrlen from cgroup sockaddr programs
...
====================
Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TCP pingpong threshold is 1 by default. But some applications, like SQL DB
may prefer a higher pingpong threshold to activate delayed acks in quick
ack mode for better performance.
The pingpong threshold and related code were changed to 3 in the year
2019 in:
commit 4a41f453be ("tcp: change pingpong threshold to 3")
And reverted to 1 in the year 2022 in:
commit 4d8f24eeed ("Revert "tcp: change pingpong threshold to 3"")
There is no single value that fits all applications.
Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
optimal performance based on the application needs.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/1697056244-21888-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds description of MSG_ZEROCOPY flag support for AF_VSOCK type of
socket.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a12ba19269 ("net/mlx5: Update Kconfig parameter documentation")
adds documentation on Kconfig options for the mlx5 driver. It refers to the
config MLX5_EN_MACSEC for MACSec offloading, but the config is actually
called MLX5_MACSEC.
Fix the reference to the right config name in the documentation.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
With the previous patches, there is no more limitation at modifying the
targets created at boot time (or module load time).
Document the way on how to create the configfs directories to be able to
modify these netconsole targets.
The design discussion about this topic could be found at:
https://lore.kernel.org/all/ZRWRal5bW93px4km@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20231012111401.333798-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The COPS Appletalk support is very old, never said to actually work
properly, and the firmware code for the devices are under a very suspect
license. Remove it all to clear up the license issue, if it is still
needed and actually used by anyone, we can add it back later once the
license is cleared up.
Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: jschlst@samba.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230927090029.44704-2-gregkh@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid
lifetime derivation mechanism.
RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router
Advertisement PIO shall be ignored if it less than 2 hours and to reset
the lifetime of the corresponding address to 2 hours. An in-progress
6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently
looking to remove this mechanism. While this draft has not been moving
particularly quickly for other reasons, there is widespread consensus on
section 4.2 which updates RFC4862 section 5.5.3e.
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Jen Linkova <furry@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, skbs generated by pktgen always have their reference count
incremented before transmission, causing their reference count to be
always greater than 1, leading to two issues:
1. Only the code paths for shared skbs can be tested.
2. In certain situations, skbs can only be released by pktgen.
To enhance testing comprehensiveness, we are introducing the "SHARED"
flag to indicate whether an SKB is shared. This flag is enabled by
default, aligning with the current behavior. However, disabling this
flag allows skbs with a reference count of 1 to be transmitted.
So we can test non-shared skbs and code paths where skbs are released
within the stack.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20230920125658.46978-2-liangchen.linux@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As BPF JIT support for loongarch64 was added about one year ago
with commit 5dc615520c ("LoongArch: Add BPF JIT support"), it
is appropriate to add loongarch64 as arch supporting BPF JIT in
bpf and sysctl docs as well.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1695111937-19697-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
http://linux-ax25.org has been down for nearly a year. Its official
replacement is https://linux-ax25.in-berlin.de.
Update the documentation to point there instead. And acknowledge that
while the linux-hams list isn't entirely dead, it isn't what most would
call 'active'. Remove that word.
Link: https://marc.info/?m=166792551600315
Signed-off-by: Peter Lafreniere <peter@n8pjl.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 79 files changed, 5275 insertions(+), 600 deletions(-).
The main changes are:
1) Basic BTF validation in libbpf, from Andrii Nakryiko.
2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi.
3) next_thread cleanups, from Oleg Nesterov.
4) Add mcpu=v4 support to arm32, from Puranjay Mohan.
5) Add support for __percpu pointers in bpf progs, from Yonghong Song.
6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang.
7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
Thanks a lot!
Also thanks to reporters, reviewers and testers of commits in this pull-request:
Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman",
Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle),
Song Liu, Stanislav Fomichev, Yonghong Song
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new xdp-rx-metadata-features member to netdev netlink
which exports a bitmask of supported kfuncs. Most of the patch
is autogenerated (headers), the only relevant part is netdev.yaml
and the changes in netdev-genl.c to marshal into netlink.
Example output on veth:
$ ip link add veth0 type veth peer name veth1 # ifndex == 12
$ ./tools/net/ynl/samples/netdev 12
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12
veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0
Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add support for SRIOV: send the requested number of VFs
to the device Control Plane, via the virtchnl message
and then enable the VFs using 'pci_enable_sriov'.
Add other ndo ops supported by the driver such as features_check,
set_rx_mode, validate_addr, set_mac_address, change_mtu, get_stats64,
set_features, and tx_timeout. Initialize the statistics task which
requests the queue related statistics to the CP. Add loopback
and promiscuous mode support and the respective virtchnl messages.
Finally, add documentation and build support for the driver.
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Co-developed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Co-developed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Co-developed-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.
For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)
Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.
Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.
Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.
This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.
The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().
This saves cpu cycles on both sides, and network resources.
Performance of a single TCP flow on a 200Gbit NIC:
- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.
Benchmark context:
- Regular netperf TCP_STREAM (no zerocopy)
- Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
- MAX_SKB_FRAGS = 17 (~60KB per GRO packet)
This feature is guarded by a new sysctl, and enabled by default:
/proc/sys/net/ipv4/tcp_backlog_ack_defer
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
- VFIO direct character device (cdev) interface support. This extracts
the vfio device fd from the container and group model, and is intended
to be the native uAPI for use with IOMMUFD. (Yi Liu)
- Enhancements to the PCI hot reset interface in support of cdev usage.
(Yi Liu)
- Fix a potential race between registering and unregistering vfio files
in the kvm-vfio interface and extend use of a lock to avoid extra
drop and acquires. (Dmitry Torokhov)
- A new vfio-pci variant driver for the AMD/Pensando Distributed Services
Card (PDS) Ethernet device, supporting live migration. (Brett Creeley)
- Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
and simplify driver init/exit in fsl code. (Li Zetao)
- Fix uninitialized hole in data structure and pad capability structures
for alignment. (Stefan Hajnoczi)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmTvnDUbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsimEEP/AzG+VRcu5LfYbLGLe0z
zB8ts6G7S78wXlmfN/LYi3v92XWvMMcm+vYF8oNAMfr1YL5sibWN6UtQfY1KCr7h
nWKdQdqjajJ5yDDZnOFdhqHJGNfmZw6+fey8Z0j8zRI2oymK4DncWWX3g/7L1SNr
9tIexGJef+mOdAmC94yOut3YviAaZ+f95T/xrdXHzzoNr50DD0+PD6AJdKJfKggP
vhiC/DAYH3Fofaa6tRasgWuKCYWdjZLR/kxgNpeEmW6kZnbq/dnzZ+kgn4HH1f9G
8p7UKVARR6FfG5aLheWu6Y9PDaKnfnqu8y/hobuE/ivXcmqqK+a6xSxrjgbVs8WJ
94SYnTBRoTlDJaKWa7GxqdgzJnV+s5ZyAgPhjzdi6mLTPWGzkuLhFWGtYL+LZAQ6
pNeZSM6CFBk+bva/xT0nNPCXxPh+/j/Y0G18FREj8aPFc03HrJQqz0RLydvTnoDz
nX/by5KdzMSVSVLPr4uDMtAsgxsGqWiFcp7QMw1HhhlLWxqmYbA+mLZaqyMZUUOx
6b/P8WXT9P2I+qPVKWQ5CWyqpsEqm6P+72yg6LOM9kINvgwDhOa7cagMXIuMWYMH
Rf97FL+K8p1eIy6AnvRHgFBMM5185uG+0YcJyVqtucDr/k8T/Om6ujAI6JbWtNe6
cLgaVAqKOYqCR4HC9bfVGSbd
=eKSR
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- VFIO direct character device (cdev) interface support. This extracts
the vfio device fd from the container and group model, and is
intended to be the native uAPI for use with IOMMUFD (Yi Liu)
- Enhancements to the PCI hot reset interface in support of cdev usage
(Yi Liu)
- Fix a potential race between registering and unregistering vfio files
in the kvm-vfio interface and extend use of a lock to avoid extra
drop and acquires (Dmitry Torokhov)
- A new vfio-pci variant driver for the AMD/Pensando Distributed
Services Card (PDS) Ethernet device, supporting live migration (Brett
Creeley)
- Cleanups to remove redundant owner setup in cdx and fsl bus drivers,
and simplify driver init/exit in fsl code (Li Zetao)
- Fix uninitialized hole in data structure and pad capability
structures for alignment (Stefan Hajnoczi)
* tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits)
vfio/pds: Send type for SUSPEND_STATUS command
vfio/pds: fix return value in pds_vfio_get_lm_file()
pds_core: Fix function header descriptions
vfio: align capability structures
vfio/type1: fix cap_migration information leak
vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code
vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver
vfio/pds: Add Kconfig and documentation
vfio/pds: Add support for firmware recovery
vfio/pds: Add support for dirty page tracking
vfio/pds: Add VFIO live migration support
vfio/pds: register with the pds_core PF
pds_core: Require callers of register/unregister to pass PF drvdata
vfio/pds: Initial support for pds VFIO driver
vfio: Commonize combine_ranges for use in other VFIO drivers
kvm/vfio: avoid bouncing the mutex when adding and deleting groups
kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add()
docs: vfio: Add vfio device cdev description
vfio: Compile vfio_group infrastructure optionally
vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev()
...
- Work from Carlos Bilbao to integrate rustdoc output into the generated
HTML documentation. This took some work to figure out how to do it
without slowing the docs build and without creating people who don't have
Rust installed, but Carlos got there.
- Move the loongarch and mips architecture documentation under
Documentation/arch/.
- Some more maintainer documentation from Jakub
...plus the usual assortment of updates, translations, and fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmTvqNkPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YgIgH/3drfLtlFtzLqDOzrzDXS8yGnE3pPdxw796b
/ZFzAK16wYKaKevYoIz8bVGGKaE1sEUW0mhlq4KGdfZuxLG8YnWS8URyCW4FDU2E
6qNL+8oJ8LZfID46f9Q8ZgfEz7yF/mhCqPk7MEswYtwbscs2ZTGCTGYB/5BHlBuT
LR+M89uLmHgr8S1o24v30OgiX+VvQFyu0xoxIhbiqUZvBd/XdfX2pgYd9BGzMj5q
C2ZP+V14g36c5pV0EO9TwhCXOF/WVrp7DbjbfWAsqBSLxvpXPydH2q1DUzGeQtP1
exujrBD1O8q3pPdaNA5R+h6cWlHmUZug9mE4BRLp9ErGrozwJsQ=
=C3Uv
-----END PGP SIGNATURE-----
Merge tag 'docs-6.6' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Documentation work keeps chugging along; this includes:
- Work from Carlos Bilbao to integrate rustdoc output into the
generated HTML documentation. This took some work to figure out how
to do it without slowing the docs build and without creating people
who don't have Rust installed, but Carlos got there
- Move the loongarch and mips architecture documentation under
Documentation/arch/
- Some more maintainer documentation from Jakub
... plus the usual assortment of updates, translations, and fixes"
* tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits)
Docu: genericirq.rst: fix irq-example
input: docs: pxrc: remove reference to phoenix-sim
Documentation: serial-console: Fix literal block marker
docs/mm: remove references to hmm_mirror ops and clean typos
docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add()
Documentation: Fix typos
Documentation/ABI: Fix typos
scripts: kernel-doc: fix macro handling in enums
scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN]
Documentation: riscv: Update boot image header since EFI stub is supported
Documentation: riscv: Add early boot document
Documentation: arm: Add bootargs to the table of added DT parameters
docs: kernel-parameters: Refer to the correct bitmap function
doc: update params of memhp_default_state=
docs: Add book to process/kernel-docs.rst
docs: sparse: fix invalid link addresses
docs: vfs: clean up after the iterate() removal
docs: Add a section on surveys to the researcher guidelines
docs: move mips under arch
docs: move loongarch under arch
...
Implement devlink port function commands to enable / disable IPsec
packet offloads. This is used to control the IPsec capability of the
device.
When ipsec_offload is enabled for a VF, it prevents adding IPsec packet
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec packet
offloads on the PF, it's not allowed to enable ipsec_packet on a VF,
until PF IPsec offloads are cleared.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement devlink port function commands to enable / disable IPsec
crypto offloads. This is used to control the IPsec capability of the
device.
When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec crypto
offloads on the PF, it's not allowed to enable ipsec_crypto on a VF,
until PF IPsec offloads are cleared.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose port function commands to enable / disable IPsec packet offloads,
this is used to control the port IPsec capabilities.
When IPsec packet is disabled for a function of the port (default),
function cannot offload IPsec packet operations (encapsulation and XFRM
policy offload). When enabled, IPsec packet operations can be offloaded
by the function of the port, which includes crypto operation
(Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy
offload.
Example of a PCI VF port which supports IPsec packet offloads:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable
$ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose port function commands to enable / disable IPsec crypto offloads,
this is used to control the port IPsec capabilities.
When IPsec crypto is disabled for a function of the port (default),
function cannot offload any IPsec crypto operations (Encrypt/Decrypt and
XFRM state offloading). When enabled, IPsec crypto operations can be
offloaded by the function of the port.
Example of a PCI VF port which supports IPsec crypto offloads:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable
$ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds a new sysctl, named scheduler, to support for selection
of different schedulers. Export mptcp_get_scheduler helper to get this
sysctl.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Point to NVIDIA documentation for device specific information now that the
Mellanox documentation site is deprecated. Refer to kernel documentation
sources for generic information not specific to mlx5 devices.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Improve aRFS observability by adding new set of counters. Each Rx
ring will have this set of counters listed below.
These counters are exposed through ethtool -S.
1) arfs_add: number of times a new rule has been created.
2) arfs_request_in: number of times a rule was requested to move from
its current Rx ring to a new Rx ring (incremented on the destination
Rx ring).
3) arfs_request_out: number of times a rule was requested to move out
from its current Rx ring (incremented on source/current Rx ring).
4) arfs_expired: number of times a rule has been expired by the
kernel and removed from HW.
5) arfs_err: number of times a rule creation or modification has
failed.
This patch removes rx[i]_xsk_arfs_err counter and its documentation in
mlx5/counters.rst since aRFS activity does not occur in XSK RQ's.
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/sfc/tc.c
fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add Kconfig entries and pds-vfio-pci.rst. Also, add an entry in the
MAINTAINERS file for this new driver.
It's not clear where documentation for vendor specific VFIO
drivers should live, so just re-use the current amd
ethernet location.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230807205755.29579-9-brett.creeley@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and
SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout
value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300
msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state.
As Paolo Valerio noticed, this might cause unwanted expiration of the ct
entry. In my test, with 1s tc netem delay set on the NAT path, after the
SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND
state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is
sent back from the peer, the sctp ct entry has expired and been deleted,
and then the SHUTDOWN_ACK has to be dropped.
Also, it is confusing these two sysctl options always show 0 due to all
timeout values using sec as unit:
net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0
net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0
This patch fixes it by also using 3 secs for sctp shutdown send and recv
state in sctp conntrack, which is also RTO.initial value in SCTP protocol.
Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV
was probably used for a rare scenario where SHUTDOWN is sent on 1st path
but SHUTDOWN_ACK is replied on 2nd path, then a new connection started
immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV
to CLOSE when receiving INIT in the ORIGINAL direction.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
A new check for the tx devlink health reporter is introduced for
determining when the PTP port timestamping SQ is considered unhealthy. If
there are enough CQEs considered never to be delivered, the space that can
be utilized on the SQ decreases significantly, impacting performance and
usability of the SQ. The health reporter is triggered when the number of
likely never delivered port timestamping CQEs that utilize the space of the
PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink
health reporter recover method is also provided for this specific TX error
context that restarts the PTP SQ.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>