linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-28 00:19:36 +00:00

Author	SHA1	Message	Date
Kuniyuki Iwashima	a3c4a125ec	netlink: Fix rmem check in netlink_broadcast_deliver(). We need to allow queuing at least one skb even when skb is larger than sk->sk_rcvbuf. The cited commit made a mistake while converting a condition in netlink_broadcast_deliver(). Let's correct the rmem check for the allow-one-skb rule. Fixes: `ae8f160e7e` ("netlink: Fix wraparounds of sk->sk_rmem_alloc.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711053208.2965945-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-11 07:31:41 -07:00
Jakub Kicinski	0f26870a98	netfilter pull request 25-07-10 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmhvEc8ACgkQ1w0aZmrP KyGYmBAAxw/EnlVp9sHWAPBNka6KOQECAb1LrnBVjv5nFZK3jHy8ehb/XjW0Q/rR ScjTgGJLBVuXMSGfFYBsDezesLYZLs/QK3amsTP9+HtYYGdOxdgvmTPMrUe2mF4W He/xrvWpVKTV00vnpVvZe5zbWnpw9iDTXbDecxmArqNVu78LOsRFU20S2MPc86ne wbgJohCS/QvhHPsKBUd83niGtKXR8uaFitLDaCgaaqqF4noXfNxIpdxm2O3NIqu1 2gbEAp3qi1YWHUYHQUxsK7W7JjY8xfd9EFLViRdQCI0aevE01GmwJc4J2oDlZ7Ki DqRFEYzpwpxqe0YwihAsB72faXMOsWFODwIiojlAue26W5iVLzu+w7hBUiOBiq6E CEBH2OSL47n3s/Va6e1Gb9W2A24x+Qp+EZEZb0Ci0I0MlstOw0aUutoc/P6F9MxI v9ApavHkt5/FImeXiTB3j61ZodgduC5WLZ7HCM5EjrCjlZznsJFuxtWSYARyYTUM aj3U00KRW85WgeuvWj1aTGTiT7dmTsQVo7XllEpzfXWvVxE0ATwfiLRfUQwtUwGx 6SecA49s34AwgdUKMynfPtlkxIzraGjhzJmGdTCA1Hi1fw2TUYniASmHRDKVQvUA 1jYCQucYf+Q1zybosscXepUCtsVSPTZTZE/gQSDCUy6WNvHnv84= =zECx -----END PGP SIGNATURE----- Merge tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next (v2) The following series contains an initial small batch of Netfilter updates for net-next: 1) Remove DCCP conntrack support, keep DCCP matches around in order to avoid breakage when loading ruleset, add Kconfig to wrap the code so it can be disabled by distributors. 2) Remove buggy code aiming at shrinking netlink deletion event, then re-add it correctly in another patch. This is to prevent -stable to pick up on a fix that breaks old userspace. From Phil Sutter. 3) Missing WARN_ON_ONCE() to check for lockdep_commit_lock_is_held() to uncover bugs. From Fedor Pchelkin. * tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: adjust lockdep assertions handling netfilter: nf_tables: Reintroduce shortened deletion notifications netfilter: nf_tables: Drop dead code from fill_*_info routines netfilter: conntrack: remove DCCP protocol support ==================== Link: https://patch.msgid.link/20250710010706.2861281-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 18:18:41 -07:00
Guillaume Nault	4e914ef063	gre: Fix IPv6 multicast route creation. Use addrconf_add_dev() instead of ipv6_find_idev() in addrconf_gre_config() so that we don't just get the inet6_dev, but also install the default ff00::/8 multicast route. Before commit `3e6a0243ff` ("gre: Fix again IPv6 link-local address generation."), the multicast route was created at the end of the function by addrconf_add_mroute(). But this code path is now only taken in one particular case (gre devices not bound to a local IP address and in EUI64 mode). For all other cases, the function exits early and addrconf_add_mroute() is not called anymore. Using addrconf_add_dev() instead of ipv6_find_idev() in addrconf_gre_config(), fixes the problem as it will create the default multicast route for all gre devices. This also brings addrconf_gre_config() a bit closer to the normal netdevice IPv6 configuration code (addrconf_dev_config()). Cc: stable@vger.kernel.org Fixes: `3e6a0243ff` ("gre: Fix again IPv6 link-local address generation.") Reported-by: Aiden Yang <ling@moedove.com> Closes: https://lore.kernel.org/netdev/CANR=AhRM7YHHXVxJ4DmrTNMeuEOY87K2mLmo9KMed1JMr20p6g@mail.gmail.com/ Reviewed-by: Gary Guo <gary@garyguo.net> Tested-by: Gary Guo <gary@garyguo.net> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/027a923dcb550ad115e6d93ee8bb7d310378bd01.1752070620.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 18:10:47 -07:00
Kito Xu	711c80f7d8	net: appletalk: Fix device refcount leak in atrtr_create() When updating an existing route entry in atrtr_create(), the old device reference was not being released before assigning the new device, leading to a device refcount leak. Fix this by calling dev_put() to release the old device reference before holding the new one. Fixes: `c7f905f0f6` ("[ATALK]: Add missing dev_hold() to atrtr_create().") Signed-off-by: Kito Xu <veritas501@foxmail.com> Link: https://patch.msgid.link/tencent_E1A26771CDAB389A0396D1681A90A49E5D09@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 18:01:08 -07:00
Jakub Kicinski	178331743c	ethtool: rss: report which fields are configured for hashing Implement ETHTOOL_GRXFH over Netlink. The number of flow types is reasonable (around 20) so report all of them at once for simplicity. Do not maintain the flow ID mapping with ioctl at the uAPI level. This gives us a chance to clean up the confusion that come from RxNFC vs RxFH (flow direction vs hashing) in the ioctl. Try to align with the names used in ethtool CLI, they seem to have stood the test of time just fine. One annoyance is that we still call L4 ports the weird names, but I guess they also apply to IPSec (where they cover the SPI) so it is what it is. $ ynl --family ethtool --dump rss-get { "header": { "dev-index": 1, "dev-name": "enp1s0" }, "hfunc": 1, "hkey": b"...", "indir": [0, 1, ...], "flow-hash": { "ether": {"l2da"}, "ah-esp4": {"ip-src", "ip-dst"}, "ah-esp6": {"ip-src", "ip-dst"}, "ah4": {"ip-src", "ip-dst"}, "ah6": {"ip-src", "ip-dst"}, "esp4": {"ip-src", "ip-dst"}, "esp6": {"ip-src", "ip-dst"}, "ip4": {"ip-src", "ip-dst"}, "ip6": {"ip-src", "ip-dst"}, "sctp4": {"ip-src", "ip-dst"}, "sctp6": {"ip-src", "ip-dst"}, "udp4": {"ip-src", "ip-dst"}, "udp6": {"ip-src", "ip-dst"} "tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"}, "tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"}, }, } Link: https://patch.msgid.link/20250708220640.2738464-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:57:49 -07:00
Jakub Kicinski	d7974697de	ethtool: mark ETHER_FLOW as usable for Rx hash Looks like some drivers (ena, enetc, fbnic.. there's probably more) consider ETHER_FLOW to be legitimate target for flow hashing. I'm not sure how intentional that is from the uAPI perspective vs just an effect of ethtool IOCTL doing minimal input validation. But Netlink will do strict validation, so we need to decide whether we allow this use case or not. I don't see a strong reason against it, and rejecting it would potentially regress a number of drivers. So update the comments and flow_type_hashable(). Link: https://patch.msgid.link/20250708220640.2738464-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:57:49 -07:00
Jakub Kicinski	400244eaa2	ethtool: rss: make sure dump takes the rss lock After commit `040cef30b5` ("net: ethtool: move get_rxfh callback under the rss_lock") we're expected to take rss_lock around get. Switch dump to using the new prep helper and move the locking into it. Link: https://patch.msgid.link/20250708220640.2738464-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:57:49 -07:00
Jakub Kicinski	809f683324	Quite a bit more work, notably: - mt76: firmware recovery improvements, MLO work - iwlwifi: use embedded PNVM in (to be released) FW images to fix compatibility issues - cfg80211/mac80211: extended regulatory info support (6 GHz) - cfg80211: use "faux device" for regulatory -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhvskoACgkQ10qiO8sP aACdthAAiN1wzi9Hrpm2ULqDecEL553ecvfOPAzlYibF4wiulVEdKJd7nFn03udA n2ylEgiANbTb+yEgDjj6IwQ9w5XmYvlLSGO/wIag3UUSWUYuCC5if/JEq56lom+B IZFMOFXpvO35jSe1h1v0HuuJvbeZv7KDWMJlA4aSXbLx4Y1juAuQid4/YllUkuXt QgAziAhF7laNk+8nfQLQ3N1DQytftiDK32vCJ6kJ7ciEhh8qxwT5aVkmE0Q/iOLg na4TdrcRRMQ7kkgODqksGz1nVrr/0AHyMVi/rQ/YaL1uY8fBxvbs0nmjU8G78LkO gM/401kySEzgLl6x/xrnaVUupbAtsyYrHYH+4q6Z07UPg1OgYY4G71rJIMp/Hq52 /iQVMOFmbhoIqFLqlvmh92QqZRH+rMf/KnO9Pnh/wVdxGH74ZP0bFYB02bqdaRie uxtO9lHfnuMedET85D747rmgAqrBJ2t3BqAD++LNdqF530eOaWiBkAPeAbTRSgis d1BJoDtWL3l0tjAA1ivdUw/fPjqWxffpdDTtzSwRjuFUNE5YPpz3VuKNSurSgvMZ DCSvxTHf5DLVrg6b4W/YXickD2kArJLpaCwkxzkpyBzh6wyRvxs+b/oZSQRVeKvL C36I3WNDqARfC29ilYUiz/G8kJry8IQaSuLaMP6X92Fa6Hdl7Kg= =qkYl -----END PGP SIGNATURE----- Merge tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Quite a bit more work, notably: - mt76: firmware recovery improvements, MLO work - iwlwifi: use embedded PNVM in (to be released) FW images to fix compatibility issues - cfg80211/mac80211: extended regulatory info support (6 GHz) - cfg80211: use "faux device" for regulatory * tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (48 commits) wifi: mac80211: don't complete management TX on SAE commit wifi: cfg80211/mac80211: implement dot11ExtendedRegInfoSupport wifi: mac80211: send extended MLD capa/ops if AP has it wifi: mac80211: copy first_part into HW scan wifi: cfg80211: add a flag for the first part of a scan wifi: mac80211: remove DISALLOW_PUNCTURING_5GHZ code wifi: cfg80211: only verify part of Extended MLD Capabilities wifi: nl80211: make nl80211_check_scan_flags() type safe wifi: cfg80211: hide scan internals wifi: mac80211: fix deactivated link CSA wifi: mac80211: add mandatory bitrate support for 6 GHz wifi: mac80211: remove spurious blank line wifi: mac80211: verify state before connection wifi: mac80211: avoid weird state in error path wifi: iwlwifi: mvm: remove support for iwl_wowlan_info_notif_v4 wifi: iwlwifi: bump minimum API version in BZ wifi: iwlwifi: mvm: remove unneeded argument wifi: iwlwifi: mvm: remove MLO GTK rekey code wifi: iwlwifi: pcie: rename iwl_pci_gen1_2_probe() argument wifi: iwlwifi: match discrete/integrated to fix some names ... ==================== Link: https://patch.msgid.link/20250710123113.24878-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:24:22 -07:00
Jakub Kicinski	7ac5cc2616	Quite a number of fixes still: - mt76 (hadn't sent any fixes so far) - RCU - scanning - decapsulation offload - interface combinations - rt2x00: build fix (bad function pointer prototype) - cfg80211: prevent A-MSDU flipping attacks in mesh - zd1211rw: prevent race ending with NULL ptr deref - cfg80211/mac80211: more S1G fixes - mwifiex: avoid WARN on certain RX frames - mac80211: - avoid stack data leak in WARN cases - fix non-transmitted BSSID search (on certain multi-BSSID APs) - always initialize key list so driver iteration won't crash - fix monitor interface in device restart - fix __free() annotation usage -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhvr/IACgkQ10qiO8sP aAB6axAAkDobxlyAB2SwqM9Es5JQcK/iAmQg3mdAsYxFGMRMz5nzzBszCfqAoAX/ 2PQvODLfd14Bbfc5svWtWmUz/Fie0Agk6GavOr9zMtIPLJL6/Q7lInjhbZ4zCNiD zxM2HjUzZMkomKHBfniiUPLd9WBQwrBKjvV5ub/f+w5ExCV+xILoP5+Mm42cPTCU in96FKqe2j0TSXrrPBnmKwMlS93s2NRmzKdyO5U94q8kCZXbQ93mosLKqjH8dpW5 UGvcu1tY0gWqLR434UcBeCKEQhWsrgVq6YgmuBya2OlnQ0Z29lVsK6Jjf3bYuJJK 6+zJS0vp6yRYUehVukhqyosz4WtHoMbNZz6JMF+glzJTm2pheke3XByHWkNVvbhL d6kTsMmtYkQHSFe9d3y54HO2eg910MnRSQwaxOA0id0FbcZ+57l7VpuNK6/Y8SF6 OvhPt6Rsm0zwtd4rCrdcnMJwxYLFMzdlFw3rkXgAoHrU5yXxlB7mG+Nbh6rAn5t9 VcT1iXqPZqsevgoGiWa0/VRd/U5sL/pXoV/7zigvOQZ78v6q2GJ5LD0Uwyx+0kMm T+cIPxjMb9kGHfvKRQ1aGCUm97415CdMNBKFErkQIXYxAykstN0RXZ8Ad5ZB9ZUg zL4TCThsWpSv0mJfVQD/KbgsBRMunnZNXiabJ40XIIFksLJHpO4= =Gldt -----END PGP SIGNATURE----- Merge tag 'wireless-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a number of fixes still: - mt76 (hadn't sent any fixes so far) - RCU - scanning - decapsulation offload - interface combinations - rt2x00: build fix (bad function pointer prototype) - cfg80211: prevent A-MSDU flipping attacks in mesh - zd1211rw: prevent race ending with NULL ptr deref - cfg80211/mac80211: more S1G fixes - mwifiex: avoid WARN on certain RX frames - mac80211: - avoid stack data leak in WARN cases - fix non-transmitted BSSID search (on certain multi-BSSID APs) - always initialize key list so driver iteration won't crash - fix monitor interface in device restart - fix __free() annotation usage * tag 'wireless-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (26 commits) wifi: mac80211: add the virtual monitor after reconfig complete wifi: mac80211: always initialize sdata::key_list wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs() wifi: mt76: mt792x: Limit the concurrent STA and SoftAP to operate on the same channel wifi: mt76: mt7925: Fix null-ptr-deref in mt7925_thermal_init() wifi: mt76: fix queue assignment for deauth packets wifi: mt76: add a wrapper for wcid access with validation wifi: mt76: mt7921: prevent decap offload config before STA initialization wifi: mt76: mt7925: prevent NULL pointer dereference in mt7925_sta_set_decap_offload() wifi: mt76: mt7925: fix incorrect scan probe IE handling for hw_scan wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan wifi: mt76: mt7925: fix the wrong config for tx interrupt wifi: mt76: Remove RCU section in mt7996_mac_sta_rc_work() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl() wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl_fixed() wifi: mt76: Move RCU section in mt7996_mcu_set_fixed_field() wifi: mt76: Assume __mt76_connac_mcu_alloc_sta_req runs in atomic context wifi: prevent A-MSDU attacks in mesh networks wifi: rt2x00: fix remove callback type mismatch wifi: mac80211: reject VHT opmode for unsupported channel widths ... ==================== Link: https://patch.msgid.link/20250710122212.24272-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 17:13:46 -07:00
Wang Liang	96698d1898	net: replace ND_PRINTK with dynamic debug ND_PRINTK with val > 1 only works when the ND_DEBUG was set in compilation phase. Replace it with dynamic debug. Convert ND_PRINTK with val <= 1 to net_{err,warn}_ratelimited, and convert the rest to net_dbg_ratelimited. Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250708033342.1627636-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 15:27:32 -07:00
Jakub Kicinski	3321e97eab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml `0a12c435a1` ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") `b3603c0466` ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 10:10:49 -07:00
Jason Xing	45e359be1c	net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt This patch provides a setsockopt method to let applications leverage to adjust how many descs to be handled at most in one send syscall. It mitigates the situation where the default value (32) that is too small leads to higher frequency of triggering send syscall. Considering the prosperity/complexity the applications have, there is no absolutely ideal suggestion fitting all cases. So keep 32 as its default value like before. The patch does the following things: - Add XDP_MAX_TX_SKB_BUDGET socket option. - Set max_tx_budget to 32 by default in the initialization phase as a per-socket granular control. - Set the range of max_tx_budget as [32, xs->tx->nentries]. The idea behind this comes out of real workloads in production. We use a user-level stack with xsk support to accelerate sending packets and minimize triggering syscalls. When the packets are aggregated, it's not hard to hit the upper bound (namely, 32). The moment user-space stack fetches the -EAGAIN error number passed from sendto(), it will loop to try again until all the expected descs from tx ring are sent out to the driver. Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of sendto() and higher throughput/PPS. Here is what I did in production, along with some numbers as follows: For one application I saw lately, I suggested using 128 as max_tx_budget because I saw two limitations without changing any default configuration: 1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind this was I counted how many descs are transmitted to the driver at one time of sendto() based on [1] patch and then I calculated the possibility of hitting the upper bound. Finally I chose 128 as a suitable value because 1) it covers most of the cases, 2) a higher number would not bring evident results. After twisting the parameters, a stable improvement of around 4% for both PPS and throughput and less resources consumption were found to be observed by strace -c -p xxx: 1) %time was decreased by 7.8% 2) error counter was decreased from 18367 to 572 [1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/ Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-10 14:48:29 +02:00
Miri Korenblit	c07981af55	wifi: mac80211: add the virtual monitor after reconfig complete In reconfig we add the virtual monitor in 2 cases: 1. If we are resuming (it was deleted on suspend) 2. If it was added after an error but before the reconfig (due to the last non-monitor interface removal). In the second case, the removal of the non-monitor interface will succeed but the addition of the virtual monitor will fail, so we add it in the reconfig. The problem is that we mislead the driver to think that this is an existing interface that is getting re-added - while it is actually a completely new interface from the drivers' point of view. Some drivers act differently when a interface is re-added. For example, it might not initialize things because they were already initialized. Such drivers will - in this case - be left with a partialy initialized vif. To fix it, add the virtual monitor after reconfig_complete, so the driver will know that this is a completely new interface. Fixes: `3c3e21e744` ("mac80211: destroy virtual monitor interface across suspend") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709233451.648d39b041e8.I2e37b68375278987e303d6c00cc5f3d8334d2f96@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-10 13:27:14 +02:00
Miri Korenblit	d7a54d02db	wifi: mac80211: always initialize sdata::key_list This is currently not initialized for a virtual monitor, leading to a NULL pointer dereference when - for example - iterating over all the keys of all the vifs. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709233400.8dcefe578497.I4c90a00ae3256520e063199d7f6f2580d5451acf@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-10 13:26:13 +02:00
Xiang Mei	dd831ac822	net/sched: sch_qfq: Fix null-deref in agg_dequeue To prevent a potential crash in agg_dequeue (net/sched/sch_qfq.c) when cl->qdisc->ops->peek(cl->qdisc) returns NULL, we check the return value before using it, similar to the existing approach in sch_hfsc.c. To avoid code duplication, the following changes are made: 1. Changed qdisc_warn_nonwc(include/net/pkt_sched.h) into a static inline function. 2. Moved qdisc_peek_len from net/sched/sch_hfsc.c to include/net/pkt_sched.h so that sch_qfq can reuse it. 3. Applied qdisc_peek_len in agg_dequeue to avoid crashing. Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250705212143.3982664-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-10 11:08:35 +02:00
David Howells	880a88f318	rxrpc: Fix oops due to non-existence of prealloc backlog struct If an AF_RXRPC service socket is opened and bound, but calls are preallocated, then rxrpc_alloc_incoming_call() will oops because the rxrpc_backlog struct doesn't get allocated until the first preallocation is made. Fix this by returning NULL from rxrpc_alloc_incoming_call() if there is no backlog struct. This will cause the incoming call to be aborted. Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Suggested-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Willy Tarreau <w@1wt.eu> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250708211506.2699012-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:41:44 -07:00
David Howells	69e4186773	rxrpc: Fix bug due to prealloc collision When userspace is using AF_RXRPC to provide a server, it has to preallocate incoming calls and assign to them call IDs that will be used to thread related recvmsg() and sendmsg() together. The preallocated call IDs will automatically be attached to calls as they come in until the pool is empty. To the kernel, the call IDs are just arbitrary numbers, but userspace can use the call ID to hold a pointer to prepared structs. In any case, the user isn't permitted to create two calls with the same call ID (call IDs become available again when the call ends) and EBADSLT should result from sendmsg() if an attempt is made to preallocate a call with an in-use call ID. However, the cleanup in the error handling will trigger both assertions in rxrpc_cleanup_call() because the call isn't marked complete and isn't marked as having been released. Fix this by setting the call state in rxrpc_service_prealloc_one() and then marking it as being released before calling the cleanup function. Fixes: `00e907127e` ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250708211506.2699012-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:41:44 -07:00
Xuewei Niu	f7c7226592	vsock: Add support for SIOCINQ ioctl Add support for SIOCINQ ioctl, indicating the length of bytes unread in the socket. The value is obtained from `vsock_stream_has_data()`. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20250708-siocinq-v6-2-3775f9a9e359@antgroup.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:29:52 -07:00
Dexuan Cui	f0c5827d07	hv_sock: Return the readable bytes in hvs_stream_has_data() When hv_sock was originally added, __vsock_stream_recvmsg() and vsock_stream_has_data() actually only needed to know whether there is any readable data or not, so hvs_stream_has_data() was written to return 1 or 0 for simplicity. However, now hvs_stream_has_data() should return the readable bytes because vsock_data_ready() -> vsock_stream_has_data() needs to know the actual bytes rather than a boolean value of 1 or 0. The SIOCINQ ioctl support also needs hvs_stream_has_data() to return the readable bytes. Let hvs_stream_has_data() return the readable bytes of the payload in the next host-to-guest VMBus hv_sock packet. Note: there may be multiple incoming hv_sock packets pending in the VMBus channel's ringbuffer, but so far there is not a VMBus API that allows us to know all the readable bytes in total without reading and caching the payload of the multiple packets, so let's just return the readable bytes of the next single packet. In the future, we'll either add a VMBus API that allows us to know the total readable bytes without touching the data in the ringbuffer, or the hv_sock driver needs to understand the VMBus packet format and parse the packets directly. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://patch.msgid.link/20250708-siocinq-v6-1-3775f9a9e359@antgroup.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:29:52 -07:00
Feng Yang	76d727ae02	skbuff: Add MSG_MORE flag to optimize tcp large packet transmission When using sockmap for forwarding, the average latency for different packet sizes after sending 10,000 packets is as follows: size old(us) new(us) 512 56 55 1472 58 58 1600 106 81 3000 145 105 5000 182 125 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250708054053.39551-1-yangfeng59949@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:25:57 -07:00
Easwar Hariharan	31326d9841	net: ipconfig: convert timeouts to secs_to_jiffies() Commit `b35108a51c` ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) While here, manually convert a couple timeouts denominated in seconds Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-2-b7817036342f@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:25:01 -07:00
Easwar Hariharan	4814f9110e	net/smc: convert timeouts to secs_to_jiffies() Commit `b35108a51c` ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Link: https://patch.msgid.link/20250707-netdev-secs-to-jiffies-part-2-v2-1-b7817036342f@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:25:01 -07:00
Eric Dumazet	1a03edeb84	tcp: refine sk_rcvbuf increase for ooo packets When a passive flow has not been accepted yet, it is not wise to increase sk_rcvbuf when receiving ooo packets. A very busy server might tune down tcp_rmem[1] to better control how much memory can be used by sockets waiting in its listeners accept queues. Fixes: `63ad7dfedf` ("tcp: adjust rcvbuf in presence of reorders") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250707213900.1543248-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:24:10 -07:00
Victor Nogueira	ffdde7bf5a	net/sched: Abort __tc_modify_qdisc if parent class does not exist Lion's patch [1] revealed an ancient bug in the qdisc API. Whenever a user creates/modifies a qdisc specifying as a parent another qdisc, the qdisc API will, during grafting, detect that the user is not trying to attach to a class and reject. However grafting is performed after qdisc_create (and thus the qdiscs' init callback) is executed. In qdiscs that eventually call qdisc_tree_reduce_backlog during init or change (such as fq, hhf, choke, etc), an issue arises. For example, executing the following commands: sudo tc qdisc add dev lo root handle a: htb default 2 sudo tc qdisc add dev lo parent a: handle beef fq Qdiscs such as fq, hhf, choke, etc unconditionally invoke qdisc_tree_reduce_backlog() in their control path init() or change() which then causes a failure to find the child class; however, that does not stop the unconditional invocation of the assumed child qdisc's qlen_notify with a null class. All these qdiscs make the assumption that class is non-null. The solution is ensure that qdisc_leaf() which looks up the parent class, and is invoked prior to qdisc_create(), should return failure on not finding the class. In this patch, we leverage qdisc_leaf to return ERR_PTRs whenever the parentid doesn't correspond to a class, so that we can detect it earlier on and abort before qdisc_create is called. [1] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Fixes: `5e50da01d0` ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs") Reported-by: syzbot+d8b58d7b0ad89a678a16@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68663c93.a70a0220.5d25f.0857.GAE@google.com/ Reported-by: syzbot+5eccb463fa89309d8bdc@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68663c94.a70a0220.5d25f.0858.GAE@google.com/ Reported-by: syzbot+1261670bbdefc5485a06@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0013.GAE@google.com/ Reported-by: syzbot+15b96fc3aac35468fe77@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0014.GAE@google.com/ Reported-by: syzbot+4dadc5aecf80324d5a51@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68679e81.a70a0220.29cf51.0016.GAE@google.com/ Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250707210801.372995-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:23:25 -07:00
Yue Haibing	22fc46cea9	atm: clip: Fix NULL pointer dereference in vcc_sendmsg() atmarpd_dev_ops does not implement the send method, which may cause crash as bellow. BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: Oops: 0010 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5324 Comm: syz.0.0 Not tainted 6.15.0-rc6-syzkaller-00346-g5723cc3450bc #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffc9000d3cf778 EFLAGS: 00010246 RAX: 1ffffffff1910dd1 RBX: 00000000000000c0 RCX: dffffc0000000000 RDX: ffffc9000dc82000 RSI: ffff88803e4c4640 RDI: ffff888052cd0000 RBP: ffffc9000d3cf8d0 R08: ffff888052c9143f R09: 1ffff1100a592287 R10: dffffc0000000000 R11: 0000000000000000 R12: 1ffff92001a79f00 R13: ffff888052cd0000 R14: ffff88803e4c4640 R15: ffffffff8c886e88 FS: 00007fbc762566c0(0000) GS:ffff88808d6c2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000041f1b000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> vcc_sendmsg+0xa10/0xc50 net/atm/common.c:644 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x219/0x270 net/socket.c:727 ____sys_sendmsg+0x52d/0x830 net/socket.c:2566 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620 __sys_sendmmsg+0x227/0x430 net/socket.c:2709 __do_sys_sendmmsg net/socket.c:2736 [inline] __se_sys_sendmmsg net/socket.c:2733 [inline] __x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2733 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+e34e5e6b5eddb0014def@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/682f82d5.a70a0220.1765ec.0143.GAE@google.com/T Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250705085228.329202-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:09:36 -07:00
Ivan Vecera	de9ccf2296	devlink: Add new "clock_id" generic device param Add a new device generic parameter to specify clock ID that should be used by the device for registering DPLL devices and pins. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:08:52 -07:00
Ivan Vecera	c0ef144695	devlink: Add support for u64 parameters Only 8, 16 and 32-bit integers are supported for numeric devlink parameters. The subsequent patch adds support for DPLL clock ID that is defined as 64-bit number. Add support for u64 parameter type. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-4-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 19:08:52 -07:00
Fedor Pchelkin	8df1b40de7	netfilter: nf_tables: adjust lockdep assertions handling It's needed to check the return value of lockdep_commit_lock_is_held(), otherwise there's no point in this assertion as it doesn't print any debug information on itself. Found by Linux Verification Center (linuxtesting.org) with Svace static analysis tool. Fixes: `b04df3da1b` ("netfilter: nf_tables: do not defer rule destruction via call_rcu") Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-07-10 03:01:22 +02:00
Phil Sutter	a1050dd071	netfilter: nf_tables: Reintroduce shortened deletion notifications Restore commit `28339b21a3` ("netfilter: nf_tables: do not send complete notification of deletions") and fix it: - Avoid upfront modification of 'event' variable so the conditionals become effective. - Always include NFTA_OBJ_TYPE attribute in object notifications, user space requires it for proper deserialisation. - Catch DESTROY events, too. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-07-10 03:01:14 +02:00
Phil Sutter	8080357a8c	netfilter: nf_tables: Drop dead code from fill_*_info routines This practically reverts commit `28339b21a3` ("netfilter: nf_tables: do not send complete notification of deletions"): The feature was never effective, due to prior modification of 'event' variable the conditional early return never happened. User space also relies upon the current behaviour, so better reintroduce the shortened deletion notifications once it is fixed. Fixes: `28339b21a3` ("netfilter: nf_tables: do not send complete notification of deletions") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-07-10 03:01:02 +02:00
Kuniyuki Iwashima	c489f3283d	atm: clip: Fix infinite recursive call of clip_push(). syzbot reported the splat below. [0] This happens if we call ioctl(ATMARP_MKIP) more than once. During the first call, clip_mkip() sets clip_push() to vcc->push(), and the second call copies it to clip_vcc->old_push(). Later, when the socket is close()d, vcc_destroy_socket() passes NULL skb to clip_push(), which calls clip_vcc->old_push(), triggering the infinite recursion. Let's prevent the second ioctl(ATMARP_MKIP) by checking vcc->user_back, which is allocated by the first call as clip_vcc. Note also that we use lock_sock() to prevent racy calls. [0]: BUG: TASK stack guard page was hit at ffffc9000d66fff8 (stack is ffffc9000d670000..ffffc9000d678000) Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5322 Comm: syz.0.0 Not tainted 6.16.0-rc4-syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:clip_push+0x5/0x720 net/atm/clip.c:191 Code: e0 8f aa 8c e8 1c ad 5b fa eb ae 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 20 48 89 f3 49 89 fd 48 bd 00 RSP: 0018:ffffc9000d670000 EFLAGS: 00010246 RAX: 1ffff1100235a4a5 RBX: ffff888011ad2508 RCX: ffff8880003c0000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888037f01000 RBP: dffffc0000000000 R08: ffffffff8fa104f7 R09: 1ffffffff1f4209e R10: dffffc0000000000 R11: ffffffff8a99b300 R12: ffffffff8a99b300 R13: ffff888037f01000 R14: ffff888011ad2500 R15: ffff888037f01578 FS: 000055557ab6d500(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000d66fff8 CR3: 0000000043172000 CR4: 0000000000352ef0 Call Trace: <TASK> clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 ... clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 clip_push+0x6dc/0x720 net/atm/clip.c:200 vcc_destroy_socket net/atm/common.c:183 [inline] vcc_release+0x157/0x460 net/atm/common.c:205 __sock_release net/socket.c:647 [inline] sock_close+0xc0/0x240 net/socket.c:1391 __fput+0x449/0xa70 fs/file_table.c:465 task_work_run+0x1d1/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff31c98e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffb5aa1f78 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 0000000000012747 RCX: 00007ff31c98e929 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007ff31cbb7ba0 R08: 0000000000000001 R09: 0000000db5aa226f R10: 00007ff31c7ff030 R11: 0000000000000246 R12: 00007ff31cbb608c R13: 00007ff31cbb6080 R14: ffffffffffffffff R15: 00007fffb5aa2090 </TASK> Modules linked in: Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+0c77cccd6b7cd917b35a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2371d94d248d126c1eb1 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 17:52:26 -07:00
Kuniyuki Iwashima	62dba28275	atm: clip: Fix memory leak of struct clip_vcc. ioctl(ATMARP_MKIP) allocates struct clip_vcc and set it to vcc->user_back. The code assumes that vcc_destroy_socket() passes NULL skb to vcc->push() when the socket is close()d, and then clip_push() frees clip_vcc. However, ioctl(ATMARPD_CTRL) sets NULL to vcc->push() in atm_init_atmarp(), resulting in memory leak. Let's serialise two ioctl() by lock_sock() and check vcc->push() in atm_init_atmarp() to prevent memleak. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 17:52:26 -07:00
Kuniyuki Iwashima	706cc36477	atm: clip: Fix potential null-ptr-deref in to_atmarpd(). atmarpd is protected by RTNL since commit `f3a0592b37` ("[ATM]: clip causes unregister hang"). However, it is not enough because to_atmarpd() is called without RTNL, especially clip_neigh_solicit() / neigh_ops->solicit() is unsleepable. Also, there is no RTNL dependency around atmarpd. Let's use a private mutex and RCU to protect access to atmarpd in to_atmarpd(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250704062416.1613927-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 17:52:26 -07:00
Johannes Berg	6b04716cdc	wifi: mac80211: don't complete management TX on SAE commit When SAE commit is sent and received in response, there's no ordering for the SAE confirm messages. As such, don't call drivers to stop listening on the channel when the confirm message is still expected. This fixes an issue if the local confirm is transmitted later than the AP's confirm, for iwlwifi (and possibly mt76) the AP's confirm would then get lost since the device isn't on the channel at the time the AP transmit the confirm. For iwlwifi at least, this also improves the overall timing of the authentication handshake (by about 15ms according to the report), likely since the session protection won't be aborted and rescheduled. Note that even before this, mgd_complete_tx() wasn't always called for each call to mgd_prepare_tx() (e.g. in the case of WEP key shared authentication), and the current drivers that have the complete callback don't seem to mind. Document this as well though. Reported-by: Jan Hendrik Farr <kernel@jfarr.cc> Closes: https://lore.kernel.org/all/aB30Ea2kRG24LINR@archlinux/ Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213232.12691580e140.I3f1d3127acabcd58348a110ab11044213cf147d3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:56:45 +02:00
Somashekhar Puttagangaiah	a11ec0dc92	wifi: cfg80211/mac80211: implement dot11ExtendedRegInfoSupport Implement dot11ExtendedRegInfoSupport to advertise non-AP station regulatory power capability as part of regulatory connectivity element in (Re)Association request frames so that AP can achieve maximum client connectivity. Control field which was interpreted using value of 3-bits B5 to B3, now uses value of 4-bits B6 to B3 to interpret the type of AP. Hence update IEEE80211_HE_6GHZ_OPER_CTRL_REG_INFO to parse 4-bits control field. If older AP still updates only 3-bits value of control field, station can still interpret the value as per section E.2.7 of IEEE 802.11 REVme D7.0 and support the appropriate AP type. Also update IEEE80211_6GHZ_CTRL_REG_INDOOR_SP_AP as the value of standard power AP is changed to 8 instead of 4 so that AP can support both LPI AP and SP AP to maximize the connectivity with stations. For backward compatibility, keeping value 4 as old AP by limiting it to SP AP only. Signed-off-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213232.90cdef116aad.I85da390fbee59355e3855691933e6a5e55c47ac4@changeid [fix kernel-doc] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:56:37 +02:00
Johannes Berg	a9681efa1b	wifi: mac80211: send extended MLD capa/ops if AP has it Currently the code only sends extended MLD capa/ops in strict mode, but if the AP has it then it should also be able to parse it. There could be cases where the AP doesn't have it but we would want to advertise it (e.g. if the AP supports nothing but we want to have BTM.), but given the broken deployed APs out there right now this is the best we can do. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213232.c9b8b3a6ca77.I1153d4283d1fbb9e5db60e7b939cc133a6345db5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:36 +02:00
Benjamin Berg	ff1ac756ea	wifi: mac80211: copy first_part into HW scan cfg80211 now reports whether this is the first part of a scan. Copy that information into the driver request. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.63f6078bd7be.Ia6e5cee945e6d9617c2f427552d89d23c92eee83@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:36 +02:00
Benjamin Berg	62c57ebb31	wifi: cfg80211: add a flag for the first part of a scan When there are no non-6 GHz channels, then the 6 GHz scan is the first part of a split scan. Add a boolean denoting whether the scan is the first part of a scan as it might be useful to drivers for internal bookkeeping. This flag is also set if the scan is not split. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.07e5a8a452ec.Ibf18f513e507422078fb31b28947e582a20df87a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:36 +02:00
Johannes Berg	984462751d	wifi: mac80211: remove DISALLOW_PUNCTURING_5GHZ code Since iwlwifi was the only driver using this and no longer does, we can remove all this code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.4dff5fb8890f.Ie531f912b252a0042c18c0734db50c3afe1adfb5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:36 +02:00
Benjamin Berg	afebe192eb	wifi: cfg80211: only verify part of Extended MLD Capabilities We verify that the Extended MLD Capabilities are matching between links. However, some bits are reserved and in particular the Recommended Max Links subfield may not necessarily match. So only verify the known subfields that can reliably be expected to be the same. More information can be found in Table 9-417o, in IEEE P802.11be/D7.0. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.a2fad48dd3e6.Iae1740cd2ac833bc4a64fd2af718e1485158fd42@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Johannes Berg	a1d9979c36	wifi: nl80211: make nl80211_check_scan_flags() type safe The cast from void * here coupled with the boolean argument on what to cast to is confusing and really not needed, just split the code and make a type-safe interface. It seems to even reduce the code size slightly, at least on x86-64. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.bdb3c96570b0.Ia153e6ce06dc9a636ff5bcc1d52468a1afd06e13@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Johannes Berg	f0df91b6a7	wifi: cfg80211: hide scan internals Hide the internal scan fields from mac80211 and drivers, the 'notified' variable is for internal tracking, and the 'info' is output that's passed to cfg80211_scan_done() and stored only for delayed userspace notification. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.6a62e41858e2.I004f66e9c087cc6e6ae4a24951cf470961ee9466@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Johannes Berg	6f9e701c16	wifi: mac80211: fix deactivated link CSA If the link is deactivated and the CSA completes, then that needs to update the link station's bandwidth (only the AP STA can exist at this point, no TDLS on inactive links) and set the CSA to no longer be active. Fix this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.07f120cf687d.I5a868c501ee73fcc2355d61c2ee06e5f444b350f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Somashekhar Puttagangaiah	bc7566fbc4	wifi: mac80211: add mandatory bitrate support for 6 GHz When a new station is added, ensure that mandatory bit-rates are enabled for 6 GHz band. Signed-off-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.4aecd7f3b85b.I33a54872a3267c9f6155ce537d6c9c2a31c3f117@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Johannes Berg	798dd0e260	wifi: mac80211: remove spurious blank line ieee80211_process_ml_reconf_resp() has a blank line between an if statement and the covered code, remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.a1f4ceae700d.I1d7aae17cc466c1648f31c42b935165db85d2809@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Miri Korenblit	eb7186bd82	wifi: mac80211: verify state before connection ieee80211_prep_connection is supposed to be called when both bitmaps (valid_links and active_links) are cleared. Make sure of it and WARN if this is not the case, to avoid weird issues. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250609213231.f616c7b693df.Ie983155627ad0d2e7c19c30ce642915246d0ed9d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:52:35 +02:00
Pagadala Yesu Anjaneyulu	a066917360	wifi: mac80211: Fix uninitialized variable with __free() in ieee80211_ml_epcs() The cleanup attribute runs kfree() when the variable goes out of scope. There is a possibility that the link_elems variable is uninitialized if the loop ends before an assignment is made to this variable. This leads to uninitialized variable bug. Fix this by assigning link_elems to NULL. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250609213231.eeacd3738a7b.I0f876fa1359daeec47ab3aef098255a9c23efd70@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:42:16 +02:00
Miri Korenblit	be1ba9ed22	wifi: mac80211: avoid weird state in error path If we get to the error path of ieee80211_prep_connection, for example because of a FW issue, then ieee80211_vif_set_links is called with 0. But the call to drv_change_vif_links from ieee80211_vif_update_links will probably fail as well, for the same reason. In this case, the valid_links and active_links bitmaps will be reverted to the value of the failing connection. Then, in the next connection, due to the logic of ieee80211_set_vif_links_bitmaps, valid_links will be set to the ID of the new connection assoc link, but the active_links will remain with the ID of the old connection's assoc link. If those IDs are different, we get into a weird state of valid_links and active_links being different. One of the consequences of this state is to call drv_change_vif_links with new_links as 0, since the & operation between the bitmaps will be 0. Since a removal of a link should always succeed, ignore the return value of drv_change_vif_links if it was called to only remove links, which is the case for the ieee80211_prep_connection's error path. That way, the bitmaps will not be reverted to have the value from the failing connection and will have 0, so the next connection will have a good state. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250609213231.ba2011fb435f.Id87ff6dab5e1cf757b54094ac2d714c656165059@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-09 11:41:38 +02:00
Eric Dumazet	ea988b4506	udp: remove udp_tunnel_gro_init() Use DEFINE_MUTEX() to initialize udp_tunnel_gro_type_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707091634.311974-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:58:38 -07:00
Kuniyuki Iwashima	db38443dcd	ipv6: Remove setsockopt_needs_rtnl(). We no longer need to hold RTNL for IPv6 socket options. Let's remove setsockopt_needs_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-16-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:39 -07:00
Kuniyuki Iwashima	eb1ac9ff6c	ipv6: anycast: Don't hold RTNL for IPV6_JOIN_ANYCAST. inet6_sk(sk)->ipv6_ac_list is protected by lock_sock(). In ipv6_sock_ac_join(), only __dev_get_by_index(), __dev_get_by_flags(), and __in6_dev_get() require RTNL. __dev_get_by_flags() is only used by ipv6_sock_ac_join() and can be converted to RCU version. Let's replace RCU version helper and drop RTNL from IPV6_JOIN_ANYCAST. setsockopt_needs_rtnl() will be removed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-15-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:39 -07:00
Kuniyuki Iwashima	976fa9b205	ipv6: anycast: Unify two error paths in ipv6_sock_ac_join(). The next patch will replace __dev_get_by_index() and __dev_get_by_flags() to RCU + refcount version. Then, we will need to call dev_put() in some error paths. Let's unify two error paths to make the next patch cleaner. Note that we add READ_ONCE() for net->ipv6.devconf_all->forwarding and idev->conf.forwarding as we will drop RTNL that protects them. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-14-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:39 -07:00
Kuniyuki Iwashima	f7fdf13bf1	ipv6: anycast: Don't hold RTNL for IPV6_LEAVE_ANYCAST and IPV6_ADDRFORM. inet6_sk(sk)->ipv6_ac_list is protected by lock_sock(). In ipv6_sock_ac_drop() and ipv6_sock_ac_close(), only __dev_get_by_index() and __in6_dev_get() requrie RTNL. Let's replace them with dev_get_by_index() and in6_dev_get() and drop RTNL from IPV6_LEAVE_ANYCAST and IPV6_ADDRFORM. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-13-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:39 -07:00
Kuniyuki Iwashima	7b6b53a76f	ipv6: anycast: Don't use rtnl_dereference(). inet6_dev->ac_list is protected by inet6_dev->lock, so rtnl_dereference() is a bit rough annotation. As done in mcast.c, we can use ac_dereference() that checks if inet6_dev->lock is held. Let's replace rtnl_dereference() with a new helper ac_dereference(). Note that now addrconf_join_solict() / addrconf_leave_solict() in __ipv6_dev_ac_inc() / __ipv6_dev_ac_dec() does not need RTNL, so we can remove ASSERT_RTNL() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-12-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:39 -07:00
Kuniyuki Iwashima	49b8223fa9	ipv6: mcast: Remove unnecessary ASSERT_RTNL and comment. Now, RTNL is not needed for mcast code, and what's commented in ip6_mc_msfget() is apparent by for_each_pmc_socklock(), which has lockdep annotation for lock_sock(). Let's remove the comment and ASSERT_RTNL() in ipv6_mc_rejoin_groups(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-11-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	e6e14d582d	ipv6: mcast: Don't hold RTNL for MCAST_ socket options. In ip6_mc_source() and ip6_mc_msfilter(), per-socket mld data is protected by lock_sock() and inet6_dev->mc_lock is also held for some per-interface functions. ip6_mc_find_dev_rtnl() only depends on RTNL. If we want to remove it, we need to check inet6_dev->dead under mc_lock to close the race with addrconf_ifdown(), as mentioned earlier. Let's do that and drop RTNL for the rest of MCAST_ socket options. Note that ip6_mc_msfilter() has unnecessary lock dances and they are integrated into one to avoid the last-minute error and simplify the error handling. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-10-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	1e589db389	ipv6: mcast: Don't hold RTNL in ipv6_sock_mc_close(). In __ipv6_sock_mc_close(), per-socket mld data is protected by lock_sock(), and only __dev_get_by_index() and __in6_dev_get() require RTNL. Let's call __ipv6_sock_mc_drop() and drop RTNL in ipv6_sock_mc_close(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-9-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	2ceb71ce7d	ipv6: mcast: Don't hold RTNL for IPV6_DROP_MEMBERSHIP and MCAST_LEAVE_GROUP. In __ipv6_sock_mc_drop(), per-socket mld data is protected by lock_sock(), and only __dev_get_by_index() and __in6_dev_get() require RTNL. Let's use dev_get_by_index() and in6_dev_get() and drop RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP. Note that __ipv6_sock_mc_drop() is factorised to reuse in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-8-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	1767bb2d47	ipv6: mcast: Don't hold RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP. In __ipv6_sock_mc_join(), per-socket mld data is protected by lock_sock(), and only __dev_get_by_index() requires RTNL. Let's use dev_get_by_index() and drop RTNL for IPV6_ADD_MEMBERSHIP and MCAST_JOIN_GROUP. Note that we must call rt6_lookup() and dev_hold() under RCU. If rt6_lookup() returns an entry from the exception table, dst_dev_put() could change rt->dev.dst to loopback concurrently, and the original device could lose the refcount before dev_hold() and unblock device registration. dst_dev_put() is called from NETDEV_UNREGISTER and synchronize_net() follows it, so as long as rt6_lookup() and dev_hold() are called within the same RCU critical section, the dev is alive. Even if the race happens, they are synchronised by idev->dead and mcast addresses are cleaned up. For the racy access to rt->dst.dev, we use dst_dev(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-7-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	e01b193e0b	ipv6: mcast: Use in6_dev_get() in ipv6_dev_mc_dec(). As well as __ipv6_dev_mc_inc(), all code in __ipv6_dev_mc_dec() are protected by inet6_dev->mc_lock, and RTNL is not needed. Let's use in6_dev_get() in ipv6_dev_mc_dec() and remove ASSERT_RTNL() in __ipv6_dev_mc_dec(). Now, we can remove the RTNL comment above addrconf_leave_solict() too. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-6-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	d22faae8c5	ipv6: mcast: Remove mca_get(). Since commit `63ed8de4be` ("mld: add mc_lock for protecting per-interface mld data"), the newly allocated struct ifmcaddr6 cannot be removed until inet6_dev->mc_lock is released, so mca_get() and mc_put() are unnecessary. Let's remove the extra refcounting. Note that mca_get() was only used in __ipv6_dev_mc_inc(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-5-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:38 -07:00
Kuniyuki Iwashima	dbd40f318c	ipv6: mcast: Check inet6_dev->dead under idev->mc_lock in __ipv6_dev_mc_inc(). Since commit `63ed8de4be` ("mld: add mc_lock for protecting per-interface mld data"), every multicast resource is protected by inet6_dev->mc_lock. RTNL is unnecessary in terms of protection but still needed for synchronisation between addrconf_ifdown() and __ipv6_dev_mc_inc(). Once we removed RTNL, there would be a race below, where we could add a multicast address to a dead inet6_dev. CPU1 CPU2 ==== ==== addrconf_ifdown() __ipv6_dev_mc_inc() if (idev->dead) <-- false dead = true return -ENODEV; ipv6_mc_destroy_dev() / ipv6_mc_down() mutex_lock(&idev->mc_lock) ... mutex_unlock(&idev->mc_lock) mutex_lock(&idev->mc_lock) ... mutex_unlock(&idev->mc_lock) The race window can be easily closed by checking inet6_dev->dead under inet6_dev->mc_lock in __ipv6_dev_mc_inc() as addrconf_ifdown() will acquire it after marking inet6_dev dead. Let's check inet6_dev->dead under mc_lock in __ipv6_dev_mc_inc(). Note that now __ipv6_dev_mc_inc() no longer depends on RTNL and we can remove ASSERT_RTNL() there and the RTNL comment above addrconf_join_solict(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-4-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:37 -07:00
Kuniyuki Iwashima	818ae1a5ec	ipv6: mcast: Replace locking comments with lockdep annotations. Commit `63ed8de4be` ("mld: add mc_lock for protecting per-interface mld data") added the same comments regarding locking to many functions. Let's replace the comments with lockdep annotation, which is more helpful. Note that we just remove the comment for mld_clear_zeros() and mld_send_cr(), where mc_dereference() is used in the entry of the function. While at it, a comment for __ipv6_sock_mc_join() is moved back to the correct place. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-3-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:37 -07:00
Kuniyuki Iwashima	fb60b74e4e	ipv6: ndisc: Remove __in6_dev_get() in pndisc_{constructor,destructor}(). ipv6_dev_mc_{inc,dec}() has the same check. Let's remove __in6_dev_get() from pndisc_constructor() and pndisc_destructor(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-2-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:32:37 -07:00
Jason Xing	1eb8b0dac1	net: xsk: update tx queue consumer immediately after transmission For afxdp, the return value of sendto() syscall doesn't reflect how many descs handled in the kernel. One of use cases is that when user-space application tries to know the number of transmitted skbs and then decides if it continues to send, say, is it stopped due to max tx budget? The following formular can be used after sending to learn how many skbs/descs the kernel takes care of: tx_queue.consumers_before - tx_queue.consumers_after Prior to the current patch, in non-zc mode, the consumer of tx queue is not immediately updated at the end of each sendto syscall when error occurs, which leads to the consumer value out-of-dated from the perspective of user space. So this patch requires store operation to pass the cached value to the shared value to handle the problem. More than those explicit errors appearing in the while() loop in __xsk_generic_xmit(), there are a few possible error cases that might be neglected in the following call trace: __xsk_generic_xmit() xskq_cons_peek_desc() xskq_cons_read_desc() xskq_cons_is_valid_desc() It will also cause the premature exit in the while() loop even if not all the descs are consumed. Based on the above analysis, using @sent_frame could cover all the possible cases where it might lead to out-of-dated global state of consumer after finishing __xsk_generic_xmit(). The patch also adds a common helper __xsk_tx_release() to keep align with the zc mode usage in xsk_tx_release(). Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250703141712.33190-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:28:07 -07:00
Kuniyuki Iwashima	df30285b36	af_unix: Introduce SO_INQ. We have an application that uses almost the same code for TCP and AF_UNIX (SOCK_STREAM). TCP can use TCP_INQ, but AF_UNIX doesn't have it and requires an extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO) as an alternative. Let's introduce the generic version of TCP_INQ. If SO_INQ is enabled, recvmsg() will put a cmsg of SCM_INQ that contains the exact value of ioctl(SIOCINQ). The cmsg is also included when msg->msg_get_inq is non-zero to make sockets io_uring-friendly. Note that SOCK_CUSTOM_SOCKOPT is flagged only for SOCK_STREAM to override setsockopt() for SOL_SOCKET. By having the flag in struct unix_sock, instead of struct sock, we can later add SO_INQ support for TCP and reuse tcp_sk(sk)->recvmsg_inq. Note also that supporting custom getsockopt() for SOL_SOCKET will need preparation for other SOCK_CUSTOM_SOCKOPT users (UDP, vsock, MPTCP). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
Kuniyuki Iwashima	8b77338eb2	af_unix: Cache state->msg in unix_stream_read_generic(). In unix_stream_read_generic(), state->msg is fetched multiple times. Let's cache it in a local variable. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702223606.1054680-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
Kuniyuki Iwashima	f4e1fb04c1	af_unix: Use cached value for SOCK_STREAM in unix_inq_len(). Compared to TCP, ioctl(SIOCINQ) for AF_UNIX SOCK_STREAM socket is more expensive, as unix_inq_len() requires iterating through the receive queue and accumulating skb->len. Let's cache the value for SOCK_STREAM to a new field during sendmsg() and recvmsg(). The field is protected by the receive queue lock. Note that ioctl(SIOCINQ) for SOCK_DGRAM returns the length of the first skb in the queue. SOCK_SEQPACKET still requires iterating through the queue because we do not touch functions shared with unix_dgram_ops. But, if really needed, we can support it by switching __skb_try_recv_datagram() to a custom version. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
Kuniyuki Iwashima	d0aac85449	af_unix: Don't use skb_recv_datagram() in unix_stream_read_skb(). unix_stream_read_skb() calls skb_recv_datagram() with MSG_DONTWAIT, which is mostly equivalent to sock_error(sk) + skb_dequeue(). In the following patch, we will add a new field to cache the number of bytes in the receive queue. Then, we want to avoid introducing atomic ops in the fast path, so we will reuse the receive queue lock. As a preparation for the change, let's not use skb_recv_datagram() in unix_stream_read_skb(). Note that sock_error() is now moved out of the u->iolock mutex as the mutex does not synchronise the peer's close() at all. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702223606.1054680-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
Kuniyuki Iwashima	772f01049c	af_unix: Don't check SOCK_DEAD in unix_stream_read_skb(). unix_stream_read_skb() checks SOCK_DEAD only when the dequeued skb is OOB skb. unix_stream_read_skb() is called for a SOCK_STREAM socket in SOCKMAP when data is sent to it. The function is invoked via sk_psock_verdict_data_ready(), which is set to sk->sk_data_ready(). During sendmsg(), we check if the receiver has SOCK_DEAD, so there is no point in checking it again later in ->read_skb(). Also, unix_read_skb() for SOCK_DGRAM does not have the test either. Let's remove the SOCK_DEAD test in unix_stream_read_skb(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702223606.1054680-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
Kuniyuki Iwashima	b429a5ad19	af_unix: Don't hold unix_state_lock() in __unix_dgram_recvmsg(). When __skb_try_recv_datagram() returns NULL in __unix_dgram_recvmsg(), we hold unix_state_lock() unconditionally. This is because SOCK_SEQPACKET sk needs to return EOF in case its peer has been close()d concurrently. This behaviour totally depends on the timing of the peer's close() and reading sk->sk_shutdown, and taking the lock does not play a role. Let's drop the lock from __unix_dgram_recvmsg() and use READ_ONCE(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 18:05:25 -07:00
David Howells	31ec70afaa	rxrpc: Fix over large frame size warning Under some circumstances, the compiler will emit the following warning for rxrpc_send_response(): net/rxrpc/output.c: In function 'rxrpc_send_response': net/rxrpc/output.c:974:1: warning: the frame size of 1160 bytes is larger than 1024 bytes This occurs because the local variables include a 16-element scatterlist array and a 16-element bio_vec array. It's probably not actually a problem as this function is only called by the rxrpc I/O thread function in a kernel thread and there won't be much on the stack before it. Fix this by overlaying the bio_vec array over the kvec array in the rxrpc_local struct. There is one of these per I/O thread and the kvec array is intended for pointing at bits of a packet to be transmitted, typically a DATA or an ACK packet. As packets for a local endpoint are only transmitted by its specific I/O thread, there can be no race, and so overlaying this bit of memory should be no problem. Fixes: `5800b1cf3f` ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506240423.E942yKJP-lkp@intel.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250707102435.2381045-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 13:03:52 -07:00
Jakub Kicinski	cd7e8841b6	net: ethtool: reduce indent for _rxfh_context ops Now that we don't have the compat code we can reduce the indent a little. No functional changes. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250707184115.2285277-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 11:56:40 -07:00
Jakub Kicinski	4e655028c2	net: ethtool: remove the compat code for _rxfh_context ops All drivers are now converted to dedicated _rxfh_context ops. Remove the use of >set_rxfh() to manage additional contexts. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250707184115.2285277-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 11:56:40 -07:00
Xin Guo	19c066f940	tcp: update the outdated ref draft-ietf-tcpm-rack As RACK-TLP was published as a standards-track RFC8985, so the outdated ref draft-ietf-tcpm-rack need to be updated. Signed-off-by: Xin Guo <guoxin0309@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250705163647.301231-1-guoxin0309@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 09:01:52 -07:00
Fengyuan Gong	a41851bea7	net: account for encap headers in qdisc pkt len Refine qdisc_pkt_len_init to include headers up through the inner transport header when computing header size for encapsulations. Also refine net/sched/sch_cake.c borrowed from qdisc_pkt_len_init(). Signed-off-by: Fengyuan Gong <gfengyuan@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250702160741.1204919-1-gfengyuan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:55:33 -07:00
Faisal Bukhari	effdbb29fd	netlink: spelling: fix appened -> appended in a comment Fix spelling mistake in net/netlink/af_netlink.c appened -> appended Signed-off-by: Faisal Bukhari <faisalbukhari523@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250705030841.353424-1-faisalbukhari523@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:42:39 -07:00
Michal Luczaj	1e7d9df379	vsock: Fix IOCTL_VM_SOCKETS_GET_LOCAL_CID to check also `transport_local` Support returning VMADDR_CID_LOCAL in case no other vsock transport is available. Fixes: `0e12190578` ("vsock: add local transport support in the vsock core") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-3-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:39:49 -07:00
Michal Luczaj	687aa0c558	vsock: Fix transport_* TOCTOU Transport assignment may race with module unload. Protect new_transport from becoming a stale pointer. This also takes care of an insecure call in vsock_use_local_transport(); add a lockdep assert. BUG: unable to handle page fault for address: fffffbfff8056000 Oops: Oops: 0000 [#1] SMP KASAN RIP: 0010:vsock_assign_transport+0x366/0x600 Call Trace: vsock_connect+0x59c/0xc40 __sys_connect+0xe8/0x100 __x64_sys_connect+0x6e/0xc0 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-2-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:39:49 -07:00
Michal Luczaj	209fd72083	vsock: Fix transport_{g2h,h2g} TOCTOU vsock_find_cid() and vsock_dev_do_ioctl() may race with module unload. transport_{g2h,h2g} may become NULL after the NULL check. Introduce vsock_transport_local_cid() to protect from a potential null-ptr-deref. KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] RIP: 0010:vsock_find_cid+0x47/0x90 Call Trace: __vsock_bind+0x4b2/0x720 vsock_bind+0x90/0xe0 __sys_bind+0x14d/0x1e0 __x64_sys_bind+0x6e/0xc0 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] RIP: 0010:vsock_dev_do_ioctl.isra.0+0x58/0xf0 Call Trace: __x64_sys_ioctl+0x12d/0x190 do_syscall_64+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250703-vsock-transports-toctou-v4-1-98f0eb530747@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:39:49 -07:00
Michal Luczaj	ab34e14258	net: skbuff: Drop unused @skb Since its introduction in commit `6fa01ccd88` ("skbuff: Add pskb_extract() helper function"), pskb_carve_frag_list() never used the argument @skb. Drop it and adapt the only caller. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:37:22 -07:00
Michal Luczaj	ad0ac6cd9c	net: skbuff: Drop unused @skb Since its introduction in commit `ce098da149` ("skbuff: Introduce slab_build_skb()"), __slab_build_skb() never used the @skb argument. Remove it and adapt both callers. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:37:22 -07:00
Michal Luczaj	25489a4f55	net: splice: Drop unused @gfp Since its introduction in commit `2e910b9532` ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES"), skb_splice_from_iter() never used the @gfp argument. Remove it and adapt callers. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250702-splice-drop-unused-v3-2-55f68b60d2b7@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:37:15 -07:00
Michal Luczaj	1024f12071	net: splice: Drop unused @pipe Since commit `41c73a0d44` ("net: speedup skb_splice_bits()"), __splice_segment() and spd_fill_page() do not use the @pipe argument. Drop it. While adapting the callers, move one line to enforce reverse xmas tree order. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250702-splice-drop-unused-v3-1-55f68b60d2b7@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 08:37:14 -07:00
Jiayuan Chen	d3a5f2871a	tcp: Correct signedness in skb remaining space calculation Syzkaller reported a bug [1] where sk->sk_forward_alloc can overflow. When we send data, if an skb exists at the tail of the write queue, the kernel will attempt to append the new data to that skb. However, the code that checks for available space in the skb is flawed: ''' copy = size_goal - skb->len ''' The types of the variables involved are: ''' copy: ssize_t (s64 on 64-bit systems) size_goal: int skb->len: unsigned int ''' Due to C's type promotion rules, the signed size_goal is converted to an unsigned int to match skb->len before the subtraction. The result is an unsigned int. When this unsigned int result is then assigned to the s64 copy variable, it is zero-extended, preserving its non-negative value. Consequently, copy is always >= 0. Assume we are sending 2GB of data and size_goal has been adjusted to a value smaller than skb->len. The subtraction will result in copy holding a very large positive integer. In the subsequent logic, this large value is used to update sk->sk_forward_alloc, which can easily cause it to overflow. The syzkaller reproducer uses TCP_REPAIR to reliably create this condition. However, this can also occur in real-world scenarios. The tcp_bound_to_half_wnd() function can also reduce size_goal to a small value. This would cause the subsequent tcp_wmem_schedule() to set sk->sk_forward_alloc to a value close to INT_MAX. Further memory allocation requests would then cause sk_forward_alloc to wrap around and become negative. [1]: https://syzkaller.appspot.com/bug?extid=de6565462ab540f50e47 Reported-by: syzbot+de6565462ab540f50e47@syzkaller.appspotmail.com Fixes: `270a1c3de4` ("tcp: Support MSG_SPLICE_PAGES") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20250707054112.101081-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-08 07:56:26 -07:00
Hannes Reinecke	e22da46850	net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING' Add a new netlink parameter 'HANDSHAKE_A_ACCEPT_KEYRING' to provide the serial number of the keyring to use. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250701144657.104401-1-hare@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 15:31:44 +02:00
Wang Liang	5d288658ee	net: replace ADDRLABEL with dynamic debug ADDRLABEL only works when it was set in compilation phase. Replace it with net_dbg_ratelimited(). Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702104417.1526138-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 15:04:05 +02:00
Sabrina Dubroca	2a198bbec6	Revert "xfrm: destroy xfrm_state synchronously on net exit path" This reverts commit `f75a2804da`. With all states (whether user or kern) removed from the hashtables during deletion, there's no need for synchronous destruction of states. xfrm6_tunnel states still need to have been destroyed (which will be the case when its last user is deleted (not destroyed)) so that xfrm6_tunnel_free_spi removes it from the per-netns hashtable before the netns is destroyed. This has the benefit of skipping one synchronize_rcu per state (in __xfrm_state_destroy(sync=true)) when we exit a netns. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-08 13:28:29 +02:00
Sabrina Dubroca	b441cf3f8c	xfrm: delete x->tunnel as we delete x The ipcomp fallback tunnels currently get deleted (from the various lists and hashtables) as the last user state that needed that fallback is destroyed (not deleted). If a reference to that user state still exists, the fallback state will remain on the hashtables/lists, triggering the WARN in xfrm_state_fini. Because of those remaining references, the fix in commit `f75a2804da` ("xfrm: destroy xfrm_state synchronously on net exit path") is not complete. We recently fixed one such situation in TCP due to defered freeing of skbs (commit `9b6412e697` ("tcp: drop secpath at the same time as we currently drop dst")). This can also happen due to IP reassembly: skbs with a secpath remain on the reassembly queue until netns destruction. If we can't guarantee that the queues are flushed by the time xfrm_state_fini runs, there may still be references to a (user) xfrm_state, preventing the timely deletion of the corresponding fallback state. Instead of chasing each instance of skbs holding a secpath one by one, this patch fixes the issue directly within xfrm, by deleting the fallback state as soon as the last user state depending on it has been deleted. Destruction will still happen when the final reference is dropped. A separate lockdep class for the fallback state is required since we're going to lock x->tunnel while x is locked. Fixes: `9d4139c769` ("netns xfrm: per-netns xfrm_state_all list") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-08 13:28:27 +02:00
Eric Dumazet	84a7d6797e	net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit() tc_action_net_exit() got an rtnl exclusion in commit `a159d3c4b8` ("net_sched: acquire RTNL in tc_action_net_exit()") Since then, commit `16af606739` ("net: sched: implement reference counted action release") made this RTNL exclusion obsolete for most cases. Only tcf_action_offload_del() might still require it. Move the rtnl locking into tcf_idrinfo_destroy() when an offload action is found. Most netns do not have actions, yet deleting them is adding a lot of pressure on RTNL, which is for many the most contended mutex in the kernel. We are moving to a per-netns 'rtnl', so tc_action_net_exit() will not be able to grab 'rtnl' a single time for a batch of netns. Before the patch: perf probe -a rtnl_lock perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.305 MB perf.data (25 samples) ] After the patch: perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ] Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@nvidia.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/20250702071230.1892674-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 13:00:24 +02:00
Jeremy Kerr	48e1736e5d	net: mctp: test: Add tests for gateway routes Add a few kunit tests for the gateway routing. Because we have multiple route types now (direct and gateway), rename mctp_test_create_route to mctp_test_create_route_direct, and add a _gateway variant too. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-14-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Jeremy Kerr	ad39c12fce	net: mctp: add gateway routing support This change allows for gateway routing, where a route table entry may reference a routable endpoint (by network and EID), instead of routing directly to a netdevice. We add support for a RTM_GATEWAY attribute for netlink route updates, with an attribute format of: struct mctp_fq_addr { unsigned int net; mctp_eid_t eid; } - we need the net here to uniquely identify the target EID, as we no longer have the device reference directly (which would provide the net id in the case of direct routes). This makes route lookups recursive, as a route lookup that returns a gateway route must be resolved into a direct route (ie, to a device) eventually. We provide a limit to the route lookups, to prevent infinite loop routing. The route lookup populates a new 'nexthop' field in the dst structure, which now specifies the key for the neighbour table lookup on device output, rather than using the packet destination address directly. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-13-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Jeremy Kerr	28ddbb2abe	net: mctp: allow NL parsing directly into a struct mctp_route The netlink route parsing functions end up setting a bunch of output variables from the rt attributes. This will get messy when the routes become more complex. So, split the rt parsing into two types: a lookup (returning route target data suitable for a route lookup, like when deleting a route) and a populate (setting fields of a struct mctp_route). In doing this, we need to separate the route allocation from mctp_route_add, so add some comments on the lifetime semantics for the latter. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-12-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Jeremy Kerr	4a1de053d7	net: mctp: remove routes by netid, not by device In upcoming changes, a route may not have a device associated. Since the route is matched on the (network, eid) tuple, pass the netid itself into mctp_route_remove. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-11-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Jeremy Kerr	48e6aa60bf	net: mctp: pass net into route creation We may not have a mdev pointer, from which we currently extract the net. Instead, pass the net directly. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-10-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Jeremy Kerr	9b4a8c38f4	net: mctp: test: Add initial socket tests Recent changes have modified the extaddr path a little, so add a couple of kunit tests to af-mctp.c. These check that we're correctly passing lladdr data between sendmsg/recvmsg and the routing layer. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-9-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:24 +02:00
Jeremy Kerr	19396179a0	net: mctp: test: add sock test infrastructure Add a new test object, for use with the af_mctp socket code. This is intially empty, but we'll start populating actual tests in an upcoming change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-8-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	80bcf05e54	net: mctp: test: move functions into utils.[ch] A future change will add another mctp test .c file, so move some of the common test setup from route.c into the utils object. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-7-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	46ee16462f	net: mctp: test: Add extaddr routing output test Test that the routing code preserves the haddr data in a skb through an input route operation. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-6-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	96b341a8e7	net: mctp: test: Add an addressed device constructor Upcoming tests will check semantics of hardware addressing, which require a dev with ->addr_len != 0. Add a constructor to create a MCTP interface using a physically-addressed bus type. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-5-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	3007f90ec0	net: mctp: separate cb from direct-addressing routing Now that we have the dst->haddr populated by sendmsg (when extended addressing is in use), we no longer need to stash the link-layer address in the skb->cb. Instead, only use skb->cb for incoming lladdr data. While we're at it: remove cb->src, as was never used. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-4-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	269936db5e	net: mctp: separate routing database from routing operations This change adds a struct mctp_dst, representing the result of a routing lookup. This decouples the struct mctp_route from the actual implementation of a routing operation. This will allow for future routing changes which may require more involved lookup logic, such as gateway routing - which may require multiple traversals of the routing table. Since we only use the struct mctp_route at lookup time, we no longer hold routes over a routing operation, as we only need it to populate the dst. However, we do hold the dev while the dst is active. This requires some changes to the route test infrastructure, as we no longer have a mock route to handle the route output operation, and transient dsts are created by the routing code, so we can't override them as easily. Instead, we use kunit->priv to stash a packet queue, and a custom dst_output function queues into that packet queue, which we can use for later expectations. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-3-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	fc2b87d036	net: mctp: test: make cloned_frag buffers more appropriately-sized In our input_cloned_frag test, we currently allocate our test buffers arbitrarily-sized at 100 bytes. We only expect to receive a max of 15 bytes from the socket, so reduce to a more appropriate size. There are some upcoming changes to the routing code which hit a frame-size limit on s390, so reduce the usage before that lands. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-2-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Jeremy Kerr	e0f3c79cc0	net: mctp: don't use source cb data when forwarding, ensure pkt_type is set In the output path, only check the skb->cb data when we know it's from a local socket; input packets will have source address information there instead. In order to detect when we're forwarding, set skb->pkt_type on input/output. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-1-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 12:39:23 +02:00
Hari Chandrakanthan	cc2b722132	wifi: mac80211: fix rx link assignment for non-MLO stations Currently, ieee80211_rx_data_set_sta() does not correctly handle the case where the interface supports multiple links (MLO), but the station does not (non-MLO). This can lead to incorrect link assignment or unexpected warnings when accessing link information. Hence, add a fix to check if the station lacks valid link support and use its default link ID for rx->link assignment. If the station unexpectedly has valid links, fall back to the default link. This ensures correct link association and prevents potential issues in mixed MLO/non-MLO environments. Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com> Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250630084119.3583593-1-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-08 11:11:06 +02:00
Greg Kroah-Hartman	f440a12d26	wifi: cfg80211: move away from using a fake platform device Downloading regulatory "firmware" needs a device to hang off of, and so a platform device seemed like the simplest way to do this. Now that we have a faux device interface, use that instead as this "regulatory device" is not anything resembling a platform device at all. Cc: Johannes Berg <johannes@sipsolutions.net> Cc: <linux-wireless@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2025070116-growing-skeptic-494c@gregkh Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-08 10:22:51 +02:00
Jakub Kicinski	80852774ba	bluetooth pull request for net: - hci_sync: Fix not disabling advertising instance - hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state - hci_sync: Fix attempting to send HCI_Disconnect to BIS handle - hci_event: Fix not marking Broadcast Sink BIS as connected -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmhmpiMZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKTZOD/9ggH1wJ52hD6NeRow2dZHt BAppV0qNg7p9PkmYPDTOKNkdASmp5VfdLyCXH0vBUjC7+iutrkA/oq5UmIzAM7Uz 1HSELyvsLBL3Gj2AxWyElBtuwo3F7cA70veesZKDJJ92VHlCMMXPJugMCrcy0iJn sVYENIJKtMnLJMTnnpmBCpRQszQfHGYnzQerbeUlCPoFgn7vsRy+onZ9XiIj6R2K HGZVk6MhUnNWrXthwcgwAZ0SRq+5nhxhB4jOSSZ6nNnT9Wt8oloDM7KtYSy976QA Ub2LzvR2E8/XtnTeWus5oLL2kAvz1vClS6gpGrvziElnIryq8aYhwKOO7yhRFmDK kllpOIPaiDrl1H1nVR4z7t/IK5z/0A0wkTcXXm4WNZkLOj8YmY+BdjPKO39PRxK4 PpcHzsmI6PQjnLeGRi3a4PYfMgVYRvdO/OKSO9/xphqgE+dQdonH7/Dsvw+QkEw8 TgFvfKj8bMNjd11Mynd46P45tnQ3HIrTSYGTbdvoU/n8ZFEjATuHtIxRscw0bRmj iy4u55JV+U3BAdm2CsRouiiTHnvXt9S2HMyMnemwwxsGpim/JG+IsccYUZd8mvLf d+N2LWsq71CpVhRSJ50WJDrCEJdKT71KwHbSYLJl8vGhObV3VpTBZ57eVF98SNjm Zqg8JqLJFvCLOhl4RWj/xA== =o7T9 -----END PGP SIGNATURE----- Merge tag 'for-net-2025-07-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: Fix not disabling advertising instance - hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state - hci_sync: Fix attempting to send HCI_Disconnect to BIS handle - hci_event: Fix not marking Broadcast Sink BIS as connected * tag 'for-net-2025-07-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_event: Fix not marking Broadcast Sink BIS as connected Bluetooth: hci_sync: Fix attempting to send HCI_Disconnect to BIS handle Bluetooth: hci_core: Remove check of BDADDR_ANY in hci_conn_hash_lookup_big_state Bluetooth: hci_sync: Fix not disabling advertising instance ==================== Link: https://patch.msgid.link/20250703160409.1791514-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 19:01:29 -07:00
Breno Leitao	eb4e773f13	netpoll: move Ethernet setup to push_eth() helper Refactor Ethernet header population into dedicated function, completing the layered abstraction with: - push_eth() for link layer - push_udp() for transport - push_ipv4()/push_ipv6() for network Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702-netpoll_untagle_ip-v2-6-13cf3db24e2b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:52:56 -07:00
Breno Leitao	cacfb1f4e9	netpoll: factor out UDP header setup into push_udp() helper Move UDP header construction from netpoll_send_udp() into a new static helper function push_udp(). This completes the protocol layer refactoring by: 1. Creating a dedicated helper for UDP header assembly 2. Removing UDP-specific logic from the main send function 3. Establishing a consistent pattern with existing IPv4/IPv6 helpers: - push_udp() - push_ipv4() - push_ipv6() The change improves code organization and maintains the encapsulation pattern established in previous refactorings. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702-netpoll_untagle_ip-v2-5-13cf3db24e2b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:52:56 -07:00
Breno Leitao	8c27639dbe	netpoll: factor out IPv4 header setup into push_ipv4() helper Move IPv4 header construction from netpoll_send_udp() into a new static helper function push_ipv4(). This completes the refactoring started with IPv6 header handling, creating symmetric helper functions for both IP versions. Changes include: 1. Extracting IPv4 header setup logic into push_ipv4() 2. Replacing inline IPv4 code with helper call 3. Moving eth assignment after helper calls for consistency The refactoring reduces code duplication and improves maintainability by isolating IP version-specific logic. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702-netpoll_untagle_ip-v2-4-13cf3db24e2b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:52:56 -07:00
Breno Leitao	839388f39a	netpoll: factor out IPv6 header setup into push_ipv6() helper Move IPv6 header construction from netpoll_send_udp() into a new static helper function, push_ipv6(). This refactoring reduces code duplication and improves readability in netpoll_send_udp(). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702-netpoll_untagle_ip-v2-3-13cf3db24e2b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:52:55 -07:00
Breno Leitao	01dae7a61c	netpoll: factor out UDP checksum calculation into helper Extract UDP checksum calculation logic from netpoll_send_udp() into a new static helper function netpoll_udp_checksum(). This reduces code duplication and improves readability for both IPv4 and IPv6 cases. No functional change intended. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702-netpoll_untagle_ip-v2-2-13cf3db24e2b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:52:55 -07:00
Breno Leitao	4b52cdfcce	netpoll: Improve code clarity with explicit struct size calculations Replace pointer-dereference sizeof() operations with explicit struct names for improved readability and maintainability. This change: 1. Replaces `sizeof(udph)` with `sizeof(struct udphdr)` 2. Replaces `sizeof(ip6h)` with `sizeof(struct ipv6hdr)` 3. Replaces `sizeof(*iph)` with `sizeof(struct iphdr)` This will make it easy to move code in the upcoming patches. No functional changes are introduced by this patch. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702-netpoll_untagle_ip-v2-1-13cf3db24e2b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:52:55 -07:00
Eric Dumazet	6058099da5	net: remove RTNL use for /proc/sys/net/core/rps_default_mask Use a dedicated mutex instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702061558.1585870-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:42:12 -07:00
Byungchul Park	b56ce86846	page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Now that __page_pool_alloc_pages_slow() is for allocating netmem, not struct page, rename it to __page_pool_alloc_netmems_slow() to reflect what it does. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250702053256.4594-4-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:40:09 -07:00
Byungchul Park	4ad125ae38	page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Now that __page_pool_release_page_dma() is for releasing netmem, not struct page, rename it to __page_pool_release_netmem_dma() to reflect what it does. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://patch.msgid.link/20250702053256.4594-3-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:40:09 -07:00
Byungchul Park	61a3324753	page_pool: rename page_pool_return_page() to page_pool_return_netmem() Now that page_pool_return_page() is for returning netmem, not struct page, rename it to page_pool_return_netmem() to reflect what it does. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://patch.msgid.link/20250702053256.4594-2-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:40:08 -07:00
Kuniyuki Iwashima	667eeab499	tipc: Fix use-after-free in tipc_conn_close(). syzbot reported a null-ptr-deref in tipc_conn_close() during netns dismantle. [0] tipc_topsrv_stop() iterates tipc_net(net)->topsrv->conn_idr and calls tipc_conn_close() for each tipc_conn. The problem is that tipc_conn_close() is called after releasing the IDR lock. At the same time, there might be tipc_conn_recv_work() running and it could call tipc_conn_close() for the same tipc_conn and release its last ->kref. Once we release the IDR lock in tipc_topsrv_stop(), there is no guarantee that the tipc_conn is alive. Let's hold the ref before releasing the lock and put the ref after tipc_conn_close() in tipc_topsrv_stop(). [0]: BUG: KASAN: use-after-free in tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165 Read of size 8 at addr ffff888099305a08 by task kworker/u4:3/435 CPU: 0 PID: 435 Comm: kworker/u4:3 Not tainted 4.19.204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1fc/0x2ef lib/dump_stack.c:118 print_address_description.cold+0x54/0x219 mm/kasan/report.c:256 kasan_report_error.cold+0x8a/0x1b9 mm/kasan/report.c:354 kasan_report mm/kasan/report.c:412 [inline] __asan_report_load8_noabort+0x88/0x90 mm/kasan/report.c:433 tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165 tipc_topsrv_stop net/tipc/topsrv.c:701 [inline] tipc_topsrv_exit_net+0x27b/0x5c0 net/tipc/topsrv.c:722 ops_exit_list+0xa5/0x150 net/core/net_namespace.c:153 cleanup_net+0x3b4/0x8b0 net/core/net_namespace.c:553 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Allocated by task 23: kmem_cache_alloc_trace+0x12f/0x380 mm/slab.c:3625 kmalloc include/linux/slab.h:515 [inline] kzalloc include/linux/slab.h:709 [inline] tipc_conn_alloc+0x43/0x4f0 net/tipc/topsrv.c:192 tipc_topsrv_accept+0x1b5/0x280 net/tipc/topsrv.c:470 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 23: __cache_free mm/slab.c:3503 [inline] kfree+0xcc/0x210 mm/slab.c:3822 tipc_conn_kref_release net/tipc/topsrv.c:150 [inline] kref_put include/linux/kref.h:70 [inline] conn_put+0x2cd/0x3a0 net/tipc/topsrv.c:155 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 The buggy address belongs to the object at ffff888099305a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 8 bytes inside of 512-byte region [ffff888099305a00, ffff888099305c00) The buggy address belongs to the page: page:ffffea000264c140 count:1 mapcount:0 mapping:ffff88813bff0940 index:0x0 flags: 0xfff00000000100(slab) raw: 00fff00000000100 ffffea00028b6b88 ffffea0002cd2b08 ffff88813bff0940 raw: 0000000000000000 ffff888099305000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888099305900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888099305980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888099305a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888099305a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888099305b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: `c5fa7b3cf3` ("tipc: introduce new TIPC server infrastructure") Reported-by: syzbot+d333febcf8f4bc5f6110@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=27169a847a70550d17be Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20250702014350.692213-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 18:38:24 -07:00
Kuniyuki Iwashima	ae8f160e7e	netlink: Fix wraparounds of sk->sk_rmem_alloc. Netlink has this pattern in some places if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) atomic_add(skb->truesize, &sk->sk_rmem_alloc); , which has the same problem fixed by commit `5a465a0da1` ("udp: Fix multiple wraparounds of sk->sk_rmem_alloc."). For example, if we set INT_MAX to SO_RCVBUFFORCE, the condition is always false as the two operands are of int. Then, a single socket can eat as many skb as possible until OOM happens, and we can see multiple wraparounds of sk->sk_rmem_alloc. Let's fix it by using atomic_add_return() and comparing the two variables as unsigned int. Before: [root@fedora ~]# ss -f netlink Recv-Q Send-Q Local Address:Port Peer Address:Port -1668710080 0 rtnl:nl_wraparound/293 * After: [root@fedora ~]# ss -f netlink Recv-Q Send-Q Local Address:Port Peer Address:Port 2147483072 0 rtnl:nl_wraparound/290 * ^ `--- INT_MAX - 576 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Jason Baron <jbaron@akamai.com> Closes: https://lore.kernel.org/netdev/cover.1750285100.git.jbaron@akamai.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250704054824.1580222-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 16:45:35 -07:00
Ilya Maximets	59f44c9ccc	net: openvswitch: allow providing upcall pid for the 'execute' command When a packet enters OVS datapath and there is no flow to handle it, packet goes to userspace through a MISS upcall. With per-CPU upcall dispatch mechanism, we're using the current CPU id to select the Netlink PID on which to send this packet. This allows us to send packets from the same traffic flow through the same handler. The handler will process the packet, install required flow into the kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a recirculation action that will pass the (likely modified) packet through the flow lookup again. And if the flow is not found, the packet will be sent to userspace again through another MISS upcall. However, the handler thread in userspace is likely running on a different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled in the syscall context of that thread. So, when the time comes to send the packet through another upcall, the per-CPU dispatch will choose a different Netlink PID, and this packet will end up processed by a different handler thread on a different CPU. The process continues as long as there are new recirculations, each time the packet goes to a different handler thread before it is sent out of the OVS datapath to the destination port. In real setups the number of recirculations can go up to 4 or 5, sometimes more. There is always a chance to re-order packets while processing upcalls, because userspace will first install the flow and then re-inject the original packet. So, there is a race window when the flow is already installed and the second packet can match it and be forwarded to the destination before the first packet is re-injected. But the fact that packets are going through multiple upcalls handled by different userspace threads makes the reordering noticeably more likely, because we not only have a race between the kernel and a userspace handler (which is hard to avoid), but also between multiple userspace handlers. For example, let's assume that 10 packets got enqueued through a MISS upcall for handler-1, it will start processing them, will install the flow into the kernel and start re-injecting packets back, from where they will go through another MISS to handler-2. Handler-2 will install the flow into the kernel and start re-injecting the packets, while handler-1 continues to re-inject the last of the 10 packets, they will hit the flow installed by handler-2 and be forwarded without going to the handler-2, while handler-2 still re-injects the first of these 10 packets. Given multiple recirculations and misses, these 10 packets may end up completely mixed up on the output from the datapath. Let's allow userspace to specify on which Netlink PID the packets should be upcalled while processing OVS_PACKET_CMD_EXECUTE. This makes it possible to ensure that all the packets are processed by the same handler thread in the userspace even with them being upcalled multiple times in the process. Packets will remain in order since they will be enqueued to the same socket and re-injected in the same order. This doesn't eliminate re-ordering as stated above, since we still have a race between kernel and the userspace thread, but it allows to eliminate races between multiple userspace threads. Userspace knows the PID of the socket on which the original upcall is received, so there is no need to send it up from the kernel. Solution requires storing the value somewhere for the duration of the packet processing. There are two potential places for this: our skb extension or the per-CPU storage. It's not clear which is better, so just following currently used scheme of storing this kind of things along the skb. We still have a decent amount of space in the cb. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250702155043.2331772-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-07 14:30:39 -07:00
Mathy Vanhoef	737bb912eb	wifi: prevent A-MSDU attacks in mesh networks This patch is a mitigation to prevent the A-MSDU spoofing vulnerability for mesh networks. The initial update to the IEEE 802.11 standard, in response to the FragAttacks, missed this case (CVE-2025-27558). It can be considered a variant of CVE-2020-24588 but for mesh networks. This patch tries to detect if a standard MSDU was turned into an A-MSDU by an adversary. This is done by parsing a received A-MSDU as a standard MSDU, calculating the length of the Mesh Control header, and seeing if the 6 bytes after this header equal the start of an rfc1042 header. If equal, this is a strong indication of an ongoing attack attempt. This defense was tested with mac80211_hwsim against a mesh network that uses an empty Mesh Address Extension field, i.e., when four addresses are used, and when using a 12-byte Mesh Address Extension field, i.e., when six addresses are used. Functionality of normal MSDUs and A-MSDUs was also tested, and confirmed working, when using both an empty and 12-byte Mesh Address Extension field. It was also tested with mac80211_hwsim that A-MSDU attacks in non-mesh networks keep being detected and prevented. Note that the vulnerability being patched, and the defense being implemented, was also discussed in the following paper and in the following IEEE 802.11 presentation: https://papers.mathyvanhoef.com/wisec2025.pdf https://mentor.ieee.org/802.11/dcn/25/11-25-0949-00-000m-a-msdu-mesh-spoof-protection.docx Cc: stable@vger.kernel.org Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://patch.msgid.link/20250616004635.224344-1-Mathy.Vanhoef@kuleuven.be Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-07 10:54:13 +02:00
Moon Hee Lee	58fcb1b428	wifi: mac80211: reject VHT opmode for unsupported channel widths VHT operating mode notifications are not defined for channel widths below 20 MHz. In particular, 5 MHz and 10 MHz are not valid under the VHT specification and must be rejected. Without this check, malformed notifications using these widths may reach ieee80211_chan_width_to_rx_bw(), leading to a WARN_ON due to invalid input. This issue was reported by syzbot. Reject these unsupported widths early in sta_link_apply_parameters() when opmode_notif is used. The accepted set includes 20, 40, 80, 160, and 80+80 MHz, which are valid for VHT. While 320 MHz is not defined for VHT, it is allowed to avoid rejecting HE or EHT clients that may still send a VHT opmode notification. Reported-by: syzbot+ededba317ddeca8b3f08@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ededba317ddeca8b3f08 Fixes: `751e7489c1` ("wifi: mac80211: expose ieee80211_chan_width_to_rx_bw() to drivers") Tested-by: syzbot+ededba317ddeca8b3f08@syzkaller.appspotmail.com Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com> Link: https://patch.msgid.link/20250703193756.46622-2-moonhee.lee.ca@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-07 10:45:21 +02:00
Johannes Berg	e1e6ebf490	wifi: mac80211: fix non-transmitted BSSID profile search When the non-transmitted BSSID profile is found, immediately return from the search to not return the wrong profile_len when the profile is found in a multiple BSSID element that isn't the last one in the frame. Fixes: `5023b14cf4` ("mac80211: support profile split between elements") Reported-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20250630154501.f26cd45a0ecd.I28e0525d06e8a99e555707301bca29265cf20dc8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-07 10:42:48 +02:00
Johannes Berg	8af596e8ae	wifi: mac80211: clear frame buffer to never leak stack In disconnect paths paths, local frame buffers are used to build deauthentication frames to send them over the air and as notifications to userspace. Some internal error paths (that, given no other bugs, cannot happen) don't always initialize the buffers before sending them to userspace, so in the presence of other bugs they can leak stack content. Initialize the buffers to avoid the possibility of this happening. Suggested-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Link: https://patch.msgid.link/20250701072213.13004-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-07 10:42:36 +02:00
Lachlan Hodges	c5fd399a24	wifi: mac80211: correctly identify S1G short beacon mac80211 identifies a short beacon by the presence of the next TBTT field, however the standard actually doesn't explicitly state that the next TBTT can't be in a long beacon or even that it is required in a short beacon - and as a result this validation does not work for all vendor implementations. The standard explicitly states that an S1G long beacon shall contain the S1G beacon compatibility element as the first element in a beacon transmitted at a TBTT that is not a TSBTT (Target Short Beacon Transmission Time) as per IEEE80211-2024 11.1.3.10.1. This is validated by 9.3.4.3 Table 9-76 which states that the S1G beacon compatibility element is only allowed in the full set and is not allowed in the minimum set of elements permitted for use within short beacons. Correctly identify short beacons by the lack of an S1G beacon compatibility element as the first element in an S1G beacon frame. Fixes: `9eaffe5078` ("cfg80211: convert S1G beacon to scan results") Signed-off-by: Simon Wadsworth <simon@morsemicro.com> Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250701075541.162619-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-07-07 10:42:15 +02:00
Eric Biggers	1cf5cdf8d2	libceph: Rename hmac_sha256() to ceph_hmac_sha256() Rename hmac_sha256() to ceph_hmac_sha256(), to avoid a naming conflict with the upcoming hmac_sha256() library function. This code will be able to use the HMAC-SHA256 library, but that's left for a later commit. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630160645.3198-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-04 10:18:52 -07:00
Alexander Mikhalitsyn	c679d17d3f	af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD Now everything is ready to pass PIDFD_STALE to pidfd_prepare(). This will allow opening pidfd for reaped tasks. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-7-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 09:32:35 +02:00
Alexander Mikhalitsyn	2775832f71	af_unix: stash pidfs dentry when needed We need to ensure that pidfs dentry is allocated when we meet any struct pid for the first time. This will allows us to open pidfd even after the task it corresponds to is reaped. Basically, we need to identify all places where we fill skb/scm_cookie with struct pid reference for the first time and call pidfs_register_pid(). Tricky thing here is that we have a few places where this happends depending on what userspace is doing: - [__scm_replace_pid()] explicitly sending an SCM_CREDENTIALS message and specified pid in a numeric format - [unix_maybe_add_creds()] enabled SO_PASSCRED/SO_PASSPIDFD but didn't send SCM_CREDENTIALS explicitly - [scm_send()] force_creds is true. Netlink case, we don't need to touch it. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-6-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 09:32:35 +02:00
Alexander Mikhalitsyn	2b9996417e	af_unix/scm: fix whitespace errors Fix whitespace/formatting errors. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-5-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 09:32:35 +02:00
Alexander Mikhalitsyn	30580dc96a	af_unix: introduce and use scm_replace_pid() helper Existing logic in __scm_send() related to filling an struct scm_cookie with a proper struct pid reference is already pretty tricky. Let's simplify it a bit by introducing a new helper. This helper will be extended in one of the next patches. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-4-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 09:32:35 +02:00
Alexander Mikhalitsyn	ee47976264	af_unix: introduce unix_skb_to_scm helper Instead of open-coding let's consolidate this logic in a separate helper. This will simplify further changes. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-3-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 09:32:35 +02:00
Alexander Mikhalitsyn	9bedee7cdf	af_unix: rework unix_maybe_add_creds() to allow sleep As a preparation for the next patches we need to allow sleeping in unix_maybe_add_creds() and also return err. Currently, we can't do that as unix_maybe_add_creds() is being called under unix_state_lock(). There is no need for this, really. So let's move call sites of this helper a bit and do necessary function signature changes. Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: David Rheinsberg <david@readahead.eu> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/20250703222314.309967-2-aleksandr.mikhalitsyn@canonical.com Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-04 09:32:31 +02:00
Jianbo Liu	95cfe23285	xfrm: Skip redundant statistics update for crypto offload In the crypto offload path, every packet is still processed by the software stack. The state's statistics required for the expiration check are being updated in software. However, the code also calls xfrm_dev_state_update_stats(), which triggers a query to the hardware device to fetch statistics. This hardware query is redundant and introduces unnecessary performance overhead. Skip this call when it's crypto offload (not packet offload) to avoid the unnecessary hardware access, thereby improving performance. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-04 09:30:22 +02:00
Eyal Birger	a90b2a1aaa	xfrm: interface: fix use-after-free after changing collect_md xfrm interface collect_md property on xfrm interfaces can only be set on device creation, thus xfrmi_changelink() should fail when called on such interfaces. The check to enforce this was done only in the case where the xi was returned from xfrmi_locate() which doesn't look for the collect_md interface, and thus the validation was never reached. Calling changelink would thus errornously place the special interface xi in the xfrmi_net->xfrmi hash, but since it also exists in the xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when the net namespace was taken down [1]. Change the check to use the xi from netdev_priv which is available earlier in the function to prevent changes in xfrm collect_md interfaces. [1] resulting oops: [ 8.516540] kernel BUG at net/core/dev.c:12029! [ 8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary) [ 8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 8.516569] Workqueue: netns cleanup_net [ 8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0 [ 8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24 [ 8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206 [ 8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60 [ 8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122 [ 8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100 [ 8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00 [ 8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00 [ 8.516615] FS: 0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000 [ 8.516619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0 [ 8.516625] PKRU: 55555554 [ 8.516627] Call Trace: [ 8.516632] <TASK> [ 8.516635] ? rtnl_is_locked+0x15/0x20 [ 8.516641] ? unregister_netdevice_queue+0x29/0xf0 [ 8.516650] ops_undo_list+0x1f2/0x220 [ 8.516659] cleanup_net+0x1ad/0x2e0 [ 8.516664] process_one_work+0x160/0x380 [ 8.516673] worker_thread+0x2aa/0x3c0 [ 8.516679] ? __pfx_worker_thread+0x10/0x10 [ 8.516686] kthread+0xfb/0x200 [ 8.516690] ? __pfx_kthread+0x10/0x10 [ 8.516693] ? __pfx_kthread+0x10/0x10 [ 8.516697] ret_from_fork+0x82/0xf0 [ 8.516705] ? __pfx_kthread+0x10/0x10 [ 8.516709] ret_from_fork_asm+0x1a/0x30 [ 8.516718] </TASK> Fixes: `abc340b38b` ("xfrm: interface: support collect metadata mode") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-04 09:25:25 +02:00
Paolo Abeni	6b9fd8857b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-04 08:03:18 +02:00
Linus Torvalds	17bbde2e17	Including fixes from Bluetooth. Current release - new code bugs: - eth: txgbe: fix the issue of TX failure - eth: ngbe: specify IRQ vector when the number of VFs is 7 Previous releases - regressions: - sched: always pass notifications when child class becomes empty - ipv4: fix stat increase when udp early demux drops the packet - bluetooth: prevent unintended pause by checking if advertising is active - virtio: fix error reporting in virtqueue_resize - eth: virtio-net: - ensure the received length does not exceed allocated size - fix the xsk frame's length check - eth: lan78xx: fix WARN in __netif_napi_del_locked on disconnect Previous releases - always broken: - bluetooth: mesh: check instances prior disabling advertising - eth: idpf: convert control queue mutex to a spinlock - eth: dpaa2: fix xdp_rxq_info leak - eth: amd-xgbe: align CL37 AN sequence as per databook Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmhmfzQSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk/GMP/ixlapKjTP/ggGIFO0nEDTm1tAFnhQl3 bBuwBDoGPjalb46WBO24SFSFYqvZwV6ZIYxCxCeBfmkPyEun0FBX6xjqUIZqohTZ u5ZSmKFkODMoxQWAG0hXBGvfeKg/GBMWJT761o5IB2XvknRlqHq6uufUBcalvlJK t58ykSYp2wjfowXSRQ4jEZnr4HZzVuvarhbCB9hJWv206fdk4LiC07teHB1VhW4w LYmBQChp8SXDFCCYZajum0cNCzx78q90lGzz+MEErVXdXXnRVeqRAUY+k4Vd/Fz+ 0OY1vZJ7xgFpy2ns3Z6TH8D41P9whBI8jUYXZ5nA45J8N5wdEQo8oVHlRe9a6Y/E 0oC+DPahhSQAq8BKGFtYSyyURGJvd4+TpQP/LV4e83myReW8i0ZKtyXVgH0Cibwb 529l6wIXBAcLK03tyYwmoCI2VjJbRoMV3nMCeiACCtDExK1YCa3dhjQ82fa8voLc MIn7zXAGf12IKca39ZapRrdaooaqvSG4htxTn94vEqScNu0wi1cymvG47h9bDrES cPyS4/MIUH0sduSDVL5PpFYfIDhqS3mpc0e8Nc3pOy7VLQ9kvtBX37OaO/tX5aeh SWU+8q8y1Cnq0+mcUUHpENFMOgZEC5UO6rdeaJB3Nu0vlHlDEZoEkUXSkHEfsf2F aodwE/oPyQCg =O7OS -----END PGP SIGNATURE----- Merge tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth. Current release - new code bugs: - eth: - txgbe: fix the issue of TX failure - ngbe: specify IRQ vector when the number of VFs is 7 Previous releases - regressions: - sched: always pass notifications when child class becomes empty - ipv4: fix stat increase when udp early demux drops the packet - bluetooth: prevent unintended pause by checking if advertising is active - virtio: fix error reporting in virtqueue_resize - eth: - virtio-net: - ensure the received length does not exceed allocated size - fix the xsk frame's length check - lan78xx: fix WARN in __netif_napi_del_locked on disconnect Previous releases - always broken: - bluetooth: mesh: check instances prior disabling advertising - eth: - idpf: convert control queue mutex to a spinlock - dpaa2: fix xdp_rxq_info leak - amd-xgbe: align CL37 AN sequence as per databook" * tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) vsock/vmci: Clear the vmci transport packet properly when initializing it dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example net: ngbe: specify IRQ vector when the number of VFs is 7 net: wangxun: revert the adjustment of the IRQ vector sequence net: txgbe: request MISC IRQ in ndo_open virtio_net: Enforce minimum TX ring size for reliability virtio_net: Cleanup '2+MAX_SKB_FRAGS' virtio_ring: Fix error reporting in virtqueue_resize virtio-net: xsk: rx: fix the frame's length check virtio-net: use the check_mergeable_len helper virtio-net: remove redundant truesize check with PAGE_SIZE virtio-net: ensure the received length does not exceed allocated size net: ipv4: fix stat increase when udp early demux drops the packet net: libwx: fix the incorrect display of the queue number amd-xgbe: do not double read link status net/sched: Always pass notifications when child class becomes empty nui: Fix dma_mapping_error() check rose: fix dangling neighbour pointers in rose_rt_device_down() enic: fix incorrect MTU comparison in enic_change_mtu() amd-xgbe: align CL37 AN sequence as per databook ...	2025-07-03 09:18:55 -07:00
Luiz Augusto von Dentz	c7349772c2	Bluetooth: hci_event: Fix not marking Broadcast Sink BIS as connected Upon receiving HCI_EVT_LE_BIG_SYNC_ESTABLISHED with status 0x00 (success) the corresponding BIS hci_conn state shall be set to BT_CONNECTED otherwise they will be left with BT_OPEN which is invalid at that point, also create the debugfs and sysfs entries following the same logic as the likes of Broadcast Source BIS and CIS connections. Fixes: `f777d88278` ("Bluetooth: ISO: Notify user space about failed bis connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-07-03 11:37:43 -04:00
Luiz Augusto von Dentz	314d30b150	Bluetooth: hci_sync: Fix attempting to send HCI_Disconnect to BIS handle BIS/PA connections do have their own cleanup proceedure which are performed by hci_conn_cleanup/bis_cleanup. Fixes: `23205562ff` ("Bluetooth: separate CIS_LINK and BIS_LINK link types") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-07-03 11:37:24 -04:00
Luiz Augusto von Dentz	ef9675b0ef	Bluetooth: hci_sync: Fix not disabling advertising instance As the code comments on hci_setup_ext_adv_instance_sync suggests the advertising instance needs to be disabled in order to update its parameters, but it was wrongly checking that !adv->pending. Fixes: `cba6b75871` ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 2") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-07-03 11:35:52 -04:00
Yue Haibing	5f712c3877	ipv6: Cleanup fib6_drop_pcpu_from() Since commit `0e23387491` ("ipv6: fix races in ip6_dst_destroy()"), 'table' is unused in __fib6_drop_pcpu_from(), no need pass it from fib6_drop_pcpu_from(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250701041235.1333687-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 16:00:50 +02:00
Pablo Neira Ayuso	fd72f265bb	netfilter: conntrack: remove DCCP protocol support The DCCP socket family has now been removed from this tree, see: `8bb3212be4` ("Merge branch 'net-retire-dccp-socket'") Remove connection tracking and NAT support for this protocol, this should not pose a problem because no DCCP traffic is expected to be seen on the wire. As for the code for matching on dccp header for iptables and nftables, mark it as deprecated and keep it in place. Ruleset restoration is an atomic operation. Without dccp matching support, an astray match on dccp could break this operation leaving your computer with no policy in place, so let's follow a more conservative approach for matches. Add CONFIG_NFT_EXTHDR_DCCP which is set to 'n' by default to deprecate dccp extension support. Similarly, label CONFIG_NETFILTER_XT_MATCH_DCCP as deprecated too and also set it to 'n' by default. Code to match on DCCP protocol from ebtables also remains in place, this is just a few checks on IPPROTO_DCCP from _check() path which is exercised when ruleset is loaded. There is another use of IPPROTO_DCCP from the _check() path in the iptables multiport match. Another check for IPPROTO_DCCP from the packet in the reject target is also removed. So let's schedule removal of the dccp matching for a second stage, this should not interfer with the dccp retirement since this is only matching on the dccp header. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-07-03 13:51:39 +02:00
HarshaVardhana S A	223e2288f4	vsock/vmci: Clear the vmci transport packet properly when initializing it In vmci_transport_packet_init memset the vmci_transport_packet before populating the fields to avoid any uninitialised data being left in the structure. Cc: Bryan Tan <bryan-bt.tan@broadcom.com> Cc: Vishnu Dasa <vishnu.dasa@broadcom.com> Cc: Broadcom internal kernel review list Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: stable <stable@kernel.org> Signed-off-by: HarshaVardhana S A <harshavardhana.sa@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250701122254.2397440-1-gregkh@linuxfoundation.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 12:52:52 +02:00
Al Viro	350db61fbe	rpc_create_client_dir(): return 0 or -E... Callers couldn't care less which dentry did we get - anything valid is treated as success. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	3ee735ef5a	rpc_create_client_dir(): don't bother with rpc_populate() not for a single file... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	db83fa912e	rpc_new_dir(): the last argument is always NULL All callers except the one in rpc_populate() pass explicit NULL there; rpc_populate() passes its last argument ('private') instead, but in the only call of rpc_populate() that creates any subdirectories (when creating fixed subdirectories of root) private itself is NULL. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	805060a69c	rpc_pipe: expand the calls of rpc_mkdir_populate() ... and get rid of convoluted callbacks. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	065e88fa33	rpc_gssd_dummy_populate(): don't bother with rpc_populate() Just have it create gssd (in root), clntXX in gssd, then info and gssd in clntXX - all with explicit rpc_new_dir()/rpc_new_file()/rpc_mkpipe_dentry(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	a117bf4caa	rpc_mkpipe_dentry(): switch to simple_start_creating() ... and make sure we set the fs-private part of inode up before attaching it to dentry. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	5c1da75895	rpc_pipe: saner primitive for creating regular files rpc_new_file(); similar to rpc_new_dir(), except that here we pass file_operations as well. Callers don't care about dentry, just return 0 or -E... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	fc1abdca51	rpc_pipe: saner primitive for creating subdirectories All users of __rpc_mkdir() have the same form - start_creating(), followed, in case of success, by __rpc_mkdir() and unlocking parent. Combine that into a single helper, expanding __rpc_mkdir() into it, along with the call of __rpc_create_common() in it. Don't mess with d_drop() + d_add() - just d_instantiate() and be done with that. The reason __rpc_create_common() goes for that dance is that dentry it gets might or might not be hashed; here we know it's hashed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	41a6b9e52b	rpc_pipe: don't overdo directory locking Don't try to hold directories locked more than VFS requires; lock just before getting a child to be made positive (using simple_start_creating()) and unlock as soon as the child is created. There's no benefit in keeping the parent locked while populating the child - it won't stop dcache lookups anyway. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	19a6314a99	rpc_mkpipe_dentry(): saner calling conventions Instead of returning a dentry or ERR_PTR(-E...), return 0 and store dentry into pipe->dentry on success and return -E... on failure. Callers are happier that way... NOTE: dummy rpc_pipe is getting ->dentry set; we never access that, since we 1) never call rpc_unlink() for it (dentry is taken out by ->kill_sb()) 2) never call rpc_queue_upcall() for it (writing to that sucker fails; no downcalls are ever submitted, so no replies are going to arrive) IOW, having that ->dentry set (and left dangling) is harmless, if ugly; cleaner solution will take more massage. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	bccea4ed06	rpc_unlink(): saner calling conventions 1) pass it pipe instead of pipe->dentry 2) zero pipe->dentry afterwards 3) it always returns 0; why bother? Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	8be22c4964	rpc_populate(): lift cleanup into callers rpc_populate() is called either from fill_super (where we don't need to remove any files on failure - rpc_kill_sb() will take them all out anyway) or from rpc_mkdir_populate(), where we need to remove the directory we'd been trying to populate along with whatever we'd put into it before we failed. Simpler to combine that into simple_recursive_removal() there. Note that rpc_pipe is overlocking directories quite a bit - locked parent is no obstacle to finding a child in dcache, so keeping it locked won't prevent userland observing a partially built subtree. All we need is to follow minimal VFS requirements; it's not as if clients used directory locking for exclusion - tree changes are serialized, but that's done on ->pipefs_sb_lock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	3829b30e77	rpc_unlink(): use simple_recursive_removal() note that the callback of simple_recursive_removal() is called with the parent locked; the victim isn't locked by the caller. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:55 -04:00
Al Viro	8e7490c40e	rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead no need to give an exact list of objects to be remove when it's simply every child of the victim directory. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:54 -04:00
Al Viro	4b2f61af8a	rpc_pipe: clean failure exits in fill_super ->kill_sb() will be called immediately after a failure return anyway, so we don't need to bother with notifier chain and rpc_gssd_dummy_depopulate(). What's more, rpc_gssd_dummy_populate() failure exits do not need to bother with __rpc_depopulate() - anything added to the tree will be taken out by ->kill_sb(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-07-02 22:44:54 -04:00
Chenguang Zhao	8b98f34ce1	net: ipv6: Fix spelling mistake change 'Maximium' to 'Maximum' Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250702055820.112190-1-zhaochenguang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 15:42:29 -07:00
Carolina Jubran	566e8f108f	devlink: Extend devlink rate API with traffic classes bandwidth management Introduce support for specifying relative bandwidth shares between traffic classes (TC) in the devlink-rate API. This new option allows users to allocate bandwidth across multiple traffic classes in a single command. This feature provides a more granular control over traffic management, especially for scenarios requiring Enhanced Transmission Selection. Users can now define a relative bandwidth share for each traffic class. For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5 (RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the total bandwidth. The actual percentage each class receives depends on the ratio of its share value to the sum of all shares. Example: DEV=pci/0000:08:00.0 $ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \ tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 $ devlink port function rate set $DEV/vfs_group \ tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0 Example usage with ynl: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"rate-tc-index": 0, "rate-tc-bw": 50}, {"rate-tc-index": 1, "rate-tc-bw": 50}, {"rate-tc-index": 2, "rate-tc-bw": 0}, {"rate-tc-index": 3, "rate-tc-bw": 0}, {"rate-tc-index": 4, "rate-tc-bw": 0}, {"rate-tc-index": 5, "rate-tc-bw": 0}, {"rate-tc-index": 6, "rate-tc-bw": 0}, {"rate-tc-index": 7, "rate-tc-bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0}, {'rate-tc-bw': 50, 'rate-tc-index': 1}, {'rate-tc-bw': 0, 'rate-tc-index': 2}, {'rate-tc-bw': 0, 'rate-tc-index': 3}, {'rate-tc-bw': 0, 'rate-tc-index': 4}, {'rate-tc-bw': 0, 'rate-tc-index': 5}, {'rate-tc-bw': 0, 'rate-tc-index': 6}, {'rate-tc-bw': 0, 'rate-tc-index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 15:39:05 -07:00
Willem de Bruijn	d2527ad3a9	net: preserve MSG_ZEROCOPY with forwarding MSG_ZEROCOPY data must be copied before data is queued to local sockets, to avoid indefinite timeout until memory release. This test is performed by skb_orphan_frags_rx, which is called when looping an egress skb to packet sockets, error queue or ingress path. To preserve zerocopy for skbs that are looped to ingress but are then forwarded to an egress device rather than delivered locally, defer this last check until an skb enters the local IP receive path. This is analogous to existing behavior of skb_clear_delivery_time. Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630194312.1571410-2-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 15:07:16 -07:00
Antoine Tenart	c2a2ff6b4d	net: ipv4: fix stat increase when udp early demux drops the packet udp_v4_early_demux now returns drop reasons as it either returns 0 or ip_mc_validate_source, which returns itself a drop reason. However its use was not converted in ip_rcv_finish_core and the drop reason is ignored, leading to potentially skipping increasing LINUX_MIB_IPRPFILTER if the drop reason is SKB_DROP_REASON_IP_RPFILTER. This is a fix and we're not converting udp_v4_early_demux to explicitly return a drop reason to ease backports; this can be done as a follow-up. Fixes: `d46f827016` ("net: ip: make ip_mc_validate_source() return drop reason") Cc: Menglong Dong <menglong8.dong@gmail.com> Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250701074935.144134-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:46:44 -07:00
Lion Ackermann	103406b38c	net/sched: Always pass notifications when child class becomes empty Certain classful qdiscs may invoke their classes' dequeue handler on an enqueue operation. This may unexpectedly empty the child qdisc and thus make an in-flight class passive via qlen_notify(). Most qdiscs do not expect such behaviour at this point in time and may re-activate the class eventually anyways which will lead to a use-after-free. The referenced fix commit attempted to fix this behavior for the HFSC case by moving the backlog accounting around, though this turned out to be incomplete since the parent's parent may run into the issue too. The following reproducer demonstrates this use-after-free: tc qdisc add dev lo root handle 1: drr tc filter add dev lo parent 1: basic classid 1:1 tc class add dev lo parent 1: classid 1:1 drr tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1 tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0 tc qdisc add dev lo parent 2:1 handle 3: netem tc qdisc add dev lo parent 3:1 handle 4: blackhole echo 1 \| socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888 tc class delete dev lo classid 1:1 echo 1 \| socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888 Since backlog accounting issues leading to a use-after-frees on stale class pointers is a recurring pattern at this point, this patch takes a different approach. Instead of trying to fix the accounting, the patch ensures that qdisc_tree_reduce_backlog always calls qlen_notify when the child qdisc is empty. This solves the problem because deletion of qdiscs always involves a call to qdisc_reset() and / or qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing the following qdisc_tree_reduce_backlog() to report to the parent. Note that this may call qlen_notify on passive classes multiple times. This is not a problem after the recent patch series that made all the classful qdiscs qlen_notify() handlers idempotent. Fixes: `3f98113810` ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()") Signed-off-by: Lion Ackermann <nnamrec@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:42:17 -07:00
Eric Dumazet	46a94e44b9	ipv6: ip6_mc_input() and ip6_mr_input() cleanups Both functions are always called under RCU. We remove the extra rcu_read_lock()/rcu_read_unlock(). We use skb_dst_dev_net_rcu() instead of skb_dst_dev_net(). We use dev_net_rcu() instead of dev_net(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:30 -07:00
Eric Dumazet	93d1cff35a	ipv6: adopt skb_dst_dev() and skb_dst_dev_net[_rcu]() helpers Use the new helpers as a step to deal with potential dst->dev races. v2: fix typo in ipv6_rthdr_rcv() (kernel test robot <lkp@intel.com>) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:30 -07:00
Eric Dumazet	1caf272972	ipv6: adopt dst_dev() helper Use the new helper as a step to deal with potential dst->dev races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:30 -07:00
Eric Dumazet	a74fc62eec	ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu] Use the new helpers as a first step to deal with potential dst->dev races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:30 -07:00
Eric Dumazet	88fe14253e	net: dst: add four helpers to annotate data-races around dst->dev dst->dev is read locklessly in many contexts, and written in dst_dev_put(). Fixing all the races is going to need many changes. We probably will have to add full RCU protection. Add three helpers to ease this painful process. static inline struct net_device dst_dev(const struct dst_entry dst) { return READ_ONCE(dst->dev); } static inline struct net_device skb_dst_dev(const struct sk_buff skb) { return dst_dev(skb_dst(skb)); } static inline struct net skb_dst_dev_net(const struct sk_buff skb) { return dev_net(skb_dst_dev(skb)); } static inline struct net skb_dst_dev_net_rcu(const struct sk_buff skb) { return dev_net_rcu(skb_dst_dev(skb)); } Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:30 -07:00
Eric Dumazet	2dce8c52a9	net: dst: annotate data-races around dst->output dst_dev_put() can overwrite dst->output while other cpus might read this field (for instance from dst_output()) Add READ_ONCE()/WRITE_ONCE() annotations to suppress potential issues. We will likely need RCU protection in the future. Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:30 -07:00
Eric Dumazet	f1c5fd3489	net: dst: annotate data-races around dst->input dst_dev_put() can overwrite dst->input while other cpus might read this field (for instance from dst_input()) Add READ_ONCE()/WRITE_ONCE() annotations to suppress potential issues. We will likely need full RCU protection later. Fixes: `4a6ce2b6f2` ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:29 -07:00
Eric Dumazet	8f2b2282d0	net: dst: annotate data-races around dst->lastuse (dst_entry)->lastuse is read and written locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:29 -07:00
Eric Dumazet	36229b2cac	net: dst: annotate data-races around dst->expires (dst_entry)->expires is read and written locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:29 -07:00
Eric Dumazet	8a402bbe54	net: dst: annotate data-races around dst->obsolete (dst_entry)->obsolete is read locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:32:29 -07:00
Eric Dumazet	e3d4825124	udp: move udp_memory_allocated into net_aligned_data ____cacheline_aligned_in_smp attribute only makes sure to align a field to a cache line. It does not prevent the linker to use the remaining of the cache line for other variables, causing potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:22:02 -07:00
Eric Dumazet	8308133741	tcp: move tcp_memory_allocated into net_aligned_data ____cacheline_aligned_in_smp attribute only makes sure to align a field to a cache line. It does not prevent the linker to use the remaining of the cache line for other variables, causing potential false sharing. Move tcp_memory_allocated into a dedicated cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:22:02 -07:00
Eric Dumazet	998642e999	net: move net_cookie into net_aligned_data Using per-cpu data for net->net_cookie generation is overkill, because even busy hosts do not create hundreds of netns per second. Make sure to put net_cookie in a private cache line to avoid potential false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:22:02 -07:00
Eric Dumazet	3715b5df09	net: add struct net_aligned_data This structure will hold networking data that must consume a full cache line to avoid accidental false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630093540.3052835-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-02 14:22:02 -07:00
Aakash Kumar S	94f39804d8	xfrm: Duplicate SPI Handling The issue originates when Strongswan initiates an XFRM_MSG_ALLOCSPI Netlink message, which triggers the kernel function xfrm_alloc_spi(). This function is expected to ensure uniqueness of the Security Parameter Index (SPI) for inbound Security Associations (SAs). However, it can return success even when the requested SPI is already in use, leading to duplicate SPIs assigned to multiple inbound SAs, differentiated only by their destination addresses. This behavior causes inconsistencies during SPI lookups for inbound packets. Since the lookup may return an arbitrary SA among those with the same SPI, packet processing can fail, resulting in packet drops. According to RFC 4301 section 4.4.2 , for inbound processing a unicast SA is uniquely identified by the SPI and optionally protocol. Reproducing the Issue Reliably: To consistently reproduce the problem, restrict the available SPI range in charon.conf : spi_min = 0x10000000 spi_max = 0x10000002 This limits the system to only 2 usable SPI values. Next, create more than 2 Child SA. each using unique pair of src/dst address. As soon as the 3rd Child SA is initiated, it will be assigned a duplicate SPI, since the SPI pool is already exhausted. With a narrow SPI range, the issue is consistently reproducible. With a broader/default range, it becomes rare and unpredictable. Current implementation: xfrm_spi_hash() lookup function computes hash using daddr, proto, and family. So if two SAs have the same SPI but different destination addresses, then they will: a. Hash into different buckets b. Be stored in different linked lists (byspi + h) c. Not be seen in the same hlist_for_each_entry_rcu() iteration. As a result, the lookup will result in NULL and kernel allows that Duplicate SPI Proposed Change: xfrm_state_lookup_spi_proto() does a truly global search - across all states, regardless of hash bucket and matches SPI and proto. Signed-off-by: Aakash Kumar S <saakashkumar@marvell.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-02 10:34:01 +02:00
Fernando Fernandez Mancera	2ca58d87eb	xfrm: ipcomp: adjust transport header after decompressing The skb transport header pointer needs to be adjusted by network header pointer plus the size of the ipcomp header. This shows up when running traffic over ipcomp using transport mode. After being reinjected, packets are dropped because the header isn't adjusted properly and some checks can be triggered. E.g the skb is mistakenly considered as IP fragmented packet and later dropped. kworker/30:1-mm 443 [030] 102.055250: skb:kfree_skb:skbaddr=0xffff8f104aa3ce00 rx_sk=( ffffffff8419f1f4 sk_skb_reason_drop+0x94 ([kernel.kallsyms]) ffffffff8419f1f4 sk_skb_reason_drop+0x94 ([kernel.kallsyms]) ffffffff84281420 ip_defrag+0x4b0 ([kernel.kallsyms]) ffffffff8428006e ip_local_deliver+0x4e ([kernel.kallsyms]) ffffffff8432afb1 xfrm_trans_reinject+0xe1 ([kernel.kallsyms]) ffffffff83758230 process_one_work+0x190 ([kernel.kallsyms]) ffffffff83758f37 worker_thread+0x2d7 ([kernel.kallsyms]) ffffffff83761cc9 kthread+0xf9 ([kernel.kallsyms]) ffffffff836c3437 ret_from_fork+0x197 ([kernel.kallsyms]) ffffffff836718da ret_from_fork_asm+0x1a ([kernel.kallsyms]) Fixes: `eb2953d269` ("xfrm: ipcomp: Use crypto_acomp interface") Link: https://bugzilla.suse.com/1244532 Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-02 09:34:21 +02:00
Tobias Brunner	3ac9e29211	xfrm: Set transport header to fix UDP GRO handling The referenced commit replaced a call to __xfrm4\|6_udp_encap_rcv() with a custom check for non-ESP markers. But what the called function also did was setting the transport header to the ESP header. The function that follows, esp4\|6_gro_receive(), relies on that being set when it calls xfrm_parse_spi(). We have to set the full offset as the skb's head was not moved yet so adding just the UDP header length won't work. Fixes: `e3fd057776` ("xfrm: Fix UDP GRO handling for some corner cases") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-07-02 09:19:56 +02:00
Andrea Mayer	db3e2ceab3	seg6: fix lenghts typo in a comment Fix a typo: lenghts -> length The typo has been identified using codespell, and the tool currently does not report any additional issues in comments. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629171226.4988-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:32:45 -07:00
Kohei Enju	34a500caf4	rose: fix dangling neighbour pointers in rose_rt_device_down() There are two bugs in rose_rt_device_down() that can cause use-after-free: 1. The loop bound `t->count` is modified within the loop, which can cause the loop to terminate early and miss some entries. 2. When removing an entry from the neighbour array, the subsequent entries are moved up to fill the gap, but the loop index `i` is still incremented, causing the next entry to be skipped. For example, if a node has three neighbours (A, A, B) with count=3 and A is being removed, the second A is not checked. i=0: (A, A, B) -> (A, B) with count=2 ^ checked i=1: (A, B) -> (A, B) with count=2 ^ checked (B, not A!) i=2: (doesn't occur because i < count is false) This leaves the second A in the array with count=2, but the rose_neigh structure has been freed. Code that accesses these entries assumes that the first `count` entries are valid pointers, causing a use-after-free when it accesses the dangling pointer. Fix both issues by iterating over the array in reverse order with a fixed loop bound. This ensures that all entries are examined and that the removal of an entry doesn't affect subsequent iterations. Reported-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e04e2c007ba2c80476cb Tested-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250629030833.6680-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 19:28:48 -07:00
Nicolas Dichtel	0341e34727	ip6_tunnel: enable to change proto of fb tunnels This is possible via the ioctl API: > ip -6 tunnel change ip6tnl0 mode any Let's align the netlink API: > ip link set ip6tnl0 type ip6tnl mode any Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250630145602.1027220-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:40:57 -07:00
Jakub Kicinski	3249eae7e4	net: ethtool: fix leaking netdev ref if ethnl_default_parse() failed Ido spotted that I made a mistake in commit under Fixes, ethnl_default_parse() may acquire a dev reference even when it returns an error. This may have been driven by the code structure in dumps (which unconditionally release dev before handling errors), but it's too much of a trap. Functions should undo what they did before returning an error, rather than expecting caller to clean up. Rather than fixing ethnl_default_set_doit() directly make ethnl_default_parse() clean up errors. Reported-by: Ido Schimmel <idosch@idosch.org> Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder Fixes: `963781bdfe` ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-01 17:27:53 -07:00
Linus Torvalds	ce95858aee	NFS Client Bugfixes for Linux 6.16 * Fix loop in GSS sequence number cache * Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails * Fix a race to wake on NFS_LAYOUT_DRAIN * Fix handling of NFS level errors in I/O -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmhkRaUACgkQ18tUv7Cl QOskFxAAiyf6OIZ6M1mBfmbDDb+O5Gl6zofn0+OW2V9puJT/0U7pgNzepK9gFtO8 o3Bq0/GU3I2oxO7wEFrWFQXl3hkPvMqCN7ai1Vb2DjRGWhu97E0Mk3DWltSmuDFQ IaofuURjJdhgvjLb03mI6ReQNxONbMU3qD0JgK4/WIfvm44574Fah6jTnod32G23 EHj8cBw+iIvGh8MmPb4g01XivMGM36bA08NP4qkU/wgeLnkJzFYb5XZf16v821T6 ZxwwruclX2fbpLtsQsHfJpOgW/TFRJTyjBcZw581H8fpkgh1PlJ96OFwrbOU7RCp gVzDw3hvWoKFaMjVlkKk3wSWzwtMWLnB8a7TmgssuNU+DqmN3qMzkaRqrOxWSYMc t7SycQ+PReaR2gQdlJNrN5/Q75OLpqplwPi6O5cqOMQXC2aMK+nhXVW9QiC1SPFI ZcymKk4anzdgIgH+8TR3JpFVmPoEuuIeLV24+DQ0rlh7+4SI3TooTygfsl3/DErb 6Ic6nXgeSBWBPvuemnPbsq9DuAqGFbLrbdutVu4LUx/9XoGd8AfA9dVLMIb/0hgm C3Lwt1xeata8dz1v2jHHS1Tzs8ZphXnUCU7gzcf4TDs3UQUGzKnnNfdfb1r2cvxU LVz2guJ9xH4r3TsVNn2GQijbccxwPVFxszzPm0JobxiQYOna0Ss= =F4+b -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: - Fix loop in GSS sequence number cache - Clean up /proc/net/rpc/nfs if nfs_fs_proc_net_init() fails - Fix a race to wake on NFS_LAYOUT_DRAIN - Fix handling of NFS level errors in I/O * tag 'nfs-for-6.16-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4/flexfiles: Fix handling of NFS level errors in I/O NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails. sunrpc: fix loop in gss seqno cache	2025-07-01 13:42:30 -07:00
RubenKelevra	21deb2d966	net: ieee8021q: fix insufficient table-size assertion _Static_assert(ARRAY_SIZE(map) != IEEE8021Q_TT_MAX - 1) rejects only a length of 7 and allows any other mismatch. Replace it with a strict equality test via a helper macro so that every mapping table must have exactly IEEE8021Q_TT_MAX (8) entries. Signed-off-by: RubenKelevra <rubenkelevra@gmail.com> Link: https://patch.msgid.link/20250626205907.1566384-1-rubenkelevra@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-01 12:55:49 +02:00
Eric Dumazet	aed4969f2b	net: net->nsid_lock does not need BH safety At the time of commit `bc51dddf98` ("netns: avoid disabling irq for netns id") peernet2id() was not yet using RCU. Commit `2dce224f46` ("netns: protect netns ID lookups with RCU") changed peernet2id() to no longer acquire net->nsid_lock (potentially from BH context). We do not need to block soft interrupts when acquiring net->nsid_lock anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Guillaume Nault <gnault@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250627163242.230866-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:58:20 -07:00
Ido Schimmel	03dc03fa04	neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries tl;dr ===== Add a new neighbor flag ("extern_valid") that can be used to indicate to the kernel that a neighbor entry was learned and determined to be valid externally. The kernel will not try to remove or invalidate such an entry, leaving these decisions to the user space control plane. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches (VTEPs). VTEPs that are connected to the same ES are called ES peers. When a neighbor entry is learned on a VTEP, it is distributed to both ES peers and remote VTEPs using EVPN MAC/IP advertisement routes. ES peers use the neighbor entry when routing traffic towards the multi-homed host and remote VTEPs use it for ARP/NS suppression. Motivation ========== If the ES link between a host and the VTEP on which the neighbor entry was locally learned goes down, the EVPN MAC/IP advertisement route will be withdrawn and the neighbor entries will be removed from both ES peers and remote VTEPs. Routing towards the multi-homed host and ARP/NS suppression can fail until another ES peer locally learns the neighbor entry and distributes it via an EVPN MAC/IP advertisement route. "draft-rbickhart-evpn-ip-mac-proxy-adv-03" [1] suggests avoiding these intermittent failures by having the ES peers install the neighbor entries as before, but also injecting EVPN MAC/IP advertisement routes with a proxy indication. When the previously mentioned ES link goes down and the original EVPN MAC/IP advertisement route is withdrawn, the ES peers will not withdraw their neighbor entries, but instead start aging timers for the proxy indication. If an ES peer locally learns the neighbor entry (i.e., it becomes "reachable"), it will restart its aging timer for the entry and emit an EVPN MAC/IP advertisement route without a proxy indication. An ES peer will stop its aging timer for the proxy indication if it observes the removal of the proxy indication from at least one of the ES peers advertising the entry. In the event that the aging timer for the proxy indication expired, an ES peer will withdraw its EVPN MAC/IP advertisement route. If the timer expired on all ES peers and they all withdrew their proxy advertisements, the neighbor entry will be completely removed from the EVPN fabric. Implementation ============== In the above scheme, when the control plane (e.g., FRR) advertises a neighbor entry with a proxy indication, it expects the corresponding entry in the data plane (i.e., the kernel) to remain valid and not be removed due to garbage collection or loss of carrier. The control plane also expects the kernel to notify it if the entry was learned locally (i.e., became "reachable") so that it will remove the proxy indication from the EVPN MAC/IP advertisement route. That is why these entries cannot be programmed with dummy states such as "permanent" or "noarp". Instead, add a new neighbor flag ("extern_valid") which indicates that the entry was learned and determined to be valid externally and should not be removed or invalidated by the kernel. The kernel can probe the entry and notify user space when it becomes "reachable" (it is initially installed as "stale"). However, if the kernel does not receive a confirmation, have it return the entry to the "stale" state instead of the "failed" state. In other words, an entry marked with the "extern_valid" flag behaves like any other dynamically learned entry other than the fact that the kernel cannot remove or invalidate it. One can argue that the "extern_valid" flag should not prevent garbage collection and that instead a neighbor entry should be programmed with both the "extern_valid" and "extern_learn" flags. There are two reasons for not doing that: 1. Unclear why a control plane would like to program an entry that the kernel cannot invalidate but can completely remove. 2. The "extern_learn" flag is used by FRR for neighbor entries learned on remote VTEPs (for ARP/NS suppression) whereas here we are concerned with local entries. This distinction is currently irrelevant for the kernel, but might be relevant in the future. Given that the flag only makes sense when the neighbor has a valid state, reject attempts to add a neighbor with an invalid state and with this flag set. For example: # ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid Error: Cannot create externally validated neighbor with an invalid state. # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 nud failed dev br0.10 extern_valid Error: Cannot mark neighbor as externally validated with an invalid state. The above means that a neighbor cannot be created with the "extern_valid" flag and flags such as "use" or "managed" as they result in a neighbor being created with an invalid state ("none") and immediately getting probed: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use Error: Cannot create externally validated neighbor with an invalid state. However, these flags can be used together with "extern_valid" after the neighbor was created with a valid state: # ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use One consequence of preventing the kernel from invalidating a neighbor entry is that by default it will only try to determine reachability using unicast probes. This can be changed using the "mcast_resolicit" sysctl: # sysctl net.ipv4.neigh.br0/10.mcast_resolicit 0 # tcpdump -nn -e -i br0.10 -Q out arp & # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 # sysctl -wq net.ipv4.neigh.br0/10.mcast_resolicit=3 # ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28 iproute2 patches can be found here [2]. [1] https://datatracker.ietf.org/doc/html/draft-rbickhart-evpn-ip-mac-proxy-adv-03 [2] https://github.com/idosch/iproute2/tree/submit/extern_valid_v1 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250626073111.244534-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 18:14:23 -07:00
Eric Dumazet	af232e7615	ipv6: guard ip6_mr_output() with rcu syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: net/ipv6/ip6mr.c:2376 at ip6_mr_output+0xe0b/0x1040 net/ipv6/ip6mr.c:2376, CPU#1: kworker/1:2/121 Call Trace: <TASK> ip6tunnel_xmit include/net/ip6_tunnel.h:162 [inline] udp_tunnel6_xmit_skb+0x640/0xad0 net/ipv6/ip6_udp_tunnel.c:112 send6+0x5ac/0x8d0 drivers/net/wireguard/socket.c:152 wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:178 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x1c8/0x7c0 drivers/net/wireguard/send.c:276 process_one_work kernel/workqueue.c:3239 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3322 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3403 kthread+0x70e/0x8a0 kernel/kthread.c:464 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: `96e8f5a9fe` ("net: ipv6: Add ip6_mr_output()") Reported-by: syzbot+0141c834e47059395621@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e86b3.a00a0220.129264.0003.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250627115822.3741390-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 17:57:29 -07:00
Jakub Kicinski	040cef30b5	net: ethtool: move get_rxfh callback under the rss_lock We already call get_rxfh under the rss_lock when we read back context state after changes. Let's be consistent and always hold the lock. The existing callers are all under rtnl_lock so this should make no difference in practice, but it makes the locking rules far less confusing IMHO. Any RSS callback and any access to the RSS XArray should hold the lock. Link: https://patch.msgid.link/20250626202848.104457-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 08:41:25 -07:00
Jakub Kicinski	739d18cce1	net: ethtool: move rxfh_fields callbacks under the rss_lock Netlink code will want to perform the RSS_SET operation atomically under the rss_lock. sfc wants to hold the rss_lock in rxfh_fields_get, which makes that difficult. Lets move the locking up to the core so that for all driver-facing callbacks rss_lock is taken consistently by the core. Link: https://patch.msgid.link/20250626202848.104457-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 08:41:24 -07:00
Jakub Kicinski	5ec353dbff	net: ethtool: take rss_lock for all rxfh changes Always take the rss_lock in ethtool_set_rxfh(). We will want to make a similar change in ethtool_set_rxfh_fields() and some drivers lock that callback regardless of rss context ID being set. Having some callbacks locked unconditionally and some only if context ID is set would be very confusing. ethtool handling is under rtnl_lock, so rss_lock is very unlikely to ever be congested. Link: https://patch.msgid.link/20250626202848.104457-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 08:41:24 -07:00
Jakub Kicinski	99e3eb454c	net: ethtool: avoid OOB accesses in PAUSE_SET We now reuse .parse_request() from GET on SET, so we need to make sure that the policies for both cover the attributes used for .parse_request(). genetlink will only allocate space in info->attrs for ARRAY_SIZE(policy). Reported-by: syzbot+430f9f76633641a62217@syzkaller.appspotmail.com Fixes: `963781bdfe` ("net: ethtool: call .parse_request for SET handlers") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250626233926.199801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-30 08:32:37 -07:00
Lachlan Hodges	1fe44a86ff	wifi: cfg80211: fix S1G beacon head validation in nl80211 S1G beacons contain fixed length optional fields that precede the variable length elements, ensure we take this into account when validating the beacon. This particular case was missed in `1e1f706fc2` ("wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements"). Fixes: `1d47f1198d` ("nl80211: correctly validate S1G beacon head") Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250626115118.68660-1-lachlan.hodges@morsemicro.com [shorten/reword subject] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-30 15:33:46 +02:00
Greg Kroah-Hartman	815ac67919	Merge 6.16-rc4 into tty-next We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-30 07:50:04 +02:00
Eric Dumazet	beead7eea8	net: ipv4: guard ip_mr_output() with rcu syzbot found at least one path leads to an ip_mr_output() without RCU being held. Add guard(rcu)() to fix this in a concise way. WARNING: CPU: 0 PID: 0 at net/ipv4/ipmr.c:2302 ip_mr_output+0xbb1/0xe70 net/ipv4/ipmr.c:2302 Call Trace: <IRQ> igmp_send_report+0x89e/0xdb0 net/ipv4/igmp.c:799 igmp_timer_expire+0x204/0x510 net/ipv4/igmp.c:-1 call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747 expire_timers kernel/time/timer.c:1798 [inline] __run_timers kernel/time/timer.c:2372 [inline] __run_timer_base+0x61a/0x860 kernel/time/timer.c:2384 run_timer_base kernel/time/timer.c:2393 [inline] run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403 handle_softirqs+0x286/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050 Fixes: `35bec72a24` ("net: ipv4: Add ip_mr_output()") Reported-by: syzbot+f02fb9e43bd85c6c66ae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685e841a.a00a0220.129264.0002.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petr Machata <petrm@nvidia.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Benjamin Poirier <bpoirier@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-28 10:44:21 +01:00
xin.guo	a041f70e57	tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK range If the new coming segment covers more than one skbs in the ofo queue, and which seq is equal to rcv_nxt, then the sequence range that is duplicated will be sent as DUP SACK, the detail as below, in step6, the {501,2001} range is clearly including too much DUP SACK range, in violation of RFC 2883 rules. 1. client > server: Flags [.], seq 501:1001, ack 1325288529, win 20000, length 500 2. server > client: Flags [.], ack 1, [nop,nop,sack 1 {501:1001}], length 0 3. client > server: Flags [.], seq 1501:2001, ack 1325288529, win 20000, length 500 4. server > client: Flags [.], ack 1, [nop,nop,sack 2 {1501:2001} {501:1001}], length 0 5. client > server: Flags [.], seq 1:2001, ack 1325288529, win 20000, length 2000 6. server > client: Flags [.], ack 2001, [nop,nop,sack 1 {501:2001}], length 0 After this fix, the final ACK is as below: 6. server > client: Flags [.], ack 2001, options [nop,nop,sack 1 {501:1001}], length 0 [edumazet] added a new packetdrill test in the following patch. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: xin.guo <guoxin0309@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250626123420.1933835-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-27 15:35:03 -07:00
Eric Dumazet	cf56a98202	tcp: remove inet_rtx_syn_ack() inet_rtx_syn_ack() is a simple wrapper around tcp_rtx_synack(), if we move req->num_retrans update. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250626153017.2156274-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-27 15:34:19 -07:00
Eric Dumazet	8d68411a12	tcp: remove rtx_syn_ack field Now inet_rtx_syn_ack() is only used by TCP, it can directly call tcp_rtx_synack() instead of using an indirect call to req->rsk_ops->rtx_syn_ack(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250626153017.2156274-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-27 15:34:18 -07:00
Christian Eggers	89fb8acc38	Bluetooth: HCI: Set extended advertising data synchronously Currently, for controllers with extended advertising, the advertising data is set in the asynchronous response handler for extended adverstising params. As most advertising settings are performed in a synchronous context, the (asynchronous) setting of the advertising data is done too late (after enabling the advertising). Move setting of adverstising data from asynchronous response handler into synchronous context to fix ordering of HCI commands. Signed-off-by: Christian Eggers <ceggers@arri.de> Fixes: `a0fb3726ba` ("Bluetooth: Use Set ext adv/scan rsp data if controller supports") Cc: stable@vger.kernel.org v2: https://lore.kernel.org/linux-bluetooth/20250626115209.17839-1-ceggers@arri.de/ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-27 14:01:20 -04:00
Christian Eggers	f3cb5676e5	Bluetooth: MGMT: mesh_send: check instances prior disabling advertising The unconditional call of hci_disable_advertising_sync() in mesh_send_done_sync() also disables other LE advertisings (non mesh related). I am not sure whether this call is required at all, but checking the adv_instances list (like done at other places) seems to solve the problem. Fixes: `b338d91703` ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-27 14:01:02 -04:00
Christian Eggers	e5af67a870	Bluetooth: MGMT: set_mesh: update LE scan interval and window According to the message of commit `b338d91703` ("Bluetooth: Implement support for Mesh"), MGMT_OP_SET_MESH_RECEIVER should set the passive scan parameters. Currently the scan interval and window parameters are silently ignored, although user space (bluetooth-meshd) expects that they can be used [1] [1] https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/mesh/mesh-io-mgmt.c#n344 Fixes: `b338d91703` ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-27 14:00:44 -04:00
Christian Eggers	46c0d947b6	Bluetooth: hci_sync: revert some mesh modifications This reverts minor parts of the changes made in commit `b338d91703` ("Bluetooth: Implement support for Mesh"). It looks like these changes were only made for development purposes but shouldn't have been part of the commit. Fixes: `b338d91703` ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-27 14:00:27 -04:00
Yang Li	1f029b4e30	Bluetooth: Prevent unintended pause by checking if advertising is active When PA Create Sync is enabled, advertising resumes unexpectedly. Therefore, it's necessary to check whether advertising is currently active before attempting to pause it. < HCI Command: LE Add Device To... (0x08\|0x0011) plen 7 #1345 [hci0] 48.306205 Address type: Random (0x01) Address: 4F:84:84:5F:88:17 (Resolvable) Identity type: Random (0x01) Identity: FC:5B:8C:F7:5D:FB (Static) < HCI Command: LE Set Address Re.. (0x08\|0x002d) plen 1 #1347 [hci0] 48.308023 Address resolution: Enabled (0x01) ... < HCI Command: LE Set Extended A.. (0x08\|0x0039) plen 6 #1349 [hci0] 48.309650 Extended advertising: Enabled (0x01) Number of sets: 1 (0x01) Entry 0 Handle: 0x01 Duration: 0 ms (0x00) Max ext adv events: 0 ... < HCI Command: LE Periodic Adve.. (0x08\|0x0044) plen 14 #1355 [hci0] 48.314575 Options: 0x0000 Use advertising SID, Advertiser Address Type and address Reporting initially enabled SID: 0x02 Adv address type: Random (0x01) Adv address: 4F:84:84:5F:88:17 (Resolvable) Identity type: Random (0x01) Identity: FC:5B:8C:F7:5D:FB (Static) Skip: 0x0000 Sync timeout: 20000 msec (0x07d0) Sync CTE type: 0x0000 Fixes: `ad383c2c65` ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled") Signed-off-by: Yang Li <yang.li@amlogic.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-27 13:37:23 -04:00
Yue Haibing	77e12dba07	ipv4: fib: Remove unnecessary encap_type check lwtunnel_build_state() has check validity of encap_type, so no need to do this before call it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250625022059.3958215-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-26 17:16:03 -07:00
Jakub Kicinski	32155c6fd9	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCaF3LFhUcZGFuaWVsQGlv Z2VhcmJveC5uZXQACgkQ2yufC7HISINtRgD+JagJmBokoPnsk7DfauJnVhaP95aV tsnna+fU1kGwS7MBAMINCoLyeISiD/XG0O+Om38czhhglWbl4+TgrthegPkE =opKf -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-06-27 We've added 6 non-merge commits during the last 8 day(s) which contain a total of 6 files changed, 120 insertions(+), 20 deletions(-). The main changes are: 1) Fix RCU usage in task_cls_state() for BPF programs using helpers like bpf_get_cgroup_classid_curr() outside of networking, from Charalampos Mitrodimas. 2) Fix a sockmap race between map_update and a pending workqueue from an earlier map_delete freeing the old psock where both pointed to the same psock->sk, from Jiayuan Chen. 3) Fix a data corruption issue when using bpf_msg_pop_data() in kTLS which failed to recalculate the ciphertext length, also from Jiayuan Chen. 4) Remove xdp_redirect_map{,_err} trace events since they are unused and also hide XDP trace events under CONFIG_BPF_SYSCALL, from Steven Rostedt. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: xdp: tracing: Hide some xdp events under CONFIG_BPF_SYSCALL xdp: Remove unused events xdp_redirect_map and xdp_redirect_map_err net, bpf: Fix RCU usage in task_cls_state() for BPF programs selftests/bpf: Add test to cover ktls with bpf_msg_pop_data bpf, ktls: Fix data corruption when using bpf_msg_pop_data() in ktls bpf, sockmap: Fix psock incorrectly pointing to sk ==================== Link: https://patch.msgid.link/20250626230111.24772-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-26 17:13:17 -07:00
Jakub Kicinski	28aa52b618	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc4). Conflicts: Documentation/netlink/specs/mptcp_pm.yaml `9e6dd4c256` ("netlink: specs: mptcp: replace underscores with dashes in names") `ec362192aa` ("netlink: specs: fix up indentation errors") https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au Adjacent changes: Documentation/netlink/specs/fou.yaml `791a9ed0a4` ("netlink: specs: fou: replace underscores with dashes in names") `880d43ca9a` ("netlink: specs: clean up spaces in brackets") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-26 10:40:50 -07:00
Alexei Starovoitov	886178a33a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3 Cross-merge BPF, perf and other fixes after downstream PRs. It restores BPF CI to green after critical fix commit `bc4394e5e7` ("perf: Fix the throttle error of some clock events") No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-26 09:49:39 -07:00
Linus Torvalds	e34a79b96a	Including fixes from bluetooth and wireless. Current release - regressions: - bridge: fix use-after-free during router port configuration Current release - new code bugs: - eth: wangxun: fix the creation of page_pool Previous releases - regressions: - netpoll: initialize UDP checksum field before checksumming - wifi: mac80211: finish link init before RCU publish - bluetooth: fix use-after-free in vhci_flush() - eth: ionic: fix DMA mapping test - eth: bnxt: properly flush XDP redirect lists Previous releases - always broken: - netlink: specs: enforce strict naming of properties - unix: don't leave consecutive consumed OOB skbs. - vsock: fix linux/vm_sockets.h userspace compilation errors - selftests: fix TCP packet checksum Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmhdIbESHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkB70P/21uclzfzXS9Ijlata4aFjBtRB0Ebvat FqHZaB6ldVceRZYG4vb2FVoQawi5Ex6Aju4fgS0y34KDiwAA7wkmTVMbmSHOxPuy 5oHDPilUv4T1cbiMBU7/BncT0XFwgA1pDsD8OBQtAB0ghSCidKdmberWZXeEW4Va J65Crdf8VLBgUb/enowRUA/TJ0SfZouAC/N+GmqfewxHCXM0a0JwJ9Kp1UjcTpEz 7JifBHJbepa3mCCqHlXcJP87Q7soMz/V0o3B6IVm75MjgmR5I/BTiBKvsWurNqLZ AUtTy4icVOr6gSFmGjeiDH2OT6silF2JwqrR+ajNKNvgWJOOxFjEY+fy3RrRYLBD WLXkO20AdRsl77CdiDQkHl8Y3R88aeils7AnZVwJ91QjcDRfnRbr5U57bdCrxAGv ZJR9jWnFrTyC7UZeil7LMf/f09mb8jQqxKwKQAvO5tLgUh9TEaqlveapLHSUdN67 ZyMZpzF5hGfslGf3Gc6tl324NkAtZB7bfAhTG8FyEIZqeOBYcL2Drr9dBLgAVTmS tTjgsoEQVGxLcDz6y5HK5kScTM5WnZybWBkGKRLl4/n5pU05SkMxjXmSr/nTpB2F CxVEjmutbfM/1tiClXCiHLLmV5DguAUV4Dz01/uiaOLjbxQb8NBzPDmRK9TNrfqh KFDQro/JEcdH =ywOM -----END PGP SIGNATURE----- Merge tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and wireless. Current release - regressions: - bridge: fix use-after-free during router port configuration Current release - new code bugs: - eth: wangxun: fix the creation of page_pool Previous releases - regressions: - netpoll: initialize UDP checksum field before checksumming - wifi: mac80211: finish link init before RCU publish - bluetooth: fix use-after-free in vhci_flush() - eth: - ionic: fix DMA mapping test - bnxt: properly flush XDP redirect lists Previous releases - always broken: - netlink: specs: enforce strict naming of properties - unix: don't leave consecutive consumed OOB skbs. - vsock: fix linux/vm_sockets.h userspace compilation errors - selftests: fix TCP packet checksum" * tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) net: libwx: fix the creation of page_pool net: selftests: fix TCP packet checksum atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister(). netlink: specs: enforce strict naming of properties netlink: specs: tc: replace underscores with dashes in names netlink: specs: rt-link: replace underscores with dashes in names netlink: specs: mptcp: replace underscores with dashes in names netlink: specs: ovs_flow: replace underscores with dashes in names netlink: specs: devlink: replace underscores with dashes in names netlink: specs: dpll: replace underscores with dashes in names netlink: specs: ethtool: replace underscores with dashes in names netlink: specs: fou: replace underscores with dashes in names netlink: specs: nfsd: replace underscores with dashes in names net: enetc: Correct endianness handling in _enetc_rd_reg64 atm: idt77252: Add missing `dma_map_error()` bnxt: properly flush XDP redirect lists vsock/uapi: fix linux/vm_sockets.h userspace compilation errors wifi: mac80211: finish link init before RCU publish wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version selftest: af_unix: Add tests for -ECONNRESET. ...	2025-06-26 09:13:27 -07:00
Jakub Kicinski	8d89661a36	net: selftests: fix TCP packet checksum The length in the pseudo header should be the length of the L3 payload AKA the L4 header+payload. The selftest code builds the packet from the lower layers up, so all the headers are pushed already when it constructs L4. We need to subtract the lower layer headers from skb->len. Fixes: `3e1e58d64c` ("net: add generic selftest support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250624183258.3377740-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-06-26 10:50:49 +02:00
Yue Haibing	f6fa45d67e	net: Reoder rxq_idx check in __net_mp_open_rxq() array_index_nospec() clamp the rxq_idx within the range of [0, dev->real_num_rx_queues), move the check before it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250624140159.3929503-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 16:53:51 -07:00
Yue Haibing	3b3ccf9ed0	net: Remove unnecessary NULL check for lwtunnel_fill_encap() lwtunnel_fill_encap() has NULL check and return 0, so no need to check before call it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250624140015.3929241-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 16:52:46 -07:00
Kuniyuki Iwashima	a433791aea	atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister(). syzbot reported a warning below during atm_dev_register(). [0] Before creating a new device and procfs/sysfs for it, atm_dev_register() looks up a duplicated device by __atm_dev_lookup(). These operations are done under atm_dev_mutex. However, when removing a device in atm_dev_deregister(), it releases the mutex just after removing the device from the list that __atm_dev_lookup() iterates over. So, there will be a small race window where the device does not exist on the device list but procfs/sysfs are still not removed, triggering the splat. Let's hold the mutex until procfs/sysfs are removed in atm_dev_deregister(). [0]: proc_dir_entry 'atm/atmtcp:0' already registered WARNING: CPU: 0 PID: 5919 at fs/proc/generic.c:377 proc_register+0x455/0x5f0 fs/proc/generic.c:377 Modules linked in: CPU: 0 UID: 0 PID: 5919 Comm: syz-executor284 Not tainted 6.16.0-rc2-syzkaller-00047-g52da431bf03b #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 RIP: 0010:proc_register+0x455/0x5f0 fs/proc/generic.c:377 Code: 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 a2 01 00 00 48 8b 44 24 10 48 c7 c7 20 c0 c2 8b 48 8b b0 d8 00 00 00 e8 0c 02 1c ff 90 <0f> 0b 90 90 48 c7 c7 80 f2 82 8e e8 0b de 23 09 48 8b 4c 24 28 48 RSP: 0018:ffffc9000466fa30 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff817ae248 RDX: ffff888026280000 RSI: ffffffff817ae255 RDI: 0000000000000001 RBP: ffff8880232bed48 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff888076ed2140 R13: dffffc0000000000 R14: ffff888078a61340 R15: ffffed100edda444 FS: 00007f38b3b0c6c0(0000) GS:ffff888124753000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f38b3bdf953 CR3: 0000000076d58000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_create_data+0xbe/0x110 fs/proc/generic.c:585 atm_proc_dev_register+0x112/0x1e0 net/atm/proc.c:361 atm_dev_register+0x46d/0x890 net/atm/resources.c:113 atmtcp_create+0x77/0x210 drivers/atm/atmtcp.c:369 atmtcp_attach drivers/atm/atmtcp.c:403 [inline] atmtcp_ioctl+0x2f9/0xd60 drivers/atm/atmtcp.c:464 do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159 sock_do_ioctl+0x115/0x280 net/socket.c:1190 sock_ioctl+0x227/0x6b0 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f38b3b74459 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f38b3b0c198 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f38b3bfe318 RCX: 00007f38b3b74459 RDX: 0000000000000000 RSI: 0000000000006180 RDI: 0000000000000005 RBP: 00007f38b3bfe310 R08: 65732f636f72702f R09: 65732f636f72702f R10: 65732f636f72702f R11: 0000000000000246 R12: 00007f38b3bcb0ac R13: 00007f38b3b0c1a0 R14: 0000200000000200 R15: 00007f38b3bcb03b </TASK> Fixes: `64bf69ddff` ("[ATM]: deregistration removes device from atm_devs list immediately") Reported-by: syzbot+8bd335d2ad3b93e80715@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/685316de.050a0220.216029.0087.GAE@google.com/ Tested-by: syzbot+8bd335d2ad3b93e80715@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250624214505.570679-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 16:43:39 -07:00
Yue Haibing	9b19b50c8d	neighbour: Remove redundant assignment to err 'err' has been checked against 0 in the if statement. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250624014216.3686659-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 15:26:03 -07:00
Jakub Kicinski	46837be5af	net: ethtool: rss: add notifications In preparation for RSS_SET handling in ethnl introduce Netlink notifications for RSS. Only cover modifications, not creation and not removal of a context, because the latter may deserve a different notification type. We should cross that bridge when we add the support for context add / remove via Netlink. Link: https://patch.msgid.link/20250623231720.3124717-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 15:24:14 -07:00
Jakub Kicinski	3073947de3	net: ethtool: copy req_info from SET to NTF Copy information parsed for SET with .req_parse to NTF handling and therefore the GET-equivalent that it ends up executing. This way if the SET was on a sub-object (like RSS context) the notification will also be appropriately scoped. Also copy the phy_index, Maxime suggests this will help PLCA commands generate accurate notifications as well. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250623231720.3124717-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 15:24:14 -07:00
Jakub Kicinski	f9dc3e52d8	net: ethtool: remove the data argument from ethtool_notify() ethtool_notify() takes a const void *data argument, which presumably was intended to pass information from the call site to the subcommand handler. This argument currently has no users. Expecting the data to be subcommand-specific has two complications. Complication #1 is that its not plumbed thru any of the standardized callbacks. It gets propagated to ethnl_default_notify() where it remains unused. Coming from the ethnl_default_set_doit() side we pass in NULL, because how could we have a command specific attribute in a generic handler. Complication #2 is that we expect the ethtool_notify() callers to know what attribute type to pass in. Again, the data pointer is untyped. RSS will need to pass the context ID to the notifications. I think it's a better design if the "subcommand" exports its own typed interface and constructs the appropriate argument struct (which will be req_info). Remove the unused data argument from ethtool_notify() but retain it in a new internal helper which subcommands can use to build a typed interface. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250623231720.3124717-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 15:24:14 -07:00
Jakub Kicinski	963781bdfe	net: ethtool: call .parse_request for SET handlers In preparation for using req_info to carry parameters between SET and NTF - call .parse_request during ethnl_default_set_doit(). The main question here is whether .parse_request is intended to be GET-specific. Originally the SET handling was delegated to each subcommand directly - ethnl_default_set_doit() and .set callbacks in ethnl_request_ops did not exist. Looking at existing users does not shed much light, all of the following subcommands use .parse_request but have no SET handler (and no NTF): net/ethtool/eeprom.c net/ethtool/rss.c net/ethtool/stats.c net/ethtool/strset.c net/ethtool/tsinfo.c There's only one which does have a SET: net/ethtool/pause.c where .parse_request handling is used to select which statistics to query. Not relevant for SET but also harmless. Going back to RSS (which doesn't have SET today) .parse_request parses the rss_context ID. Using the req_info struct to pass the context ID from SET to NTF will be very useful. Switch to ethnl_default_parse(), effectively adding the .parse_request for SET handlers. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250623231720.3124717-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 15:24:13 -07:00
Jakub Kicinski	ceca0769e8	net: ethtool: dynamically allocate full req size req In preparation for using req_info to carry parameters between SET and NTF allocate a full request info struct. Since the size depends on the subcommand we need to allocate it on the heap. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250623231720.3124717-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 15:24:13 -07:00
Jakub Kicinski	ab4eb6a25d	The usual features/cleanups/etc., notably: - rtw88: IBSS mode for SDIO devices - rtw89: - BT coex for MLO/WiFi7 - work on station + P2P concurrency - ath: fix W=2 export.h warnings - ath12k: fix scan on multi-radio devices - cfg80211/mac80211: MLO statistics - mac80211: S1G aggregation - cfg80211/mac80211: per-radio RTS threshold -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhb5KsACgkQ10qiO8sP aAC8mQ//RPsycyRQ3XRf3E1OsGV61M1iJJRvdmNtW7GMi6Ztc96xkf1oGHQRH2x1 lQ6PiDGEG9dvtQNvE2rS1vtgsFAa0pWB4+pD2t+mIs3umwxGI8OhgwgSIY1GYTwe Jq9IqsxSFMc38pI4RPVL2TgKJ81W3Ak51DAbCBxP5Lo2vFyFYe6toVc8+88WyPTX +hhY5cw2WSDwWiRQ9BPTgs5+LKRnIVh8xmX0I69bUX8zZ0MxU2z0UGzjVQOjywDK w89jdqwwbMVMIL8Ti2foIfI8UC4N+KwDeMB7HLJhkTEf8B4VScd8aeSFIOtjz/j0 V20fegLfJ31DvsXWI5I2ia7DKDnMRA3Kb1XZzsS5qeY7jTolLMUS9c96fw7SUfVd P3clV5H1kKretnzxKijUbrG2MgHsuIzX3x6GXWtuZEcVyrovh9wSz08B4/DbfPe7 PSqVug+GOnH5YV7RpP9jByvY86MT846O/V+MYi+CkrwgMATRyX0jA6EXUVe26kzT mL3Xr68aSUKLxkLgiCWSdzQw7e+YF56JbpcbJQOq0Z16CIOw1k84r62kBUFzNB3k tG5InbFYAGiNJu2LmhFFZN8grD1TrDYKuzmgBoia48TpI2gwlwcW3kCWfMfty25L IJ4p+SAsTvnYpbLesWQHDZ0bouQCKPouB5NUqVHFNhBCIBMo634= =16yP -----END PGP SIGNATURE----- Merge tag 'wireless-next-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== The usual features/cleanups/etc., notably: - rtw88: IBSS mode for SDIO devices - rtw89: - BT coex for MLO/WiFi7 - work on station + P2P concurrency - ath: fix W=2 export.h warnings - ath12k: fix scan on multi-radio devices - cfg80211/mac80211: MLO statistics - mac80211: S1G aggregation - cfg80211/mac80211: per-radio RTS threshold * tag 'wireless-next-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (171 commits) wifi: iwlwifi: dvm: fix potential overflow in rs_fill_link_cmd() iwlwifi: Add missing check for alloc_ordered_workqueue wifi: iwlwifi: Fix memory leak in iwl_mvm_init() iwlwifi: api: delete repeated words iwlwifi: remove unused no_sleep_autoadjust declaration iwlwifi: Fix comment typo iwlwifi: use DECLARE_BITMAP macro iwlwifi: fw: simplify the iwl_fw_dbg_collect_trig() wifi: iwlwifi: mld: ftm: fix switch end indentation MAINTAINERS: update iwlwifi git link wifi: iwlwifi: pcie: fix non-MSIX handshake register wifi: iwlwifi: mld: don't exit EMLSR when we shouldn't wifi: iwlwifi: move _iwl_trans_set_bits_mask utilities wifi: iwlwifi: mld: make iwl_mld_add_all_rekeys void wifi: iwlwifi: move iwl_trans_pcie_write_mem to iwl-trans.c wifi: iwlwifi: pcie: move iwl_trans_pcie_dump_regs() to utils.c wifi: iwlwifi: mld: advertise support for TTLM changes wifi: iwlwifi: mld: Block EMLSR when scanning on P2P Device wifi: iwlwifi: mld: use the correct struct size for tracing wifi: iwlwifi: support RZL platform device ID ... ==================== Link: https://patch.msgid.link/20250625120135.41933-55-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 10:28:13 -07:00
Jakub Kicinski	010c40c1f5	Just a few fixes: - iwlegacy: work around large stack with clang/kasan - mac80211: fix integer overflow - mac80211: fix link struct init vs. RCU publish - iwlwifi: fix warning on IFF_UP -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhb4zAACgkQ10qiO8sP aAAJMA/+KBq3/uvXAv7Mxtg2YTvmzKMs/Zsk3Uh1yCijsm6K9/yMkcuvyRDs9n+M yobTGwlNKsg+xZmwKM6vT2TwFN4cJhFsz9pL/QQ7M6k4ZARv6MkIgXxhJ5504trv ufsea2iCN2xHQ7Y81SNFjtZwckMPhS1Cgs7OdjOkP8GJPDsTXrdTNSZZh69l3XX1 RG0Fp98RhPnONmzj+ewj3leVMwJlCyAdzqx1B3Hk1tJQojUtkmwg1F+bpAifYBLs arORBtkL4cVYXcYkoBJ+WFLMztooAaqo4Oal8s4zl5/VubjCb30+mBFRQytD9ldD gp2gqY6Y1BAiCZKBxjAwRvG743CBPEBD7tHUDtS4cbPPChjMKWXrakWkLGjTDv50 wYnb9EsPKt03L353vaQ/BHW2m88ebxIYiaSVUY+animRvadpFihzT2fF9P+0/eHi zU2AFlmF0bS74OtjyOVXSSin0pTmBEHDIRfqBdw/szGhMqBdHDq+qPb2H2TJnfZy eec1MS0td8MYN9SV9xPac+4EqUbOHDlQ4fVD3rCn74+wESQp7HUnSJOgobB6cOJg N3tao+n/K8kYxdaCplqAeWMD3bFSYs9oz404K3wOeO85lHz+lB581j6as8zJ86cL nJVLEflnDCQHnxFCEMHeV/1q/GjIU7Xv5+rwDpfMEY3hx0jH1d8= =6koP -----END PGP SIGNATURE----- Merge tag 'wireless-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few fixes: - iwlegacy: work around large stack with clang/kasan - mac80211: fix integer overflow - mac80211: fix link struct init vs. RCU publish - iwlwifi: fix warning on IFF_UP * tag 'wireless-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: finish link init before RCU publish wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version wifi: mac80211: fix beacon interval calculation overflow wifi: iwlegacy: work around excessive stack usage on clang/kasan ==================== Link: https://patch.msgid.link/20250625115433.41381-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-25 10:26:16 -07:00
Lachlan Hodges	a505229624	wifi: mac80211: add support for S1G aggregation Allow an S1G station to use aggregation. Signed-off-by: Sophronia Koilpillai <sophronia.koilpillai@morsemicro.com> Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250617080610.756048-5-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:28 +02:00
Lachlan Hodges	037dc18ac3	wifi: mac80211: add support for storing station S1G capabilities When a station configuration is updated, update the stations S1G capabilities. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250617080610.756048-4-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:28 +02:00
Lachlan Hodges	2a8a6b7c4c	wifi: mac80211: handle station association response with S1G Add support for updating the stations S1G capabilities when an S1G association occurs. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250617080610.756048-3-lachlan.hodges@morsemicro.com [remove unused S1G_CAP3_MAX_MPDU_LEN_3895/_7791] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:28 +02:00
Lachlan Hodges	5ea255673c	wifi: cfg80211: support configuration of S1G station capabilities Currently there is no support for initialising a peers S1G capabilities, this patch adds support for configuring an S1G station. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250617080610.756048-2-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:28 +02:00
Roopni Devanathan	407bc77b70	wifi: mac80211: Set RTS threshold on per-radio basis Add support to get the radio for which RTS threshold needs to be changed from userspace. Pass on this radio index to underlying drivers as an additional argument. A value of -1 indicates radio index is not mentioned and that the configuration applies to all radio(s) of the wiphy. Signed-off-by: Roopni Devanathan <quic_rdevanat@quicinc.com> Link: https://patch.msgid.link/20250615082312.619639-5-quic_rdevanat@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:27 +02:00
Roopni Devanathan	8959519005	wifi: cfg80211: Report per-radio RTS threshold to userspace In case of multi-radio wiphys, with per-radio RTS threshold brought into use, RTS threshold for each radio in a wiphy can be recorded in wiphy parameter - wiphy_radio_cfg, as an array. Add a new attribute - NL80211_WIPHY_RADIO_ATTR_RTS_THRESHOLD in nested parameter - NL80211_ATTR_WIPHY_RADIOS. When a request for getting RTS threshold for a particular radio is received, parse the radio id and get the required data. Add this data to the newly added nested attribute NL80211_WIPHY_RADIO_ATTR_RTS_THRESHOLD. Add support to report this data to userspace. Signed-off-by: Roopni Devanathan <quic_rdevanat@quicinc.com> Link: https://patch.msgid.link/20250615082312.619639-4-quic_rdevanat@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:27 +02:00
Roopni Devanathan	264637941c	wifi: cfg80211: Add Support to Set RTS Threshold for each Radio Currently, setting RTS threshold is based on per-phy basis, i.e., all the radios present in a wiphy will take RTS threshold value to be the one sent from userspace. But each radio in a multi-radio wiphy can have different RTS threshold requirements. To extend support to set RTS threshold for each radio, get the radio for which RTS threshold needs to be changed from the user. Use the attribute in NL - NL80211_ATTR_WIPHY_RADIO_INDEX, to identify the radio of interest. Create a new structure - wiphy_radio_cfg and add rts_threshold in it as a u32 value to store RTS threshold of each radio in a wiphy and allocate memory for it during wiphy register based on the wiphy.n_radio updated by drivers. Pass radio id received from the user to mac80211 drivers along with its corresponding RTS threshold. Signed-off-by: Roopni Devanathan <quic_rdevanat@quicinc.com> Link: https://patch.msgid.link/20250615082312.619639-3-quic_rdevanat@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:27 +02:00
Roopni Devanathan	b74947b4f6	wifi: cfg80211/mac80211: Add support to get radio index Currently, per-radio attributes are set on per-phy basis, i.e., all the radios present in a wiphy will take attributes values sent from user. But each radio in a wiphy can get different values from userspace based on its requirement. To extend support to set per-radio attributes, add support to get radio index from userspace. Add an NL attribute - NL80211_ATTR_WIPHY_RADIO_INDEX, to get user specified radio index for which attributes should be changed. Pass this to individual drivers, so that the drivers can use this radio index to change per-radio attributes when necessary. Currently, per-radio attributes identified are: NL80211_ATTR_WIPHY_TX_POWER_LEVEL NL80211_ATTR_WIPHY_ANTENNA_TX NL80211_ATTR_WIPHY_ANTENNA_RX NL80211_ATTR_WIPHY_RETRY_SHORT NL80211_ATTR_WIPHY_RETRY_LONG NL80211_ATTR_WIPHY_FRAG_THRESHOLD NL80211_ATTR_WIPHY_RTS_THRESHOLD NL80211_ATTR_WIPHY_COVERAGE_CLASS NL80211_ATTR_TXQ_LIMIT NL80211_ATTR_TXQ_MEMORY_LIMIT NL80211_ATTR_TXQ_QUANTUM By default, the radio index is set to -1. This means the attribute should be treated as a global configuration. If the user has not specified any index, then the radio index passed to individual drivers would be -1. This would indicate that the attribute applies to all radios in that wiphy. Signed-off-by: Roopni Devanathan <quic_rdevanat@quicinc.com> Link: https://patch.msgid.link/20250615082312.619639-2-quic_rdevanat@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:27 +02:00
Sarika Sharma	4cb1ce7e25	wifi: mac80211: add link_sta_statistics ops to fill link station statistics Currently, link station statistics for MLO are filled by mac80211. But there are some statistics that kept by mac80211 might not be accurate, so let the driver pre-fill the link statistics. The driver can fill the values (indicating which field is filled, by setting the filled bitmapin in link_station structure). Statistics that driver don't fill are filled by mac80211. Hence, add link_sta_statistics callback to fill link station statistics for MLO in sta_set_link_sinfo() by drivers. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-11-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:27 +02:00
Sarika Sharma	5e9129f574	wifi: mac80211: correct RX stats packet increment for multi-link Currently, RX stats packets are incremented for deflink member for non-ML and multi-link(ML) station case. However, for ML station, packets should be incremented based on the specific link. Therefore, if a valid link_id is present, fetch the corresponding link station information and increment the RX packets for that link. For non-MLO stations, the deflink will still be used. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-10-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:27 +02:00
Sarika Sharma	505991fba9	wifi: mac80211: extend support to fill link level sinfo structure Currently, sinfo structure is supported to fill information at deflink( or one of the links) level for station. This has problems when applied to fetch multi-link(ML) station information. Hence, if valid_links are present, support filling link_station structure for each link. This will be helpful to check the link related statistics during MLO. Additionally, TXQ stats for pertid are applicable at station level not at link level. Therefore check link_id is less then 0, before filling TXQ stats in pertid stats. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-9-quic_sarishar@quicinc.com [fix some indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Sarika Sharma	8af903e454	wifi: cfg80211: clear sinfo->filled for MLO station statistics Currently, sinfo->filled is for set in sta_set_sinfo() after filling the corresponding fields in station_info structure for station statistics. For non-ML stations, the fields are correctly filled from sta->deflink and corresponding sinfo->filled bit are set, but for MLO any one of link's data is filled and corresponding sinfo->filled bit is set. For MLO before embed NL message, fields of sinfo structure like bytes, packets, signal are updated with accumulated, best, least of all links data. But some of fields like rssi, pertid don't make much sense at MLO level. Hence, to prevent misinterpretation, clear sinfo->filled for fields which don't make much sense at MLO level. This will prevent filling misleading values in NL message. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-8-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Sarika Sharma	80b2fa4679	wifi: mac80211: add support to accumulate removed link statistics Currently, if a link gets removed in between for a station then directly accumulated data will fall down to sum of other active links. This will bring inconsistency in station dump statistics. For instance, let's take Tx packets - at t=0-> link-0:2 link-1:3 Tx packets => accumulated = 5 - at t=1-> link-0:4 link-1:6 Tx packets => accumulated = 10 let say at t=2, link-0 went down => link-0:0 link-1:7 => accumulated = 7 Here, suddenly accumulated Tx packets will come down to 7 from 10. This is showing inconsistency. Therefore, store link-0 data when it went down and add to accumulated Tx packet = 11. Hence, store the removed link statistics data in sta structure and add it in accumulated statistics for consistency. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-7-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Sarika Sharma	49e47223ec	wifi: cfg80211: allocate memory for link_station info structure Currently, station_info structure is passed to fill station statistics from mac80211/drivers. After NL message send to user space for requested station statistics, memory for station statistics is freed in cfg80211. Therefore, memory allocation/free for link station statistics should also happen in cfg80211 only. Hence, allocate the memory for link_station structure for all possible links and free in cfg80211_sinfo_release_content(). Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-6-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Sarika Sharma	2d226d41db	wifi: cfg80211: add statistics for providing overview for MLO station Currently statistics are handled at link level for multi-link operation(MLO). There is no provision to check accumulated statistics for a multi-link(ML) station. Other statistics, such as signal, rates, are also managed at the link level only. Statistics such as packets, bytes, signal, rates, etc are useful to provide overall overview for the ML stations. Statistics such as packets, bytes are accumulated statistics at MLO level. However, MLO statistics for rates and signal can not be accumulated since it won't make much sense. Hence, handle other statistics such as signal, rates, etc bit differently at MLO level. The signal could be the best of all links- e.g. if Link 1 has a signal strength of -70 dBm and Link 2 has -65 dBm, the signal for MLO will be -65 dBm. The rate could be determined based on the most recently updated link- e.g. if link 1 has a rate of 300 Mbps and link 2 has a rate of 450 Mbps, the MLO rate can be calculated based on the inactivity of each link. If the inactive time for link 1 is 20 seconds and for link 2 is 10 seconds, the MLO rate will be the most recently updated rate, which is link 2's rate of 450 Mbps. The inactive time, dtim_period and beacon_interval can be taken as the least value of field from link level. Similarly, other MLO level applicable fields are handled and the fields which don't make much sense at MLO level, a subsequent change will handle to embed NL message. Hence, add accumulated and other statistics for MLO station if valid links are present to represent comprehensive overview for the ML stations. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-5-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Sarika Sharma	82d7f841d9	wifi: cfg80211: extend to embed link level statistics in NL message Currently, statistics is supported at deflink( or one of the links) level for station. This has problems when applied to multi-link(ML) connections. Hence, add changes to support link level statistics to embed NL message with link related information if valid links are present. This will be helpful to check the link related statistics during MLO. The statistics will be embedded into NL message as below: For non-ML: cmd-> NL80211_ATTR_IFINDEX NL80211_ATTR_MAC NL80211_ATTR_GENERATION ....etc NL80211_ATTR_STA_INFO \| nested NL80211_STA_INFO_CONNECTED_TIME, NL80211_STA_INFO_STA_FLAGS, NL80211_STA_INFO_RX_BYTES, NL80211_STA_INFO_TX_BYTES, .........etc For MLO: cmd -> NL80211_ATTR_IFINDEX NL80211_ATTR_MAC NL80211_ATTR_GENERATION .......etc NL80211_ATTR_STA_INFO \| nested NL80211_STA_INFO_CONNECTED_TIME, NL80211_STA_INFO_STA_FLAGS, ........etc NL80211_ATTR_MLO_LINK_ID, NL80211_ATTR_MLD_ADDR, NL80211_ATTR_MLO_LINKS \| nested link_id-1 \| nested NL80211_ATTR_MLO_LINK_ID, NL80211_ATTR_MAC, NL80211_ATTR_STA_INFO \| nested NL80211_STA_INFO_RX_BYTES, NL80211_STA_INFO_TX_BYTES, NL80211_STA_INFO_CONNECTED_TIME, ..........etc. link_id-2 \| nested NL80211_ATTR_MLO_LINK_ID, NL80211_ATTR_MAC, NL80211_ATTR_STA_INFO \| nested NL80211_STA_INFO_RX_BYTES, NL80211_STA_INFO_TX_BYTES, NL80211_STA_INFO_CONNECTED_TIME, .........etc Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-4-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Sarika Sharma	e581b7fe62	wifi: mac80211: add support towards MLO handling of station statistics Currently, in supporting API's to fill sinfo structure from sta structure, is mapped to fill the fields from sta->deflink. However, for multi-link (ML) station, sinfo structure should be filled from corresponding link_id. Therefore, add link_id as an additional argument in supporting API's for filling sinfo structure correctly. Link_id is set to -1 for non-ML station and corresponding link_id for ML stations. In supporting API's for filling sinfo structure, check for link_id, if link_id < 0, fill the sinfo structure from sta->deflink, otherwise fill from sta->link[link_id]. Current, changes are done at the deflink level i.e, pass -1 as link_id. Actual link_id will be added in subsequent patches to support station statistics for MLO. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250528054420.3050133-2-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:19:26 +02:00
Johannes Berg	d87c3ca0f8	wifi: mac80211: finish link init before RCU publish Since the link/conf pointers can be accessed without any protection other than RCU, make sure the data is actually set up before publishing the structures. Fixes: `b2e8434f18` ("wifi: mac80211: set up/tear down client vif links properly") Link: https://patch.msgid.link/20250624130749.9a308b713c74.I4a80f5eead112a38730939ea591d2e275c721256@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 15:16:43 +02:00
Paolo Abeni	1fd26729e0	bluetooth pull request for net: - L2CAP: Fix L2CAP MTU negotiation - hci_core: Fix use-after-free in vhci_flush() - btintel_pcie: Fix potential race condition in firmware download - hci_qca: fix unable to load the BT driver -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmhZhnIZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKZX6EACeRMlkRoRe7oUKqfpXw+R+ RLdYvvco2LVSIwaMq664kHQWAbNcSj2okXcbJqPTXjHR1CZhaFGp6Rufgi1kz363 r19Ym4eGL2ZN9JlQwSDJqbWMrXxCo5tB4ymNof3yW+2ZPRurqDJDwCjcnhikfG14 v8ZOQRxDJOJdPSyqtW75qgFNv9Szekg272ZeLjAK/6XSqMpiPzyLsN4TJdij2Wf6 Ie02w368TQkmJMHly37QE24hDkyZ/UuR0lmTyB0bTCAiDwnTya8oWEUPRwa5I1+S 1FubHOBUGlx907bEJXZkBow98sCsChg/PNGqO0dsoJD/GJo4U6lUX1Lb/6qWNL+d T6PcDLMKRrDcY9ZgPAqSq7sYvzPGjaw+JWTN01okr5mVoVEsh8XDLhEQYPKJ8NzU TJy6FHXtZyGuqw21VD9+VbGrOFJMNYUVhUxQZidAaxQbqE7Vgl59Hj08Z6zGad26 hE8srBlGH7PcJN+DSXX9coYP11bSWVgXKLgmXgtPF3jVYQOch+Vxhcz3kIGSIZe1 W5qFfJhurHjdFEop6IRAXOJUeZWL8/YEcQJbFcS3Z2pfGAitY438Y18tvHibcvbc 7c/DMt7586qh1VbQSz0Gf8IeUuXMlAXlVaIYTKTtlURuo1w8+f+cz/PowtSKIjum J1N/Jf+1DzCVPO2tazdd1w== =vb9E -----END PGP SIGNATURE----- Merge tag 'for-net-2025-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - L2CAP: Fix L2CAP MTU negotiation - hci_core: Fix use-after-free in vhci_flush() - btintel_pcie: Fix potential race condition in firmware download - hci_qca: fix unable to load the BT driver * tag 'for-net-2025-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_core: Fix use-after-free in vhci_flush() driver: bluetooth: hci_qca:fix unable to load the BT driver Bluetooth: L2CAP: Fix L2CAP MTU negotiation Bluetooth: btintel_pcie: Fix potential race condition in firmware download ==================== Link: https://patch.msgid.link/20250623165405.227619-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-06-24 12:40:54 +02:00
Ramya Gnanasekar	140c6a61d8	wifi: mac80211: update radar_required in channel context after channel switch Currently, when a non-DFS channel is brought up and the bandwidth is expanded from 80 MHz to 160 MHz, where the primary 80 MHz is non-DFS and the secondary 80 MHz consists of DFS channels, radar detection fails if radar occurs in the secondary 80 MHz. When the channel is switched from 80 MHz to 160 MHz, with the primary 80 MHz being non-DFS and the secondary 80 MHz consisting of DFS channels, the radar required flag in the channel switch parameters is set to true. However, when using a reserved channel context, it is not updated in sdata, which disables radar detection in the secondary 80 MHz DFS channels. Update the radar required flag in sdata to fix this issue when using a reserved channel context. Signed-off-by: Ramya Gnanasekar <ramya.gnanasekar@oss.qualcomm.com> Signed-off-by: Ramasamy Kaliappan <ramasamy.kaliappan@oss.qualcomm.com> Link: https://patch.msgid.link/20250608140324.1687117-1-ramasamy.kaliappan@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 11:37:10 +02:00
Kuniyuki Iwashima	2a5a484184	af_unix: Don't set -ECONNRESET for consumed OOB skb. Christian Brauner reported that even after MSG_OOB data is consumed, calling close() on the receiver socket causes the peer's recv() to return -ECONNRESET: 1. send() and recv() an OOB data. >>> from socket import * >>> s1, s2 = socketpair(AF_UNIX, SOCK_STREAM) >>> s1.send(b'x', MSG_OOB) 1 >>> s2.recv(1, MSG_OOB) b'x' 2. close() for s2 sets ECONNRESET to s1->sk_err even though s2 consumed the OOB data >>> s2.close() >>> s1.recv(10, MSG_DONTWAIT) ... ConnectionResetError: [Errno 104] Connection reset by peer Even after being consumed, the skb holding the OOB 1-byte data stays in the recv queue to mark the OOB boundary and break recv() at that point. This must be considered while close()ing a socket. Let's skip the leading consumed OOB skb while checking the -ECONNRESET condition in unix_release_sock(). Fixes: `314001f0bf` ("af_unix: Add OOB support") Reported-by: Christian Brauner <brauner@kernel.org> Closes: https://lore.kernel.org/netdev/20250529-sinkt-abfeuern-e7b08200c6b0@brauner/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://patch.msgid.link/20250619041457.1132791-4-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-06-24 10:10:07 +02:00
Kuniyuki Iwashima	32ca245464	af_unix: Don't leave consecutive consumed OOB skbs. Jann Horn reported a use-after-free in unix_stream_read_generic(). The following sequences reproduce the issue: $ python3 from socket import * s1, s2 = socketpair(AF_UNIX, SOCK_STREAM) s1.send(b'x', MSG_OOB) s2.recv(1, MSG_OOB) # leave a consumed OOB skb s1.send(b'y', MSG_OOB) s2.recv(1, MSG_OOB) # leave a consumed OOB skb s1.send(b'z', MSG_OOB) s2.recv(1) # recv 'z' illegally s2.recv(1, MSG_OOB) # access 'z' skb (use-after-free) Even though a user reads OOB data, the skb holding the data stays on the recv queue to mark the OOB boundary and break the next recv(). After the last send() in the scenario above, the sk2's recv queue has 2 leading consumed OOB skbs and 1 real OOB skb. Then, the following happens during the next recv() without MSG_OOB 1. unix_stream_read_generic() peeks the first consumed OOB skb 2. manage_oob() returns the next consumed OOB skb 3. unix_stream_read_generic() fetches the next not-yet-consumed OOB skb 4. unix_stream_read_generic() reads and frees the OOB skb , and the last recv(MSG_OOB) triggers KASAN splat. The 3. above occurs because of the SO_PEEK_OFF code, which does not expect unix_skb_len(skb) to be 0, but this is true for such consumed OOB skbs. while (skip >= unix_skb_len(skb)) { skip -= unix_skb_len(skb); skb = skb_peek_next(skb, &sk->sk_receive_queue); ... } In addition to this use-after-free, there is another issue that ioctl(SIOCATMARK) does not function properly with consecutive consumed OOB skbs. So, nothing good comes out of such a situation. Instead of complicating manage_oob(), ioctl() handling, and the next ECONNRESET fix by introducing a loop for consecutive consumed OOB skbs, let's not leave such consecutive OOB unnecessarily. Now, while receiving an OOB skb in unix_stream_recv_urg(), if its previous skb is a consumed OOB skb, it is freed. [0]: BUG: KASAN: slab-use-after-free in unix_stream_read_actor (net/unix/af_unix.c:3027) Read of size 4 at addr ffff888106ef2904 by task python3/315 CPU: 2 UID: 0 PID: 315 Comm: python3 Not tainted 6.16.0-rc1-00407-gec315832f6f9 #8 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:122) print_report (mm/kasan/report.c:409 mm/kasan/report.c:521) kasan_report (mm/kasan/report.c:636) unix_stream_read_actor (net/unix/af_unix.c:3027) unix_stream_read_generic (net/unix/af_unix.c:2708 net/unix/af_unix.c:2847) unix_stream_recvmsg (net/unix/af_unix.c:3048) sock_recvmsg (net/socket.c:1063 (discriminator 20) net/socket.c:1085 (discriminator 20)) __sys_recvfrom (net/socket.c:2278) __x64_sys_recvfrom (net/socket.c:2291 (discriminator 1) net/socket.c:2287 (discriminator 1) net/socket.c:2287 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f8911fcea06 Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08 RSP: 002b:00007fffdb0dccb0 EFLAGS: 00000202 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: 00007fffdb0dcdc8 RCX: 00007f8911fcea06 RDX: 0000000000000001 RSI: 00007f8911a5e060 RDI: 0000000000000006 RBP: 00007fffdb0dccd0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000202 R12: 00007f89119a7d20 R13: ffffffffc4653600 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 315: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) __kasan_slab_alloc (mm/kasan/common.c:348) kmem_cache_alloc_node_noprof (./include/linux/kasan.h:250 mm/slub.c:4148 mm/slub.c:4197 mm/slub.c:4249) __alloc_skb (net/core/skbuff.c:660 (discriminator 4)) alloc_skb_with_frags (./include/linux/skbuff.h:1336 net/core/skbuff.c:6668) sock_alloc_send_pskb (net/core/sock.c:2993) unix_stream_sendmsg (./include/net/sock.h:1847 net/unix/af_unix.c:2256 net/unix/af_unix.c:2418) __sys_sendto (net/socket.c:712 (discriminator 20) net/socket.c:727 (discriminator 20) net/socket.c:2226 (discriminator 20)) __x64_sys_sendto (net/socket.c:2233 (discriminator 1) net/socket.c:2229 (discriminator 1) net/socket.c:2229 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Freed by task 315: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) kasan_save_free_info (mm/kasan/generic.c:579 (discriminator 1)) __kasan_slab_free (mm/kasan/common.c:271) kmem_cache_free (mm/slub.c:4643 (discriminator 3) mm/slub.c:4745 (discriminator 3)) unix_stream_read_generic (net/unix/af_unix.c:3010) unix_stream_recvmsg (net/unix/af_unix.c:3048) sock_recvmsg (net/socket.c:1063 (discriminator 20) net/socket.c:1085 (discriminator 20)) __sys_recvfrom (net/socket.c:2278) __x64_sys_recvfrom (net/socket.c:2291 (discriminator 1) net/socket.c:2287 (discriminator 1) net/socket.c:2287 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) The buggy address belongs to the object at ffff888106ef28c0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 68 bytes inside of freed 224-byte region [ffff888106ef28c0, ffff888106ef29a0) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888106ef3cc0 pfn:0x106ef2 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x200000000000040(head\|node=0\|zone=2) page_type: f5(slab) raw: 0200000000000040 ffff8881001d28c0 ffffea000422fe00 0000000000000004 raw: ffff888106ef3cc0 0000000080190010 00000000f5000000 0000000000000000 head: 0200000000000040 ffff8881001d28c0 ffffea000422fe00 0000000000000004 head: ffff888106ef3cc0 0000000080190010 00000000f5000000 0000000000000000 head: 0200000000000001 ffffea00041bbc81 00000000ffffffff 00000000ffffffff head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888106ef2800: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc ffff888106ef2880: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb >ffff888106ef2900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888106ef2980: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888106ef2a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: `314001f0bf` ("af_unix: Add OOB support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20250619041457.1132791-2-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-06-24 10:10:06 +02:00
Lachlan Hodges	7a3750ff0f	wifi: mac80211: fix beacon interval calculation overflow As we are converting from TU to usecs, a beacon interval of 100*1024 usecs will lead to integer wrapping. To fix change to use a u32. Fixes: `057d5f4ba1` ("mac80211: sync dtim_count to TSF") Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250621123209.511796-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-24 09:04:55 +02:00
Ido Schimmel	7544f3f5b0	bridge: mcast: Fix use-after-free during router port configuration The bridge maintains a global list of ports behind which a multicast router resides. The list is consulted during forwarding to ensure multicast packets are forwarded to these ports even if the ports are not member in the matching MDB entry. When per-VLAN multicast snooping is enabled, the per-port multicast context is disabled on each port and the port is removed from the global router port list: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 # ip link add name dummy1 up master br1 type dummy # ip link set dev dummy1 type bridge_slave mcast_router 2 $ bridge -d mdb show \| grep router router ports on br1: dummy1 # ip link set dev br1 type bridge mcast_vlan_snooping 1 $ bridge -d mdb show \| grep router However, the port can be re-added to the global list even when per-VLAN multicast snooping is enabled: # ip link set dev dummy1 type bridge_slave mcast_router 0 # ip link set dev dummy1 type bridge_slave mcast_router 2 $ bridge -d mdb show \| grep router router ports on br1: dummy1 Since commit `4b30ae9adb` ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions"), when per-VLAN multicast snooping is enabled, multicast disablement on a port will disable the per-{port, VLAN} multicast contexts and not the per-port one. As a result, a port will remain in the global router port list even after it is deleted. This will lead to a use-after-free [1] when the list is traversed (when adding a new port to the list, for example): # ip link del dev dummy1 # ip link add name dummy2 up master br1 type dummy # ip link set dev dummy2 type bridge_slave mcast_router 2 Similarly, stale entries can also be found in the per-VLAN router port list. When per-VLAN multicast snooping is disabled, the per-{port, VLAN} contexts are disabled on each port and the port is removed from the per-VLAN router port list: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 # ip link add name dummy1 up master br1 type dummy # bridge vlan add vid 2 dev dummy1 # bridge vlan global set vid 2 dev br1 mcast_snooping 1 # bridge vlan set vid 2 dev dummy1 mcast_router 2 $ bridge vlan global show dev br1 vid 2 \| grep router router ports: dummy1 # ip link set dev br1 type bridge mcast_vlan_snooping 0 $ bridge vlan global show dev br1 vid 2 \| grep router However, the port can be re-added to the per-VLAN list even when per-VLAN multicast snooping is disabled: # bridge vlan set vid 2 dev dummy1 mcast_router 0 # bridge vlan set vid 2 dev dummy1 mcast_router 2 $ bridge vlan global show dev br1 vid 2 \| grep router router ports: dummy1 When the VLAN is deleted from the port, the per-{port, VLAN} multicast context will not be disabled since multicast snooping is not enabled on the VLAN. As a result, the port will remain in the per-VLAN router port list even after it is no longer member in the VLAN. This will lead to a use-after-free [2] when the list is traversed (when adding a new port to the list, for example): # ip link add name dummy2 up master br1 type dummy # bridge vlan add vid 2 dev dummy2 # bridge vlan del vid 2 dev dummy1 # bridge vlan set vid 2 dev dummy2 mcast_router 2 Fix these issues by removing the port from the relevant (global or per-VLAN) router port list in br_multicast_port_ctx_deinit(). The function is invoked during port deletion with the per-port multicast context and during VLAN deletion with the per-{port, VLAN} multicast context. Note that deleting the multicast router timer is not enough as it only takes care of the temporary multicast router states (1 or 3) and not the permanent one (2). [1] BUG: KASAN: slab-out-of-bounds in br_multicast_add_router.part.0+0x3f1/0x560 Write of size 8 at addr ffff888004a67328 by task ip/384 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 br_multicast_add_router.part.0+0x3f1/0x560 br_multicast_set_port_router+0x74e/0xac0 br_setport+0xa55/0x1870 br_port_slave_changelink+0x95/0x120 __rtnl_newlink+0x5e8/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0x360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] BUG: KASAN: slab-use-after-free in br_multicast_add_router.part.0+0x378/0x560 Read of size 8 at addr ffff888009f00840 by task bridge/391 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 br_multicast_add_router.part.0+0x378/0x560 br_multicast_set_port_router+0x6f9/0xac0 br_vlan_process_options+0x8b6/0x1430 br_vlan_rtm_process_one+0x605/0xa30 br_vlan_rtm_process+0x396/0x4c0 rtnetlink_rcv_msg+0x2f7/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0x360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `2796d846d7` ("net: bridge: vlan: convert mcast router global option to per-vlan entry") Fixes: `4b30ae9adb` ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions") Reported-by: syzbot+7bfa4b72c6a5da128d32@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/684c18bd.a00a0220.279073.000b.GAE@google.com/T/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250619182228.1656906-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 18:19:10 -07:00
Eric Dumazet	935b67675a	net: make sk->sk_rcvtimeo lockless Followup of commit `285975dd67` ("net: annotate data-races around sk->sk_{rcv\|snd}timeo"). Remove lock_sock()/release_sock() from ksmbd_tcp_rcv_timeout() and add READ_ONCE()/WRITE_ONCE() where it is needed. Also SO_RCVTIMEO_OLD and SO_RCVTIMEO_NEW can call sock_set_timeout() without holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250620155536.335520-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 17:05:12 -07:00
Eric Dumazet	3169e36ae1	net: make sk->sk_sndtimeo lockless Followup of commit `285975dd67` ("net: annotate data-races around sk->sk_{rcv\|snd}timeo"). Remove lock_sock()/release_sock() from sock_set_sndtimeo(), and add READ_ONCE()/WRITE_ONCE() where it is needed. Also SO_SNDTIMEO_OLD and SO_SNDTIMEO_NEW can call sock_set_timeout() without holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250620155536.335520-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 17:05:11 -07:00
Eric Dumazet	c51da3f7a1	net: remove sock_i_uid() Difference between sock_i_uid() and sk_uid() is that after sock_orphan(), sock_i_uid() returns GLOBAL_ROOT_UID while sk_uid() returns the last cached sk->sk_uid value. None of sock_i_uid() callers care about this. Use sk_uid() which is much faster and inlined. Note that diag/dump users are calling sock_i_ino() and can not see the full benefit yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://patch.msgid.link/20250620133001.4090592-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 17:04:03 -07:00
Eric Dumazet	e84a4927a4	net: annotate races around sk->sk_uid sk->sk_uid can be read while another thread changes its value in sockfs_setattr(). Add sk_uid(const struct sock *sk) helper to factorize the needed READ_ONCE() annotations, and add corresponding WRITE_ONCE() where needed. Fixes: `86741ec254` ("net: core: Add a UID field to struct sock.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://patch.msgid.link/20250620133001.4090592-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 17:04:03 -07:00
Arnd Bergmann	b630c781bc	caif: reduce stack size, again I tried to fix the stack usage in this function a couple of years ago, but there is still a problem with the latest gcc versions in some configurations: net/caif/cfctrl.c:553:1: error: the frame size of 1296 bytes is larger than 1280 bytes [-Werror=frame-larger-than=] Reduce this once again, with a separate cfctrl_link_setup() function that holds the bulk of all the local variables. It also turns out that the param[] array that takes up a large portion of the stack is write-only and can be left out here. Fixes: `ce6289661b` ("caif: reduce stack size with KASAN") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20250620112244.3425554-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 16:58:43 -07:00
Pranav Tyagi	b04202d606	net/sched: replace strncpy with strscpy Replace the deprecated strncpy() with the two-argument version of strscpy() as the destination is an array and buffer should be NUL-terminated. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250620103653.6957-1-pranav.tyagi03@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 16:56:51 -07:00
Pranav Tyagi	ae2402bf88	net/smc: replace strncpy with strscpy Replace the deprecated strncpy() with two-argument version of strscpy() as the destination is an array and should be NUL-terminated. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250620102559.6365-1-pranav.tyagi03@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 16:56:21 -07:00
Breno Leitao	f599020702	net: netpoll: Initialize UDP checksum field before checksumming commit `f1fce08e63` ("netpoll: Eliminate redundant assignment") removed the initialization of the UDP checksum, which was wrong and broke netpoll IPv6 transmission due to bad checksumming. udph->check needs to be set before calling csum_ipv6_magic(). Fixes: `f1fce08e63` ("netpoll: Eliminate redundant assignment") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250620-netpoll_fix-v1-1-f9f0b82bc059@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 13:14:51 -07:00
Kory Maincent	96c16c59b7	ethtool: pse-pd: Add missing linux/export.h include Fix missing linux/export.h header include in net/ethtool/pse-pd.c to resolve build warning reported by the kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506200024.T3O0FWeR-lkp@intel.com/ Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250619162547.1989468-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-23 13:13:01 -07:00
Nikhil Jha	9c19b3315c	sunrpc: fix loop in gss seqno cache There was a silly bug in the initial implementation where a loop variable was not incremented. This commit increments the loop variable. This bug is somewhat tricky to catch because it can only happen on loops of two or more. If it is hit, it locks up a kernel thread in an infinite loop. Signed-off-by: Nikhil Jha <njha@janestreet.com> Tested-by: Nikhil Jha <njha@janestreet.com> Fixes: `08d6ee6d8a` ("sunrpc: implement rfc2203 rpcsec_gss seqnum cache") Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-06-23 11:01:15 -04:00
Jens Axboe	5be5726e1a	Merge branch 'timestamp-for-jens' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.17/io_uring Pull networking side timestamp prep patch from Jakub. * 'timestamp-for-jens' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: net: timestamp: add helper returning skb's tx tstamp	2025-06-23 08:59:38 -06:00
Kuniyuki Iwashima	1d6123102e	Bluetooth: hci_core: Fix use-after-free in vhci_flush() syzbot reported use-after-free in vhci_flush() without repro. [0] From the splat, a thread close()d a vhci file descriptor while its device was being used by iotcl() on another thread. Once the last fd refcnt is released, vhci_release() calls hci_unregister_dev(), hci_free_dev(), and kfree() for struct vhci_data, which is set to hci_dev->dev->driver_data. The problem is that there is no synchronisation after unlinking hdev from hci_dev_list in hci_unregister_dev(). There might be another thread still accessing the hdev which was fetched before the unlink operation. We can use SRCU for such synchronisation. Let's run hci_dev_reset() under SRCU and wait for its completion in hci_unregister_dev(). Another option would be to restore hci_dev->destruct(), which was removed in commit `587ae086f6` ("Bluetooth: Remove unused hci-destruct cb"). However, this would not be a good solution, as we should not run hci_unregister_dev() while there are in-flight ioctl() requests, which could lead to another data-race KCSAN splat. Note that other drivers seem to have the same problem, for exmaple, virtbt_remove(). [0]: BUG: KASAN: slab-use-after-free in skb_queue_empty_lockless include/linux/skbuff.h:1891 [inline] BUG: KASAN: slab-use-after-free in skb_queue_purge_reason+0x99/0x360 net/core/skbuff.c:3937 Read of size 8 at addr ffff88807cb8d858 by task syz.1.219/6718 CPU: 1 UID: 0 PID: 6718 Comm: syz.1.219 Not tainted 6.16.0-rc1-syzkaller-00196-g08207f42d3ff #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xd2/0x2b0 mm/kasan/report.c:521 kasan_report+0x118/0x150 mm/kasan/report.c:634 skb_queue_empty_lockless include/linux/skbuff.h:1891 [inline] skb_queue_purge_reason+0x99/0x360 net/core/skbuff.c:3937 skb_queue_purge include/linux/skbuff.h:3368 [inline] vhci_flush+0x44/0x50 drivers/bluetooth/hci_vhci.c:69 hci_dev_do_reset net/bluetooth/hci_core.c:552 [inline] hci_dev_reset+0x420/0x5c0 net/bluetooth/hci_core.c:592 sock_do_ioctl+0xd9/0x300 net/socket.c:1190 sock_ioctl+0x576/0x790 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fcf5b98e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fcf5c7b9038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fcf5bbb6160 RCX: 00007fcf5b98e929 RDX: 0000000000000000 RSI: 00000000400448cb RDI: 0000000000000009 RBP: 00007fcf5ba10b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fcf5bbb6160 R15: 00007ffd6353d528 </TASK> Allocated by task 6535: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __kmalloc_cache_noprof+0x230/0x3d0 mm/slub.c:4359 kmalloc_noprof include/linux/slab.h:905 [inline] kzalloc_noprof include/linux/slab.h:1039 [inline] vhci_open+0x57/0x360 drivers/bluetooth/hci_vhci.c:635 misc_open+0x2bc/0x330 drivers/char/misc.c:161 chrdev_open+0x4c9/0x5e0 fs/char_dev.c:414 do_dentry_open+0xdf0/0x1970 fs/open.c:964 vfs_open+0x3b/0x340 fs/open.c:1094 do_open fs/namei.c:3887 [inline] path_openat+0x2ee5/0x3830 fs/namei.c:4046 do_filp_open+0x1fa/0x410 fs/namei.c:4073 do_sys_openat2+0x121/0x1c0 fs/open.c:1437 do_sys_open fs/open.c:1452 [inline] __do_sys_openat fs/open.c:1468 [inline] __se_sys_openat fs/open.c:1463 [inline] __x64_sys_openat+0x138/0x170 fs/open.c:1463 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 6535: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:576 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x62/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2381 [inline] slab_free mm/slub.c:4643 [inline] kfree+0x18e/0x440 mm/slub.c:4842 vhci_release+0xbc/0xd0 drivers/bluetooth/hci_vhci.c:671 __fput+0x44c/0xa70 fs/file_table.c:465 task_work_run+0x1d1/0x260 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x6ad/0x22e0 kernel/exit.c:955 do_group_exit+0x21c/0x2d0 kernel/exit.c:1104 __do_sys_exit_group kernel/exit.c:1115 [inline] __se_sys_exit_group kernel/exit.c:1113 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1113 x64_sys_call+0x21ba/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at ffff88807cb8d800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 88 bytes inside of freed 1024-byte region [ffff88807cb8d800, ffff88807cb8dc00) Fixes: `bf18c7118c` ("Bluetooth: vhci: Free driver_data on file release") Reported-by: syzbot+2faa4825e556199361f9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f62d64848fc4c7c30cd6 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-23 10:59:29 -04:00
Greg Kroah-Hartman	63dafeb392	Merge 6.16-rc3 into driver-core-next We need the driver-core fixes that are in 6.16-rc3 into here as well to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-23 07:53:36 +02:00
Eric Dumazet	b993ea46b3	atm: clip: prevent NULL deref in clip_push() Blamed commit missed that vcc_destroy_socket() calls clip_push() with a NULL skb. If clip_devs is NULL, clip_push() then crashes when reading skb->truesize. Fixes: `93a2014afb` ("atm: fix a UAF in lec_arp_clear_vccs()") Reported-by: syzbot+1316233c4c6803382a8b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68556f59.a00a0220.137b3.004e.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Gengming Liu <l.dmxcsnsbh@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-22 19:31:14 +01:00
Linus Torvalds	739a6c93cc	nfsd-6.16 fixes: - Two fixes for commits in the nfsd-6.16 merge - One fix for the recently-added NFSD netlink facility - One fix for a remote SunRPC crasher -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmhWykQACgkQM2qzM29m f5cfIA/+MtqpF8iHCNWBk2mmgQHtfS78qhQexEsQ69F1k90+bzCJ9Mu7AZT67xRZ 2E6Fm3gyJuFPaJ9YOGewyD6OV/eE0SioPQteZIIwUVclKcoyVtqJpc57A/aQfMJm Te1bPWYrc5jfxoF/QzkOjzd4KWXOej2nIqsybFIDK5uTe7gn9QpPiFL8ePRuf7bJ 5oWBTZPpmuWpDcebQrz56OT0bvvrCNH9EV9TZNYwVa/B7yl3UNHwM2YB5O9m2Rl7 GUeWytc2Y7WgdDJe9poPwNrYyXFb8JX4pCbS54q5wCRLYLDgo8PsgsFCAFfgjx0l ArATKDhzfYK0azASDtJma2eCJkrm4/nXRNFON7xucN79pYRh9GChcmhhfubb8Yo3 HopUaCSLOgy7PNsbbz3lNREn5uBpomeK1Mb88/UkJ/d8fmKGIO1ITTlarazYv2w5 GgmzZSBS43VdUmipqNyddIMg2uy7uaeAQZpr4jHVFVctPypjXCn/zIrhG2mmjRhW M7PCJqtFj0a7DM48LA6r9K8N4Yi7EoSQKNAKfwduE++XXYWgBk9blaF/7om/fIRJ VJm/RfiuFuyGaxCLd7WpPpASeS/nq410BMV5/kxol+UgiCxk8xjsvU7y8OWLOl7S rUend5Yei7/h0NGoSmQPnBoYnb1YuTeMMzjfaRWSsV5gbvVDejQ= =1C8u -----END PGP SIGNATURE----- Merge tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Two fixes for commits in the nfsd-6.16 merge - One fix for the recently-added NFSD netlink facility - One fix for a remote SunRPC crasher * tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: sunrpc: handle SVC_GARBAGE during svc auth processing as auth error nfsd: use threads array as-is in netlink interface SUNRPC: Cleanup/fix initial rq_pages allocation NFSD: Avoid corruption of a referring call list	2025-06-21 09:20:15 -07:00
Wang Liang	091d019adc	net/smc: remove unused function smc_lo_supports_v2 The smcd_ops->supports_v2 is only called in smcd_register_dev(), which calls function smcd_supports_v2 for ism. For loopback-ism, function smc_lo_supports_v2 is unused, remove it. Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250619030854.1536676-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-21 08:26:42 -07:00
Jakub Kicinski	72792461c8	net: ethtool: don't mux RXFH via rxnfc callbacks All drivers have been converted. Stop using the rxnfc fallbacks for Rx Flow Hashing configuration. Joe pointed out in earlier review that in ethtool_set_rxfh() we need both .get_rxnfc and .get_rxfh_fields, because we need both the ring count and flow hashing (because we call ethtool_check_flow_types()). IOW the existing check added for transitioning drivers was buggy. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618203823.1336156-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-21 07:55:00 -07:00
Simon Horman	433dce0692	rds: Correct spelling Correct spelling as flagged by codespell. With these changes in place codespell only flags false positives in net/rds. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250619-rds-minor-v1-2-86d2ee3a98b9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-21 07:35:39 -07:00
Simon Horman	6e307a873d	rds: Correct endian annotation of port and addr assignments Correct the endianness annotation of port assignments: A host byte order value (RDS_TCP_PORT) is correctly converted to network byte order (big endian) using htons. But it is then cast back to host byte order before assigning to a variable that expects a big endian value. Address this by dropping the cast. This is not a bug because, while the endian annotation is changed by this patch, the assigned value is unchanged. Also correct the endianness of address assignment. A host byte order value (INADDR_ANY) is incorrectly assigned as-is to a variable that expects a big endian value. Address this by converting the value to network byte order (big endian). This is not a bug because INADDR_ANY is 0, which is isomorphic with regards to endian conversions. IOW, while the endian annotation is changed by this patch, the assigned value is unchanged. Incorrect endian annotations appear to date back to IPv4-only code added by commit `70041088e3` ("RDS: Add TCP transport to RDS"). Flagged by Sparse. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250619-rds-minor-v1-1-86d2ee3a98b9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-21 07:35:38 -07:00
Frédéric Danis	042bb9603c	Bluetooth: L2CAP: Fix L2CAP MTU negotiation OBEX download from iPhone is currently slow due to small packet size used to transfer data which doesn't follow the MTU negotiated during L2CAP connection, i.e. 672 bytes instead of 32767: < ACL Data TX: Handle 11 flags 0x00 dlen 12 L2CAP: Connection Request (0x02) ident 18 len 4 PSM: 4103 (0x1007) Source CID: 72 > ACL Data RX: Handle 11 flags 0x02 dlen 16 L2CAP: Connection Response (0x03) ident 18 len 8 Destination CID: 14608 Source CID: 72 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 11 flags 0x00 dlen 27 L2CAP: Configure Request (0x04) ident 20 len 19 Destination CID: 14608 Flags: 0x0000 Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 32767 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 63 Max transmit: 3 Retransmission timeout: 2000 Monitor timeout: 12000 Maximum PDU size: 1009 > ACL Data RX: Handle 11 flags 0x02 dlen 26 L2CAP: Configure Request (0x04) ident 72 len 18 Destination CID: 72 Flags: 0x0000 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 32 Max transmit: 255 Retransmission timeout: 0 Monitor timeout: 0 Maximum PDU size: 65527 Option: Frame Check Sequence (0x05) [mandatory] FCS: 16-bit FCS (0x01) < ACL Data TX: Handle 11 flags 0x00 dlen 29 L2CAP: Configure Response (0x05) ident 72 len 21 Source CID: 14608 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 32 Max transmit: 255 Retransmission timeout: 2000 Monitor timeout: 12000 Maximum PDU size: 1009 > ACL Data RX: Handle 11 flags 0x02 dlen 32 L2CAP: Configure Response (0x05) ident 20 len 24 Source CID: 72 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 32767 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 63 Max transmit: 3 Retransmission timeout: 2000 Monitor timeout: 12000 Maximum PDU size: 1009 Option: Frame Check Sequence (0x05) [mandatory] FCS: 16-bit FCS (0x01) ... > ACL Data RX: Handle 11 flags 0x02 dlen 680 Channel: 72 len 676 ctrl 0x0202 [PSM 4103 mode Enhanced Retransmission (0x03)] {chan 8} I-frame: Unsegmented TxSeq 1 ReqSeq 2 < ACL Data TX: Handle 11 flags 0x00 dlen 13 Channel: 14608 len 9 ctrl 0x0204 [PSM 4103 mode Enhanced Retransmission (0x03)] {chan 8} I-frame: Unsegmented TxSeq 2 ReqSeq 2 > ACL Data RX: Handle 11 flags 0x02 dlen 680 Channel: 72 len 676 ctrl 0x0304 [PSM 4103 mode Enhanced Retransmission (0x03)] {chan 8} I-frame: Unsegmented TxSeq 2 ReqSeq 3 The MTUs are negotiated for each direction. In this traces 32767 for iPhone->localhost and no MTU for localhost->iPhone, which based on '4.4 L2CAP_CONFIGURATION_REQ' (Core specification v5.4, Vol. 3, Part A): The only parameters that should be included in the L2CAP_CONFIGURATION_REQ packet are those that require different values than the default or previously agreed values. ... Any missing configuration parameters are assumed to have their most recently explicitly or implicitly accepted values. and '5.1 Maximum transmission unit (MTU)': If the remote device sends a positive L2CAP_CONFIGURATION_RSP packet it should include the actual MTU to be used on this channel for traffic flowing into the local device. ... The default value is 672 octets. is set by BlueZ to 672 bytes. It seems that the iPhone used the lowest negotiated value to transfer data to the localhost instead of the negotiated one for the incoming direction. This could be fixed by using the MTU negotiated for the other direction, if exists, in the L2CAP_CONFIGURATION_RSP. This allows to use segmented packets as in the following traces: < ACL Data TX: Handle 11 flags 0x00 dlen 12 L2CAP: Connection Request (0x02) ident 22 len 4 PSM: 4103 (0x1007) Source CID: 72 < ACL Data TX: Handle 11 flags 0x00 dlen 27 L2CAP: Configure Request (0x04) ident 24 len 19 Destination CID: 2832 Flags: 0x0000 Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 32767 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 63 Max transmit: 3 Retransmission timeout: 2000 Monitor timeout: 12000 Maximum PDU size: 1009 > ACL Data RX: Handle 11 flags 0x02 dlen 26 L2CAP: Configure Request (0x04) ident 15 len 18 Destination CID: 72 Flags: 0x0000 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 32 Max transmit: 255 Retransmission timeout: 0 Monitor timeout: 0 Maximum PDU size: 65527 Option: Frame Check Sequence (0x05) [mandatory] FCS: 16-bit FCS (0x01) < ACL Data TX: Handle 11 flags 0x00 dlen 29 L2CAP: Configure Response (0x05) ident 15 len 21 Source CID: 2832 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 32767 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 32 Max transmit: 255 Retransmission timeout: 2000 Monitor timeout: 12000 Maximum PDU size: 1009 > ACL Data RX: Handle 11 flags 0x02 dlen 32 L2CAP: Configure Response (0x05) ident 24 len 24 Source CID: 72 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 32767 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Enhanced Retransmission (0x03) TX window size: 63 Max transmit: 3 Retransmission timeout: 2000 Monitor timeout: 12000 Maximum PDU size: 1009 Option: Frame Check Sequence (0x05) [mandatory] FCS: 16-bit FCS (0x01) ... > ACL Data RX: Handle 11 flags 0x02 dlen 1009 Channel: 72 len 1005 ctrl 0x4202 [PSM 4103 mode Enhanced Retransmission (0x03)] {chan 8} I-frame: Start (len 21884) TxSeq 1 ReqSeq 2 > ACL Data RX: Handle 11 flags 0x02 dlen 1009 Channel: 72 len 1005 ctrl 0xc204 [PSM 4103 mode Enhanced Retransmission (0x03)] {chan 8} I-frame: Continuation TxSeq 2 ReqSeq 2 This has been tested with kernel 5.4 and BlueZ 5.77. Cc: stable@vger.kernel.org Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2025-06-20 11:54:48 -04:00
Kavita Kavita	7c598c653a	wifi: cfg80211: Add support for link reconfiguration negotiation offload to driver In the case of SME-in-driver, the driver can internally choose to update the links based on the AP MLD recommendation and do link reconfiguration negotiation with AP MLD. (e.g., After the driver processing the BSS Transition Management request frame received from the AP MLD with Neighbor Report containing Multi-Link element with recommended links information chooses to do link reconfiguration negotiation with AP MLD). To support this, extend cfg80211_mlo_reconf_add_done() and NL80211_CMD_ASSOC_MLO_RECONF to indicate added links information for driver-initiated link reconfiguration requests. For removed links, the driver indicates links information using the NL80211_CMD_LINKS_REMOVED event for driver-initiated cases, the same as supplicant initiated cases. For the driver-initiated case, cfg80211 will receive link reconfiguration result asynchronously from driver so holding BSSes of the accepted add links is needed in the event path. Also, no need of unhold call for the rejected add link BSSes since there was no hold call happened previously. Once the supplicant receives the NL80211_CMD_ASSOC_MLO_RECONF event, it needs to process the information about newly added links and install per-link group keys (e.g., GTK/IGTK/BIGTK etc.). In case of the SME-in-driver, using a vendor interface etc. to notify the supplicant to initiate a link reconfiguration request and then supplicant sending command to the cfg80211 can lead to race conditions. The correct design to avoid this is that the driver indicates the cfg80211 directly with the results of the link reconfiguration negotiation. Signed-off-by: Kavita Kavita <quic_kkavita@quicinc.com> Link: https://patch.msgid.link/20250604105757.2542-3-quic_kkavita@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-20 10:46:31 +02:00
Rameshkumar Sundaram	2eb7c1baf4	wifi: mac80211: Fix bssid_indicator for MBSSID in AP mode Currently, in ieee80211_assign_beacon() mbssid count is updated as link's bssid_indicator. mbssid count is the total number of MBSSID elements in the beacon instead of Max BSSID indicator of the Multiple BSS set. This will result in drivers obtaining an invalid bssid_indicator for BSSes in a Multiple BSS set. Fix this by updating link's bssid_indicator from MBSSID element for Transmitting BSS and update the same for all of its Non-Transmitting BSSes. Fixes: `dde78aa520` ("mac80211: update bssid_indicator in ieee80211_assign_beacon") Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com> Link: https://patch.msgid.link/20250530040940.3188537-1-rameshkumar.sundaram@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-20 10:45:19 +02:00
Raj Kumar Bhagat	c9172fae4b	wifi: mac80211: Allow scan on a radio while operating on DFS on another radio Currently, in multi-radio wiphy cases, if one radio is operating on a DFS channel, -EBUSY is returned even when a scan is requested on a different radio. Because of this, an MLD AP with one radio (link) on a DFS channel and Automatic Channel Selection (ACS) on another radio (link) cannot be brought up. In multi-radio wiphy cases, multiple radios are grouped under a single wiphy. Hence, if a radio is operating on a DFS channel and a scan is requested on a different radio of the same wiphy, the scan can be allowed simultaneously without impacting the DFS operations. Add logic to check the underlying radio used for the requested scan. If the radio on which DFS is already running is not being used, allow the scan operation; otherwise, return -EBUSY. Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Link: https://patch.msgid.link/20250527-mlo-dfs-acs-v2-3-92c2f37c81d9@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-20 10:44:05 +02:00
Aditya Kumar Singh	fe8582dbb4	wifi: mac80211: Allow DFS/CSA on a radio if scan is ongoing on another radio Currently, in multi-radio wiphy cases, if a scan is ongoing on one radio, -EBUSY is returned when DFS or a channel switch is initiated on another radio. Because of this, an MLD AP with one radio (link) in an ongoing scan cannot initiate DFS or a channel switch on another radio (link). In multi-radio wiphy cases, multiple radios are grouped under a single wiphy. Hence, if a scan is ongoing on one underlying radio and DFS or a channel switch is requested on a different underlying radio of the same wiphy, these operations can be allowed simultaneously. Add logic to check the underlying radio used for the ongoing scan. If the radio on which DFS or a channel switch is requested is not being used for the scan, allow the operation; otherwise, return -EBUSY. Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> Co-developed-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Link: https://patch.msgid.link/20250527-mlo-dfs-acs-v2-2-92c2f37c81d9@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-20 10:44:05 +02:00
Vasanthakumar Thiagarajan	df42bfc96e	wifi: cfg80211: Add utility API to get radio index from channel Add utility API cfg80211_get_radio_idx_by_chan() to retrieve the radio index corresponding to a given channel in a multi-radio wiphy. This utility function can be used when we want to check the radio-specific data for a channel in a multi-radio wiphy. For example, it can help determine the radio index required to handle a scan request. This index can then be used to decide whether the scan can proceed without interfering with ongoing DFS operations on another radio. Signed-off-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Co-developed-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com> Link: https://patch.msgid.link/20250527-mlo-dfs-acs-v2-1-92c2f37c81d9@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-20 10:44:05 +02:00
Jianbo Liu	b05d42eefa	xfrm: hold device only for the asynchronous decryption The dev_hold() on skb->dev during packet reception was originally added to prevent the device from being released prematurely during asynchronous decryption operations. As current hardware can offload decryption, this asynchronous path is not always utilized. This often results in a pattern of dev_hold() immediately followed by dev_put() for each packet, creating unnecessary reference counting overhead detrimental to performance. This patch optimizes this by skipping the dev_hold() and subsequent dev_put() when asynchronous decryption is not being performed. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-06-20 10:39:19 +02:00
Jakub Kicinski	77f08133bc	Merge branch 'ref_tracker-add-ability-to-register-a-debugfs-file-for-a-ref_tracker_dir' Jeff Layton says: ==================== ref_tracker: add ability to register a debugfs file for a ref_tracker_dir For those just joining in, this series adds a new top-level "ref_tracker" debugfs directory, and has each ref_tracker_dir register a file in there as part of its initialization. It also adds the ability to register a symlink with a more human-usable name that points to the file, and does some general cleanup of how the ref_tracker object names are handled. v14: https://lore.kernel.org/20250610-reftrack-dbgfs-v14-0-efb532861428@kernel.org v13: https://lore.kernel.org/20250603-reftrack-dbgfs-v13-0-7b2a425019d8@kernel.org v12: https://lore.kernel.org/20250529-reftrack-dbgfs-v12-0-11b93c0c0b6e@kernel.org v11: https://lore.kernel.org/20250528-reftrack-dbgfs-v11-0-94ae0b165841@kernel.org v10: https://lore.kernel.org/20250527-reftrack-dbgfs-v10-0-dc55f7705691@kernel.org v9: https://lore.kernel.org/20250509-reftrack-dbgfs-v9-0-8ab888a4524d@kernel.org v8: https://lore.kernel.org/20250507-reftrack-dbgfs-v8-0-607717d3bb98@kernel.org v7: https://lore.kernel.org/20250505-reftrack-dbgfs-v7-0-f78c5d97bcca@kernel.org v6: https://lore.kernel.org/20250430-reftrack-dbgfs-v6-0-867c29aff03a@kernel.org v5: https://lore.kernel.org/20250428-reftrack-dbgfs-v5-0-1cbbdf2038bd@kernel.org v4: https://lore.kernel.org/20250418-reftrack-dbgfs-v4-0-5ca5c7899544@kernel.org v3: https://lore.kernel.org/20250417-reftrack-dbgfs-v3-0-c3159428c8fb@kernel.org v2: https://lore.kernel.org/20250415-reftrack-dbgfs-v2-0-b18c4abd122f@kernel.org v1: https://lore.kernel.org/20250414-reftrack-dbgfs-v1-0-f03585832203@kernel.org ==================== Link: https://patch.msgid.link/20250618-reftrack-dbgfs-v15-0-24fc37ead144@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 17:02:07 -07:00
Jeff Layton	707bd05be7	ref_tracker: eliminate the ref_tracker_dir name field Now that we have dentries and the ability to create meaningful symlinks to them, don't keep a name string in each tracker. Switch the output format to print "class@address", and drop the name field. Also, add a kerneldoc header for ref_tracker_dir_init(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250618-reftrack-dbgfs-v15-9-24fc37ead144@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 17:02:04 -07:00
Jeff Layton	8f2079f8da	net: add symlinks to ref_tracker_dir for netns After assigning the inode number to the namespace, use it to create a unique name for each netns refcount tracker with the ns.inum and net_cookie values in it, and register a symlink to the debugfs file for it. init_net is registered before the ref_tracker dir is created, so add a late_initcall() to register its files and symlinks. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250618-reftrack-dbgfs-v15-8-24fc37ead144@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 17:02:04 -07:00
Jeff Layton	aa7d26c3c3	ref_tracker: add a static classname string to each ref_tracker_dir A later patch in the series will be adding debugfs files for each ref_tracker that get created in ref_tracker_dir_init(). The format will be "class@%px". The current "name" string can vary between ref_tracker_dir objects of the same type, so it's not suitable for this purpose. Add a new "class" string to the ref_tracker dir that describes the the type of object (sans any individual info for that object). Also, in the i915 driver, gate the creation of debugfs files on whether the dentry pointer is still set to NULL. CI has shown that the ref_tracker_dir can be initialized more than once. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250618-reftrack-dbgfs-v15-4-24fc37ead144@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 17:02:04 -07:00
Breno Leitao	6ad7969a36	netpoll: Extract IPv6 address retrieval function Extract the IPv6 address retrieval logic from netpoll_setup() into a dedicated helper function netpoll_take_ipv6() to improve code organization and readability. The function handles obtaining the local IPv6 address from the network device, including proper address type matching between local and remote addresses (link-local vs global), and includes appropriate error handling when IPv6 is not supported or no suitable address is available. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618-netpoll_ip_ref-v1-3-c2ac00fe558f@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 16:15:35 -07:00
Breno Leitao	3699f992e8	netpoll: extract IPv4 address retrieval into helper function Move the IPv4 address retrieval logic from netpoll_setup() into a separate netpoll_take_ipv4() function to improve code organization and readability. This change consolidates the IPv4-specific logic and error handling into a dedicated function while maintaining the same functionality. No functional changes. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618-netpoll_ip_ref-v1-2-c2ac00fe558f@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 16:15:35 -07:00
Breno Leitao	76d30b51e8	netpoll: Extract carrier wait function Extract the carrier waiting logic into a dedicated helper function netpoll_wait_carrier() to improve code readability and reduce duplication in netpoll_setup(). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618-netpoll_ip_ref-v1-1-c2ac00fe558f@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 16:15:35 -07:00
Wang Liang	c3ee72ded0	net/smc: remove unused input parameters in smc_buf_get_slot The input parameter "compressed_bufsize" of smc_buf_get_slot is unused, remove it. Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618103342.1423913-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 16:15:34 -07:00
Eric Dumazet	f64bd2045d	tcp: tcp_time_to_recover() cleanup tcp_time_to_recover() does not need the @flag argument. Its first parameter can be marked const, and of tcp_sock type. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618091246.1260322-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 16:15:34 -07:00
Nicolas Escande	c7d78566bb	neighbour: add support for NUD_PERMANENT proxy entries As discussesd before in [0] proxy entries (which are more configuration than runtime data) should stay when the link (carrier) goes does down. This is what happens for regular neighbour entries. So lets fix this by: - storing in proxy entries the fact that it was added as NUD_PERMANENT - not removing NUD_PERMANENT proxy entries when the carrier goes down (same as how it's done in neigh_flush_dev() for regular neigh entries) [0]: https://lore.kernel.org/netdev/c584ef7e-6897-01f3-5b80-12b53f7b4bf4@kernel.org/ Signed-off-by: Nicolas Escande <nico.escande@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250617141334.3724863-1-nico.escande@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 16:15:34 -07:00
Jakub Kicinski	62deb67fc5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 13:00:24 -07:00
Jakub Kicinski	dccf87ea49	More fixes: - ath12k - avoid busy-waiting - activate correct number of links - iwlwifi - iwldvm regression (lots of warnings) - iwlmld merge damage regression (crash) - fix build with some old gcc versions - carl9170: don't talk to device w/o FW [syzbot] - ath6kl: remove bad FW WARN [syzbot] - ieee80211: use variable-length arrays [syzbot] - mac80211 - remove WARN on delayed beacon update [syzbot] - drop OCB frames with invalid source [syzbot] -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhSg10ACgkQ10qiO8sP aAD10w//fdel2PXhAT6weaNRXmpUStR/TWCcKeUy6pko1BanxVbJNK9DPsCmg9Ym ApDewdsmTF600VAW+/JpoHCckzIGhx8Q/i5DJ0Xi/sbLweFMeDcmAsHkqVu11wTn gatTXlwP0/iXCL/EEtuS8fS1vs7HF29Q4QVNHPtOEaoNSm7LpE20Ukve4fBB8eXr oG3l7bNxW4STIFJL1+EWIXXv+tuyfORsIoD65N+NephJvTYvdKocWjjjDgnsLrPR HOW7Yli9G7xebXvDhdUVzdDjG0KEcarEHbyIosiPCv/xSnFZlrq4pg2zz8psHG4H MxeHlDgKR2gSdAeX207vB2yZ884yrRtA2XuQ7t8JrVATvXI6LE9puB+P0ssqLsvz jO9LjU9cfg8EoaBnK1w6vjL+OH3z2D1D8CvrkRjAUxZUKcBfk/NIQu4zsei5GuL5 fGDSowLiMUMMyecvKbLggxMEwp3qTs9zTb4w5Vn+QtbVFjj8+HgEb4AHrQM9tr/y GavBZvCnKqEnv/VgkTh2Z9KfVznFfXKi3N0XR6gotVDueMeG6fkXvwFTSg6dW0UX c1vHJTZ0H0AHelWoSz8/1FO6NXmqU76hnW7EwcYkwkGJ3/tji1Bzyt6ihknIttKl AVYno+nmAC1WJNzKxJDFE+HX/F1RwowKlek/nLUMU1Ti4kzgRWo= =Q6JR -----END PGP SIGNATURE----- Merge tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== More fixes: - ath12k - avoid busy-waiting - activate correct number of links - iwlwifi - iwldvm regression (lots of warnings) - iwlmld merge damage regression (crash) - fix build with some old gcc versions - carl9170: don't talk to device w/o FW [syzbot] - ath6kl: remove bad FW WARN [syzbot] - ieee80211: use variable-length arrays [syzbot] - mac80211 - remove WARN on delayed beacon update [syzbot] - drop OCB frames with invalid source [syzbot] * tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking wifi: iwlwifi: dvm: restore n_no_reclaim_cmds setting wifi: iwlwifi: cfg: Limit cb_size to valid range wifi: iwlwifi: restore missing initialization of async_handlers_list (again) wifi: ath6kl: remove WARN on bad firmware input wifi: carl9170: do not ping device which has failed to load firmware wifi: ath12k: don't wait when there is no vdev started wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process() wifi: ath12k: avoid burning CPU while waiting for firmware stats wifi: ath12k: fix documentation on firmware stats wifi: ath12k: don't activate more links than firmware supports wifi: ath12k: update link active in case two links fall on the same MAC wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID command wifi: ath12k: update freq range for each hardware mode wifi: ath12k: parse and save sbs_lower_band_end_freq from WMI_SERVICE_READY_EXT2_EVENTID event wifi: ath12k: parse and save hardware mode info from WMI_SERVICE_READY_EXT_EVENTID event for later use wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STAT wifi: mac80211: don't WARN for late channel/color switch wifi: mac80211: drop invalid source address OCB frames wifi: remove zero-length arrays ==================== Link: https://patch.msgid.link/20250618210642.35805-6-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 08:38:41 -07:00
Eric Dumazet	d03b79f459	net: atm: fix /proc/net/atm/lec handling /proc/net/atm/lec must ensure safety against dev_lec[] changes. It appears it had dev_put() calls without prior dev_hold(), leading to imbalance and UAF. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> # Minor atm contributor Link: https://patch.msgid.link/20250618140844.1686882-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 08:36:31 -07:00
Eric Dumazet	d13a3824bf	net: atm: add lec_mutex syzbot found its way in net/atm/lec.c, and found an error path in lecd_attach() could leave a dangling pointer in dev_lec[]. Add a mutex to protect dev_lecp[] uses from lecd_attach(), lec_vcc_attach() and lec_mcast_attach(). Following patch will use this mutex for /proc/net/atm/lec. BUG: KASAN: slab-use-after-free in lecd_attach net/atm/lec.c:751 [inline] BUG: KASAN: slab-use-after-free in lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008 Read of size 8 at addr ffff88807c7b8e68 by task syz.1.17/6142 CPU: 1 UID: 0 PID: 6142 Comm: syz.1.17 Not tainted 6.16.0-rc1-syzkaller-00239-g08215f5486ec #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xcd/0x680 mm/kasan/report.c:521 kasan_report+0xe0/0x110 mm/kasan/report.c:634 lecd_attach net/atm/lec.c:751 [inline] lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008 do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159 sock_do_ioctl+0x118/0x280 net/socket.c:1190 sock_ioctl+0x227/0x6b0 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Allocated by task 6132: kasan_save_stack+0x33/0x60 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __do_kmalloc_node mm/slub.c:4328 [inline] __kvmalloc_node_noprof+0x27b/0x620 mm/slub.c:5015 alloc_netdev_mqs+0xd2/0x1570 net/core/dev.c:11711 lecd_attach net/atm/lec.c:737 [inline] lane_ioctl+0x17db/0x23e0 net/atm/lec.c:1008 do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159 sock_do_ioctl+0x118/0x280 net/socket.c:1190 sock_ioctl+0x227/0x6b0 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 6132: kasan_save_stack+0x33/0x60 mm/kasan/common.c:47 kasan_save_track+0x14/0x30 mm/kasan/common.c:68 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:576 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2381 [inline] slab_free mm/slub.c:4643 [inline] kfree+0x2b4/0x4d0 mm/slub.c:4842 free_netdev+0x6c5/0x910 net/core/dev.c:11892 lecd_attach net/atm/lec.c:744 [inline] lane_ioctl+0x1ce8/0x23e0 net/atm/lec.c:1008 do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159 sock_do_ioctl+0x118/0x280 net/socket.c:1190 sock_ioctl+0x227/0x6b0 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893 Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+8b64dec3affaed7b3af5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6852c6f6.050a0220.216029.0018.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250618140844.1686882-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 08:36:13 -07:00
Krzysztof Kozlowski	fc27ab4890	NFC: nci: uart: Set tty->disc_data only in success path Setting tty->disc_data before opening the NCI device means we need to clean it up on error paths. This also opens some short window if device starts sending data, even before NCIUARTSETDRIVER IOCTL succeeded (broken hardware?). Close the window by exposing tty->disc_data only on the success path, when opening of the NCI device and try_module_get() succeeds. The code differs in error path in one aspect: tty->disc_data won't be ever assigned thus NULL-ified. This however should not be relevant difference, because of "tty->disc_data=NULL" in nci_uart_tty_open(). Cc: Linus Torvalds <torvalds@linuxfoundation.org> Fixes: `9961127d4b` ("NFC: nci: add generic uart support") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250618073649.25049-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 08:33:54 -07:00
Kuniyuki Iwashima	10876da918	calipso: Fix null-ptr-deref in calipso_req_{set,del}attr(). syzkaller reported a null-ptr-deref in sock_omalloc() while allocating a CALIPSO option. [0] The NULL is of struct sock, which was fetched by sk_to_full_sk() in calipso_req_setattr(). Since commit `a1a5344ddb` ("tcp: avoid two atomic ops for syncookies"), reqsk->rsk_listener could be NULL when SYN Cookie is returned to its client, as hinted by the leading SYN Cookie log. Here are 3 options to fix the bug: 1) Return 0 in calipso_req_setattr() 2) Return an error in calipso_req_setattr() 3) Alaways set rsk_listener 1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie for CALIPSO. 3) is also no go as there have been many efforts to reduce atomic ops and make TCP robust against DDoS. See also commit `3b24d854cb` ("tcp/dccp: do not touch listener sk_refcnt under synflood"). As of the blamed commit, SYN Cookie already did not need refcounting, and no one has stumbled on the bug for 9 years, so no CALIPSO user will care about SYN Cookie. Let's return an error in calipso_req_setattr() and calipso_req_delattr() in the SYN Cookie case. This can be reproduced by [1] on Fedora and now connect() of nc times out. [0]: TCP: request_sock_TCPv6: Possible SYN flooding on port [::]:20002. Sending cookies. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 3 UID: 0 PID: 12262 Comm: syz.1.2611 Not tainted 6.14.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_pnet include/net/net_namespace.h:406 [inline] RIP: 0010:sock_net include/net/sock.h:655 [inline] RIP: 0010:sock_kmalloc+0x35/0x170 net/core/sock.c:2806 Code: 89 d5 41 54 55 89 f5 53 48 89 fb e8 25 e3 c6 fd e8 f0 91 e3 00 48 8d 7b 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b RSP: 0018:ffff88811af89038 EFLAGS: 00010216 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff888105266400 RDX: 0000000000000006 RSI: ffff88800c890000 RDI: 0000000000000030 RBP: 0000000000000050 R08: 0000000000000000 R09: ffff88810526640e R10: ffffed1020a4cc81 R11: ffff88810526640f R12: 0000000000000000 R13: 0000000000000820 R14: ffff888105266400 R15: 0000000000000050 FS: 00007f0653a07640(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f863ba096f4 CR3: 00000000163c0005 CR4: 0000000000770ef0 PKRU: 80000000 Call Trace: <IRQ> ipv6_renew_options+0x279/0x950 net/ipv6/exthdrs.c:1288 calipso_req_setattr+0x181/0x340 net/ipv6/calipso.c:1204 calipso_req_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:597 netlbl_req_setattr+0x18a/0x440 net/netlabel/netlabel_kapi.c:1249 selinux_netlbl_inet_conn_request+0x1fb/0x320 security/selinux/netlabel.c:342 selinux_inet_conn_request+0x1eb/0x2c0 security/selinux/hooks.c:5551 security_inet_conn_request+0x50/0xa0 security/security.c:4945 tcp_v6_route_req+0x22c/0x550 net/ipv6/tcp_ipv6.c:825 tcp_conn_request+0xec8/0x2b70 net/ipv4/tcp_input.c:7275 tcp_v6_conn_request+0x1e3/0x440 net/ipv6/tcp_ipv6.c:1328 tcp_rcv_state_process+0xafa/0x52b0 net/ipv4/tcp_input.c:6781 tcp_v6_do_rcv+0x8a6/0x1a40 net/ipv6/tcp_ipv6.c:1667 tcp_v6_rcv+0x505e/0x5b50 net/ipv6/tcp_ipv6.c:1904 ip6_protocol_deliver_rcu+0x17c/0x1da0 net/ipv6/ip6_input.c:436 ip6_input_finish+0x103/0x180 net/ipv6/ip6_input.c:480 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0x13c/0x6b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:469 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] ip6_rcv_finish+0xb6/0x490 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0xf9/0x490 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core+0x12e/0x1f0 net/core/dev.c:5896 __netif_receive_skb+0x1d/0x170 net/core/dev.c:6009 process_backlog+0x41e/0x13b0 net/core/dev.c:6357 __napi_poll+0xbd/0x710 net/core/dev.c:7191 napi_poll net/core/dev.c:7260 [inline] net_rx_action+0x9de/0xde0 net/core/dev.c:7382 handle_softirqs+0x19a/0x770 kernel/softirq.c:561 do_softirq.part.0+0x36/0x70 kernel/softirq.c:462 </IRQ> <TASK> do_softirq arch/x86/include/asm/preempt.h:26 [inline] __local_bh_enable_ip+0xf1/0x110 kernel/softirq.c:389 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0xc2a/0x3c40 net/core/dev.c:4679 dev_queue_xmit include/linux/netdevice.h:3313 [inline] neigh_hh_output include/net/neighbour.h:523 [inline] neigh_output include/net/neighbour.h:537 [inline] ip6_finish_output2+0xd69/0x1f80 net/ipv6/ip6_output.c:141 __ip6_finish_output net/ipv6/ip6_output.c:215 [inline] ip6_finish_output+0x5dc/0xd60 net/ipv6/ip6_output.c:226 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x24b/0x8d0 net/ipv6/ip6_output.c:247 dst_output include/net/dst.h:459 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_xmit+0xbbc/0x20d0 net/ipv6/ip6_output.c:366 inet6_csk_xmit+0x39a/0x720 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x1a7b/0x3b40 net/ipv4/tcp_output.c:1471 tcp_transmit_skb net/ipv4/tcp_output.c:1489 [inline] tcp_send_syn_data net/ipv4/tcp_output.c:4059 [inline] tcp_connect+0x1c0c/0x4510 net/ipv4/tcp_output.c:4148 tcp_v6_connect+0x156c/0x2080 net/ipv6/tcp_ipv6.c:333 __inet_stream_connect+0x3a7/0xed0 net/ipv4/af_inet.c:677 tcp_sendmsg_fastopen+0x3e2/0x710 net/ipv4/tcp.c:1039 tcp_sendmsg_locked+0x1e82/0x3570 net/ipv4/tcp.c:1091 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1358 inet6_sendmsg+0xb9/0x150 net/ipv6/af_inet6.c:659 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0xf4/0x2a0 net/socket.c:733 __sys_sendto+0x29a/0x390 net/socket.c:2187 __do_sys_sendto net/socket.c:2194 [inline] __se_sys_sendto net/socket.c:2190 [inline] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2190 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc3/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f06553c47ed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f0653a06fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f0655605fa0 RCX: 00007f06553c47ed RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b RBP: 00007f065545db38 R08: 0000200000000140 R09: 000000000000001c R10: f7384d4ea84b01bd R11: 0000000000000246 R12: 0000000000000000 R13: 00007f0655605fac R14: 00007f0655606038 R15: 00007f06539e7000 </TASK> Modules linked in: [1]: dnf install -y selinux-policy-targeted policycoreutils netlabel_tools procps-ng nmap-ncat mount -t selinuxfs none /sys/fs/selinux load_policy netlabelctl calipso add pass doi:1 netlabelctl map del default netlabelctl map add default address:::1 protocol:calipso,1 sysctl net.ipv4.tcp_syncookies=2 nc -l ::1 80 & nc ::1 80 Fixes: `e1adea9270` ("calipso: Allow request sockets to be relabelled by the lsm.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+4yxNu4w@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250617224125.17299-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-19 08:33:09 -07:00
Jeff Layton	94d10a4dba	sunrpc: handle SVC_GARBAGE during svc auth processing as auth error tianshuo han reported a remotely-triggerable crash if the client sends a kernel RPC server a specially crafted packet. If decoding the RPC reply fails in such a way that SVC_GARBAGE is returned without setting the rq_accept_statp pointer, then that pointer can be dereferenced and a value stored there. If it's the first time the thread has processed an RPC, then that pointer will be set to NULL and the kernel will crash. In other cases, it could create a memory scribble. The server sunrpc code treats a SVC_GARBAGE return from svc_authenticate or pg_authenticate as if it should send a GARBAGE_ARGS reply. RFC 5531 says that if authentication fails that the RPC should be rejected instead with a status of AUTH_ERR. Handle a SVC_GARBAGE return as an AUTH_ERROR, with a reason of AUTH_BADCRED instead of returning GARBAGE_ARGS in that case. This sidesteps the whole problem of touching the rpc_accept_statp pointer in this situation and avoids the crash. Cc: stable@kernel.org Fixes: `29cd2927fb` ("SUNRPC: Fix encoding of accepted but unsuccessful RPC replies") Reported-by: tianshuo han <hantianshuo233@gmail.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-06-19 09:35:45 -04:00
Christian Brauner	804d679449	pidfs: remove pidfs_{get,put}_pid() Now that we stash persistent information in struct pid there's no need to play volatile games with pinning struct pid via dentries in pidfs. Link: https://lore.kernel.org/20250618-work-pidfs-persistent-v2-8-98f3456fd552@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-19 14:28:24 +02:00
Guillaume Nault	d3623dd5bd	ipv6: Simplify link-local address generation for IPv6 GRE. Since commit `3e6a0243ff` ("gre: Fix again IPv6 link-local address generation."), addrconf_gre_config() has stopped handling IP6GRE devices specially and just calls the regular addrconf_addr_gen() function to create their link-local IPv6 addresses. We can thus avoid using addrconf_gre_config() for IP6GRE devices and use the normal IPv6 initialisation path instead (that is, jump directly to addrconf_dev_config() in addrconf_init_auto_addrs()). See commit `3e6a0243ff` ("gre: Fix again IPv6 link-local address generation.") for a deeper explanation on how and why GRE devices started handling their IPv6 link-local address generation specially, why it was a problem, and why this is not even necessary in most cases (especially for GRE over IPv6). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/a9144be9c7ec3cf09f25becae5e8fdf141fde9f6.1750075076.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-06-19 12:11:40 +02:00
Kory Maincent (Dent Project)	eeb0c8f72f	net: ethtool: Add PSE port priority support feature This patch expands the status information provided by ethtool for PSE c33 with current port priority and max port priority. It also adds a call to pse_ethtool_set_prio() to configure the PSE port priority. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-8-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 19:00:17 -07:00
Kory Maincent (Dent Project)	1176978ed8	net: ethtool: Add support for new power domains index description Report the index of the newly introduced PSE power domain to the user, enabling improved management of the power budget for PSE devices. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-5-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 19:00:16 -07:00
Kory Maincent (Dent Project)	fc0e6db309	net: pse-pd: Add support for reporting events Add support for devm_pse_irq_helper() to register PSE interrupts and report events such as over-current or over-temperature conditions. This follows a similar approach to the regulator API but also sends notifications using a dedicated PSE ethtool netlink socket. Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-2-78a1a645e2ee@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 19:00:15 -07:00
David Arinzon	cea465a96a	devlink: Add new "enable_phc" generic device param Add a new device generic parameter to enable/disable the PHC (PTP Hardware Clock) functionality in the device associated with the devlink instance. Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250617110545.5659-6-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:57:29 -07:00
Stanislav Fomichev	3a321b6b1f	net: remove redundant ASSERT_RTNL() in queue setup functions The existing netdev_ops_assert_locked() already asserts that either the RTNL lock or the per-device lock is held, making the explicit ASSERT_RTNL() redundant. Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-5-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
Stanislav Fomichev	1ead750109	udp_tunnel: remove rtnl_lock dependency Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <michael.chan@broadcom.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:53:51 -07:00
David Wei	dbe0ca8da1	tcp: fix passive TFO socket having invalid NAPI ID There is a bug with passive TFO sockets returning an invalid NAPI ID 0 from SO_INCOMING_NAPI_ID. Normally this is not an issue, but zero copy receive relies on a correct NAPI ID to process sockets on the right queue. Fix by adding a sk_mark_napi_id_set(). Fixes: `e5907459ce` ("tcp: Record Rx hash and NAPI ID in tcp_child_process") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250617212102.175711-5-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 18:30:51 -07:00
Haixia Qu	f82727adcf	tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer The reproduction steps: 1. create a tun interface 2. enable l2 bearer 3. TIPC_NL_UDP_GET_REMOTEIP with media name set to tun tipc: Started in network mode tipc: Node identity 8af312d38a21, cluster identity 4711 tipc: Enabled bearer <eth:syz_tun>, priority 1 Oops: general protection fault KASAN: null-ptr-deref in range CPU: 1 UID: 1000 PID: 559 Comm: poc Not tainted 6.16.0-rc1+ #117 PREEMPT Hardware name: QEMU Ubuntu 24.04 PC RIP: 0010:tipc_udp_nl_dump_remoteip+0x4a4/0x8f0 the ub was in fact a struct dev. when bid != 0 && skip_cnt != 0, bearer_list[bid] may be NULL or other media when other thread changes it. fix this by checking media_id. Fixes: `832629ca5c` ("tipc: add UDP remoteip dump to netlink API") Signed-off-by: Haixia Qu <hxqu@hillstonenet.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20250617055624.2680-1-hxqu@hillstonenet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 14:19:50 -07:00
Simon Horman	a9874d961e	nfc: Remove checks for nla_data returning NULL The implementation of nla_data is as follows: static inline void nla_data(const struct nlattr nla) { return (char *) nla + NLA_HDRLEN; } Excluding the case where nla is exactly -NLA_HDRLEN, it will not return NULL. And it seems misleading to assume that it can, other than in this corner case. So drop checks for this condition. Flagged by Smatch. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250617-nfc-null-data-v1-1-c7525ead2e95@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-18 14:17:32 -07:00
Neal Cardwell	d0fa59897e	tcp: fix tcp_packet_delayed() for tcp_is_non_sack_preventing_reopen() behavior After the following commit from 2024: commit `e37ab73736` ("tcp: fix to allow timestamp undo if no retransmits were sent") ...there was buggy behavior where TCP connections without SACK support could easily see erroneous undo events at the end of fast recovery or RTO recovery episodes. The erroneous undo events could cause those connections to suffer repeated loss recovery episodes and high retransmit rates. The problem was an interaction between the non-SACK behavior on these connections and the undo logic. The problem is that, for non-SACK connections at the end of a loss recovery episode, if snd_una == high_seq, then tcp_is_non_sack_preventing_reopen() holds steady in CA_Recovery or CA_Loss, but clears tp->retrans_stamp to 0. Then upon the next ACK the "tcp: fix to allow timestamp undo if no retransmits were sent" logic saw the tp->retrans_stamp at 0 and erroneously concluded that no data was retransmitted, and erroneously performed an undo of the cwnd reduction, restoring cwnd immediately to the value it had before loss recovery. This caused an immediate burst of traffic and build-up of queues and likely another immediate loss recovery episode. This commit fixes tcp_packet_delayed() to ignore zero retrans_stamp values for non-SACK connections when snd_una is at or above high_seq, because tcp_is_non_sack_preventing_reopen() clears retrans_stamp in this case, so it's not a valid signal that we can undo. Note that the commit named in the Fixes footer restored long-present behavior from roughly 2005-2019, so apparently this bug was present for a while during that era, and this was simply not caught. Fixes: `e37ab73736` ("tcp: fix to allow timestamp undo if no retransmits were sent") Reported-by: Eric Wheeler <netdev@lists.ewheeler.net> Closes: https://lore.kernel.org/netdev/64ea9333-e7f9-0df-b0f2-8d566143acab@ewheeler.net/ Signed-off-by: Neal Cardwell <ncardwell@google.com> Co-developed-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-06-18 09:04:34 +01:00
Kuniyuki Iwashima	7851263998	atm: Revert atm_account_tx() if copy_from_iter_full() fails. In vcc_sendmsg(), we account skb->truesize to sk->sk_wmem_alloc by atm_account_tx(). It is expected to be reverted by atm_pop_raw() later called by vcc->dev->ops->send(vcc, skb). However, vcc_sendmsg() misses the same revert when copy_from_iter_full() fails, and then we will leak a socket. Let's factorise the revert part as atm_return_tx() and call it in the failure path. Note that the corresponding sk_wmem_alloc operation can be found in alloc_tx() as of the blamed commit. $ git blame -L:alloc_tx net/atm/common.c c55fa3cccbc2c~ Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20250614161959.GR414686@horms.kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250616182147.963333-3-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:42:44 -07:00
Tejun Heo	fd0406e5ca	net: tcp: tsq: Convert from tasklet to BH workqueue The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts TCP Small Queues implementation from tasklet to BH workqueue. Semantically, this is an equivalent conversion and there shouldn't be any user-visible behavior changes. While workqueue's queueing and execution paths are a bit heavier than tasklet's, unless the work item is being queued every packet, the difference hopefully shouldn't matter. My experience with the networking stack is very limited and this patch definitely needs attention from someone who actually understands networking. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/aFBeJ38AS1ZF3Dq5@slm.duckdns.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:29:21 -07:00
Kuniyuki Iwashima	6dbb0d97c5	mpls: Use rcu_dereference_rtnl() in mpls_route_input_rcu(). As syzbot reported [0], mpls_route_input_rcu() can be called from mpls_getroute(), where is under RTNL. net->mpls.platform_label is only updated under RTNL. Let's use rcu_dereference_rtnl() in mpls_route_input_rcu() to silence the splat. [0]: WARNING: suspicious RCU usage 6.15.0-rc7-syzkaller-00082-g5cdb2c77c4c3 #0 Not tainted ---------------------------- net/mpls/af_mpls.c:84 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by syz.2.4451/17730: #0: ffffffff9012a3e8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline] #0: ffffffff9012a3e8 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x371/0xe90 net/core/rtnetlink.c:6961 stack backtrace: CPU: 1 UID: 0 PID: 17730 Comm: syz.2.4451 Not tainted 6.15.0-rc7-syzkaller-00082-g5cdb2c77c4c3 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120 lockdep_rcu_suspicious+0x166/0x260 kernel/locking/lockdep.c:6865 mpls_route_input_rcu+0x1d4/0x200 net/mpls/af_mpls.c:84 mpls_getroute+0x621/0x1ea0 net/mpls/af_mpls.c:2381 rtnetlink_rcv_msg+0x3c9/0xe90 net/core/rtnetlink.c:6964 netlink_rcv_skb+0x16d/0x440 net/netlink/af_netlink.c:2534 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x53a/0x7f0 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] ____sys_sendmsg+0xa98/0xc70 net/socket.c:2566 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620 __sys_sendmmsg+0x200/0x420 net/socket.c:2709 __do_sys_sendmmsg net/socket.c:2736 [inline] __se_sys_sendmmsg net/socket.c:2733 [inline] __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2733 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x230 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f0a2818e969 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f0a28f52038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00007f0a283b5fa0 RCX: 00007f0a2818e969 RDX: 0000000000000003 RSI: 0000200000000080 RDI: 0000000000000003 RBP: 00007f0a28210ab1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f0a283b5fa0 R15: 00007ffce5e9f268 </TASK> Fixes: `0189197f44` ("mpls: Basic routing support") Reported-by: syzbot+8a583bdd1a5cc0b0e068@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68507981.a70a0220.395abc.01ef.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250616201532.1036568-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:21:59 -07:00
Petr Machata	96e8f5a9fe	net: ipv6: Add ip6_mr_output() Multicast routing is today handled in the input path. Locally generated MC packets don't hit the IPMR code today. Thus if a VXLAN remote address is multicast, the driver needs to set an OIF during route lookup. Thus MC routing configuration needs to be kept in sync with the VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC routing code instead. To that end, this patch adds support to route locally generated multicast packets. The newly-added routines do largely what ip6_mr_input() and ip6_mr_forward() do: make an MR cache lookup to find where to send the packets, and use ip6_output() to send each of them. When no cache entry is found, the packet is punted to the daemon for resolution. Similarly to the IPv4 case in a previous patch, the new logic is contingent on a newly-added IP6CB flag being set. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/3bcc034a3ab4d3c291072fff38f78d7fbbeef4e6.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:46 -07:00
Petr Machata	1b02f4475d	net: ipv6: ip6mr: Split ip6mr_forward2() in two Some of the work of ip6mr_forward2() is specific to IPMR forwarding, and should not take place on the output path. In order to allow reuse of the common parts, extract out of the function a helper, ip6mr_prepare_forward(). Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/8932bd5c0fbe3f662b158803b8509604fa7bc113.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:46 -07:00
Petr Machata	094f39d5e8	net: ipv6: ip6mr: Make ip6mr_forward2() void Nobody uses the return value, so convert the function to void. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/e0bee259da0da58da96647ea9e21e6360c8f7e11.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	3365afd3ab	net: ipv6: ip6mr: Fix in/out netdev to pass to the FORWARD chain The netfilter hook is invoked with skb->dev for input netdevice, and vif_dev for output netdevice. However at the point of invocation, skb->dev is already set to vif_dev, and MR-forwarded packets are reported with in=out: # ip6tables -A FORWARD -j LOG --log-prefix '[forw]' # cd tools/testing/selftests/net/forwarding # ./router_multicast.sh # dmesg \| fgrep '[forw]' [ 1670.248245] [forw]IN=v5 OUT=v5 [...] For reference, IPv4 MR code shows in and out as appropriate. Fix by caching skb->dev and using the updated value for output netdev. Fixes: `7bc570c8b4` ("[IPV6] MROUTE: Support multicast forwarding.") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/3141ae8386fbe13fef4b793faa75e6bae58d798a.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	f78c75d84f	net: ipv6: Add a flags argument to ip6tunnel_xmit(), udp_tunnel6_xmit_skb() ip6tunnel_xmit() erases the contents of the SKB control block. In order to be able to set particular IP6CB flags on the SKB, add a corresponding parameter, and propagate it to udp_tunnel6_xmit_skb() as well. In one of the following patches, VXLAN driver will use this facility to mark packets as subject to IPv6 multicast routing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/acb4f9f3e40c3a931236c3af08a720b017fbfbfb.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	6a7d88ca15	net: ipv6: Make udp_tunnel6_xmit_skb() void The function always returns zero, thus the return value does not carry any signal. Just make it void. Most callers already ignore the return value. However: - Refold arguments of the call from sctp_v6_xmit() so that they fit into the 80-column limit. - tipc_udp_xmit() initializes err from the return value, but that should already be always zero at that point. So there's no practical change, but elision of the assignment prompts a couple more tweaks to clean up the function. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/7facacf9d8ca3ca9391a4aee88160913671b868d.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	35bec72a24	net: ipv4: Add ip_mr_output() Multicast routing is today handled in the input path. Locally generated MC packets don't hit the IPMR code today. Thus if a VXLAN remote address is multicast, the driver needs to set an OIF during route lookup. Thus MC routing configuration needs to be kept in sync with the VXLAN FDB and MDB. Ideally, the VXLAN packets would be routed by the MC routing code instead. To that end, this patch adds support to route locally generated multicast packets. The newly-added routines do largely what ip_mr_input() and ip_mr_forward() do: make an MR cache lookup to find where to send the packets, and use ip_mc_output() to send each of them. When no cache entry is found, the packet is punted to the daemon for resolution. However, an installation that uses a VXLAN underlay netdevice for which it also has matching MC routes, would get a different routing with this patch. Previously, the MC packets would be delivered directly to the underlay port, whereas now they would be MC-routed. In order to avoid this change in behavior, introduce an IPCB flag. Only if the flag is set will ip_mr_output() actually engage, otherwise it reverts to ip_mc_output(). This code is based on work by Roopa Prabhu and Nikolay Aleksandrov. Signed-off-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/0aadbd49330471c0f758d54afb05eb3b6e3a6b65.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	b2e653bcff	net: ipv4: ipmr: Split ipmr_queue_xmit() in two Some of the work of ipmr_queue_xmit() is specific to IPMR forwarding, and should not take place on the output path. In order to allow reuse of the common parts, split the function into two: the ipmr_prepare_xmit() helper that takes care of the common bits, and the ipmr_queue_fwd_xmit(), which invokes the former and encapsulates the whole forwarding algorithm. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/4e8db165572a4f8bd29a723a801e854e9d20df4d.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	3b7bc938e0	net: ipv4: ipmr: ipmr_queue_xmit(): Drop local variable `dev' The variable is used for caching of rt->dst.dev. The netdevice referenced therein does not change during the scope of validity of that local. At the same time, the local is only used twice, and each of these uses will end up in a different function in the following patches, further eliminating any use the local could have had. Drop the local altogether and inline the uses. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/c80600a4b51679fe78f429ccb6d60892c2f9e4de.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:45 -07:00
Petr Machata	e3411e326f	net: ipv4: Add a flags argument to iptunnel_xmit(), udp_tunnel_xmit_skb() iptunnel_xmit() erases the contents of the SKB control block. In order to be able to set particular IPCB flags on the SKB, add a corresponding parameter, and propagate it to udp_tunnel_xmit_skb() as well. In one of the following patches, VXLAN driver will use this facility to mark packets as subject to IP multicast routing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/89c9daf9f2dc088b6b92ccebcc929f51742de91f.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 18:18:44 -07:00
Álvaro Fernández Rojas	ef07df397a	net: dsa: tag_brcm: add support for legacy FCS tags Add support for legacy Broadcom FCS tags, which are similar to DSA_TAG_PROTO_BRCM_LEGACY. BCM5325 and BCM5365 switches require including the original FCS value and length, as opposed to BCM63xx switches. Adding the original FCS value and length to DSA_TAG_PROTO_BRCM_LEGACY would impact performance of BCM63xx switches, so it's better to create a new tag. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250614080000.1884236-3-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 17:52:01 -07:00
Álvaro Fernández Rojas	a4daaf063f	net: dsa: tag_brcm: legacy: reorganize functions Move brcm_leg_tag_rcv() definition to top. This function is going to be shared between two different tags. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Link: https://patch.msgid.link/20250614080000.1884236-2-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 17:51:59 -07:00
Neal Cardwell	db16319efc	tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial() Now that we have removed the RFC3517/RFC6675 hints, tcp_clear_retrans_hints_partial() is empty, and can be removed. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250615001435.2390793-4-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 16:19:04 -07:00
Neal Cardwell	ba4618885b	tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint Now that obsolete RFC3517/RFC6675 TCP loss detection has been removed, we can remove the somewhat complex and intrusive code to maintain its hint state: lost_skb_hint and lost_cnt_hint. This commit makes tcp_clear_retrans_hints_partial() empty. We will remove tcp_clear_retrans_hints_partial() and its call sites in the next commit. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250615001435.2390793-3-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 16:19:04 -07:00
Neal Cardwell	1c120191dc	tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code RACK-TLP loss detection has been enabled as the default loss detection algorithm for Linux TCP since 2018, in: commit `b38a51fec1` ("tcp: disable RFC6675 loss detection") In case users ran into unexpected bugs or performance regressions, that commit allowed Linux system administrators to revert to using RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0. In the seven years since 2018, our team has not heard reports of anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and we can't find any record in web searches of such a revert. RACK-TLP was published as a standards-track RFC, RFC8985, in February 2021. Several other major TCP implementations have default-enabled RACK-TLP at this point as well. RACK-TLP offers several significant performance advantages over RFC3517/RFC6675 loss recovery, including much better performance in the common cases of tail drops, lost retransmissions, and reordering. It is now time to remove the obsolete and unused RFC3517/RFC6675 loss recovery code. This will allow a substantial simplification of the Linux TCP code base, and removes 12 bytes of state in every tcp_sock for 64-bit machines (8 bytes on 32-bit machines). To arrange the commits in reasonable sizes, this patch series is split into 3 commits. The following 2 commits remove bookkeeping state and code that is no longer needed after this removal of RFC3517/RFC6675 loss recovery. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250615001435.2390793-2-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 16:18:48 -07:00
Hyunwoo Kim	b160766e26	net/sched: fix use-after-free in taprio_dev_notifier Since taprio’s taprio_dev_notifier() isn’t protected by an RCU read-side critical section, a race with advance_sched() can lead to a use-after-free. Adding rcu_read_lock() inside taprio_dev_notifier() prevents this. Fixes: `fed87cc671` ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/aEzIYYxt0is9upYG@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 16:14:04 -07:00
Jakub Kicinski	68a4abb1ef	linux-can-fixes-for-6.16-20250617 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmhRji0THG1rbEBwZW5n dXRyb25peC5kZQAKCRAMdGXf+ZCRnIhnB/4sPyASpIMeCZh20cL8WHFKhz1F2daS znxvsL3sQZfol4a6pEc4/lYHEcpODitUhF4sPbMgU7qlSEysZbEsfuYZw1RLy8j9 bsIS+LqRqSslwS2/XuLwOjXly4+bgPwc4wj2RwH9+D9TlaLkZ02NZAqDNkidPxm/ 8udZxMqorVIUhqsckfsmIqVonllz7l3InSZ3gYeXVLXAMTgsvperjVtN/YivdbPm H1T6b6qopUEmIqDdsKelb1D9URMoiW9exDX05OpSzxZWWeTPZyJclOraKu6Otf4W Jd7IVgOv03prcOcerlXZoD6tEC0Amsl1RdkPiX6csAa6m63XvxjHIJyH =hk/l -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-06-17 The patch is by Brett Werling, and fixes the power regulator retrieval during probe of the tcan4x5x glue code for the m_can driver. * tag 'linux-can-fixes-for-6.16-20250617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: tcan4x5x: fix power regulator retrieval during probe openvswitch: Allocate struct ovs_pcpu_storage dynamically ==================== Link: https://patch.msgid.link/20250617155123.2141584-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 15:48:38 -07:00
Mina Almasry	6f793a1d05	net: netmem: fix skb_ensure_writable with unreadable skbs skb_ensure_writable should succeed when it's trying to write to the header of the unreadable skbs, so it doesn't need an unconditional skb_frags_readable check. The preceding pskb_may_pull() call will succeed if write_len is within the head and fail if we're trying to write to the unreadable payload, so we don't need an additional check. Removing this check restores DSCP functionality with unreadable skbs as it's called from dscp_tg. Cc: willemb@google.com Cc: asml.silence@gmail.com Fixes: `65249feb6b` ("net: add support for skbs with unreadable frags") Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250615200733.520113-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 15:48:20 -07:00
Jakub Kicinski	b9ebe0cd5d	Merge branch 'io_uring-cmd-for-tx-timestamps' Pavel Begunkov says: ==================== io_uring cmd for tx timestamps (part) Apply the networking helpers for the io_uring timestamp API. ==================== Link: https://patch.msgid.link/cover.1750065793.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 15:24:30 -07:00
Pavel Begunkov	2410251cde	net: timestamp: add helper returning skb's tx tstamp Add a helper function skb_get_tx_timestamp() that returns a tx timestamp associated with an error queue skb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/702357dd8936ef4c0d3864441e853bfe3224a677.1750065793.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-17 15:24:23 -07:00
Heiko Carstens	8a56977051	s390/drivers: Explicitly include <linux/export.h> Explicitly include <linux/export.h> in files which contain an EXPORT_SYMBOL(). See commit `a934a57a42` ("scripts/misc-check: check missing #include <linux/export.h> when W=1") for more details. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2025-06-17 18:18:02 +02:00
Sebastian Andrzej Siewior	7b4ac12cc9	openvswitch: Allocate struct ovs_pcpu_storage dynamically PERCPU_MODULE_RESERVE defines the maximum size that can by used for the per-CPU data size used by modules. This is 8KiB. Commit `035fcdc4d2` ("openvswitch: Merge three per-CPU structures into one") restructured the per-CPU memory allocation for the module and moved the separate alloc_percpu() invocations at module init time to a static per-CPU variable which is allocated by the module loader. The size of the per-CPU data section for openvswitch is 6488 bytes which is ~80% of the available per-CPU memory. Together with a few other modules it is easy to exhaust the available 8KiB of memory. Allocate ovs_pcpu_storage dynamically at module init time. Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/all/c401e017-f8db-4f57-a1cd-89beb979a277@nvidia.com Fixes: `035fcdc4d2` ("openvswitch: Merge three per-CPU structures into one") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250613123629.-XSoQTCu@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-06-17 14:47:46 +02:00
Johannes Berg	d19bac3d4e	wifi: mac80211: don't WARN for late channel/color switch There's really no value in the WARN stack trace etc., the reason for this happening isn't directly related to the calling function anyway. Also, syzbot has been observing it constantly, and there's no way we can resolve it there - those systems are just slow. Instead print an error message (once) and add a comment about what really causes this message. Reported-by: syzbot+468656785707b0e995df@syzkaller.appspotmail.com Reported-by: syzbot+18c783c5cf6a781e3e2c@syzkaller.appspotmail.com Reported-by: syzbot+d5924d5cffddfccab68e@syzkaller.appspotmail.com Reported-by: syzbot+7d73d99525d1ff7752ef@syzkaller.appspotmail.com Reported-by: syzbot+8e6e002c74d1927edaf5@syzkaller.appspotmail.com Reported-by: syzbot+97254a3b10c541879a65@syzkaller.appspotmail.com Reported-by: syzbot+dfd1fd46a1960ad9c6ec@syzkaller.appspotmail.com Reported-by: syzbot+85e0b8d12d9ca877d806@syzkaller.appspotmail.com Link: https://patch.msgid.link/20250617104902.146e10919be1.I85f352ca4a2dce6f556e5ff45ceaa5f3769cb5ce@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-17 14:30:06 +02:00
Johannes Berg	d1b1a5eb27	wifi: mac80211: drop invalid source address OCB frames In OCB, don't accept frames from invalid source addresses (and in particular don't try to create stations for them), drop the frames instead. Reported-by: syzbot+8b512026a7ec10dcbdd9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/6788d2d9.050a0220.20d369.0028.GAE@google.com/ Signed-off-by: Johannes Berg <johannes.berg@intel.com> Tested-by: syzbot+8b512026a7ec10dcbdd9@syzkaller.appspotmail.com Link: https://patch.msgid.link/20250616171838.7433379cab5d.I47444d63c72a0bd58d2e2b67bb99e1fea37eec6f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-06-17 14:29:49 +02:00
Jiri Slaby (SUSE)	2b5eac0f8c	tty: introduce and use tty_port_tty_vhangup() helper This code (tty_get -> vhangup -> tty_put) is repeated on few places. Introduce a helper similar to tty_port_tty_hangup() (asynchronous) to handle even vhangup (synchronous). And use it on those places. In fact, reuse the tty_port_tty_hangup()'s code and call tty_vhangup() depending on a new bool parameter. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: David Lin <dtwlin@gmail.com> Cc: Johan Hovold <johan@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: Oliver Neukum <oneukum@suse.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250611100319.186924-2-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-17 13:42:33 +02:00
Thomas Weißschuh	2fbe82037a	sysfs: treewide: switch back to bin_attribute::read()/write() The bin_attribute argument of bin_attribute::read() is now const. This makes the _new() callbacks unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-3-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-17 10:44:13 +02:00
Ido Schimmel	a2840d4e25	seg6: Allow End.X behavior to accept an oif Extend the End.X behavior to accept an output interface as an optional attribute and make use of it when resolving a route. This is needed when user space wants to use a link-local address as the nexthop address. Before: # ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6 # ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 $ ip -6 route show 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 dev sr6 metric 1024 pref medium 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium After: # ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6 # ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 $ ip -6 route show 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6 metric 1024 pref medium 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium Note that the oif attribute is not dumped to user space when it was not specified (as an oif of 0) since each entry keeps track of the optional attributes that it parsed during configuration (see struct seg6_local_lwt::parsed_optattrs). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250612122323.584113-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:31:14 -07:00
Ido Schimmel	3159671855	seg6: Call seg6_lookup_any_nexthop() from End.X behavior seg6_lookup_nexthop() is a wrapper around seg6_lookup_any_nexthop(). Change End.X behavior to invoke seg6_lookup_any_nexthop() directly so that we would not need to expose the new output interface argument outside of the seg6local module. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250612122323.584113-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:31:14 -07:00
Ido Schimmel	01c411238c	seg6: Extend seg6_lookup_any_nexthop() with an oif argument seg6_lookup_any_nexthop() is called by the different endpoint behaviors (e.g., End, End.X) to resolve an IPv6 route. Extend the function with an output interface argument so that it could be used to resolve a route with a certain output interface. This will be used by subsequent patches that will extend the End.X behavior with an output interface as an optional argument. ip6_route_input_lookup() cannot be used when an output interface is specified as it ignores this parameter. Similarly, calling ip6_pol_route() when a table ID was not specified (e.g., End.X behavior) is wrong. Therefore, when an output interface is specified without a table ID, resolve the route using ip6_route_output() which will take the output interface into account. Note that no endpoint behavior currently passes both a table ID and an output interface, so the oif argument passed to ip6_pol_route() is always zero and there are no functional changes in this regard. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250612122323.584113-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:31:14 -07:00
Breno Leitao	ccc7edf0ad	netpoll: move netpoll_print_options to netconsole Move netpoll_print_options() from net/core/netpoll.c to drivers/net/netconsole.c and make it static. This function is only used by netconsole, so there's no need to export it or keep it in the public netpoll API. This reduces the netpoll API surface and improves code locality by keeping netconsole-specific functionality within the netconsole driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-4-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:18:33 -07:00
Breno Leitao	5a34c9a853	netpoll: relocate netconsole-specific functions to netconsole module Move netpoll_parse_ip_addr() and netpoll_parse_options() from the generic netpoll module to the netconsole module where they are actually used. These functions were originally placed in netpoll but are only consumed by netconsole. This refactoring improves code organization by: - Removing unnecessary exported symbols from netpoll - Making netpoll_parse_options() static (no longer needs global visibility) - Reducing coupling between netpoll and netconsole modules The functions remain functionally identical - this is purely a code reorganization to better reflect their actual usage patterns. Here are the changes: 1) Move both functions from netpoll to netconsole 2) Add static to netpoll_parse_options() 3) Removed the EXPORT_SYMBOL() PS: This diff does not change the function format, so, it is easy to review, but, checkpatch will not be happy. A follow-up patch will address the current issues reported by checkpatch. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-3-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:18:33 -07:00
Breno Leitao	afb023329c	netpoll: expose netpoll logging macros in public header Move np_info(), np_err(), and np_notice() macros from internal implementation to the public netpoll header file to make them available for use by netpoll consumers. These logging macros provide consistent formatting for netpoll-related messages by automatically prefixing log output with the netpoll instance name. The goal is to use the exact same format that is being displayed today, instead of creating something netconsole-specific. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-2-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:18:33 -07:00
Breno Leitao	260948993a	netpoll: remove __netpoll_cleanup from exported API Since commit `97714695ef` ("net: netconsole: Defer netpoll cleanup to avoid lock release during list traversal"), netconsole no longer uses __netpoll_cleanup(). With no remaining users, remove this function from the exported netpoll API. The function remains available internally within netpoll for use by netpoll_cleanup(). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250613-rework-v3-1-0752bf2e6912@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-16 15:18:33 -07:00
Yajun Deng	0c17270f9b	net: sysfs: Implement is_visible for phys_(port_id, port_name, switch_id) phys_port_id_show, phys_port_name_show and phys_switch_id_show would return -EOPNOTSUPP if the netdev didn't implement the corresponding method. There is no point in creating these files if they are unsupported. Put these attributes in netdev_phys_group and implement the is_visible method. make phys_(port_id, port_name, switch_id) invisible if the netdev dosen't implement the corresponding method. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250612142707.4644-1-yajun.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-14 11:26:59 -07:00
Qiu Yutan	0051ea4aca	net: arp: use kfree_skb_reason() in arp_rcv() Replace kfree_skb() with kfree_skb_reason() in arp_rcv(). Signed-off-by: Qiu Yutan <qiu.yutan@zte.com.cn> Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn> Link: https://patch.msgid.link/20250612110259698Q2KNNOPQhnIApRskKN3Hi@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-13 17:30:02 -07:00
Yonghong Song	4fc012daf9	bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on arm64 with 64KB page: xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on when constructing frags, with 64K page size, the frag data_len could be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail(). To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel can test different page size properly. With the kernel change, the user space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default value is 17, so for 4K page, the maximum packet size will be less than 68K. To test 64K page, a bigger maximum packet size than 68K is desired. So two different functions are implemented for subtest xdp_adjust_frags_tail_grow. Depending on different page size, different data input/output sizes are used to adapt with different page size. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-06-12 19:07:51 -07:00
Hari Kalavakunta	8e16170ae9	net: ncsi: Fix buffer overflow in fetching version id In NC-SI spec v1.2 section 8.4.44.2, the firmware name doesn't need to be null terminated while its size occupies the full size of the field. Fix the buffer overflow issue by adding one additional byte for null terminator. Signed-off-by: Hari Kalavakunta <kalavakunta.hari.prasad@gmail.com> Reviewed-by: Paul Fertser <fercerpav@gmail.com> Link: https://patch.msgid.link/20250610193338.1368-1-kalavakunta.hari.prasad@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 18:21:59 -07:00
Benjamin Coddington	8b3ac9faba	SUNRPC: Cleanup/fix initial rq_pages allocation While investigating some reports of memory-constrained NUMA machines failing to mount v3 and v4.0 nfs mounts, we found that svc_init_buffer() was not attempting to retry allocations from the bulk page allocator. Typically, this results in a single page allocation being returned and the mount attempt fails with -ENOMEM. A retry would have allowed the mount to succeed. Additionally, it seems that the bulk allocation in svc_init_buffer() is redundant because svc_alloc_arg() will perform the required allocation and does the correct thing to retry the allocations. The call to allocate memory in svc_alloc_arg() drops the preferred node argument, but I expect we'll still allocate on the preferred node because the allocation call happens within the svc thread context, which chooses the node with memory closest to the current thread's execution. This patch cleans out the bulk allocation in svc_init_buffer() to allow svc_alloc_arg() to handle the allocation/retry logic for rq_pages. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: `ed603bcf4f` ("sunrpc: Replace the rq_pages array with dynamically-allocated memory") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-06-12 20:37:33 -04:00
Jakub Kicinski	9bb00786fc	net: ethtool: add dedicated callbacks for getting and setting rxfh fields We mux multiple calls to the drivers via the .get_nfc and .set_nfc callbacks. This is slightly inconvenient to the drivers as they have to de-mux them back. It will also be awkward for netlink code to construct struct ethtool_rxnfc when it wants to get info about RX Flow Hash, from the RSS module. Add dedicated driver callbacks. Create struct ethtool_rxfh_fields which contains only data relevant to RXFH. Maintain the names of the fields to avoid having to heavily modify the drivers. For now support both callbacks, once all drivers are converted ethtool_*et_rxfh_fields() will stop using the rxnfc callbacks. Link: https://patch.msgid.link/20250611145949.2674086-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 17:16:20 -07:00
Jakub Kicinski	fac4b41741	net: ethtool: require drivers to opt into the per-RSS ctx RXFH RX Flow Hashing supports using different configuration for different RSS contexts. Only two drivers seem to support it. Make sure we uniformly error out for drivers which don't. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250611145949.2674086-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 17:16:19 -07:00
Jakub Kicinski	2a644c5cec	net: ethtool: remove the duplicated handling from rxfh and rxnfc Now that the handles have been separated - remove the RX Flow Hash handling from rxnfc functions and vice versa. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250611145949.2674086-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 17:16:19 -07:00
Jakub Kicinski	f4f1265355	net: ethtool: copy the rxfh flow handling RX Flow Hash configuration uses the same argument structure as flow filters. This is probably why ethtool IOCTL handles them together. The more checks we add the more convoluted this code is getting (as some of the checks apply only to flow filters and others only to the hashing). Copy the code to separate the handling. This is an exact copy, the next change will remove unnecessary handling. Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20250611145949.2674086-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 17:16:19 -07:00
Jakub Kicinski	535de52801	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 10:09:10 -07:00
Linus Torvalds	27605c8c0f	Including fixes from bluetooth and wireless. Current release - regressions: - af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD Current release - new code bugs: - eth: airoha: correct enable mask for RX queues 16-31 - veth: prevent NULL pointer dereference in veth_xdp_rcv when peer disappears under traffic - ipv6: move fib6_config_validate() to ip6_route_add(), prevent invalid routes Previous releases - regressions: - phy: phy_caps: don't skip better duplex match on non-exact match - dsa: b53: fix untagged traffic sent via cpu tagged with VID 0 - Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused transient packet loss, exact reason not fully understood, yet Previous releases - always broken: - net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6) - sched: sfq: fix a potential crash on gso_skb handling - Bluetooth: intel: improve rx buffer posting to avoid causing issues in the firmware - eth: intel: i40e: make reset handling robust against multiple requests - eth: mlx5: ensure FW pages are always allocated on the local NUMA node, even when device is configure to 'serve' another node - wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850, prevent kernel crashes - wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request() for 3 sec if fw_stats_done is not set Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmhK/3IACgkQMUZtbf5S IruE5A//RdwiBW/pqoMIiRKLA3HZeUA/beYOl4DwVf8WFQNUIqdboeAi6k4yFrS+ SykKN0s1z8fW45lA46iFv3sR0QKYGln/v/cANsqojYqKBD3PF42dRifFlEAIz2M5 fnXK1VHPJOFK/OBOyKiiW3R6mFv+v9epZM8BKED77vFy7osDV2zkObePeE8/34B7 yVAr6JNTpB5Ex4ziG+e/6tFF6IX9RJLBl4fkRRynLDSsb1NFuy39LxPsxRQPxnzo tlfHfxEFl5qDNGondUoSxmp38HoO6MRofWp1d1GZoBbTXi0gXV26I5WaaBHBqPkm jZ7AtIMQq2+JuEg0y4dFFRehZLwLEMuhvlbacbIOKNBngVIsploBzvbG3ntWuUa4 Z5VFayQXumsHB5g7+vEFK6vCPaIpatKt419JsFXogNvVmmQzghALFlSymm/WbyGL Bj3R448xGDJw+2zDAXSH/nMMHkRaQd2Ptj2czvJ0Y7Fj8bxJgH0okaHOBrk9RQTQ bdUGCiMY84p6WI7rKDkFyyohMxppdYsY8A9hSdGgpqvu7dZi5yGmzz1Sp9+uSfSF Lj61am4LSvRsIuTP5cdqmTBn3mZS5R49hvJsFddgXRhF+Y9gB7LSm0sypZbuOEKD m9ijKcNETglzer0iMCwAVrIbDHGjqqHS74DkRzsuPsQ8kaCjsno= =0mtm -----END PGP SIGNATURE----- Merge tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and wireless. Current release - regressions: - af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD Current release - new code bugs: - eth: airoha: correct enable mask for RX queues 16-31 - veth: prevent NULL pointer dereference in veth_xdp_rcv when peer disappears under traffic - ipv6: move fib6_config_validate() to ip6_route_add(), prevent invalid routes Previous releases - regressions: - phy: phy_caps: don't skip better duplex match on non-exact match - dsa: b53: fix untagged traffic sent via cpu tagged with VID 0 - Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused transient packet loss, exact reason not fully understood, yet Previous releases - always broken: - net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6) - sched: sfq: fix a potential crash on gso_skb handling - Bluetooth: intel: improve rx buffer posting to avoid causing issues in the firmware - eth: intel: i40e: make reset handling robust against multiple requests - eth: mlx5: ensure FW pages are always allocated on the local NUMA node, even when device is configure to 'serve' another node - wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850, prevent kernel crashes - wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request() for 3 sec if fw_stats_done is not set" * tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context net: ethtool: Don't check if RSS context exists in case of context 0 af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD. ipv6: Move fib6_config_validate() to ip6_route_add(). net: drv: netdevsim: don't napi_complete() from netpoll net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get() veth: prevent NULL pointer dereference in veth_xdp_rcv net_sched: remove qdisc_tree_flush_backlog() net_sched: ets: fix a race in ets_qdisc_change() net_sched: tbf: fix a race in tbf_change() net_sched: red: fix a race in __red_change() net_sched: prio: fix a race in prio_tune() net_sched: sch_sfq: reject invalid perturb period net: phy: phy_caps: Don't skip better duplex macth on non-exact match MAINTAINERS: Update Kuniyuki Iwashima's email address. selftests: net: add test case for NAT46 looping back dst net: clear the dst when changing skb protocol net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper net/mlx5e: Fix leak of Geneve TLV option object net/mlx5: HWS, make sure the uplink is the last destination ...	2025-06-12 09:50:36 -07:00
Jakub Kicinski	d5705afbac	Another quick round of updates: - revert mwifiex HT40 that was causing issues - many ath10k/ath11k/ath12k fixes - re-add some iwlwifi code I lost in a merge - use kfree_sensitive() on an error path in cfg80211 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmhKjoYACgkQ10qiO8sP aAADgQ//fAOMAGzuuyxy3KwxtytpYWq/k0jb3HmHct135qxteoOSv/ah0/+nvYFD 4BNAkDa44hqAP5ynWYgGQIqssJ0WkkZFooCzMpb3mzsN5sONy7XfkqG0M8RIC3xC d28nt5zDufKt+0QtWUq9pUHamm6f+4kG+LQa9kGSlUNJ3wHUMSsONTgC7T8Rpb3u CxW5vyeIp0OJDKN65qsN1iGqzzA5hF7j4jX2BH+NF/8eoztY3t5C/o0mpRaHqY/d RWB9Sm5TmIXKnEHvy8CxIwm4+5goEdRi1ua/xJAC/SWmLm3NEEQPotJASnP3+xky 1Ft2EEGkYJHExYnGZaAHjykVY1JGNZos5gitp13325iFGLy9CyeCQ87Uml/k0uw9 k3xZKLzbCwIr1gPy6gTn0ai2V2P3CLmDuuvKiulIvOkfVsXbtB6zCxgansmVW8Xx WklAX7ZMUxSvI628FFAbGW2Gt5OSPQOXRIsk4LdMO6JQs85mMRux69rIM69F4aEV 8Kean7BjT/7RZGyd5VKjkDwYvOlh2z6+UUzShudqW7otsdA2P+W3+6yu2zPq8oC/ KUQTHtb9wrYux1CdSOovmmIVnc2NALegKLP6rf3VlVF+51TIAPMz8QhDA3N29ArZ /lhImuPhDpva08pgmFyT6WkHb3lj3KAQsp/O5gLHpLKhjkSdhi8= =wXKD -----END PGP SIGNATURE----- Merge tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Another quick round of updates: - revert mwifiex HT40 that was causing issues - many ath10k/ath11k/ath12k fixes - re-add some iwlwifi code I lost in a merge - use kfree_sensitive() on an error path in cfg80211 * tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: use kfree_sensitive() for connkeys cleanup wifi: iwlwifi: fix merge damage related to iwl_pci_resume Revert "wifi: mwifiex: Fix HT40 bandwidth issue." wifi: ath12k: fix uaf in ath12k_core_init() wifi: ath12k: Fix hal_reo_cmd_status kernel-doc wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850 wifi: ath11k: validate ath11k_crypto_mode on top of ath11k_core_qmi_firmware_ready wifi: ath11k: consistently use ath11k_mac_get_fw_stats() wifi: ath11k: move locking outside of ath11k_mac_get_fw_stats() wifi: ath11k: adjust unlock sequence in ath11k_update_stats_event() wifi: ath11k: move some firmware stats related functions outside of debugfs wifi: ath11k: don't wait when there is no vdev started wifi: ath11k: don't use static variables in ath11k_debugfs_fw_stats_process() wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request() wil6210: fix support for sparrow chipsets wifi: ath10k: Avoid vdev delete timeout when firmware is already down ath10k: snoc: fix unbalanced IRQ enable in crash recovery ==================== Link: https://patch.msgid.link/20250612082519.11447-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 08:16:47 -07:00
Gal Pressman	d78ebc772c	net: ethtool: Don't check if RSS context exists in case of context 0 Context 0 (default context) always exists, there is no need to check whether it exists or not when adding a flow steering rule. The existing check fails when creating a flow steering rule for context 0 as it is not stored in the rss_ctx xarray. For example: $ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618 rmgr: Cannot insert RX class rule: Invalid argument Cannot insert classification rule An example usecase for this could be: - A high-priority rule (loc 0) directing specific port traffic to context 0. - A low-priority rule (loc 1) directing all other TCP traffic to context 1. This is a user-visible regression that was caught in our testing environment, it was not reported by a user yet. Fixes: `de7f7582df` ("net: ethtool: prevent flow steering to RSS contexts which don't exist") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 08:15:35 -07:00
Jakub Kicinski	d5441acae7	bluetooth pull request for net: - eir: Fix NULL pointer deference on eir_get_service_data - eir: Fix possible crashes on eir_create_adv_data - hci_sync: Fix broadcast/PA when using an existing instance - ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets - ISO: Fix not using bc_sid as advertisement SID - MGMT: Fix sparse errors -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmhJ66MZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKfp/D/0VTEMF4PiA2eLHIPSwyIHr pvpz3nY1WE84lAVL0VKNJalA15dk6TVs3Vxgns62BHLdajBOmYPpuJGXaSERBfLB t5eb4nU9rx9F7+SW8zVLNwtnn5bTENNYKQIjfLmslDQQGfOjeaUP5sO/rIcLEiO3 0rEi55pE4nM6S2wUcmQlhWPC6tr3vIptg4lAz3MWlATDuUnkLjJ3rzEZdkg2kt39 2VJGNxXEG7sBrwv+coO3ROe54YSOrb+gvd9HOL0vq3MVBcvncCRqc7TuBlYi7/5C p+WdEyG26FgS/TzdgMJKuVISQp6kNKulbuRhsnD2XZA3Gik+t+79Ex9haYW+HLDS AWQNBm1FgYdCc4LsAxKfwGdvp8wAx1ci1vLNniYVTelyUAc5LosEZ/15DCCyTKdK 9zXEAfxwn72dLVtryVIRKqDR39QVqsxDSuV9ydgXzPJWwjisHX3AB01EqN5PGjYH aspNgMGfYL9zSw6N1LQ+99M+/JLbvLs7b4jui4CbD3EI7nxN0YqOcKlHw7vEje5s auU/UEL7DgWOzHTxCcidwATuV79pfx0CRSwsXaPLV1yA9lhS5AYdpBlsRB+wRFbN vhpw8dwj/WCM0GVYnG87BU3mriyfNgaERTVA2nLKZXvn+cRkVBUkLwBV3Jpi7vQZ cJ22gcrRj7uYotfvyCHv9g== =dulg -----END PGP SIGNATURE----- Merge tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - eir: Fix NULL pointer deference on eir_get_service_data - eir: Fix possible crashes on eir_create_adv_data - hci_sync: Fix broadcast/PA when using an existing instance - ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets - ISO: Fix not using bc_sid as advertisement SID - MGMT: Fix sparse errors * tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: MGMT: Fix sparse errors Bluetooth: ISO: Fix not using bc_sid as advertisement SID Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets Bluetooth: eir: Fix possible crashes on eir_create_adv_data Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance Bluetooth: Fix NULL pointer deference on eir_get_service_data ==================== Link: https://patch.msgid.link/20250611204944.1559356-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 08:13:48 -07:00
Kuniyuki Iwashima	43fb2b30ee	af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD. Before the cited commit, the kernel unconditionally embedded SCM credentials to skb for embryo sockets even when both the sender and listener disabled SO_PASSCRED and SO_PASSPIDFD. Now, the credentials are added to skb only when configured by the sender or the listener. However, as reported in the link below, it caused a regression for some programs that assume credentials are included in every skb, but sometimes not now. The only problematic scenario would be that a socket starts listening before setting the option. Then, there will be 2 types of non-small race window, where a client can send skb without credentials, which the peer receives as an "invalid" message (and aborts the connection it seems ?): Client Server ------ ------ s1.listen() <-- No SO_PASS{CRED,PIDFD} s2.connect() s2.send() <-- w/o cred s1.setsockopt(SO_PASS{CRED,PIDFD}) s2.send() <-- w/ cred or Client Server ------ ------ s1.listen() <-- No SO_PASS{CRED,PIDFD} s2.connect() s2.send() <-- w/o cred s3, _ = s1.accept() <-- Inherit cred options s2.send() <-- w/o cred but not set yet s3.setsockopt(SO_PASS{CRED,PIDFD}) s2.send() <-- w/ cred It's unfortunate that buggy programs depend on the behaviour, but let's restore the previous behaviour. Fixes: `3f84d577b7` ("af_unix: Inherit sk_flags at connect().") Reported-by: Jacek Łuczak <difrost.kernel@gmail.com> Closes: https://lore.kernel.org/all/68d38b0b-1666-4974-85d4-15575789c8d4@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Tested-by: Christian Heusel <christian@heusel.eu> Tested-by: André Almeida <andrealmeid@igalia.com> Tested-by: Jacek Łuczak <difrost.kernel@gmail.com> Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 08:13:06 -07:00

... 5 6 7 8 9 ...

81527 Commits