linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 23:46:45 +00:00

Author	SHA1	Message	Date
Paolo Abeni	941defcea7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py `75cc19c8ff` ("selftests: drv-net: add xdp cases for ping.py") `de94e86974` ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/ net/core/devmem.c `a70f891e0f` ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") `1d22d3060b` ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/ Adjacent changes: tools/testing/selftests/net/Makefile `6f50175cca` ("selftests: Add IPv6 link-local address generation tests for GRE devices.") `2e5584e0f9` ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c `661958552e` ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") `fe96d717d3` ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-13 23:08:11 +01:00
Stanislav Fomichev	1d22d3060b	net: drop rtnl_lock for queue_mgmt operations All drivers that use queue API are already converted to use netdev instance lock. Move netdev instance lock management to the netlink layer and drop rtnl_lock. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry. <almasrymina@google.com> Link: https://patch.msgid.link/20250311144026.4154277-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-12 13:32:35 -07:00
Stanislav Fomichev	adbf627f17	eth: bnxt: add missing netdev lock management to bnxt_dl_reload_up bnxt_dl_reload_up is completely missing instance lock management which can result in `devlink dev reload` leaving with instance lock held. Add the missing calls. Also add netdev_assert_locked to make it clear that the up() method is running with the instance lock grabbed. v2: - add net/netdev_lock.h include to bnxt_devlink.c for netdev_assert_locked Fixes: `004b500801` ("eth: bnxt: remove most dependencies on RTNL") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250309215851.2003708-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-12 13:19:15 -07:00
Stanislav Fomichev	eaca6e5dc6	eth: bnxt: request unconditional ops lock netdev_lock_ops conditionally grabs instance lock when queue_mgmt_ops is defined. However queue_mgmt_ops support is signaled via FW so we can sometimes boot without queue_mgmt_ops being set. This will result in bnxt running without instance lock which the driver now heavily depends on. Set request_ops_lock to true unconditionally to always request netdev instance lock. Fixes: `004b500801` ("eth: bnxt: remove most dependencies on RTNL") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250309215851.2003708-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-12 13:19:15 -07:00
Stanislav Fomichev	110eff172d	eth: bnxt: switch to netif_close All (error) paths that call dev_close are already holding instance lock, so switch to netif_close to avoid the deadlock. v2: - add missing EXPORT_MODULE for netif_close Fixes: `004b500801` ("eth: bnxt: remove most dependencies on RTNL") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250309215851.2003708-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-12 13:19:15 -07:00
Taehee Yoo	87dd285083	eth: bnxt: fix memory leak in queue reset When the queue is reset, the bnxt_alloc_one_tpa_info() is called to allocate tpa_info for the new queue. And then the old queue's tpa_info should be removed by the bnxt_free_one_tpa_info(), but it is not called. So memory leak occurs. It adds the bnxt_free_one_tpa_info() in the bnxt_queue_mem_free(). unreferenced object 0xffff888293cc0000 (size 16384): comm "ncdevmem", pid 2076, jiffies 4296604081 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 40 75 78 93 82 88 ff ff ........@ux..... 40 75 78 93 02 00 00 00 00 00 00 00 00 00 00 00 @ux............. backtrace (crc 5d7d4798): ___kmalloc_large_node+0x10d/0x1b0 __kmalloc_large_node_noprof+0x17/0x60 __kmalloc_noprof+0x3f6/0x520 bnxt_alloc_one_tpa_info+0x5f/0x300 [bnxt_en] bnxt_queue_mem_alloc+0x8e8/0x14f0 [bnxt_en] netdev_rx_queue_restart+0x233/0x620 net_devmem_bind_dmabuf_to_queue+0x2a3/0x600 netdev_nl_bind_rx_doit+0xc00/0x10a0 genl_family_rcv_msg_doit+0x1d4/0x2b0 genl_rcv_msg+0x3fb/0x6c0 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x447/0x710 netlink_sendmsg+0x712/0xbc0 __sys_sendto+0x3fd/0x4d0 __x64_sys_sendto+0xdc/0x1b0 Fixes: `2d694c27d3` ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-7-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:11 -07:00
Taehee Yoo	f09af5fdfb	eth: bnxt: fix kernel panic in the bnxt_get_queue_stats{rx \| tx} When qstats-get operation is executed, callbacks of netdev_stats_ops are called. The bnxt_get_queue_stats{rx \| tx} collect per-queue stats from sw_stats in the rings. But {rx \| tx \| cp}_ring are allocated when the interface is up. So, these rings are not allocated when the interface is down. The qstats-get is allowed even if the interface is down. However, the bnxt_get_queue_stats{rx \| tx}() accesses cp_ring and tx_ring without null check. So, it needs to avoid accessing rings if the interface is down. Reproducer: ip link set $interface down ./cli.py --spec netdev.yaml --dump qstats-get OR ip link set $interface down python ./stats.py Splat looks like: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1680fa067 P4D 1680fa067 PUD 16be3b067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 1495 Comm: python3 Not tainted 6.14.0-rc4+ #32 5cd0f999d5a15c574ac72b3e4b907341 Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bnxt_get_queue_stats_rx+0xf/0x70 [bnxt_en] Code: c6 87 b5 18 00 00 02 eb a2 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 01 RSP: 0018:ffffabef43cdb7e0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffc04c8710 RCX: 0000000000000000 RDX: ffffabef43cdb858 RSI: 0000000000000000 RDI: ffff8d504e850000 RBP: ffff8d506c9f9c00 R08: 0000000000000004 R09: ffff8d506bcd901c R10: 0000000000000015 R11: ffff8d506bcd9000 R12: 0000000000000000 R13: ffffabef43cdb8c0 R14: ffff8d504e850000 R15: 0000000000000000 FS: 00007f2c5462b080(0000) GS:ffff8d575f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000167fd0000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x15a/0x460 ? sched_balance_find_src_group+0x58d/0xd10 ? exc_page_fault+0x6e/0x180 ? asm_exc_page_fault+0x22/0x30 ? bnxt_get_queue_stats_rx+0xf/0x70 [bnxt_en cdd546fd48563c280cfd30e9647efa420db07bf1] netdev_nl_stats_by_netdev+0x2b1/0x4e0 ? xas_load+0x9/0xb0 ? xas_find+0x183/0x1d0 ? xa_find+0x8b/0xe0 netdev_nl_qstats_get_dumpit+0xbf/0x1e0 genl_dumpit+0x31/0x90 netlink_dump+0x1a8/0x360 Fixes: `af7b3b4add` ("eth: bnxt: support per-queue statistics") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250309134219.91670-6-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:11 -07:00
Taehee Yoo	c03e7d05aa	eth: bnxt: do not update checksum in bnxt_xdp_build_skb() The bnxt_rx_pkt() updates ip_summed value at the end if checksum offload is enabled. When the XDP-MB program is attached and it returns XDP_PASS, the bnxt_xdp_build_skb() is called to update skb_shared_info. The main purpose of bnxt_xdp_build_skb() is to update skb_shared_info, but it updates ip_summed value too if checksum offload is enabled. This is actually duplicate work. When the bnxt_rx_pkt() updates ip_summed value, it checks if ip_summed is CHECKSUM_NONE or not. It means that ip_summed should be CHECKSUM_NONE at this moment. But ip_summed may already be updated to CHECKSUM_UNNECESSARY in the XDP-MB-PASS path. So the by skb_checksum_none_assert() WARNS about it. This is duplicate work and updating ip_summed in the bnxt_xdp_build_skb() is not needed. Splat looks like: WARNING: CPU: 3 PID: 5782 at ./include/linux/skbuff.h:5155 bnxt_rx_pkt+0x479b/0x7610 [bnxt_en] Modules linked in: bnxt_re bnxt_en rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_] CPU: 3 UID: 0 PID: 5782 Comm: socat Tainted: G W 6.14.0-rc4+ #27 Tainted: [W]=WARN Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bnxt_rx_pkt+0x479b/0x7610 [bnxt_en] Code: 54 24 0c 4c 89 f1 4c 89 ff c1 ea 1f ff d3 0f 1f 00 49 89 c6 48 85 c0 0f 84 4c e5 ff ff 48 89 c7 e8 ca 3d a0 c8 e9 8f f4 ff ff <0f> 0b f RSP: 0018:ffff88881ba09928 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000c7590303 RCX: 0000000000000000 RDX: 1ffff1104e7d1610 RSI: 0000000000000001 RDI: ffff8881c91300b8 RBP: ffff88881ba09b28 R08: ffff888273e8b0d0 R09: ffff888273e8b070 R10: ffff888273e8b010 R11: ffff888278b0f000 R12: ffff888273e8b080 R13: ffff8881c9130e00 R14: ffff8881505d3800 R15: ffff888273e8b000 FS: 00007f5a2e7be080(0000) GS:ffff88881ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff2e708ff8 CR3: 000000013e3b0000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ? __warn+0xcd/0x2f0 ? bnxt_rx_pkt+0x479b/0x7610 ? report_bug+0x326/0x3c0 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x14/0x50 ? asm_exc_invalid_op+0x16/0x20 ? bnxt_rx_pkt+0x479b/0x7610 ? bnxt_rx_pkt+0x3e41/0x7610 ? __pfx_bnxt_rx_pkt+0x10/0x10 ? napi_complete_done+0x2cf/0x7d0 __bnxt_poll_work+0x4e8/0x1220 ? __pfx___bnxt_poll_work+0x10/0x10 ? __pfx_mark_lock.part.0+0x10/0x10 bnxt_poll_p5+0x36a/0xfa0 ? __pfx_bnxt_poll_p5+0x10/0x10 __napi_poll.constprop.0+0xa0/0x440 net_rx_action+0x899/0xd00 ... Following ping.py patch adds xdp-mb-pass case. so ping.py is going to be able to reproduce this issue. Fixes: `1dc4c557bf` ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250309134219.91670-5-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:11 -07:00
Taehee Yoo	661958552e	eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic When a queue is restarted, it sets MRU to 0 for stopping packet flow. MRU variable is a member of vnic_info[], the first vnic_info is default and the second is ntuple. Only when ntuple is enabled(ethtool -K eth0 ntuple on), vnic_info for ntuple is allocated in init logic. The bp->nr_vnics indicates how many vnic_info are allocated. However bnxt_queue_{start \| stop}() accesses vnic_info[BNXT_VNIC_NTUPLE] regardless of ntuple state. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Fixes: `b9d2956e86` ("bnxt_en: stop packet flow during bnxt_queue_stop/start") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-4-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:11 -07:00
Taehee Yoo	ca2456e073	eth: bnxt: return fail if interface is down in bnxt_queue_mem_alloc() The bnxt_queue_mem_alloc() is called to allocate new queue memory when a queue is restarted. It internally accesses rx buffer descriptor corresponding to the index. The rx buffer descriptor is allocated and set when the interface is up and it's freed when the interface is down. So, if queue is restarted if interface is down, kernel panic occurs. Splat looks like: BUG: unable to handle page fault for address: 000000000000b240 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 UID: 0 PID: 1563 Comm: ncdevmem2 Not tainted 6.14.0-rc2+ #9 844ddba6e7c459cafd0bf4db9a3198e Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bnxt_queue_mem_alloc+0x3f/0x4e0 [bnxt_en] Code: 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 89 f5 53 48 89 fb 4c 8d b5 40 05 00 00 48 83 ec 15 RSP: 0018:ffff9dcc83fef9e8 EFLAGS: 00010202 RAX: ffffffffc0457720 RBX: ffff934ed8d40000 RCX: 0000000000000000 RDX: 000000000000001f RSI: ffff934ea508f800 RDI: ffff934ea508f808 RBP: ffff934ea508f800 R08: 000000000000b240 R09: ffff934e84f4b000 R10: ffff9dcc83fefa30 R11: ffff934e84f4b000 R12: 000000000000001f R13: ffff934ed8d40ac0 R14: ffff934ea508fd40 R15: ffff934e84f4b000 FS: 00007fa73888c740(0000) GS:ffff93559f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000b240 CR3: 0000000145a2e000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x15a/0x460 ? exc_page_fault+0x6e/0x180 ? asm_exc_page_fault+0x22/0x30 ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 7f85e76f4d724ba07471d7e39d9e773aea6597b7] ? bnxt_queue_mem_alloc+0x3f/0x4e0 [bnxt_en 7f85e76f4d724ba07471d7e39d9e773aea6597b7] netdev_rx_queue_restart+0xc5/0x240 net_devmem_bind_dmabuf_to_queue+0xf8/0x200 netdev_nl_bind_rx_doit+0x3a7/0x450 genl_family_rcv_msg_doit+0xd9/0x130 genl_rcv_msg+0x184/0x2b0 ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10 ? __pfx_genl_rcv_msg+0x10/0x10 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 ... Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Fixes: `2d694c27d3` ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250309134219.91670-3-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:11 -07:00
Taehee Yoo	9f7b2aa503	eth: bnxt: fix truesize for mb-xdp-pass case When mb-xdp is set and return is XDP_PASS, packet is converted from xdp_buff to sk_buff with xdp_update_skb_shared_info() in bnxt_xdp_build_skb(). bnxt_xdp_build_skb() passes incorrect truesize argument to xdp_update_skb_shared_info(). The truesize is calculated as BNXT_RX_PAGE_SIZE * sinfo->nr_frags but the skb_shared_info was wiped by napi_build_skb() before. So it stores sinfo->nr_frags before bnxt_xdp_build_skb() and use it instead of getting skb_shared_info from xdp_get_shared_info_from_buff(). Splat looks like: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 0 at net/core/skbuff.c:6072 skb_try_coalesce+0x504/0x590 Modules linked in: xt_nat xt_tcpudp veth af_packet xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype nft_coms CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.14.0-rc2+ #3 RIP: 0010:skb_try_coalesce+0x504/0x590 Code: 4b fd ff ff 49 8b 34 24 40 80 e6 40 0f 84 3d fd ff ff 49 8b 74 24 48 40 f6 c6 01 0f 84 2e fd ff ff 48 8d 4e ff e9 25 fd ff ff <0f> 0b e99 RSP: 0018:ffffb62c4120caa8 EFLAGS: 00010287 RAX: 0000000000000003 RBX: ffffb62c4120cb14 RCX: 0000000000000ec0 RDX: 0000000000001000 RSI: ffffa06e5d7dc000 RDI: 0000000000000003 RBP: ffffa06e5d7ddec0 R08: ffffa06e6120a800 R09: ffffa06e7a119900 R10: 0000000000002310 R11: ffffa06e5d7dcec0 R12: ffffe4360575f740 R13: ffffe43600000000 R14: 0000000000000002 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffffa0755f700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f147b76b0f8 CR3: 00000001615d4000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ? __warn+0x84/0x130 ? skb_try_coalesce+0x504/0x590 ? report_bug+0x18a/0x1a0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? skb_try_coalesce+0x504/0x590 inet_frag_reasm_finish+0x11f/0x2e0 ip_defrag+0x37a/0x900 ip_local_deliver+0x51/0x120 ip_sublist_rcv_finish+0x64/0x70 ip_sublist_rcv+0x179/0x210 ip_list_rcv+0xf9/0x130 How to reproduce: <Node A> ip link set $interface1 xdp obj xdp_pass.o ip link set $interface1 mtu 9000 up ip a a 10.0.0.1/24 dev $interface1 <Node B> ip link set $interfac2 mtu 9000 up ip a a 10.0.0.2/24 dev $interface2 ping 10.0.0.1 -s 65000 Following ping.py patch adds xdp-mb-pass case. so ping.py is going to be able to reproduce this issue. Fixes: `1dc4c557bf` ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-2-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:11 -07:00
Jakub Kicinski	8ef890df40	net: move misc netdev_lock flavors to a separate header Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-08 09:06:50 -08:00
Doug Berger	254f3239dd	net: bcmgenet: revise suspend/resume If the network interface is configured for Wake-on-LAN we should avoid bringing the interface down and up since it slows the time to reestablish network traffic on resume. Redundant calls to phy_suspend() and phy_resume() are removed since they are already invoked from within phy_stop() and phy_start() called from bcmgenet_netif_stop() and bcmgenet_netif_start(). Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-15-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:48 -08:00
Doug Berger	2432b9817b	net: bcmgenet: allow return of power up status It is possible for a WoL power up to fail due to the GENET being reset while in the suspend state. Allow these failures to be returned as error codes to allow different recovery behavior when necessary. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-14-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:48 -08:00
Doug Berger	ffce2bedd3	net: bcmgenet: move bcmgenet_power_up into resume_noirq The bcmgenet_power_up() function is moved from the resume method to the resume_noirq method for symmetry with the suspend_noirq method. This allows the wol_active flag to be removed. The UMAC_IRQ_WAKE_EVENT interrupts that can be unmasked by the bcmgenet_wol_power_down_cfg() function are now re-masked by the bcmgenet_wol_power_up_cfg() function at the resume_noirq level as well. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-13-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:48 -08:00
Doug Berger	f1bacae8b6	net: bcmgenet: support reclaiming unsent Tx packets When disabling the transmitter any outstanding packets can now be reclaimed by bcmgenet_tx_reclaim_all() rather than by the bcmgenet_fini_dma() function. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-12-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:48 -08:00
Doug Berger	791f349d02	net: bcmgenet: introduce bcmgenet_[r\|t]dma_disable The bcmgenet_rdma_disable and bcmgenet_tdma_disable functions are introduced to provide a common method for disabling each dma and the code is simplified. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-11-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	58affb23b6	net: bcmgenet: consolidate dma initialization The functions bcmgenet_dma_disable and bcmgenet_enable_dma are only used as part of dma initialization. Their functionality is moved inside bcmgenet_init_dma and the functions are removed. Since the dma is always disabled inside of bcmgenet_init_dma, the initialization functions bcmgenet_init_rx_queues and bcmgenet_init_tx_queues no longer need to attempt to manage its state. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-10-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	8b031d4e9b	net: bcmgenet: remove dma_ctrl argument Since the individual queues manage their own DMA enables there is no need to return dma_ctrl from bcmgenet_dma_disable() and pass it back to bcmgenet_enable_dma(). Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-9-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	6d31f8fc6c	net: bcmgenet: add support for RX_CLS_FLOW_DISC Now that the DESC_INDEX ring descriptor is no longer used we can enable hardware discarding of flows by routing them to a queue that is not enabled. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-8-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	3b5d4f5a82	net: bcmgenet: move DESC_INDEX flow to ring 0 The default transmit and receive packet handling is moved from the DESC_INDEX (i.e. 16) descriptor rings to the Ring 0 queues. This saves a fair amount of special case code by unifying the handling. A default dummy filter is enabled in the Hardware Filter Block to route default receive packets to Ring 0. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-7-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	f841f5ef99	net: bcmgenet: extend bcmgenet_hfb_* API Extend the bcmgenet_hfb_* API to allow initialization and programming of the Hardware Filter Block on GENET v1 and GENET v2 hardware. Programming of ethtool flows is still not supported on this older hardware. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-6-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	59a97b8184	net: bcmgenet: BCM7712 is GENETv5 compatible The major revision of the GENET core in the BCM7712 SoC was bumped to 7 but it is compatible with the GENETv5 implementation. This commit maps the version accordingly to avoid a warning. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-5-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:47 -08:00
Doug Berger	a2bdde505f	net: bcmgenet: move feature flags to bcmgenet_priv The feature flags are moved and consolidated to the primary private driver structure and are now initialized from the platform device data rather than the hardware parameters to allow finer control over which platforms use which features. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-4-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:46 -08:00
Doug Berger	07c1a756a5	net: bcmgenet: add bcmgenet_has_* helpers Introduce helper functions to indicate whether the driver should make use of a particular feature that it supports. These helpers abstract the implementation of how the feature availability is encoded. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-3-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:46 -08:00
Doug Berger	d2b4106805	net: bcmgenet: bcmgenet_hw_params clean up The entries of the bcmgenet_hw_params array are broken out to remove unused and duplicate entries and are made read only since they should not change for a specific version of the GENET hardware. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250306192643.2383632-2-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:33:46 -08:00
Stanislav Fomichev	004b500801	eth: bnxt: remove most dependencies on RTNL Only devlink and sriov paths are grabbing rtnl explicitly. The rest is covered by netdev instance lock which the core now grabs, so there is no need to manage rtnl in most places anymore. On the core side we can now try to drop rtnl in some places (do_setlink for example) for the drivers that signal non-rtnl mode (TBD). Boot-tested and with `ethtool -L eth1 combined 24` to trigger reset. Cc: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-15-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-06 12:59:45 -08:00
Stanislav Fomichev	cae03e5bdd	net: hold netdev instance lock during queue operations For the drivers that use queue management API, switch to the mode where core stack holds the netdev instance lock. This affects the following drivers: - bnxt - gve - netdevsim Originally I locked only start/stop, but switched to holding the lock over all iterations to make them look atomic to the device (feels like it should be easier to reason about). Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-06 12:59:43 -08:00
Manoj Panicker	c214410c47	bnxt_en: Add TPH support in BNXT driver Add TPH support to the Broadcom BNXT device driver. This allows the driver to utilize TPH functions for retrieving and configuring Steering Tags when changing interrupt affinity. With compatible NIC firmware, network traffic will be tagged correctly with Steering Tags, resulting in significant memory bandwidth savings and other advantages as demonstrated by real network benchmarks on TPH-capable platforms. Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:23 -08:00
Somnath Kotur	fe96d717d3	bnxt_en: Extend queue stop/start for TX rings In order to use queue_stop/queue_start to support the new Steering Tags, we need to free the TX ring and TX completion ring if it is a combined channel with TX/RX sharing the same NAPI. Otherwise TX completions will not have the updated Steering Tag. If TPH is not enabled, we just stop the TX ring without freeing the TX/TX cmpl rings. With that we can now add napi_disable() and napi_enable() during queue_stop()/ queue_start(). This will guarantee that NAPI will stop processing the completion entries in case there are additional pending entries in the completion rings after queue_stop(). There could be some NQEs sitting unprocessed while NAPI is disabled thereby leaving the NQ unarmed. Explicitly re-arm the NQ after napi_enable() in queue start so that NAPI will resume properly. Error handling in bnxt_queue_start() requires a reset. If a TX ring cannot be allocated or initialized properly, it will cause TX timeout. The reset will also free any partially allocated rings. We don't expect to hit this error path because re-allocating previously reserved and allocated rings with the same parameters should never fail. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:23 -08:00
Michael Chan	c8a0f7652d	bnxt_en: Refactor TX ring free logic Add a new bnxt_hwrm_tx_ring_free() function to handle freeing a HW transmit ring. The new function will also be used in the next patch to free the TX ring in queue_stop. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:23 -08:00
Somnath Kotur	6b6bf60fc9	bnxt_en: Reallocate RX completion ring for TPH support In order to program the correct Steering Tag during an IRQ affinity change, we need to free/re-allocate the RX completion ring during queue_restart. If TPH is enabled, call FW to free the Rx completion ring and clear the ring entries in queue_stop(). Re-allocate it in queue_start() if TPH is enabled. Note that TPH mode is not enabled in this patch and will be enabled later in the patch series. While modifying bnxt_queue_start(), remove the unnecessary zeroing of rxr->rx_next_cons. It gets overwritten by the clone in bnxt_queue_start(). Remove the rx_reset counter increment since restart is not reset. Add comment to clarify that the ring allocations in queue_start should never fail. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:23 -08:00
Michael Chan	4c8e612c9a	bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Newer firmware can use the NQ ring ID associated with each RX/RX AGG ring to enable PCIe Steering Tags on P5_PLUS chips. When allocating RX/RX AGG rings, pass along NQ ring ID for the firmware to use. This information helps optimize DMA writes by directing them to the cache closer to the CPU consuming the data, potentially improving the processing speed. This change is backward-compatible with older firmware, which will simply disregard the information. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:23 -08:00
Michael Chan	e1714de532	bnxt_en: Refactor RX/RX AGG ring parameters setup for P5_PLUS There is some common code for setting up RX and RX AGG ring allocation parameters for P5_PLUS chips. Refactor the logic into a new function. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:22 -08:00
Somnath Kotur	09cc58d594	bnxt_en: Refactor bnxt_free_tx_rings() to free per TX ring Modify bnxt_free_tx_rings() to free the skbs per TX ring. This will be useful later in the series. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:22 -08:00
Somnath Kotur	f33a508c23	bnxt_en: Refactor completion ring free routine Add a wrapper routine to free L2 completion rings. This will be useful later in the series. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:22 -08:00
Michael Chan	e6ec504856	bnxt_en: Refactor TX ring allocation logic Add a new bnxt_hwrm_tx_ring_alloc() function to handle allocating a transmit ring. This will be useful later in the series. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:22 -08:00
Michael Chan	0fed290525	bnxt_en: Refactor completion ring allocation logic for P5_PLUS chips Add a new bnxt_hwrm_cp_ring_alloc_p5() function to handle allocating one completion ring on P5_PLUS chips. This simplifies the existing code and will be useful later in the series. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:22 -08:00
Michael Chan	ebdf7fe488	bnxt_en: Set NPAR 1.2 support when registering with firmware NPAR (Network interface card partitioning)[1] 1.2 adds a transparent VLAN tag for all packets between the NIC and the switch. Because of that, RX VLAN acceleration cannot be supported for any additional host configured VLANs. The driver has to acknowledge that it can support no RX VLAN acceleration and set the NPAR 1.2 supported flag when registering with the FW. Otherwise, the FW call will fail and the driver will abort on these NPAR 1.2 NICs with this error: bnxt_en 0000:26:00.0 (unnamed net_device) (uninitialized): hwrm req_type 0x1d seq id 0xb error 0x2 [1] https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernet-nic-controllers/bcm957xxx/adapters/introduction/features/network-partitioning-npar.html Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250213011240.1640031-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-14 19:50:22 -08:00
Eric Biggers	8df3682904	lib/crc32: standardize on crc32c() name for Castagnoli CRC32 For historical reasons, the Castagnoli CRC32 is available under 3 names: crc32c(), crc32c_le(), and __crc32c_le(). Most callers use crc32c(). The more verbose versions are not really warranted; there is no "_be" version that the "_le" version needs to be differentiated from, and the leading underscores are pointless. Therefore, let's standardize on just crc32c(). Remove the other two names, and update callers accordingly. Specifically, the new crc32c() comes from what was previously __crc32c_le(), so compared to the old crc32c() it now takes a size_t length rather than unsigned int, and it's now in linux/crc32.h instead of just linux/crc32c.h (which includes linux/crc32.h). Later patches will also rename __crc32c_le_combine(), crc32c_le_base(), and crc32c_le_arch(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-02-08 20:06:30 -08:00
Lenny Szubowicz	e0efe83ed3	tg3: Disable tg3 PCIe AER on system reboot Disable PCIe AER on the tg3 device on system reboot on a limited list of Dell PowerEdge systems. This prevents a fatal PCIe AER event on the tg3 device during the ACPI _PTS (prepare to sleep) method for S5 on those systems. The _PTS is invoked by acpi_enter_sleep_state_prep() as part of the kernel's reboot sequence as a result of commit `38f34dba80` ("PM: ACPI: reboot: Reinstate S5 for reboot"). There was an earlier fix for this problem by commit `2ca1c94ce0` ("tg3: Disable tg3 device on system reboot to avoid triggering AER"). But it was discovered that this earlier fix caused a reboot hang when some Dell PowerEdge servers were booted via ipxe. To address this reboot hang, the earlier fix was essentially reverted by commit `9fc3bc7643` ("tg3: power down device only on SYSTEM_POWER_OFF"). This re-exposed the tg3 PCIe AER on reboot problem. This fix is not an ideal solution because the root cause of the AER is in system firmware. Instead, it's a targeted work-around in the tg3 driver. Note also that the PCIe AER must be disabled on the tg3 device even if the system is configured to use "firmware first" error handling. V3: - Fix sparse warning on improper comparison of pdev->current_state - Adhere to netdev comment style Fixes: `9fc3bc7643` ("tg3: power down device only on SYSTEM_POWER_OFF") Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-02-03 10:12:14 +00:00
Florian Fainelli	46ded70923	net: bcmgenet: Correct overlaying of PHY and MAC Wake-on-LAN Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC, while others might be only supported by the PHY. Make sure that the .get_wol() returns the union of both rather than only that of the PHY if the PHY supports Wake-on-LAN. When disabling Wake-on-LAN, make sure that this is done at both the PHY and MAC level, rather than doing an early return from the PHY driver. Fixes: `7e400ff35c` ("net: bcmgenet: Add support for PHY-based Wake-on-LAN") Fixes: `9ee09edc05` ("net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250129231342.35013-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-02-01 16:53:33 -08:00
Linus Torvalds	c2933b2bef	First batch of fixes for 6.14. Nothing really stands out, but as usual there's a slight concentration of fixes for issues added in the last two weeks before the MW, and driver bugs from 6.13 which tend to get discovered upon wider distribution. Including fixes from IPSec, netfilter and Bluetooth. Current release - regressions: - net: revert RTNL changes in unregister_netdevice_many_notify() - Bluetooth: fix possible infinite recursion of btusb_reset - eth: adjust locking in some old drivers which protect their state with spinlocks to avoid sleeping in atomic; core protects netdev state with a mutex now Previous releases - regressions: - eth: mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node() - eth: bgmac: reduce max frame size to support just 1500 bytes; the jumbo frame support would previously cause OOB writes, but now fails outright - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid false detection of MPTCP blackholing Previous releases - always broken: - mptcp: handle fastopen disconnect correctly - xfrm: make sure skb->sk is a full sock before accessing its fields - xfrm: fix taking a lock with preempt disabled for RT kernels - usb: ipheth: improve safety of packet metadata parsing; prevent potential OOB accesses - eth: renesas: fix missing rtnl lock in suspend/resume path Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmebzXsACgkQMUZtbf5S IrvGBQ//auOF2yY1sg40fBvc6Hr1jpZBcr+uqTL6Qka1uVOvTFY51hAN54lBt32+ ixmcHsD0xdcHrr7VrqSXqurQLiGsdwUpnxZFCj/FymQuMunVysEqudvPeKDVHpsw JW5c4nJOexEA2viByK9iB23Qq0P3uBoPEnKrbSTVSDvYaXUj6y8Cvt3/vXc+H/tc T7GaxHH55NNNPkRz34YU3OWcaZsgkQEcdVpZf4tODPmg7J5VQj8SQeMhk/HI0sdO WKjWB0woZkiQECtamqAOXnv47PXd6igv8NALRPlJcKjs0EszUvuYhD/9MEOeghjI sjcQn9JnPpG+ca/qFVCSpEEOo2zGVn5dkJT5x26udH+5XHf7Pq+zpJwB6LHo98yF bGMpIrF6gi2EnBtS/tRjMyBU9Ut9KiUtjXMvn9EsD1U1FGIbz6wyQLlT0pG0hwJb rEdKfegrcyhWKHOD4vH9ciEg/7lgGfsGyfJDktIMdyailZc6tBcxwbdlc+5jRDA9 0RqGASIXaiX7AC3WOSeQzMgbV+WXdhaX/yrMJL5KBfBTzxG2audnJt1tPN3mbh3z NM6M2cnMsoX4QLSiaukJaCL7LWHSTlVttZVg8FGXHj1PejMQQBVjGVvzk2UF55UR gV7X9/VkXhmIDAZgThWtOdPLz+ItksfSKiruhUsXust6JgqRuqc= =GjBE -----END PGP SIGNATURE----- Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from IPSec, netfilter and Bluetooth. Nothing really stands out, but as usual there's a slight concentration of fixes for issues added in the last two weeks before the merge window, and driver bugs from 6.13 which tend to get discovered upon wider distribution. Current release - regressions: - net: revert RTNL changes in unregister_netdevice_many_notify() - Bluetooth: fix possible infinite recursion of btusb_reset - eth: adjust locking in some old drivers which protect their state with spinlocks to avoid sleeping in atomic; core protects netdev state with a mutex now Previous releases - regressions: - eth: - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node() - bgmac: reduce max frame size to support just 1500 bytes; the jumbo frame support would previously cause OOB writes, but now fails outright - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid false detection of MPTCP blackholing Previous releases - always broken: - mptcp: handle fastopen disconnect correctly - xfrm: - make sure skb->sk is a full sock before accessing its fields - fix taking a lock with preempt disabled for RT kernels - usb: ipheth: improve safety of packet metadata parsing; prevent potential OOB accesses - eth: renesas: fix missing rtnl lock in suspend/resume path" * tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) MAINTAINERS: add Neal to TCP maintainers net: revert RTNL changes in unregister_netdevice_many_notify() net: hsr: fix fill_frame_info() regression vs VLAN packets doc: mptcp: sysctl: blackhole_timeout is per-netns mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted netfilter: nf_tables: reject mismatching sum of field_len with set key length net: sh_eth: Fix missing rtnl lock in suspend/resume path net: ravb: Fix missing rtnl lock in suspend/resume path selftests/net: Add test for loading devbound XDP program in generic mode net: xdp: Disallow attaching device-bound programs in generic mode tcp: correct handling of extreme memory squeeze bgmac: reduce max frame size to support just MTU 1500 vsock/test: Add test for connect() retries vsock/test: Add test for UAF due to socket unbinding vsock/test: Introduce vsock_connect_fd() vsock/test: Introduce vsock_bind() vsock: Allow retrying on connect() failure vsock: Keep the binding until socket destruction Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming ...	2025-01-30 12:24:20 -08:00
Rafał Miłecki	752e5fcc2e	bgmac: reduce max frame size to support just MTU 1500 bgmac allocates new replacement buffer before handling each received frame. Allocating & DMA-preparing 9724 B each time consumes a lot of CPU time. Ideally bgmac should just respect currently set MTU but it isn't the case right now. For now just revert back to the old limited frame size. This change bumps NAT masquerade speed by ~95%. Since commit `8218f62c9c` ("mm: page_frag: use initial zero offset for page_frag_alloc_align()"), the bgmac driver fails to open its network interface successfully and runs out of memory in the following call stack: bgmac_open -> bgmac_dma_init -> bgmac_dma_rx_skb_for_slot -> netdev_alloc_frag BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768. Eventually we land into __page_frag_alloc_align() with the following parameters across multiple successive calls: __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0 __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=10048 __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=20096 __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=30144 So in that case we do indeed have offset + fragsz (40192) > size (32768) and so we would eventually return NULL. Reverting to the older 1500 bytes MTU allows the network driver to be usable again. Fixes: `8c7da63978` ("bgmac: configure MTU and add support for frames beyond 8192 byte size") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> [florian: expand commit message about recent commits] Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250127175159.1788246-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-29 18:57:42 -08:00
Jakub Kicinski	8ed47e4e0b	eth: tg3: fix calling napi_enable() in atomic context tg3 has a spin lock protecting most of the config, switch to taking netdev_lock() explicitly on enable/start paths. Disable/stop paths seem to not be under the spin lock (since napi_disable() already needs to sleep), so leave that side as is. tg3_restart_hw() releases and re-takes the spin lock, we need to do the same because dev_close() needs to take netdev_lock(). Fixes: `413f0271f3` ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250124031841.1179756-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-27 14:30:49 -08:00
Linus Torvalds	0afd22092d	RDMA v6.14 merge window pull request Lighter that normal, but the now usual collection of driver fixes and small improvements: - Small fixes and minor improvements to cxgb4, bnxt_re, rxe, srp, efa, cxgb4 - Update mlx4 to use the new umem APIs, avoiding direct use of scatterlist - Support ROCEv2 in erdma - Remove various uncalled functions, constify bin_attribute - Provide core infrastructure to catch netdev events and route them to drivers, consolidating duplicated driver code - Fix rare race condition crashes in mlx5 ODP flows -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ5JkNAAKCRCFwuHvBreF YeCwAP9nrzFZMBEa0DVx7V8w3sotQCOzaaoi+4UOigeleppKSAD+K7QA5CIwNf7j x0apvJlNsXKSYJVUI2rch/qIL8ckQwM= =wIYu -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Lighter that normal, but the now usual collection of driver fixes and small improvements: - Small fixes and minor improvements to cxgb4, bnxt_re, rxe, srp, efa, cxgb4 - Update mlx4 to use the new umem APIs, avoiding direct use of scatterlist - Support ROCEv2 in erdma - Remove various uncalled functions, constify bin_attribute - Provide core infrastructure to catch netdev events and route them to drivers, consolidating duplicated driver code - Fix rare race condition crashes in mlx5 ODP flows" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (63 commits) RDMA/mlx5: Fix implicit ODP use after free RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error RDMA/qib: Constify 'struct bin_attribute' RDMA/hfi1: Constify 'struct bin_attribute' RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]" RDMA/cxgb4: Notify rdma stack for IB_EVENT_QP_LAST_WQE_REACHED event RDMA/bnxt_re: Allocate dev_attr information dynamically RDMA/bnxt_re: Pass the context for ulp_irq_stop RDMA/bnxt_re: Add support to handle DCB_CONFIG_CHANGE event RDMA/bnxt_re: Query firmware defaults of CC params during probe RDMA/bnxt_re: Add Async event handling support bnxt_en: Add ULP call to notify async events RDMA/mlx5: Fix indirect mkey ODP page count MAINTAINERS: Update the bnxt_re maintainers RDMA/hns: Clean up the legacy CONFIG_INFINIBAND_HNS RDMA/rtrs: Add missing deinit() call RDMA/efa: Align interrupt related fields to same type RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of error RDMA/mlx5: Fix link status down event for MPV RDMA/erdma: Support create_ah/destroy_ah in non-sleepable contexts ...	2025-01-24 12:21:28 -08:00
Jakub Kicinski	99d028c634	eth: bnxt: update header sizing defaults 300-400B RPC requests are fairly common. With the current default of 256B HDS threshold bnxt ends up splitting those, lowering PCIe bandwidth efficiency and increasing the number of memory allocation. Increase the HDS threshold to fit 4 buffers in a 4k page. This works out to 640B as the threshold on a typical kernel confing. This change increases the performance for a microbenchmark which receives 400B RPCs and sends empty responses by 4.5%. Admittedly this is just a single benchmark, but 256B works out to just 6 (so 2 more) packets per head page, because shinfo size dominates the headers. Now that we use page pool for the header pages I was also tempted to default rx_copybreak to 0, but in synthetic testing the copybreak size doesn't seem to make much difference. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-20 11:44:58 -08:00
Jakub Kicinski	bee018052d	eth: bnxt: allocate enough buffer space to meet HDS threshold Now that we can configure HDS threshold separately from the rx_copybreak HDS threshold may be higher than rx_copybreak. We need to make sure that we have enough space for the headers. Fixes: `6b43673a25` ("bnxt_en: add support for hds-thresh ethtool command") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-20 11:44:58 -08:00
Jakub Kicinski	928459bbda	net: ethtool: populate the default HDS params in the core The core has the current HDS config, it can pre-populate the values for the drivers. While at it, remove the zero-setting in netdevsim. Zero are the default values since the config is zalloc'ed. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-20 11:44:58 -08:00
Jakub Kicinski	e58263e911	eth: bnxt: apply hds_thrs settings correctly Use the pending config for hds_thrs. Core will only update the "current" one after we return success. Without this change 2 reconfigs would be required for the setting to reach the device. Fixes: `6b43673a25` ("bnxt_en: add support for hds-thresh ethtool command") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-20 11:44:57 -08:00
Jakub Kicinski	3c836451ca	net: move HDS config from ethtool state Separate the HDS config from the ethtool state struct. The HDS config contains just simple parameters, not state. Having it as a separate struct will make it easier to clone / copy and also long term potentially make it per-queue. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-20 11:44:57 -08:00
Jakub Kicinski	17656eb5cf	eth: bnxt: fix string truncation warning in FW version W=1 builds with gcc 14.2.1 report: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:4193:32: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size 27 [-Werror=format-truncation=] 4193 \| "/pkg %s", buf); It's upset that we let buf be full length but then we use 5 characters for "/pkg ". The builds is also clear with clang version 19.1.5 now. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250117183726.1481524-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-18 17:32:45 -08:00
Jakub Kicinski	2ee738e90e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.13-rc8). Conflicts: drivers/net/ethernet/realtek/r8169_main.c `1f691a1fc4` ("r8169: remove redundant hwmon support") `152d00a913` ("r8169: simplify setting hwmon attribute visibility") https://lore.kernel.org/20250115122152.760b4e8d@canb.auug.org.au Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c `152f4da05a` ("bnxt_en: add support for rx-copybreak ethtool command") `f0aa6a37a3` ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref") drivers/net/ethernet/intel/ice/ice_type.h `50327223a8` ("ice: add lock to protect low latency interface") `dc26548d72` ("ice: Fix quad registers read on E825") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-16 10:34:59 -08:00
Taehee Yoo	6b43673a25	bnxt_en: add support for hds-thresh ethtool command The bnxt_en driver has configured the hds_threshold value automatically when TPA is enabled based on the rx-copybreak default value. Now the hds-thresh ethtool command is added, so it adds an implementation of hds-thresh option. Configuration of the hds-thresh is applied only when the tcp-data-split is enabled. The default value of hds-thresh is 256, which is the default value of rx-copybreak, which used to be the hds_thresh value. The maximum hds-thresh is 1023. # Example: # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-9-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 14:42:12 -08:00
Taehee Yoo	87c8f8496a	bnxt_en: add support for tcp-data-split ethtool command NICs that uses bnxt_en driver supports tcp-data-split feature by the name of HDS(header-data-split). But there is no implementation for the HDS to enable by ethtool. Only getting the current HDS status is implemented and The HDS is just automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. The hds_threshold follows rx-copybreak value. and it was unchangeable. This implements `ethtool -G <interface name> tcp-data-split <value>` command option. The value can be <on> and <auto>. The value is <auto> and one of LRO/GRO/JUMBO is enabled, HDS is automatically enabled and all LRO/GRO/JUMBO are disabled, HDS is automatically disabled. HDS feature relies on the aggregation ring. So, if HDS is enabled, the bnxt_en driver initializes the aggregation ring. This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-8-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 14:42:12 -08:00
Taehee Yoo	152f4da05a	bnxt_en: add support for rx-copybreak ethtool command The bnxt_en driver supports rx-copybreak, but it couldn't be set by userspace. Only the default value(256) has worked. This patch makes the bnxt_en driver support following command. `ethtool --set-tunable <devname> rx-copybreak <value> ` and `ethtool --get-tunable <devname> rx-copybreak`. By this patch, hds_threshol is set to the rx-copybreak value. But it will be set by `ethtool -G eth0 hds-thresh N` in the next patch. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-7-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 14:42:11 -08:00
Russell King (Oracle)	21f56ad1b2	net: bcm: asp2: convert to phylib managed EEE Convert the Broadcom ASP2 driver to use phylib managed EEE support. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk81-000r4x-TS@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 13:17:56 -08:00
Russell King (Oracle)	df8017e8a1	net: bcm: asp2: remove tx_lpi_enabled Phylib maintains a copy of tx_lpi_enabled, which will be used to populate the member when phy_ethtool_get_eee(). Therefore, writing to this member before phy_ethtool_get_eee() will have no effect. Remove it. Also remove setting our copy of info->eee.tx_lpi_enabled which becomes write-only. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk7w-000r4r-Pq@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 13:17:56 -08:00
Russell King (Oracle)	54033f5512	net: bcm: asp2: fix LPI timer handling Fix the LPI timer handling in Broadcom ASP2 driver after the phylib managed EEE patches were merged. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/E1tXk7r-000r4l-Li@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-15 13:17:56 -08:00
Kalesh AP	57e6464c22	RDMA/bnxt_re: Pass the context for ulp_irq_stop ulp_irq_stop() can be invoked from a context where FW is healthy or when FW is in a reset state. In the latter case, ULP must stop all interactions with HW/FW and also with application and stack. Added a new parameter to the ulp_irq_stop() function to achieve that. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-01-14 06:22:10 -05:00
Kalesh AP	7fea327840	RDMA/bnxt_re: Add Async event handling support Using the option provided by Ethernet driver, register for FW Async event. During probe, while registeriung with Ethernet driver, provide the ulp hook 'ulp_async_notifier' for receiving the firmware events. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-01-14 04:05:20 -05:00
Michael Chan	184fe6f238	bnxt_en: Add ULP call to notify async events When the driver receives an async event notification from the Firmware, we make the new ulp_async_notifier() call to inform the RDMA driver that a firmware async event has been received. RDMA driver can then take necessary actions based on the event type. In the next patch, we will implement the ulp_async_notifier() callbacks in the RDMA driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250107024553.2926983-2-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-01-14 03:39:46 -05:00
Jakub Kicinski	f0aa6a37a3	eth: bnxt: always recalculate features after XDP clearing, fix null-deref Recalculate features when XDP is detached. Before: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 \| grep gro rx-gro-hw: off [requested on] After: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 \| grep gro rx-gro-hw: on The fact that HW-GRO doesn't get re-enabled automatically is just a minor annoyance. The real issue is that the features will randomly come back during another reconfiguration which just happens to invoke netdev_update_features(). The driver doesn't handle reconfiguring two things at a time very robustly. Starting with commit `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") we only reconfigure the RSS hash table if the "effective" number of Rx rings has changed. If HW-GRO is enabled "effective" number of rings is 2x what user sees. So if we are in the bad state, with HW-GRO re-enablement "pending" after XDP off, and we lower the rings by / 2 - the HW-GRO rings doing 2x and the ethtool -L doing / 2 may cancel each other out, and the: if (old_rx_rings != bp->hw_resc.resv_rx_rings && condition in __bnxt_reserve_rings() will be false. The RSS map won't get updated, and we'll crash with: BUG: kernel NULL pointer dereference, address: 0000000000000168 RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0 bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180 __bnxt_setup_vnic_p5+0x58/0x110 bnxt_init_nic+0xb72/0xf50 __bnxt_open_nic+0x40d/0xab0 bnxt_open_nic+0x2b/0x60 ethtool_set_channels+0x18c/0x1d0 As we try to access a freed ring. The issue is present since XDP support was added, really, but prior to commit `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") it wasn't causing major issues. Fixes: `1054aee823` ("bnxt_en: Use NETIF_F_GRO_HW.") Fixes: `98ba1d931f` ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-10 18:01:29 -08:00
Jakub Kicinski	14ea4cd1b1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.13-rc7). Conflicts: `a42d71e322` ("net_sched: sch_cake: Add drop reasons") `737d4d91d3` ("sched: sch_cake: add bounds checks to host bulk flow fairness counts") Adjacent changes: drivers/net/ethernet/meta/fbnic/fbnic.h `3a856ab347` ("eth: fbnic: add IRQ reuse support") `95978931d5` ("eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-09 16:11:47 -08:00
Michael Chan	40452969a5	bnxt_en: Fix DIM shutdown DIM work will call the firmware to adjust the coalescing parameters on the RX rings. We should cancel DIM work before we call the firmware to free the RX rings. Otherwise, FW will reject the call from DIM work if the RX ring has been freed. This will generate an error message like this: bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2 and cause unnecessary concern for the user. It is also possible to modify the coalescing parameters of the wrong ring if the ring has been re-allocated. To prevent this, cancel DIM work right before freeing the RX rings. We also have to add a check in NAPI poll to not schedule DIM if the RX rings are shutting down. Check that the VNIC is active before we schedule DIM. The VNIC is always disabled before we free the RX rings. Fixes: `0bc0b97fca` ("bnxt_en: cleanup DIM work on device shutdown") Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-06 16:40:26 -08:00
Kalesh AP	c8dafb0e43	bnxt_en: Fix possible memory leak when hwrm_req_replace fails When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop() which could cause a memory leak. Fixes: `bbf33d1d98` ("bnxt_en: update all firmware calls to use the new APIs") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-06 16:40:26 -08:00
Jakub Kicinski	385f186aba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.13-rc6). No conflicts. Adjacent changes: include/linux/if_vlan.h `f91a5b8089` ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK") `3f330db306` ("net: reformat kdoc return statements") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-01-03 16:29:29 -08:00
Vitalii Mordan	b255ef45fc	eth: bcmsysport: fix call balance of priv->clk handling routines Check the return value of clk_prepare_enable to ensure that priv->clk has been successfully enabled. If priv->clk was not enabled during bcm_sysport_probe, bcm_sysport_resume, or bcm_sysport_open, it must not be disabled in any subsequent execution paths. Fixes: `31bc72d976` ("net: systemport: fetch and use clock resources") Signed-off-by: Vitalii Mordan <mordan@ispras.ru> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241227123007.2333397-1-mordan@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-30 17:33:46 -08:00
Michael Chan	bf2afe0f14	bnxt_en: Skip reading PXP registers during ethtool -d if unsupported Newer firmware does not allow reading the PXP registers during ethtool -d, so skip the firmware call in that case. Userspace (bnxt.c) always expects the register block to be populated so zeroes will be returned instead. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Michael Chan	b45a850585	bnxt_en: Skip MAC loopback selftest if it is unsupported by FW Call the new HWRM_PORT_MAC_QCAPS to check if mac loopback is supported. Skip the MAC loopback ethtool self test if it is not supported. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20241217182620.2454075-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Michael Chan	36d1e70a90	bnxt_en: Skip PHY loopback ethtool selftest if unsupported by FW Skip PHY loopback selftest if firmware advertises that it is unsupported in the HWRM_PORT_PHY_QCAPS call. Only show PHY loopback test result to be 0 if the test has run and passes. Do the same for external loopback to be consistent. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Michael Chan	fac5472fc8	bnxt_en: Do not allow ethtool -m on an untrusted VF Block all ethtool module operations on an untrusted VF. The firmware won't allow it and will return error. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Hongguang Gao	b1b66ae094	bnxt_en: Use FW defined resource limits for RoCE If FW supports setting resource limits for RoCE, then just use the FW limits instead of using some fixed values in the driver. These limits will be used to allocate context memory for QP, SRQ, AH, and MR resources for RoCE. Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241217182620.2454075-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-19 17:30:00 -08:00
Joe Hattori	0cb2c504d7	net: ethernet: bgmac-platform: fix an OF node reference leak The OF node obtained by of_parse_phandle() is not freed. Call of_node_put() to balance the refcount. This bug was found by an experimental static analysis tool that I am developing. Fixes: `1676aba5ef` ("net: ethernet: bgmac: device tree phy enablement") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241214014912.2810315-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-12-17 13:22:05 +01:00
Michael Chan	24c6843b73	bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of the previous generation (5750X or P5). However, the aggregation ID fields in the completion structures on P7 have been redefined from 16 bits to 12 bits. The freed up 4 bits are redefined for part of the metadata such as the VLAN ID. The aggregation ID mask was not modified when adding support for P7 chips. Including the extra 4 bits for the aggregation ID can potentially cause the driver to store or fetch the packet header of GRO/LRO packets in the wrong TPA buffer. It may hit the BUG() condition in __skb_pull() because the SKB contains no valid packet header: kernel BUG at include/linux/skbuff.h:2766! Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022 RIP: 0010:eth_type_trans+0xda/0x140 Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48 RSP: 0018:ff615003803fcc28 EFLAGS: 00010283 RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040 RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000 RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001 R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0 R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000 FS: 0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? eth_type_trans+0xda/0x140 ? do_error_trap+0x65/0x80 ? eth_type_trans+0xda/0x140 ? exc_invalid_op+0x4e/0x70 ? eth_type_trans+0xda/0x140 ? asm_exc_invalid_op+0x16/0x20 ? eth_type_trans+0xda/0x140 bnxt_tpa_end+0x10b/0x6b0 [bnxt_en] ? bnxt_tpa_start+0x195/0x320 [bnxt_en] bnxt_rx_pkt+0x902/0xd90 [bnxt_en] ? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en] ? kmem_cache_free+0x343/0x440 ? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en] __bnxt_poll_work+0x193/0x370 [bnxt_en] bnxt_poll_p5+0x9a/0x300 [bnxt_en] ? try_to_wake_up+0x209/0x670 __napi_poll+0x29/0x1b0 Fix it by redefining the aggregation ID mask for P5_PLUS chips to be 12 bits. This will work because the maximum aggregation ID is less than 4096 on all P5_PLUS chips. Fixes: `13d2d3d381` ("bnxt_en: Add new P7 hardware interface definitions") Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-10 18:25:38 -08:00
Hongguang Gao	fab4b4d2c9	bnxt_en: Fix potential crash when dumping FW log coredump If the FW log context memory is retained after FW reset, the existing code is not handling the condition correctly and zeroes out the data structures. This potentially will cause a division by zero crash when the user runs ethtool -w. The last_type is also not set correctly when the context memory is retained. This will cause errors because the last_type signals to the FW that all context memory types have been configured. Oops: divide error: 0000 1 PREEMPT SMP NOPTI CPU: 53 UID: 0 PID: 7019 Comm: ethtool Kdump: loaded Tainted: G OE 6.12.0-rc7+ #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Supermicro SYS-621C-TN12R/X13DDW-A, BIOS 1.4 08/10/2023 RIP: 0010:__bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] Code: 0a 31 d2 4c 89 6c 24 10 45 8b a5 fc df ff ff 4c 8b 74 24 20 31 db 66 89 44 24 06 48 63 c5 c1 e5 09 4c 0f af e0 48 8b 44 24 30 <49> f7 f4 4c 89 64 24 08 48 63 c5 4d 89 ec 31 ed 48 89 44 24 18 49 RSP: 0018:ff480591603d78b8 EFLAGS: 00010206 RAX: 0000000000100000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ff23959e46740000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000100000 R09: ff23959e46740000 R10: ff480591603d7a18 R11: 0000000000000010 R12: 0000000000000000 R13: ff23959e46742008 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f04227c1740(0000) GS:ff2395adbf680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f04225b33a5 CR3: 000000108b9a4001 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? do_error_trap+0x65/0x80 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? exc_divide_error+0x36/0x50 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? asm_exc_divide_error+0x16/0x20 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en] ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0xda/0x160 [bnxt_en] bnxt_get_ctx_coredump.constprop.0+0x1ed/0x390 [bnxt_en] ? __memcg_slab_post_alloc_hook+0x21c/0x3c0 ? __bnxt_get_coredump+0x473/0x4b0 [bnxt_en] __bnxt_get_coredump+0x473/0x4b0 [bnxt_en] ? security_file_alloc+0x74/0xe0 ? cred_has_capability.isra.0+0x78/0x120 bnxt_get_coredump_length+0x4b/0xf0 [bnxt_en] bnxt_get_dump_flag+0x40/0x60 [bnxt_en] __dev_ethtool+0x17e4/0x1fc0 ? syscall_exit_to_user_mode+0xc/0x1d0 ? do_syscall_64+0x85/0x150 ? unmap_page_range+0x299/0x4b0 ? vma_interval_tree_remove+0x215/0x2c0 ? __kmalloc_cache_noprof+0x10a/0x300 dev_ethtool+0xa8/0x170 dev_ioctl+0x1b5/0x580 ? sk_ioctl+0x4a/0x110 sock_do_ioctl+0xab/0xf0 sock_ioctl+0x1ca/0x2e0 __x64_sys_ioctl+0x87/0xc0 do_syscall_64+0x79/0x150 Fixes: `24d694aec1` ("bnxt_en: Allocate backing store memory for FW trace logs") Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204215918.1692597-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-06 17:39:13 -08:00
Michael Chan	de37faf41a	bnxt_en: Fix GSO type for HW GRO packets on 5750X chips The existing code is using RSS profile to determine IPV4/IPV6 GSO type on all chips older than 5760X. This won't work on 5750X chips that may be using modified RSS profiles. This commit from 2018 has updated the driver to not use RSS profile for HW GRO packets on newer chips: `50f011b63d` ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.") However, a recent commit to add support for the newest 5760X chip broke the logic. If the GRO packet needs to be re-segmented by the stack, the wrong GSO type will cause the packet to be dropped. Fix it to only use RSS profile to determine GSO type on the oldest 5730X/5740X chips which cannot use the new method and is safe to use the RSS profiles. Also fix the L3/L4 hash type for RX packets by not using the RSS profile for the same reason. Use the ITYPE field in the RX completion to determine L3/L4 hash types correctly. Fixes: `a7445d6980` ("bnxt_en: Add support for new RX and TPA_START completion types for P7") Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-06 17:39:13 -08:00
David Wei	bd649c5cc9	bnxt_en: handle tpa_info in queue API implementation Commit `7ed816be35` ("eth: bnxt: use page pool for head frags") added a page pool for header frags, which may be distinct from the existing pool for the aggregation ring. Prior to this change, frags used in the TPA ring rx_tpa were allocated from system memory e.g. napi_alloc_frag() meaning their lifetimes were not associated with a page pool. They can be returned at any time and so the queue API did not alloc or free rx_tpa. But now frags come from a separate head_pool which may be different to page_pool. Without allocating and freeing rx_tpa, frags allocated from the old head_pool may be returned to a different new head_pool which causes a mismatch between the pp hold/release count. Fix this problem by properly freeing and allocating rx_tpa in the queue API implementation. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204041022.56512-4-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-04 19:23:35 -08:00
David Wei	bf1782d70d	bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for allocating rx_agg_bmap. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204041022.56512-3-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-04 19:23:35 -08:00
David Wei	5883a3e0ba	bnxt_en: refactor tpa_info alloc/free into helpers Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work on a single tpa_info in prep for queue API using them. There are 2 pairs of operations: * bnxt_alloc_one_tpa_info() * bnxt_free_one_tpa_info() These alloc/free the tpa_info array itself. * bnxt_alloc_one_tpa_info_data() * bnxt_free_one_tpa_info_data() These alloc/free the frags stored in tpa_info array. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241204041022.56512-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-12-04 19:23:35 -08:00
Daniel Xu	be75cda92a	bnxt_en: ethtool: Supply ntuple rss context action Commit `2f4f9fe5bf` ("bnxt_en: Support adding ntuple rules on RSS contexts") added support for redirecting to an RSS context as an ntuple rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE codepath. This caused `ethtool -n` to always report the action as "Action: Direct to queue 0" which is wrong. Fix by teaching bnxt driver to report the RSS context when applicable. Fixes: `2f4f9fe5bf` ("bnxt_en: Support adding ntuple rules on RSS contexts") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://patch.msgid.link/2e884ae39e08dc5123be7c170a6089cefe6a78f7.1732748253.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-30 14:16:12 -08:00
Linus Torvalds	65ae975e97	Including fixes from bluetooth. Current release - regressions: - rtnetlink: fix rtnl_dump_ifinfo() error path - bluetooth: remove the redundant sco_conn_put Previous releases - regressions: - netlink: fix false positive warning in extack during dumps - sched: sch_fq: don't follow the fast path if Tx is behind now - ipv6: delete temporary address if mngtmpaddr is removed or unmanaged - tcp: fix use-after-free of nreq in reqsk_timer_handler(). - bluetooth: fix slab-use-after-free Read in set_powered_sync - l2tp: fix warning in l2tp_exit_net found - eth: bnxt_en: fix receive ring space parameters when XDP is active - eth: lan78xx: fix double free issue with interrupt buffer allocation - eth: tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets Previous releases - always broken: - ipmr: fix tables suspicious RCU usage - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() - eth: octeontx2-af: fix low network performance - eth: stmmac: dwmac-socfpga: set RX watchdog interrupt as broken - eth: rtase: correct the speed for RTL907XD-V1 Misc: - some documentation fixup Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmdIolwSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk/fEP/01Nuobq5teEiJgfV25xMqKT8EtvtrTk QatoPMD4UrpxbTBlA6wc23wBewBCVHG6IKVTVH00mUsWbZv561PNnXexD5yTLlor p4XSyaUwXeUzD+9LsxlTJGyp2gKGrir6NY6R/pYaJJ7pjxuRQKOl+qXf7s7IjIye Fnh8LAxIhr/LdBCJBV4tajS5VfCB6svT+uFCflbOw0Ng/quGfKchTHGTBxyHr3Ef mw0XsFew+6hDt72l9u0BNUewsSNfcfxSR343Z/DCaS03ZRQxhsB9I2v0WfgteO+U 3xdRG1WvphfYsN/C/zJ19OThAmbKE+u4gz8Z07yebpgFN5jbe5Rcf7IVcXiexd0Y 2fivK7DFU06TLukqBkUqqwPzAgh1w/KA+ia119WteYKxxTchu9td7+L4pr9qU4Tg Nipq0MYaj0cEebf+DdlG+2UFjMzaTiN/Ph1Cdh15bqMaVhn/eOk+L959y/XUlBm0 vpNL2SaFg8ki1N3SyTCFvmS3w8P+jM/KaA3fQv8hfG9Ceab5NKEoUff1VdjDBh9X sS7I15rg8s0CV1DWDJn6Mvex30e2+/yesjJbD/D9HDcb1y2vmbwz9t5L3yFpoNbc +qxRawoxj+Vi/4DZNnZKHvTkc0+hOm4f+BtUGiGBfBnIIrqvYh3DnQTc5res6l0e ZdG0B4yEZedj =7dW1 -----END PGP SIGNATURE----- Merge tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Current release - regressions: - rtnetlink: fix rtnl_dump_ifinfo() error path - bluetooth: remove the redundant sco_conn_put Previous releases - regressions: - netlink: fix false positive warning in extack during dumps - sched: sch_fq: don't follow the fast path if Tx is behind now - ipv6: delete temporary address if mngtmpaddr is removed or unmanaged - tcp: fix use-after-free of nreq in reqsk_timer_handler(). - bluetooth: fix slab-use-after-free Read in set_powered_sync - l2tp: fix warning in l2tp_exit_net found - eth: - bnxt_en: fix receive ring space parameters when XDP is active - lan78xx: fix double free issue with interrupt buffer allocation - tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets Previous releases - always broken: - ipmr: fix tables suspicious RCU usage - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() - eth: - octeontx2-af: fix low network performance - stmmac: dwmac-socfpga: set RX watchdog interrupt as broken - rtase: correct the speed for RTL907XD-V1 Misc: - some documentation fixup" * tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) ipmr: fix build with clang and DEBUG_NET disabled. Documentation: tls_offload: fix typos and grammar Fix spelling mistake ipmr: fix tables suspicious RCU usage ip6mr: fix tables suspicious RCU usage ipmr: add debug check for mr table cleanup selftests: rds: move test.py to TEST_FILES net_sched: sch_fq: don't follow the fast path if Tx is behind now tcp: Fix use-after-free of nreq in reqsk_timer_handler(). net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI net: Comment copy_from_sockptr() explaining its behaviour rxrpc: Improve setsockopt() handling of malformed user input llc: Improve setsockopt() handling of malformed user input Bluetooth: SCO: remove the redundant sco_conn_put Bluetooth: MGMT: Fix possible deadlocks Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync bnxt_en: Unregister PTP during PCI shutdown and suspend bnxt_en: Refactor bnxt_ptp_init() bnxt_en: Fix receive ring space parameters when XDP is active bnxt_en: Fix queue start to update vnic RSS table ...	2024-11-28 10:15:20 -08:00
Michael Chan	3661c05c54	bnxt_en: Unregister PTP during PCI shutdown and suspend If we go through the PCI shutdown or suspend path, we shutdown the NIC but PTP remains registered. If the kernel continues to run for a little bit, the periodic PTP .do_aux_work() function may be called and it will read the PHC from the BAR register. Since the device has already been disabled, it will cause a PCIe completion timeout. Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend handlers. bnxt_ptp_clear() will unregister from PTP and .do_aux_work() will be canceled. In bnxt_resume(), we need to re-initialize PTP. Fixes: `a521c8a01d` ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()") Cc: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Michael Chan	1e9614cd95	bnxt_en: Refactor bnxt_ptp_init() Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init(). Store it in bp->ptp_cfg so that the caller doesn't need to know what the value should be. In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume() and this will make it easier. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Shravya KN	3051a77a09	bnxt_en: Fix receive ring space parameters when XDP is active The MTU setting at the time an XDP multi-buffer is attached determines whether the aggregation ring will be used and the rx_skb_func handler. This is done in bnxt_set_rx_skb_mode(). If the MTU is later changed, the aggregation ring setting may need to be changed and it may become out-of-sync with the settings initially done in bnxt_set_rx_skb_mode(). This may result in random memory corruption and crashes as the HW may DMA data larger than the allocated buffer size, such as: BUG: kernel NULL pointer dereference, address: 00000000000003c0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1 Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021 RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en] Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202 RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380 RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980 R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990 FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> __bnxt_poll_work+0x1c2/0x3e0 [bnxt_en] To address the issue, we now call bnxt_set_rx_skb_mode() within bnxt_change_mtu() to properly set the AGG rings configuration and update rx_skb_func based on the new MTU value. Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on the current MTU. Fixes: `08450ea98a` ("bnxt_en: Fix max_mtu setting for multi-buf XDP") Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Somnath Kotur	5ac066b7b0	bnxt_en: Fix queue start to update vnic RSS table HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to have the same FW ring ID as before. So we must reinitialize the RSS table with the correct ring IDs. Otherwise, traffic may not resume properly if the restarted ring ID is stale. Since this feature is only supported on P5_PLUS chips, we call bnxt_vnic_set_rss_p5() to update the HW RSS table. Fixes: `2d694c27d3` ("bnxt_en: implement netdev_queue_mgmt_ops") Cc: David Wei <dw@davidwei.uk> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Shravya KN	5007991670	bnxt_en: Set backplane link modes correctly for ethtool Use the return value from bnxt_get_media() to determine the port and link modes. bnxt_get_media() returns the proper BNXT_MEDIA_KR when the PHY is backplane. This will correct the ethtool settings for backplane devices. Fixes: `5d4e1bf606` ("bnxt_en: extend media types to supported and autoneg modes") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Saravanan Vajravel	5311598f7f	bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down After successful PCIe AER recovery, FW will reset all resource reservations. If it is IF_UP, the driver will call bnxt_open() and all resources will be reserved again. It it is IF_DOWN, we should call bnxt_reserve_rings() so that we can reserve resources including RoCE resources to allow RoCE to resume after AER. Without this patch, RoCE fails to resume in this IF_DOWN scenario. Later, if it becomes IF_UP, bnxt_open() will see that resources have been reserved and will not reserve again. Fixes: `fb1e6e562b` ("bnxt_en: Fix AER recovery.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-11-26 15:29:31 +01:00
Pavan Chebbi	614f4d166e	tg3: Set coherent DMA mask bits to 31 for BCM57766 chipsets The hardware on Broadcom 1G chipsets have a known limitation where they cannot handle DMA addresses that cross over 4GB. When such an address is encountered, the hardware sets the address overflow error bit in the DMA status register and triggers a reset. However, BCM57766 hardware is setting the overflow bit and triggering a reset in some cases when there is no actual underlying address overflow. The hardware team analyzed the issue and concluded that it is happening when the status block update has an address with higher (b16 to b31) bits as 0xffff following a previous update that had lowest bits as 0xffff. To work around this bug in the BCM57766 hardware, set the coherent dma mask from the current 64b to 31b. This will ensure that upper bits of the status block DMA address are always at most 0x7fff, thus avoiding the improper overflow check described above. This work around is intended for only status block and ring memories and has no effect on TX and RX buffers as they do not require coherent memory. Fixes: `72f2afb8a6` ("[TG3]: Add DMA address workaround") Reported-by: Salam Noureddine <noureddine@arista.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20241119055741.147144-1-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-24 16:45:48 -08:00
Linus Torvalds	2a163a4cea	RDMA v6.13 merge window pull request Seveal fixes scattered across the drivers and a few new features: - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns - Force disassociate the userspace FD when hns does an async reset - bnxt new features for optimized modify QP to skip certain stayes, CQ coalescing, better debug dumping - mlx5 new data placement ordering feature - Faster destruction of mlx5 devx HW objects - Improvements to RDMA CM mad handling -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZz4ENwAKCRCFwuHvBreF YQYQAP9R54r5J1Iylg+zqhCc+e/9oveuuZbfLvy/EJiEpmdprQEAgPs1RrB0z7U6 1xrVStUKNPhGd5XeVVZGkIV0zYv6Tw4= =V5xI -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Seveal fixes scattered across the drivers and a few new features: - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns - Force disassociate the userspace FD when hns does an async reset - bnxt new features for optimized modify QP to skip certain stayes, CQ coalescing, better debug dumping - mlx5 new data placement ordering feature - Faster destruction of mlx5 devx HW objects - Improvements to RDMA CM mad handling" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits) RDMA/bnxt_re: Correct the sequence of device suspend RDMA/bnxt_re: Use the default mode of congestion control RDMA/bnxt_re: Support different traffic class IB/cm: Rework sending DREQ when destroying a cm_id IB/cm: Do not hold reference on cm_id unless needed IB/cm: Explicitly mark if a response MAD is a retransmission RDMA/mlx5: Move events notifier registration to be after device registration RDMA/bnxt_re: Cache MSIx info to a local structure RDMA/bnxt_re: Refurbish CQ to NQ hash calculation RDMA/bnxt_re: Refactor NQ allocation RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved RDMA/hns: Fix different dgids mapping to the same dip_idx RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design bnxt_en: Add support for RoCE sriov configuration RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg() RDMA/hns: Fix out-of-order issue of requester when setting FENCE RDMA/nldev: Add IB device and net device rename events RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation RDMA/core: Move ib_uverbs_file struct to uverbs_types.h ...	2024-11-22 20:03:57 -08:00
Shruti Parab	3c2179e663	bnxt_en: Add FW trace coredump segments to the coredump The FW trace coredump segments are very similar to the context memory segments in the previous patch. The main difference is to call HWRM_DBG_LOG_BUFFER_FLUSH to flush the FW data to host memory and to include an additional record in the coredump that contains the head and tail information of the trace data. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Michael Chan	bda2e63a50	bnxt_en: Add a new ethtool -W dump flag Add a new ethtool -W dump flag (2) to include driver coredump segments. This patch adds the host backing store context memory pages used by the chip and FW to store various states to the coredump. The pages for each context memory type is dumped into a separate coredump segment. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Shruti Parab <shruti.parab@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Shruti Parab	a854a17097	bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr() Pass the component ID and segment ID to this function to create the coredump segment header. This will be needed in the next patches to create more segments for the coredump. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Sreekanth Reddy	23a18b91b6	bnxt_en: Add functions to copy host context memory Host context memory is used by the newer chips to store context information for various L2 and RoCE states and FW logs. This information will be useful for debugging. This patch adds the functions to copy all pages of a context memory type to a contiguous buffer. The next patches will include the context memory dump during ethtool -w coredump. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Co-developed-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:55 -08:00
Hongguang Gao	de999362ad	bnxt_en: Do not free FW log context memory If FW supports appending new FW logs to an offset in the context memory after FW reset, then do not free this type of context memory during reset. The driver will provide the initial offset to the FW when configuring this type of context memory. This way, we don't lose the older FW logs after reset. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Shruti Parab	84fcd9449f	bnxt_en: Manage the FW trace context memory The FW trace memory pages will be added to the ethtool -w coredump in later patches. In addition to the raw data, the driver has to add a header to provide the head and tail information on each FW trace log segment when creating the coredump. The FW sends an async message to the driver after DMAing a chunk of logs to the context memory to indicate the last offset containing the tail of the logs. The driver needs to keep track of that. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Shruti Parab	24d694aec1	bnxt_en: Allocate backing store memory for FW trace logs Allocate the new FW trace log backing store context memory types if they are supported by the FW. FW debug logs are DMA'ed to the host backing store memory when the on-chip buffers are full. If host memory cannot be allocated for these memory types, the driver will not abort. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Hongguang Gao	46010d43ab	bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem() If 'force' is false, it will keep the memory pages and all data structures for the context memory type if the memory is valid. This patch always passes true for the 'force' parameter so there is no change in behavior. Later patches will adjust the 'force' parameter for the FW log context memory types so that the logs will not be reset after FW reset. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Hongguang Gao	968d2cc07c	bnxt_en: Refactor bnxt_free_ctx_mem() Add a new function bnxt_free_one_ctx_mem() to free one context memory type. bnxt_free_ctx_mem() now calls the new function in the loop to free each context memory type. There is no change in behavior. Later patches will further make use of the new function. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Shruti Parab	0b350b4927	bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type Add a new bit to struct bnxt_ctx_mem_type to indicate that host memory has been successfully allocated for this context memory type. In the next patches, we'll be adding some additional context memory types for FW debugging/logging. If memory cannot be allocated for any of these new types, we will not abort and the cleared mem_valid bit will indicate to skip configuring the memory type. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-of-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Michael Chan	ff00bcc9ec	bnxt_en: Update firmware interface spec to 1.10.3.85 The major change is the new firmware command to flush the FW debug logs to the host backing store context memory buffers. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 19:48:54 -08:00
Kees Cook	1cfb5e5788	Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings" This reverts commit `3bd9b9abdf`. We cannot use the new tagged struct group because it throws C++ errors even under "extern C". Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-18 18:52:11 -08:00
Vadim Fedorenko	c7a21af711	bnxt_en: optimize gettimex64 Current implementation of gettimex64() makes at least 3 PCIe reads to get current PHC time. It takes at least 2.2us to get this value back to userspace. At the same time there is cached value of upper bits of PHC available for packet timestamps already. This patch reuses cached value to speed up reading of PHC time. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-15 14:26:05 -08:00
Jakub Kicinski	7ed816be35	eth: bnxt: use page pool for head frags Testing small size RPCs (300B-400B) on a large AMD system suggests that page pool recycling is very useful even for just the head frags. With this patch (and copy break disabled) I see a 30% performance improvement (82Gbps -> 106Gbps). Convert bnxt from normal page frags to page pool frags for head buffers. On systems with small page size we can use the same pool as for TPA pages. On systems with large pages the frag allocation logic of the page pool is already used to split a large page into TPA chunks. TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB) and we always allocate the same sized chunks. Mixing allocation of TPA and head pages would lead to sub-optimal memory use. Plus Taehee's work on zero-copy / devmem will need to differentiate between TPA and non-TPA page pool, anyway. Conditionally allocate a new page pool for heads. Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-12 18:26:38 -08:00
Bhargava Chenna Marreddy	304cc83807	RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design Refine RoCE SRIOV resource configuration design, using the INITIALIZE_FW's flag as an indication for the new design to the firmware. RoCE driver does not have to provision resources to VF when firmware advertises support for RoCE resource management by NIC driver. Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-11-12 03:04:04 -05:00
Vikas Gupta	53371c5c21	bnxt_en: Add support for RoCE sriov configuration During driver load, PF RDMA driver provisions resources to the RDMA VFs. This logic takes into consideration of the total number of VFs supported on the PF while allocating resources. Firmware now advertises a capability where NIC driver can allocate resources for RDMA VFs when the user actually creates a VF. So this resource distribution can be based on the number of active VFs. This patch adds the support to check for the firmware capability and follow the new RDMA VF resource allocation strategy. The current logic in the RDMA driver will be removed for the newer Firmware versions in a subsequent patch in this series. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1730882676-24434-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-11-12 03:04:04 -05:00
Vadim Fedorenko	f0fe51a043	bnxt_en: add unlocked version of bnxt_refclk_read Serialization of PHC read with FW reset mechanism uses ptp_lock which also protects timecounter updates. This means we cannot grab it when called from bnxt_cc_read(). Let's move locking into different function. Fixes: `6c0828d00f` ("bnxt_en: replace PTP spinlock with seqlock") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20241107214917.2980976-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-11 17:32:29 -08:00
Mohammad Heib	fcf42409c6	bnxt_en: use irq_update_affinity_hint() irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. bnxt_en device reopening will resets the affinity in bnxt_open(). 3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-11 15:31:03 -08:00
Rosen Penev	fda960354e	net: broadcom: use ethtool string helpers The latter is the preferred way to copy ethtool strings. Avoids manually incrementing the pointer. Cleans up the code quite well. Signed-off-by: Rosen Penev <rosenp@gmail.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241104205317.306140-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:51:02 -08:00
Rosen Penev	4069dcb7da	net: bnx2x: use ethtool string helpers The latter is the preferred way to copy ethtool strings. Avoids manually incrementing the pointer. Cleans up the code quite well. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241104202326.78418-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:50:37 -08:00
Daniel Xu	5f143efd38	bnxt_en: ethtool: Support unset l4proto on ip4/ip6 ntuple rules Previously, trying to insert an ip4/ip6 ntuple rule with an unset l4proto would get rejected with -EOPNOTSUPP. For example, the following would fail: ethtool -N eth0 flow-type ip6 dst-ip $IP6 context 1 The reason was that all the l4proto validation was being run despite the l4proto mask being set to 0x0. Fix by respecting the mask on l4proto and treating a mask of 0x0 as wildcard l4proto. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/1ac93a2836b25f79e7045f8874d9a17875229ffc.1730778566.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:39:59 -08:00
Daniel Xu	050eb2cebb	bnxt_en: ethtool: Remove ip4/ip6 ntuple support for IPPROTO_RAW Commit `9ba0e56199` ("bnxt_en: Enhance ethtool ntuple support for ip flows besides TCP/UDP") added support for ip4/ip6 ntuple rules. However, if you wanted to wildcard over l4proto, you had to provide 0xFF. The choice of 0xFF is non-standard and non-intuitive. Delete support for it in this commit. Next commit we will introduce a cleaner way to wildcard l4proto. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/a5ba0d3bd926d27977c317efa7fdfbc8a704d2b8.1730778566.git.dxu@dxuuu.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-06 17:39:59 -08:00
Vadim Fedorenko	6c0828d00f	bnxt_en: replace PTP spinlock with seqlock We can see high contention on ptp_lock while doing RX timestamping on high packet rates over several queues. Spinlock is not effecient to protect timecounter for RX timestamps when reads are the most usual operations and writes are only occasional. It's better to use seqlock in such cases. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241103215108.557531-2-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-05 17:33:26 -08:00
Vadim Fedorenko	bb2ef9b92b	bnxt_en: cache only 24 bits of hw counter This hardware can provide only 48 bits of cycle counter. We can leave only 24 bits in the cache to extend RX timestamps from 32 bits to 48 bits. Lower 8 bits of the cached value will be used to check for roll-over while extending to full 48 bits. This change makes cache writes atomic even on 32 bit platforms and we can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The configuration structure will be also reduced by 4 bytes. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241103215108.557531-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-05 17:33:26 -08:00
Caleb Sander Mateos	61bf0009a7	dim: pass dim_sample to net_dim() by reference net_dim() is currently passed a struct dim_sample argument by value. struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64 passes it on the stack. All callers have already initialized dim_sample on the stack, so passing it by value requires pushing a duplicated copy to the stack. Either witing to the stack and immediately reading it, or perhaps dereferencing addresses relative to the stack pointer in a chain of push instructions, seems to perform quite poorly. In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, 94% of which is attributed to the first push instruction to copy dim_sample on the stack for the call to net_dim(): // Call ktime_get() 0.26 \|4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47> // Pass the address of struct dim in %rdi \|4ead7: lea 0x3d0(%rbx),%rdi // Set dim_sample.pkt_ctr \|4eade: mov %r13d,0x8(%rsp) // Set dim_sample.byte_ctr \|4eae3: mov %r12d,0xc(%rsp) // Set dim_sample.event_ctr 0.15 \|4eae8: mov %bp,0x10(%rsp) // Duplicate dim_sample on the stack 94.16 \|4eaed: push 0x10(%rsp) 2.79 \|4eaf1: push 0x10(%rsp) 0.07 \|4eaf5: push %rax // Call net_dim() 0.21 \|4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b> To allow the caller to reuse the struct dim_sample already on the stack, pass the struct dim_sample by reference to net_dim(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 12:36:54 -08:00
Rosen Penev	9b4b2e02c1	net: bnxt: use ethtool string helpers Avoids having to use manual pointer manipulation. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241029233229.9385-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 11:09:25 -08:00
Gustavo A. R. Silva	3bd9b9abdf	net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Change the type of the middle struct member currently causing trouble from `struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`. Additionally, update the type of some variables in various functions that don't access the flexible-array member, changing them to the newly created `struct ethtool_link_settings_hdr`. These changes are needed because the type of the conflicting middle members changed. So, those instances that expect the type to be `struct ethtool_link_settings` should be adjusted to the newly created type `struct ethtool_link_settings_hdr`. Also, adjust variable declarations to follow the reverse xmas tree convention. Fix 3338 of the following -Wflex-array-member-not-at-end warnings: include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 11:06:58 -08:00
Florian Fainelli	e69fbd287d	net: systemport: Move IO macros to header file Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h where it is more appropriate and where static inline helpers are acceptable. While at it, make sure that the macro 'offset' argument does not trigger a checkpatch warning due to possible argument re-use. Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-28 15:54:37 -07:00
Florian Fainelli	890bde75a2	net: systemport: Remove unused txchk accessors Vladimir reported the following warning with clang-16 and W=1: warning: unused function 'txchk_readl' [-Wunused-function] BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET); note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'txchk_writel' [-Wunused-function] note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'tbuf_readl' [-Wunused-function] BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET); note: expanded from macro 'BCM_SYSPORT_IO_MACRO' warning: unused function 'tbuf_writel' [-Wunused-function] note: expanded from macro 'BCM_SYSPORT_IO_MACRO' The TXCHK and RBUF blocks are not being accessed, remove the IO macros used to access those blocks. No functional impact. Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-28 15:54:37 -07:00
Paolo Abeni	03fc07a247	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-25 09:08:22 +02:00
Paolo Abeni	91afa49a3e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.12-rc4). Conflicts: `107a034d5c` ("net/mlx5: qos: Store rate groups in a qos domain") `1da9cfd6c4` ("net/mlx5: Unregister notifier on eswitch init failure") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-21 09:14:18 +02:00
WangYuli	9e2ffec543	eth: Fix typo 'accelaration'. 'exprienced' and 'rewritting' There are some spelling mistakes of 'accelaration', 'exprienced' and 'rewritting' in comments which should be 'acceleration', 'experienced' and 'rewriting'. Suggested-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/all/20241017162846.GA51712@kernel.org/ Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <90D42CB167CA0842+20241018021910.31359-1-wangyuli@uniontech.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>	2024-10-20 11:06:48 -05:00
Vadim Fedorenko	4ab3e4983b	bnxt_en: replace ptp_lock with irqsave variant In netpoll configuration the completion processing can happen in hard irq context which will break with spin_lock_bh() for fullfilling RX timestamp in case of all packets timestamping. Replace it with spin_lock_irqsave() variant. Fixes: `7f5515d19c` ("bnxt_en: Get the RX packet timestamp") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Message-ID: <20241016195234.2622004-1-vadfed@meta.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>	2024-10-19 16:16:25 -05:00
Andy Shevchenko	abb7c98b99	tg3: Increase buffer size for IRQ label GCC is not happy with the current code, e.g.: .../tg3.c:11313:37: error: ‘-txrx-’ directive output may be truncated writing 6 bytes into a region of size between 1 and 16 [-Werror=format-truncation=] 11313 \| "%s-txrx-%d", tp->dev->name, irq_num); \| ^~~~~~ .../tg3.c:11313:34: note: using the range [-2147483648, 2147483647] for directive argument 11313 \| "%s-txrx-%d", tp->dev->name, irq_num); When `make W=1` is supplied, this prevents kernel building. Fix it by increasing the buffer size for IRQ label and use sizeoF() instead of hard coded constants. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Message-ID: <20241016090647.691022-1-andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>	2024-10-17 21:24:24 -05:00
Wang Hai	fed07d3eb8	net: bcmasp: fix potential memory leak in bcmasp_xmit() The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb in case of mapping fails, add dev_kfree_skb() to fix it. Fixes: `490cb41200` ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-15 17:10:27 -07:00
Wang Hai	c401ed1c70	net: systemport: fix potential memory leak in bcm_sysport_xmit() The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb in case of dma_map_single() fails, add dev_kfree_skb() to fix it. Fixes: `80105befdb` ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-15 12:53:52 -07:00
Joe Damato	4193652274	bnxt: Add support for persistent NAPI config Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-14 17:54:29 -07:00
Simon Horman	76d37e4fd6	tg3: Address byte-order miss-matches Address byte-order miss-matches flagged by Sparse. In tg3_load_firmware_cpu() and tg3_get_device_address() this is done using appropriate types to store big endian values. In the cases of tg3_test_nvram(), where buf is an array which contains values of several different types, cast to __le32 before converting values to host byte order. Reported by Sparse as: .../tg3.c:3745:34: warning: cast to restricted __be32 .../tg3.c:13096:21: warning: cast to restricted __le32 .../tg3.c:13096:21: warning: cast from restricted __be32 .../tg3.c:13101:21: warning: cast to restricted __le32 .../tg3.c:13101:21: warning: cast from restricted __be32 .../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types) .../tg3.c:17070:63: expected restricted __be32 [usertype] val .../tg3.c:17070:63: got unsigned int dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types) .../tg3.c:17071:63: expected restricted __be32 [usertype] val .../tg3.c:17071:63: got unsigned int Also, address white-space issues on lines modified for the above. And, for consistency, lines adjacent to them. Compile tested only. No functional change intended. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-14 17:27:10 -07:00
Justin Chen	c531f2269a	net: bcmasp: enable SW timestamping Add skb_tx_timestamp() call and enable support for SW timestamping. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241010221506.802730-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-11 16:03:31 -07:00
Justin Chen	ea22f8eabb	net: broadcom: remove select MII from brcmstb Ethernet drivers The MII driver isn't used by brcmstb Ethernet drivers. Remove it from the BCMASP, GENET, and SYSTEMPORT drivers. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241010191332.1074642-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-11 16:00:17 -07:00
Joe Damato	aec5514d73	tg3: Link queues to NAPIs Link queues to NAPIs using the netdev-genl API so this information is queryable. First, test with the default setting on my tg3 NIC at boot with 1 TX queue: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}] Now, adjust the number of TX queues to be 4 via ethtool: $ sudo ethtool -L eth0 tx 4 $ sudo ethtool -l eth0 \| tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows the tg3 has renamed the IRQs to be combined: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Now query this via netlink to ensure the queues are linked properly to their NAPIs: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}] As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID 8960, and so on for each queue index up to 3. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:29 -07:00
Joe Damato	25118cce66	tg3: Link IRQs to NAPI instances Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX queues. Compare the output of /proc/interrupts for my tg3 device with the output of netdev-genl after applying this patch: $ cat /proc/interrupts \| grep eth0 343: [...] eth0-tx-0 344: [...] eth0-rx-1 345: [...] eth0-rx-2 346: [...] eth0-rx-3 347: [...] eth0-rx-4 As you can see above, tg3 has named the IRQs such that there is a dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8197, 'ifindex': 2, 'irq': 347}, {'id': 8196, 'ifindex': 2, 'irq': 346}, {'id': 8195, 'ifindex': 2, 'irq': 345}, {'id': 8194, 'ifindex': 2, 'irq': 344}, {'id': 8193, 'ifindex': 2, 'irq': 343}] Netlink displays the same IRQs as above, noting that each is mapped to a unique NAPI instance. Now, reconfigure the NIC to have 4 TX queues and 4 RX queues: $ sudo ethtool -L eth0 rx 4 tx 4 $ sudo ethtool -l eth0 \| tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Examine /proc/interrupts once again, noting that tg3 will now rename the IRQs to suggest that they are combined tx and rx without allocating additional IRQs, so the total IRQ count in /proc/interrupts is unchanged: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Check the output from netlink again: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8973, 'ifindex': 2, 'irq': 347}, {'id': 8972, 'ifindex': 2, 'irq': 346}, {'id': 8971, 'ifindex': 2, 'irq': 345}, {'id': 8970, 'ifindex': 2, 'irq': 344}, {'id': 8969, 'ifindex': 2, 'irq': 343}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-10 18:40:29 -07:00
Uwe Kleine-König	e96321fad3	net: ethernet: Switch back to struct platform_driver::remove() After commit `0edb555a65` ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/ethernet to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-04 16:39:56 -07:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Edwin Peer	f77cdee5db	bnxt_en: resize bnxt_irq name field to fit format string The name field of struct bnxt_irq is written using snprintf in bnxt_setup_msix(). Make the field large enough to fit the maximal formatted string to prevent truncation. Truncated IRQ names are less meaningful to the user. For example, "enp4s0f0np0-TxRx-0" gets truncated to "enp4s0f0np0-TxRx-" with the existing code. Make sure we have space for the extra characters added to the IRQ names: - the characters introduced by the static format string: hyphens - the maximal static substituted ring type string: "TxRx" - the maximum length of an integer formatted as a string, even though reasonable ring numbers would never be as long as this. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:42:45 -07:00
Michael Chan	2d51eb0bd8	bnxt_en: Add MSIX check in bnxt_check_rings() bnxt_check_rings() is called to ensure that we have the hardware ring resources before committing to reinitialize with the new number of rings. MSIX vectors are never checked at this point, because up until recently we must first disable MSIX before we can allocate the new set of MSIX vectors. Now that we support dynamic MSIX allocation, check to make sure we can dynamically allocate the new MSIX vectors as the last step in bnxt_check_rings() if dynamic MSIX is supported. For example, the IOMMU group may limit the number of MSIX vectors for the device. With this patch, the ring change will fail more gracefully when there is not enough MSIX vectors. It is also better to move bnxt_check_rings() to be called as the last step when changing ethtool rings. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:42:45 -07:00
Michael Chan	f775cb1bbf	bnxt_en: Increase the number of MSIX vectors for RoCE device If RocE is supported on the device, set the number of RoCE MSIX vectors to the number of online CPUs + 1 and capped at these maximums: VF: 2 NPAR: 5 PF: 64 For the PF, the maximum is now increased from the previous value of 9 to get better performance for kernel applications. Remove the unnecessary check for BNXT_FLAG_ROCE_CAP. bnxt_set_dflt_ulp_msix() will only be called if the flag is set. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 18:42:45 -07:00
Gal Pressman	0644646d91	tg3: Remove setting of RX software timestamp The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-09 17:44:40 -07:00
Gal Pressman	3fc85527b0	bnxt_en: Remove setting of RX software timestamp The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-09 17:44:40 -07:00
Gal Pressman	26f74155df	bnx2x: Remove setting of RX software timestamp The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-09-06 09:34:18 +01:00
Jinjie Ruan	e8ac897445	net: bcmasp: Simplify with scoped for each OF child loop Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-03 12:54:43 +02:00
Michael Chan	e68256c8a7	bnxt_en: Support dynamic MSIX A range of MSIX vectors are allocated at initialization for the number needed for RocE and L2. During run-time, if the user increases or decreases the number of L2 rings, all the MSIX vectors have to be freed and a new range has to be allocated. This is not optimal and causes disruptions to RoCE traffic every time there is a change in L2 MSIX. If the system supports dynamic MSIX allocations, use dynamic allocation to add new L2 MSIX vectors or free unneeded L2 MSIX vectors. RoCE traffic is not affected using this scheme. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20240828183235.128948-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	f049d699ae	bnxt_en: Allocate the max bp->irq_tbl size for dynamic msix allocation If dynamic MSIX allocation is supported, additional MSIX can be allocated at run-time without reinitializing the existing MSIX entries. The first step to support this dynamic scheme is to allocate a large enough bp->irq_tbl if dynamic allocation is supported. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	4343838ca5	bnxt_en: Replace deprecated PCI MSIX APIs Use the new pci_alloc_irq_vectors() and pci_free_irq_vectors() to replace the deprecated pci_enable_msix_range() and pci_disable_msix(). Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	af756aad3d	bnxt_en: Remove register mapping to support INTX In legacy INTX mode, a register is mapped so that the INTX handler can read it to determine if the NIC is the source of the interrupt. This and all the related macros are no longer needed now that INTX is no longer supported. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:25 -07:00
Michael Chan	e94d8d97c7	bnxt_en: Remove BNXT_FLAG_USING_MSIX flag Now that we only support MSIX, the BNXT_FLAG_USING_MSIX is always true. Remove it and any if conditions checking for it. Remove the INTX handler and associated logic. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Michael Chan	2a659a4603	bnxt_en: Deprecate support for legacy INTX mode Firmware has deprecated support for legacy INTX in 2022 (since v2.27) and INTX hasn't been tested for many years before that. INTX was only used as a fallback mechansim in case MSIX wasn't available. MSIX is always supported by all firmware. If MSIX capability in PCI config space is not found during probe, abort. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Sreekanth Reddy	26e3846e23	bnxt_en: Support QOS and TPID settings for the SRIOV VLAN With recent changes in the .ndo_set_vf_*() guidelines, resubmitting this patch that was reverted eariler in 2023: `c27153682e` ("Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN") Add these missing settings in the .ndo_set_vf_vlan() method. Older firmware does not support the TPID setting so check for proper support. Remove the unused BNXT_VF_QOS flag. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Vikas Gupta	9e7b880b92	bnxt_en: add support for retrieving crash dump using ethtool Add support for retrieving crash dump using ethtool -w on the supported interface. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00
Vikas Gupta	c33626d83e	bnxt_en: add support for storing crash dump into host memory Newer firmware supports automatic DMA of crash dump to host memory when it crashes. If the feature is supported, allocate the required memory using the existing context memory infrastructure. Communicate the page table containing the DMA addresses to the firmware. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240828183235.128948-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-08-29 15:33:24 -07:00

1 2 3 4 5 ...

4368 Commits