mirror_ubuntu-kernels

mirror of https://git.proxmox.com/git/mirror_ubuntu-kernels.git synced 2025-12-22 19:16:06 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	7f5acea727	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-17 (iavf) This series contains updates to iavf driver only. Ding Hui fixes use-after-free issue by calling netif_napi_del() for all allocated q_vectors. He also resolves out-of-bounds issue by not updating to new values when timeout is encountered. Marcin and Ahmed change the way resets are handled so that the callback operating under the RTNL lock will wait for the reset to finish, the rtnl_lock sensitive functions in reset flow will schedule the netdev update for later in order to remove circular dependency with the critical lock. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: fix reset task race with iavf_remove() iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies Revert "iavf: Do not restart Tx queues after reset task failure" Revert "iavf: Detach device during reset task" iavf: Wait for reset in callbacks which trigger it iavf: use internal state to free traffic IRQs iavf: Fix out-of-bounds when setting channels on remove iavf: Fix use-after-free in free_netdev ==================== Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:49:08 -07:00
Subbaraya Sundeep	e7002b3b20	octeontx2-pf: mcs: Generate hash key using ecb(aes) Hardware generated encryption and ICV tags are found to be wrong when tested with IEEE MACSEC test vectors. This is because as per the HRM, the hash key (derived by AES-ECB block encryption of an all 0s block with the SAK) has to be programmed by the software in MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register. Hence fix this by generating hash key in software and configuring in hardware. Fixes: `c54ffc7360` ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:21:56 -07:00
Florian Kauer	78adb4bcf9	igc: Prevent garbled TX queue with XDP ZEROCOPY In normal operation, each populated queue item has next_to_watch pointing to the last TX desc of the packet, while each cleaned item has it set to 0. In particular, next_to_use that points to the next (necessarily clean) item to use has next_to_watch set to 0. When the TX queue is used both by an application using AF_XDP with ZEROCOPY as well as a second non-XDP application generating high traffic, the queue pointers can get in an invalid state where next_to_use points to an item where next_to_watch is NOT set to 0. However, the implementation assumes at several places that this is never the case, so if it does hold, bad things happen. In particular, within the loop inside of igc_clean_tx_irq(), next_to_clean can overtake next_to_use. Finally, this prevents any further transmission via this queue and it never gets unblocked or signaled. Secondly, if the queue is in this garbled state, the inner loop of igc_clean_tx_ring() will never terminate, completely hogging a CPU core. The reason is that igc_xdp_xmit_zc() reads next_to_use before acquiring the lock, and writing it back (potentially unmodified) later. If it got modified before locking, the outdated next_to_use is written pointing to an item that was already used elsewhere (and thus next_to_watch got written). Fixes: `9acf59a752` ("igc: Enable TX via AF_XDP zero-copy") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:43:39 -07:00
Geetha sowjanya	8fcd7c7b3a	octeontx2-pf: Dont allocate BPIDs for LBK interfaces Current driver enables backpressure for LBK interfaces. But these interfaces do not support this feature. Hence, this patch fixes the issue by skipping the backpressure configuration for these interfaces. Fixes: `75f3627099` ("octeontx2-pf: Support to enable/disable pause frames via ethtool"). Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-18 14:51:45 +02:00
Paolo Abeni	03803083a9	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-14 (ice) This series contains updates to ice driver only. Petr Oros removes multiple calls made to unregister netdev and devlink_port. Michal fixes null pointer dereference that can occur during reload. ==================== Link: https://lore.kernel.org/r/20230714201041.1717834-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-18 11:50:10 +02:00
Ahmed Zaki	c34743daca	iavf: fix reset task race with iavf_remove() The reset task is currently scheduled from the watchdog or adminq tasks. First, all direct calls to schedule the reset task are replaced with the iavf_schedule_reset(), which is modified to accept the flag showing the type of reset. To prevent the reset task from starting once iavf_remove() starts, we need to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now easily added to iavf_schedule_reset(). Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task. It is redundant since all callers who set the flag immediately schedules the reset task. Fixes: `3ccd54ef44` ("iavf: Fix init state closure on remove") Fixes: `14756b2ae2` ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:20:28 -07:00
Ahmed Zaki	d1639a1731	iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies A driver's lock (crit_lock) is used to serialize all the driver's tasks. Lockdep, however, shows a circular dependency between rtnl and crit_lock. This happens when an ndo that already holds the rtnl requests the driver to reset, since the reset task (in some paths) tries to grab rtnl to either change real number of queues of update netdev features. [566.241851] ====================================================== [566.241893] WARNING: possible circular locking dependency detected [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE [566.241984] ------------------------------------------------------ [566.242025] repro.sh/2604 is trying to acquire lock: [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf] [566.242167] but task is already holding lock: [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf] [566.242300] which lock already depends on the new lock. [566.242353] the existing dependency chain (in reverse order) is: [566.242401] -> #1 (rtnl_mutex){+.+.}-{3:3}: [566.242451] __mutex_lock+0xc1/0xbb0 [566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf] [566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf] [566.242627] process_one_work+0x2b3/0x560 [566.242663] worker_thread+0x4f/0x3a0 [566.242696] kthread+0xf2/0x120 [566.242730] ret_from_fork+0x29/0x50 [566.242763] -> #0 (&adapter->crit_lock){+.+.}-{3:3}: [566.242815] __lock_acquire+0x15ff/0x22b0 [566.242869] lock_acquire+0xd2/0x2c0 [566.242901] __mutex_lock+0xc1/0xbb0 [566.242934] iavf_close+0x3c/0x240 [iavf] [566.242997] __dev_close_many+0xac/0x120 [566.243036] dev_close_many+0x8b/0x140 [566.243071] unregister_netdevice_many_notify+0x165/0x7c0 [566.243116] unregister_netdevice_queue+0xd3/0x110 [566.243157] iavf_remove+0x6c1/0x730 [iavf] [566.243217] pci_device_remove+0x33/0xa0 [566.243257] device_release_driver_internal+0x1bc/0x240 [566.243299] pci_stop_bus_device+0x6c/0x90 [566.243338] pci_stop_and_remove_bus_device+0xe/0x20 [566.243380] pci_iov_remove_virtfn+0xd1/0x130 [566.243417] sriov_disable+0x34/0xe0 [566.243448] ice_free_vfs+0x2da/0x330 [ice] [566.244383] ice_sriov_configure+0x88/0xad0 [ice] [566.245353] sriov_numvfs_store+0xde/0x1d0 [566.246156] kernfs_fop_write_iter+0x15e/0x210 [566.246921] vfs_write+0x288/0x530 [566.247671] ksys_write+0x74/0xf0 [566.248408] do_syscall_64+0x58/0x80 [566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc [566.249886] other info that might help us debug this: [566.252014] Possible unsafe locking scenario: [566.253432] CPU0 CPU1 [566.254118] ---- ---- [566.254800] lock(rtnl_mutex); [566.255514] lock(&adapter->crit_lock); [566.256233] lock(rtnl_mutex); [566.256897] lock(&adapter->crit_lock); [566.257388] * DEADLOCK * The deadlock can be triggered by a script that is continuously resetting the VF adapter while doing other operations requiring RTNL, e.g: while :; do ip link set $VF up ethtool --set-channels $VF combined 2 ip link set $VF down ip link set $VF up ethtool --set-channels $VF combined 4 ip link set $VF down done Any operation that triggers a reset can substitute "ethtool --set-channles" As a fix, add a new task "finish_config" that do all the work which needs rtnl lock. With the exception of iavf_remove(), all work that require rtnl should be called from this task. As for iavf_remove(), at the point where we need to call unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent finish_config work cannot restart after that since the task is guarded by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config(). Fixes: `5ac49f3c27` ("iavf: use mutexes for locking of critical sections") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:20:28 -07:00
Marcin Szycik	d916d27304	Revert "iavf: Do not restart Tx queues after reset task failure" This reverts commit `08f1c147b7`. Netdev is no longer being detached during reset, so this fix can be reverted. We leave the removal of "hacky" IFF_UP flag update. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:20:11 -07:00
Marcin Szycik	d2806d960e	Revert "iavf: Detach device during reset task" This reverts commit `aa626da947`. Detaching device during reset was not fully fixing the rtnl locking issue, as there could be a situation where callback was already in progress before detaching netdev. Furthermore, detaching netdevice causes TX timeouts if traffic is running. To reproduce: ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null & while :; do for i in 200 7000 400 5000 300 3000 ; do ip netns exec ns1 ip link set $VF1 mtu $i sleep 2 done sleep 10 done Currently, callbacks such as iavf_change_mtu() wait for the reset. If the reset fails to acquire the rtnl_lock, they schedule the netdev update for later while continuing the reset flow. Operations like MTU changes are performed under the rtnl_lock. Therefore, when the operation finishes, another callback that uses rtnl_lock can start. Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:08:22 -07:00
Marcin Szycik	c2ed2403f1	iavf: Wait for reset in callbacks which trigger it There was a fail when trying to add the interface to bonding right after changing the MTU on the interface. It was caused by bonding interface unable to open the interface due to interface being in __RESETTING state because of MTU change. Add new reset_waitqueue to indicate that reset has finished. Add waiting for reset to finish in callbacks which trigger hw reset: iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam(). We use a 5000ms timeout period because on Hyper-V based systems, this operation takes around 3000-4000ms. In normal circumstances, it doesn't take more than 500ms to complete. Add a function iavf_wait_for_reset() to reuse waiting for reset code and use it also in iavf_set_channels(), which already waits for reset. We don't use error handling in iavf_set_channels() as this could cause the device to be in incorrect state if the reset was scheduled but hit timeout or the waitng function was interrupted by a signal. Fixes: `4e5e6b5d9d` ("iavf: Fix return of set the new channel count") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:08:22 -07:00
Ahmed Zaki	a77ed5c5b7	iavf: use internal state to free traffic IRQs If the system tries to close the netdev while iavf_reset_task() is running, __LINK_STATE_START will be cleared and netif_running() will return false in iavf_reinit_interrupt_scheme(). This will result in iavf_free_traffic_irqs() not being called and a leak as follows: [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0' [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0 is shown when pci_disable_msix() is later called. Fix by using the internal adapter state. The traffic IRQs will always exist if state == __IAVF_RUNNING. Fixes: `5b36e8d04b` ("i40evf: Enable VF to request an alternate queue allocation") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:08:22 -07:00
Ding Hui	7c4bced3ca	iavf: Fix out-of-bounds when setting channels on remove If we set channels greater during iavf_remove(), and waiting reset done would be timeout, then returned with error but changed num_active_queues directly, that will lead to OOB like the following logs. Because the num_active_queues is greater than tx/rx_rings[] allocated actually. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 3506.152887] iavf 0000:41:02.0: Removing device [ 3510.400799] ================================================================== [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536 [ 3510.400823] [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 3510.400835] Call Trace: [ 3510.400851] dump_stack+0x71/0xab [ 3510.400860] print_address_description+0x6b/0x290 [ 3510.400865] ? iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400868] kasan_report+0x14a/0x2b0 [ 3510.400873] iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400880] iavf_remove+0x2b6/0xc70 [iavf] [ 3510.400884] ? iavf_free_all_rx_resources+0x160/0x160 [iavf] [ 3510.400891] ? wait_woken+0x1d0/0x1d0 [ 3510.400895] ? notifier_call_chain+0xc1/0x130 [ 3510.400903] pci_device_remove+0xa8/0x1f0 [ 3510.400910] device_release_driver_internal+0x1c6/0x460 [ 3510.400916] pci_stop_bus_device+0x101/0x150 [ 3510.400919] pci_stop_and_remove_bus_device+0xe/0x20 [ 3510.400924] pci_iov_remove_virtfn+0x187/0x420 [ 3510.400927] ? pci_iov_add_virtfn+0xe10/0xe10 [ 3510.400929] ? pci_get_subsys+0x90/0x90 [ 3510.400932] sriov_disable+0xed/0x3e0 [ 3510.400936] ? bus_find_device+0x12d/0x1a0 [ 3510.400953] i40e_free_vfs+0x754/0x1210 [i40e] [ 3510.400966] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 3510.400968] ? pci_get_device+0x7c/0x90 [ 3510.400970] ? pci_get_subsys+0x90/0x90 [ 3510.400982] ? pci_vfs_assigned.part.7+0x144/0x210 [ 3510.400987] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.400996] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 3510.401001] sriov_numvfs_store+0x214/0x290 [ 3510.401005] ? sriov_totalvfs_show+0x30/0x30 [ 3510.401007] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.401011] ? __check_object_size+0x15a/0x350 [ 3510.401018] kernfs_fop_write+0x280/0x3f0 [ 3510.401022] vfs_write+0x145/0x440 [ 3510.401025] ksys_write+0xab/0x160 [ 3510.401028] ? __ia32_sys_read+0xb0/0xb0 [ 3510.401031] ? fput_many+0x1a/0x120 [ 3510.401032] ? filp_close+0xf0/0x130 [ 3510.401038] do_syscall_64+0xa0/0x370 [ 3510.401041] ? page_fault+0x8/0x30 [ 3510.401043] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 3510.401073] RIP: 0033:0x7f3a9bb842c0 [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0 [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001 [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700 [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001 [ 3510.401090] [ 3510.401093] Allocated by task 76795: [ 3510.401098] kasan_kmalloc+0xa6/0xd0 [ 3510.401099] __kmalloc+0xfb/0x200 [ 3510.401104] iavf_init_interrupt_scheme+0x26f/0x1310 [iavf] [ 3510.401108] iavf_watchdog_task+0x1d58/0x4050 [iavf] [ 3510.401114] process_one_work+0x56a/0x11f0 [ 3510.401115] worker_thread+0x8f/0xf40 [ 3510.401117] kthread+0x2a0/0x390 [ 3510.401119] ret_from_fork+0x1f/0x40 [ 3510.401122] 0xffffffffffffffff [ 3510.401123] In timeout handling, we should keep the original num_active_queues and reset num_req_queues to 0. Fixes: `4e5e6b5d9d` ("iavf: Fix return of set the new channel count") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Cc: Huang Cun <huangcun@sangfor.com.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:08:22 -07:00
Ding Hui	5f4fa1672d	iavf: Fix use-after-free in free_netdev We do netif_napi_add() for all allocated q_vectors[], but potentially do netif_napi_del() for part of them, then kfree q_vectors and leave invalid pointers at dev->napi_list. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 4093.900222] ================================================================== [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390 [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699 [ 4093.900233] [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 4093.900239] Call Trace: [ 4093.900244] dump_stack+0x71/0xab [ 4093.900249] print_address_description+0x6b/0x290 [ 4093.900251] ? free_netdev+0x308/0x390 [ 4093.900252] kasan_report+0x14a/0x2b0 [ 4093.900254] free_netdev+0x308/0x390 [ 4093.900261] iavf_remove+0x825/0xd20 [iavf] [ 4093.900265] pci_device_remove+0xa8/0x1f0 [ 4093.900268] device_release_driver_internal+0x1c6/0x460 [ 4093.900271] pci_stop_bus_device+0x101/0x150 [ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900275] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10 [ 4093.900278] ? pci_get_subsys+0x90/0x90 [ 4093.900280] sriov_disable+0xed/0x3e0 [ 4093.900282] ? bus_find_device+0x12d/0x1a0 [ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 4093.900299] ? pci_get_device+0x7c/0x90 [ 4093.900300] ? pci_get_subsys+0x90/0x90 [ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210 [ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900318] sriov_numvfs_store+0x214/0x290 [ 4093.900320] ? sriov_totalvfs_show+0x30/0x30 [ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900323] ? __check_object_size+0x15a/0x350 [ 4093.900326] kernfs_fop_write+0x280/0x3f0 [ 4093.900329] vfs_write+0x145/0x440 [ 4093.900330] ksys_write+0xab/0x160 [ 4093.900332] ? __ia32_sys_read+0xb0/0xb0 [ 4093.900334] ? fput_many+0x1a/0x120 [ 4093.900335] ? filp_close+0xf0/0x130 [ 4093.900338] do_syscall_64+0xa0/0x370 [ 4093.900339] ? page_fault+0x8/0x30 [ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900357] RIP: 0033:0x7f16ad4d22c0 [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0 [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001 [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700 [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001 [ 4093.900367] [ 4093.900368] Allocated by task 820: [ 4093.900371] kasan_kmalloc+0xa6/0xd0 [ 4093.900373] __kmalloc+0xfb/0x200 [ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf] [ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf] [ 4093.900382] process_one_work+0x56a/0x11f0 [ 4093.900383] worker_thread+0x8f/0xf40 [ 4093.900384] kthread+0x2a0/0x390 [ 4093.900385] ret_from_fork+0x1f/0x40 [ 4093.900387] 0xffffffffffffffff [ 4093.900387] [ 4093.900388] Freed by task 6699: [ 4093.900390] __kasan_slab_free+0x137/0x190 [ 4093.900391] kfree+0x8b/0x1b0 [ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf] [ 4093.900397] iavf_remove+0x35a/0xd20 [iavf] [ 4093.900399] pci_device_remove+0xa8/0x1f0 [ 4093.900400] device_release_driver_internal+0x1c6/0x460 [ 4093.900401] pci_stop_bus_device+0x101/0x150 [ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900403] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900404] sriov_disable+0xed/0x3e0 [ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900416] sriov_numvfs_store+0x214/0x290 [ 4093.900417] kernfs_fop_write+0x280/0x3f0 [ 4093.900418] vfs_write+0x145/0x440 [ 4093.900419] ksys_write+0xab/0x160 [ 4093.900420] do_syscall_64+0xa0/0x370 [ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900422] 0xffffffffffffffff [ 4093.900422] [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200 which belongs to the cache kmalloc-8k of size 8192 [ 4093.900425] The buggy address is located 5184 bytes inside of 8192-byte region [ffff88b4dc144200, ffff88b4dc146200) [ 4093.900425] The buggy address belongs to the page: [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0 [ 4093.900430] flags: 0x10000000008100(slab\|head) [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80 [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000 [ 4093.900434] page dumped because: kasan: bad access detected [ 4093.900435] [ 4093.900435] Memory state around the buggy address: [ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] ^ [ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ================================================================== Although the patch #2 (of 2) can avoid the issue triggered by this repro.sh, there still are other potential risks that if num_active_queues is changed to less than allocated q_vectors[] by unexpected, the mismatched netif_napi_add/del() can also cause UAF. Since we actually call netif_napi_add() for all allocated q_vectors unconditionally in iavf_alloc_q_vectors(), so we should fix it by letting netif_napi_del() match to netif_napi_add(). Fixes: `5eae00c57f` ("i40evf: main driver core") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Cc: Huang Cun <huangcun@sangfor.com.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-17 10:08:10 -07:00
Heiner Kallweit	162d626f30	r8169: fix ASPM-related problem for chip version 42 and 43 Referenced commit missed that for chip versions 42 and 43 ASPM remained disabled in the respective rtl_hw_start_...() routines. This resulted in problems as described in the referenced bug ticket. Therefore re-instantiate the previous logic. Fixes: `5fc3f6c90c` ("r8169: consolidate disabling ASPM before EPHY access") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-17 07:47:39 +01:00
Michal Swiatkowski	b3e7b3a6ee	ice: prevent NULL pointer deref during reload Calling ethtool during reload can lead to call trace, because VSI isn't configured for some time, but netdev is alive. To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors to 0 after freeing and add a check for ::tx/rx_rings in ring related ethtool ops. Add proper unroll of filters in ice_start_eth(). Reproduction: $watch -n 0.1 -d 'ethtool -g enp24s0f0np0' $devlink dev reload pci/0000:18:00.0 action driver_reinit Call trace before fix: [66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000 [66303.926259] #PF: supervisor read access in kernel mode [66303.926286] #PF: error_code(0x0000) - not-present page [66303.926311] PGD 0 P4D 0 [66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI [66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1 [66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018 [66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice] [66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48 [66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246 [66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48 [66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000 [66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000 [66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000 [66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50 [66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000 [66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0 [66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [66303.927060] PKRU: 55555554 [66303.927075] Call Trace: [66303.927094] <TASK> [66303.927111] ? __die+0x23/0x70 [66303.927140] ? page_fault_oops+0x171/0x4e0 [66303.927176] ? exc_page_fault+0x7f/0x180 [66303.927209] ? asm_exc_page_fault+0x26/0x30 [66303.927244] ? ice_get_ringparam+0x22/0x50 [ice] [66303.927433] rings_prepare_data+0x62/0x80 [66303.927469] ethnl_default_doit+0xe2/0x350 [66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140 [66303.927538] genl_rcv_msg+0x1b1/0x2c0 [66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10 [66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10 [66303.927615] netlink_rcv_skb+0x58/0x110 [66303.927644] genl_rcv+0x28/0x40 [66303.927665] netlink_unicast+0x19e/0x290 [66303.927691] netlink_sendmsg+0x254/0x4d0 [66303.927717] sock_sendmsg+0x93/0xa0 [66303.927743] __sys_sendto+0x126/0x170 [66303.927780] __x64_sys_sendto+0x24/0x30 [66303.928593] do_syscall_64+0x5d/0x90 [66303.929370] ? __count_memcg_events+0x60/0xa0 [66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30 [66303.930920] ? handle_mm_fault+0x9e/0x350 [66303.931688] ? do_user_addr_fault+0x258/0x740 [66303.932452] ? exc_page_fault+0x7f/0x180 [66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: `5b246e533d` ("ice: split probe into smaller functions") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-14 09:56:20 -07:00
Petr Oros	24a3298ac9	ice: Unregister netdev and devlink_port only once Since commit `6624e780a5` ("ice: split ice_vsi_setup into smaller functions") ice_vsi_release does things twice. There is unregister netdev which is unregistered in ice_deinit_eth also. It also unregisters the devlink_port twice which is also unregistered in ice_deinit_eth(). This double deregistration is hidden because devl_port_unregister ignores the return value of xa_erase. [ 68.642167] Call Trace: [ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice] [ 68.655656] ice_vsi_release+0x445/0x690 [ice] [ 68.660147] ice_deinit+0x99/0x280 [ice] [ 68.664117] ice_remove+0x1b6/0x5c0 [ice] [ 171.103841] Call Trace: [ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice] [ 171.114841] ice_remove+0x158/0x270 [ice] [ 171.118854] pci_device_remove+0x3b/0xc0 [ 171.122779] device_release_driver_internal+0xc7/0x170 [ 171.127912] driver_detach+0x54/0x8c [ 171.131491] bus_remove_driver+0x77/0xd1 [ 171.135406] pci_unregister_driver+0x2d/0xb0 [ 171.139670] ice_module_exit+0xc/0x55f [ice] Fixes: `6624e780a5` ("ice: split ice_vsi_setup into smaller functions") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-14 09:45:10 -07:00
Wang Ming	a822551c51	net: ethernet: Remove repeating expression Identify issues that arise by using the tests/doublebitand.cocci semantic patch. Need to remove duplicate expression in if statement. Signed-off-by: Wang Ming <machel@vivo.com> Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-14 09:11:10 +01:00
Wang Ming	4ad23d2368	bna: Remove error checking for debugfs_create_dir() It is expected that most callers should _ignore_ the errors return by debugfs_create_dir() in bnad_debugfs_init(). Signed-off-by: Wang Ming <machel@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-14 09:09:12 +01:00
Daniel Golle	1d6d537dc5	net: ethernet: mtk_eth_soc: handle probe deferral Move the call to of_get_ethdev_address to mtk_add_mac which is part of the probe function and can hence itself return -EPROBE_DEFER should of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely get rid of the mtk_init function. The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced in situations in which the NVMEM provider holding the MAC address has not yet be loaded at the time mtk_eth_soc is initially probed. In this case probing of mtk_eth_soc should be deferred instead of falling back to use a random MAC address, so once the NVMEM provider becomes available probing can be repeated. Fixes: `656e705243` ("net-next: mediatek: add support for MT7623 ethernet") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-14 08:44:56 +01:00
Tanmay Patil	b685f1a589	net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the field will be strictly contained within one word. However, this is not guaranteed to be the case and it is possible for ALE field entries to span across up to two words at the most. Fix the methods to handle getting/setting fields spanning up to two words. Fixes: `db82173f23` ("netdev: driver: ethernet: add cpsw address lookup engine support") Signed-off-by: Tanmay Patil <t-patil@ti.com> [s-vadapalli@ti.com: rephrased commit message and added Fixes tag] Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-14 08:36:43 +01:00
Jiawen Wu	aa846677a9	net: txgbe: fix eeprom calculation error For some device types like TXGBE_ID_XAUI, checksum computed in txgbe_calc_eeprom_checksum() is larger than TXGBE_EEPROM_SUM. Remove the limit on the size of checksum. Fixes: `049fe53653` ("net: txgbe: Add operations to interact with firmware") Fixes: `5e2ea7801f` ("net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20230711063414.3311-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-12 16:54:52 -07:00
Krister Johansen	1e9cb763e9	net: ena: fix shift-out-of-bounds in exponential backoff The ENA adapters on our instances occasionally reset. Once recently logged a UBSAN failure to console in the process: UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13 shift exponent 32 is too large for 32-bit type 'unsigned int' CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117 Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017 Workqueue: ena ena_fw_reset_device [ena] Call Trace: <TASK> dump_stack_lvl+0x4a/0x63 dump_stack+0x10/0x16 ubsan_epilogue+0x9/0x36 __ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e ? __const_udelay+0x43/0x50 ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena] wait_for_reset_state+0x54/0xa0 [ena] ena_com_dev_reset+0xc8/0x110 [ena] ena_down+0x3fe/0x480 [ena] ena_destroy_device+0xeb/0xf0 [ena] ena_fw_reset_device+0x30/0x50 [ena] process_one_work+0x22b/0x3d0 worker_thread+0x4d/0x3f0 ? process_one_work+0x3d0/0x3d0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> Apparently, the reset delays are getting so large they can trigger a UBSAN panic. Looking at the code, the current timeout is capped at 5000us. Using a base value of 100us, the current code will overflow after (1<<29). Even at values before 32, this function wraps around, perhaps unintentionally. Cap the value of the exponent used for this backoff at (1<<16) which is larger than currently necessary, but large enough to support bigger values in the future. Cc: stable@vger.kernel.org Fixes: `4bb7f4cf60` ("net: ena: reduce driver load time") Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-12 15:57:57 -07:00
David S. Miller	b6c9ebde5a	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== igc: Fix corner cases for TSN offload Florian Kauer says: The igc driver supports several different offloading capabilities relevant in the TSN context. Recent patches in this area introduced regressions for certain corner cases that are fixed in this series. Each of the patches (except the first one) addresses a different regression that can be separately reproduced. Still, they have overlapping code changes so they should not be separately applied. Especially #4 and #6 address the same observation, but both need to be applied to avoid TX hang occurrences in the scenario described in the patches. ==================== Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-12 10:07:24 +01:00
Suman Ghosh	8278ee2a26	octeontx2-pf: Add additional check for MCAM rules Due to hardware limitation, MCAM drop rule with ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting such rules. Fixes: `dce677da57` ("octeontx2-pf: Add vlan-etype to ntuple filters") Signed-off-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20230710103027.2244139-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-11 12:56:57 +02:00
Wei Fang	84a1094719	net: fec: use netdev_err_once() instead of netdev_err() In the case of heavy XDP traffic to be transmitted, the console will print the error log continuously if there are lack of enough BDs to accommodate the frames. The log looks like below. [ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG! Not only will this log be replicated and redundant, it will also degrade XDP performance. So we use netdev_err_once() instead of netdev_err() now. Fixes: `6d6b39f180` ("net: fec: add initial XDP support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-11 10:00:49 +02:00
Wei Fang	56b3c6ba53	net: fec: increase the size of tx ring and update tx_wake_threshold When the XDP feature is enabled and with heavy XDP frames to be transmitted, there is a considerable probability that available tx BDs are insufficient. This will lead to some XDP frames to be discarded and the "NOT enough BD for SG!" error log will appear in the console (as shown below). [ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG! In the case of heavy XDP traffic, sometimes the speed of recycling tx BDs may be slower than the speed of sending XDP frames. There may be several specific reasons, such as the interrupt is not responsed in time, the efficiency of the NAPI callback function is too low due to all the queues (tx queues and rx queues) share the same NAPI, and so on. After trying various methods, I think that increase the size of tx BD ring is simple and effective. Maybe the best resolution is that allocate NAPI for each queue to improve the efficiency of the NAPI callback, but this change is a bit big and I didn't try this method. Perheps this method will be implemented in a future patch. This patch also updates the tx_wake_threshold of tx ring which is related to the size of tx ring in the previous logic. Otherwise, the tx_wake_threshold will be too high (403 BDs), which is more likely to impact the slow path in the case of heavy XDP traffic, because XDP path and slow path share the tx BD rings. According to Jakub's suggestion, the tx_wake_threshold is at least equal to tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of entries is overflowing, we should be able to apply a hysteresis of a few tens of entries. Fixes: `6d6b39f180` ("net: fec: add initial XDP support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-11 10:00:49 +02:00
Wei Fang	20f7973990	net: fec: recycle pages for transmitted XDP frames Once the XDP frames have been successfully transmitted through the ndo_xdp_xmit() interface, it's the driver responsibility to free the frames so that the page_pool can recycle the pages and reuse them. However, this action is not implemented in the fec driver. This leads to a user-visible problem that the console will print the following warning log. [ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec [ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec [ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec [ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec [ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec Therefore, to solve this issue, we free XDP frames via xdp_return_frame() while cleaning the tx BD ring. Fixes: `6d6b39f180` ("net: fec: add initial XDP support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-11 10:00:49 +02:00
Wei Fang	be7ecbe7ec	net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP When a XDP program is installed or uninstalled, fec_restart() will be invoked to reset MAC and buffer descriptor rings. It's reasonable not to transmit any packet during the process of reset. However, the NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default, that is to say, it's possible that the fec_enet_xdp_xmit() will be invoked even if the process of reset is not finished. In this case, the redirected XDP frames might be dropped and available transmit BDs may be incorrectly deemed insufficient. So this patch disable the NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure this feature when the bpf program is installed or uninstalled. Fixes: `e4ac7cc6e5` ("net: fec: turn on XDP features") Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-11 10:00:49 +02:00
Florian Kauer	0bcc62858d	igc: Fix inserting of empty frame for launchtime The insertion of an empty frame was introduced with commit `db0b124f02` ("igc: Enhance Qbv scheduling by using first flag bit") in order to ensure that the current cycle has at least one packet if there is some packet to be scheduled for the next cycle. However, the current implementation does not properly check if a packet is already scheduled for the current cycle. Currently, an empty packet is always inserted if and only if txtime >= end_of_cycle && txtime > last_tx_cycle but since last_tx_cycle is always either the end of the current cycle (end_of_cycle) or the end of a previous cycle, the second part (txtime > last_tx_cycle) is always true unless txtime == last_tx_cycle. What actually needs to be checked here is if the last_tx_cycle was already written within the current cycle, so an empty frame should only be inserted if and only if txtime >= end_of_cycle && end_of_cycle > last_tx_cycle. This patch does not only avoid an unnecessary insertion, but it can actually be harmful to insert an empty packet if packets are already scheduled in the current cycle, because it can lead to a situation where the empty packet is actually processed as the first packet in the upcoming cycle shifting the packet with the first_flag even one cycle into the future, finally leading to a TX hang. The TX hang can be reproduced on a i225 with: sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x1 \ txtime-delay 500000 \ clockid CLOCK_TAI sudo tc qdisc replace dev enp1s0 parent 100:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload \ skip_sock_check and traffic generator sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns with traffic.cfg #define ETH_P_IP 0x0800 { /* Ethernet Header / 0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed 0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed const16(ETH_P_IP), / IPv4 Header / 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum / UDP Header / 10, 0, 48, 1, # IP Src - adapt as needed 10, 0, 48, 10, # IP Dest - adapt as needed const16(5555), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum / Payload */ fill('W', 1000), } and the observed message with that is for example igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang Tx Queue <0> TDH <32> TDT <3c> next_to_use <3c> next_to_clean <32> buffer_info[next_to_clean] time_stamp <ffff26a8> next_to_watch <00000000632a1828> jiffies <ffff27f8> desc.status <1048000> Fixes: `db0b124f02` ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-10 08:59:08 -07:00
Florian Kauer	c1bca9ac0b	igc: Fix launchtime before start of cycle It is possible (verified on a running system) that frames are processed by igc_tx_launchtime with a txtime before the start of the cycle (baset_est). However, the result of txtime - baset_est is written into a u32, leading to a wrap around to a positive number. The following launchtime > 0 check will only branch to executing launchtime = 0 if launchtime is already 0. Fix it by using a s32 before checking launchtime > 0. Fixes: `db0b124f02` ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-10 08:59:08 -07:00
Florian Kauer	8b86f10ab6	igc: No strict mode in pure launchtime/CBS offload The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END prevent the packet transmission over slot and cycle boundaries. This is important for taprio offload where the slots and cycles correspond to the slots and cycles configured for the network. However, the Qbv offload feature of the i225 is also used for enabling TX launchtime / ETF offload. In that case, however, the cycle has no meaning for the network and is only used internally to adapt the base time register after a second has passed. Enabling strict mode in this case would unnecessarily prevent the transmission of certain packets (i.e. at the boundary of a second) and thus interferes with the ETF qdisc that promises transmission at a certain point in time. Similar to ETF, this also applies to CBS offload that also should not be influenced by strict mode unless taprio offload would be enabled at the same time. This fully reverts commit `d8f45be01d` ("igc: Use strict cycles for Qbv scheduling") but its commit message only describes what was already implemented before that commit. The difference to a plain revert of that commit is that it now copes with the base_time = 0 case that was fixed with commit `e17090eb24` ("igc: allow BaseTime 0 enrollment for Qbv") In particular, enabling strict mode leads to TX hang situations under high traffic if taprio is applied WITHOUT taprio offload but WITH ETF offload, e.g. as in sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x1 \ txtime-delay 500000 \ clockid CLOCK_TAI sudo tc qdisc replace dev enp1s0 parent 100:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload \ skip_sock_check and traffic generator sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns with traffic.cfg #define ETH_P_IP 0x0800 { /* Ethernet Header / 0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed 0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed const16(ETH_P_IP), / IPv4 Header / 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum / UDP Header / 10, 0, 48, 1, # IP Src - adapt as needed 10, 0, 48, 10, # IP Dest - adapt as needed const16(5555), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum / Payload */ fill('W', 1000), } and the observed message with that is for example igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang Tx Queue <0> TDH <d0> TDT <f0> next_to_use <f0> next_to_clean <d0> buffer_info[next_to_clean] time_stamp <ffff661f> next_to_watch <00000000245a4efb> jiffies <ffff6e48> desc.status <1048000> Fixes: `d8f45be01d` ("igc: Use strict cycles for Qbv scheduling") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-10 08:58:16 -07:00
Florian Kauer	e5d88c53d0	igc: Handle already enabled taprio offload for basetime 0 Since commit `e17090eb24` ("igc: allow BaseTime 0 enrollment for Qbv") it is possible to enable taprio offload with a basetime of 0. However, the check if taprio offload is already enabled (and thus -EALREADY should be returned for igc_save_qbv_schedule) still relied on adapter->base_time > 0. This can be reproduced as follows: # TAPRIO offload (flags == 0x2) and base-time = 0 sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x2 # The second call should fail with "Error: Device failed to setup taprio offload." # But that only happens if base-time was != 0 sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x2 Fixes: `e17090eb24` ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-10 08:40:22 -07:00
Florian Kauer	82ff5f29b7	igc: Do not enable taprio offload for invalid arguments Only set adapter->taprio_offload_enable after validating the arguments. Otherwise, it stays set even if the offload was not enabled. Since the subsequent code does not get executed in case of invalid arguments, it will not be read at first. However, by activating and then deactivating another offload (e.g. ETF/TX launchtime offload), taprio_offload_enable is read and erroneously keeps the offload feature of the NIC enabled. This can be reproduced as follows: # TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time -1000 \ sched-entry S 01 300000 \ flags 0x2 # IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register) sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w1 # Activate ETF offload sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 \ hw 0 sudo tc qdisc add dev enp1s0 parent 6666:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload # IGC_TQAVCTRL is 0x9 as expected sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w1 # Deactivate ETF offload again sudo tc qdisc delete dev enp1s0 parent 6666:1 # IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9 sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 Fixes: `e17090eb24` ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-10 08:40:22 -07:00
Florian Kauer	8046063df8	igc: Rename qbv_enable to taprio_offload_enable In the current implementation the flags adapter->qbv_enable and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not have the same meaning. The first one is used only to indicate taprio offload (i.e. when igc_save_qbv_schedule was called), while the second one corresponds to the Qbv mode of the hardware. However, the second one is also used to support the TX launchtime feature, i.e. ETF qdisc offload. This leads to situations where adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED is set. This is prone to confusion. The rename should reduce this confusion. Since it is a pure rename, it has no impact on functionality. Fixes: `e17090eb24` ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-10 08:40:22 -07:00
Junfeng Guo	9d0aba9831	gve: unify driver name usage Current codebase contained the usage of two different names for this driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users to use, especially when trying to bind or unbind the driver manually. The corresponding kernel module is registered with the name of `gve`. It's more reasonable to align the name of the driver with the module. Fixes: `893ce44df5` ("gve: Add basic driver framework for Compute Engine Virtual NIC") Cc: csully@google.com Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-10 08:29:55 +01:00
Simon Horman	73c4d1b307	net: lan743x: select FIXED_PHY The blamed commit introduces usage of fixed_phy_register() but not a corresponding dependency on FIXED_PHY. This can result in a build failure. s390-linux-ld: drivers/net/ethernet/microchip/lan743x_main.o: in function `lan743x_phy_open': drivers/net/ethernet/microchip/lan743x_main.c:1514: undefined reference to `fixed_phy_register' Fixes: `624864fbff` ("net: lan743x: add fixed phy support for LAN7431 device") Cc: stable@vger.kernel.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/netdev/725bf1c5-b252-7d19-7582-a6809716c7d6@infradead.org/ Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-09 11:23:47 +01:00
Rafał Miłecki	e7731194fd	net: bgmac: postpone turning IRQs off to avoid SoC hangs Turning IRQs off is done by accessing Ethernet controller registers. That can't be done until device's clock is enabled. It results in a SoC hang otherwise. This bug remained unnoticed for years as most bootloaders keep all Ethernet interfaces turned on. It seems to only affect a niche SoC family BCM47189. It has two Ethernet controllers but CFE bootloader uses only the first one. Fixes: `34322615cb` ("net: bgmac: Mask interrupts during probe") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-08 10:02:15 +01:00
Shannon Nelson	3a7af34fb6	ionic: remove dead device fail path Remove the probe error path code that leaves the driver bound to the device, but with essentially a dead device. This was useful maybe twice early in the driver's life and no longer makes sense to keep. Fixes: `30a1e6d0f8` ("ionic: keep ionic dev on lif init fail") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-07 09:14:45 +01:00
Nitya Sunkad	abfb2a58a5	ionic: remove WARN_ON to prevent panic_on_warn Remove unnecessary early code development check and the WARN_ON that it uses. The irq alloc and free paths have long been cleaned up and this check shouldn't have stuck around so long. Fixes: `77ceb68e29` ("ionic: Add notifyq support") Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-07 09:13:16 +01:00
Sai Krishna	7709fbd492	octeontx2-af: Move validation of ptp pointer before its usage Moved PTP pointer validation before its use to avoid smatch warning. Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree. Fixes: `2ef4e45d99` ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-07 09:08:57 +01:00
Ratheesh Kannoth	af42088bda	octeontx2-af: Promisc enable/disable through mbox In legacy silicon, promiscuous mode is only modified through CGX mbox messages. In CN10KB silicon, it is modified from CGX mbox and NIX. This breaks legacy application behaviour. Fix this by removing call from NIX. Fixes: `d6c9784baf` ("octeontx2-af: Invoke exact match functions if supported") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-07 09:06:50 +01:00
David S. Miller	b61aac027b	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-05 (igc) This series contains updates to igc driver only. Husaini adds check to increment Qbv change error counter only on taprio Qbvs. He also removes delay during Tx ring configuration and resolves Tx hang that could occur when transmitting on a gate to be closed. Prasad Koya reports ethtool link mode as TP (twisted pair). Tee Min corrects value for max SDU. Aravindhan ensures that registers for PPS are always programmed to occur in future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-07 08:56:12 +01:00
Junfeng Guo	0503efeadb	gve: Set default duplex configuration to full Current duplex mode was unset in the driver, resulting in the default parameter being set to 0, which corresponds to half duplex. It might mislead users to have incorrect expectation about the driver's transmission capabilities. Set the default duplex configuration to full, as the driver runs in full duplex mode at this point. Fixes: `7e074d5a76` ("gve: Enable Link Speed Reporting in the driver.") Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-06 19:14:24 -07:00
Jakub Kicinski	41b9eff0ce	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-05 (ice) This series contains updates to ice driver only. Sridhar fixes incorrect comparison of max Tx rate limit to occur against each TC value rather than the aggregate. He also resolves an issue with the wrong VSI being used when setting max Tx rate when TCs are enabled. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix tx queue rate limit when TCs are configured ice: Fix max_rate check while configuring TX rate limits ==================== Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-06 19:14:16 -07:00
Jakub Kicinski	4863b57bfd	mlx5-fixes-2023-07-05 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSlrvAACgkQSD+KveBX +j4EcggAsZlHQHRaA9re/5Fr8VV/YNmf+eLM6/2F6CD1sLcwWEsmDXkqZpL/wqVv bcP/dE/ehiMd1FI6XmEb9aGZY29mQBoItwCnxeUrK75d8CQib/pH87VkQdT9Uf6W RVqPk2rOL778sTy6V/UZDecfmZB5XfEoOO6f0YP/j2t8HxmfdetBN0orE0iRQBmO +0W8X5bIfCmGyWdQzOJB4a+S7Wi1JACr6CIOdNtADuZ7C7kKiPWrvEl7DMHlyX3Y TyrE//piSJuxaHwdMNBHPsXK5ZYyn7ZJjqeCA+Kd+lKMJlJEO7R+TkTOP0RKE++a ZdDetCqLKegQPpWZgjGzei7PpSpaFA== =cNgk -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-07-05 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: RX, Fix page_pool page fragment tracking for XDP net/mlx5: Query hca_cap_2 only when supported net/mlx5e: TC, CT: Offload ct clear only once net/mlx5e: Check for NOT_READY flag state after locking net/mlx5: Register a unique thermal zone per device net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq net/mlx5e: fix memory leak in mlx5e_ptp_open net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create net/mlx5e: fix double free in mlx5e_destroy_flow_table ==================== Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-06 19:11:21 -07:00
Vladimir Oltean	c6efb4ae38	net: mscc: ocelot: fix oversize frame dropping for preemptible TCs This switch implements Hold/Release in a strange way, with no control from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly based on the schedule. Namely, when the gate of a preemptible TC is about to close (actually QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event), the QSYS seems to emit a HOLD request pulse towards the MAC which preempts the currently transmitted packet, and further packets are held back in the queue system. This allows large frames to be squeezed through small time slots, because HOLD requests initiated by the gate events result in the frame being segmented in multiple fragments, the bit time of which is equal to the size of the time slot. It has been reported that the vsc9959_tas_guard_bands_update() logic breaks this, because it doesn't take preemptible TCs into account, and enables oversized frame dropping when the time slot doesn't allow a full MTU to be sent, but it does allow 2minFragSize to be sent (128B). Packets larger than 128B are dropped instead of being sent in multiple fragments. Confusingly, the manual says: \| For guard band, SDU calculation of a traffic class of a port, if \| preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then \| QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise \| QSYS::QMAXSDU_CFG_.QMAXSDU_* is used. but this only refers to the static guard band durations, and the QMAXSDU_CFG_* registers have dual purpose - the other being oversized frame dropping, which takes place irrespective of whether frames are preemptible or express. So, to fix the problem, we need to call vsc9959_tas_guard_bands_update() from ocelot_port_update_active_preemptible_tcs(), and modify the guard band logic to consider a different (lower) oversize limit for preemptible traffic classes. Fixes: `403ffc2c34` ("net: mscc: ocelot: add support for preemptible traffic classes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-06 19:10:22 -07:00
Vladimir Oltean	009d30f1a7	net: mscc: ocelot: extend ocelot->fwd_domain_lock to cover ocelot->tas_lock In a future commit we will have to call vsc9959_tas_guard_bands_update() from ocelot_port_update_active_preemptible_tcs(), and that will be impossible due to the AB/BA locking dependencies between ocelot->tas_lock and ocelot->fwd_domain_lock. Just like we did in commit `3ff468ef98` ("net: mscc: ocelot: remove struct ocelot_mm_state :: lock"), the only solution is to expand the scope of ocelot->fwd_domain_lock for it to also serialize changes made to the Time-Aware Shaper, because those will have to result in a recalculation of cut-through TCs, which is something that depends on the forwarding domain. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Message-ID: <20230705104422.49025-2-vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-06 19:10:22 -07:00
Klaus Kudielka	21327f81db	net: mvneta: fix txq_map in case of txq_number==1 If we boot with mvneta.txq_number=1, the txq_map is set incorrectly: MVNETA_CPU_TXQ_ACCESS(1) refers to TX queue 1, but only TX queue 0 is initialized. Fix this. Fixes: `50bf8cb6fc` ("net: mvneta: Configure XPS support") Signed-off-by: Klaus Kudielka <klaus.kudielka@gmail.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://lore.kernel.org/r/20230705053712.3914-1-klaus.kudielka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-07-06 10:29:39 +02:00
Linus Torvalds	6843306689	Including fixes from bluetooth, bpf and wireguard. Current release - regressions: - nvme-tcp: fix comma-related oops after sendpage changes Current release - new code bugs: - ptp: make max_phase_adjustment sysfs device attribute invisible when not supported Previous releases - regressions: - sctp: fix potential deadlock on &net->sctp.addr_wq_lock - mptcp: - ensure subflow is unhashed before cleaning the backlog - do not rely on implicit state check in mptcp_listen() Previous releases - always broken: - net: fix net_dev_start_xmit trace event vs skb_transport_offset() - Bluetooth: - fix use-bdaddr-property quirk - L2CAP: fix multiple UaFs - ISO: use hci_sync for setting CIG parameters - hci_event: fix Set CIG Parameters error status handling - hci_event: fix parsing of CIS Established Event - MGMT: fix marking SCAN_RSP as not connectable - wireguard: queuing: use saner cpu selection wrapping - sched: act_ipt: various bug fixes for iptables <> TC interactions - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging - eth: sfc: fix null-deref in devlink port without MAE access - eth: ibmvnic: do not reset dql stats on NON_FATAL err Misc: - xsk: honor SO_BINDTODEVICE on bind Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSlu+MACgkQMUZtbf5S Irslgw//S7jf/GL8V6y8VL3te+/OPOZnLDTzFFOdy64/y97FE6XIacJUpyWRhtmz oSzcSNHETPW9U+xSGa2ZQlKhAXt6n9iRNvUegql+VBb13Iz+l7AdTeoxRv/YuwDo 5lTOIB6cBw+ATd0oxS6wr8SyUlcvktUKBfTAItjbVM55aXfIUpXIa84+F7avJgIA XP1u/3PHhwItmwo/hXhHH0+P0QA8ix1q2SvRB7DAlQLBsTuQhaKjXWQkYYTKw/Nt dtvh8iQSs/YXaHMjTa5CK28HOD8+ywIizr+uJ9VaNqIzV0W5JE9IE8P4NFpBcY7t kGjTYODOph7dkNmZ5RLj3N+B6CyC57OXDzoo/tr8940UytCLVj9EVyduarLGLx57 edqK9cUz5kWejyGoyZ4Pvlo/SKvCQ2HKMeiAJ0/nNpTJMFuygMoqGsaD6ttzkXMj fZLPjRUK3axd+15ZzhLEf8HyL5Qh+qPqqX9p7NljfMKwhxMWJ5fuICJfdGOSdMJR ndL+wPfRPFQwszZ4pbTY2Ivn29mo8ScBOSOEgQs2mOny+zFzTzmqNWz/jcFfQnjS cylxBEHrgudT2FuCImZ/v66TM5yakHXqIdpTGG+zsvJWQqjM96Z3I7WRvi0g9d75 n84il+j34mnzl90j2xEutqUiK7BQ9ZpZBsutPVTKBIHKWWiortI= =9yzk -----END PGP SIGNATURE----- Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf and wireguard. Current release - regressions: - nvme-tcp: fix comma-related oops after sendpage changes Current release - new code bugs: - ptp: make max_phase_adjustment sysfs device attribute invisible when not supported Previous releases - regressions: - sctp: fix potential deadlock on &net->sctp.addr_wq_lock - mptcp: - ensure subflow is unhashed before cleaning the backlog - do not rely on implicit state check in mptcp_listen() Previous releases - always broken: - net: fix net_dev_start_xmit trace event vs skb_transport_offset() - Bluetooth: - fix use-bdaddr-property quirk - L2CAP: fix multiple UaFs - ISO: use hci_sync for setting CIG parameters - hci_event: fix Set CIG Parameters error status handling - hci_event: fix parsing of CIS Established Event - MGMT: fix marking SCAN_RSP as not connectable - wireguard: queuing: use saner cpu selection wrapping - sched: act_ipt: various bug fixes for iptables <> TC interactions - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging - eth: sfc: fix null-deref in devlink port without MAE access - eth: ibmvnic: do not reset dql stats on NON_FATAL err Misc: - xsk: honor SO_BINDTODEVICE on bind" * tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) nfp: clean mc addresses in application firmware when closing port selftests: mptcp: pm_nl_ctl: fix 32-bit support selftests: mptcp: depend on SYN_COOKIES selftests: mptcp: userspace_pm: report errors with 'remove' tests selftests: mptcp: userspace_pm: use correct server port selftests: mptcp: sockopt: return error if wrong mark selftests: mptcp: sockopt: use 'iptables-legacy' if available selftests: mptcp: connect: fail if nft supposed to work mptcp: do not rely on implicit state check in mptcp_listen() mptcp: ensure subflow is unhashed before cleaning the backlog s390/qeth: Fix vipa deletion octeontx-af: fix hardware timestamp configuration net: dsa: sja1105: always enable the send_meta options net: dsa: tag_sja1105: fix MAC DA patching from meta frames net: Replace strlcpy with strscpy pptp: Fix fib lookup calls. mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX xsk: Honor SO_BINDTODEVICE on bind ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported ...	2023-07-05 15:44:45 -07:00
Aravindhan Gunasekaran	84a192e461	igc: Handle PPS start time programming for past time values I225/6 hardware can be programmed to start PPS output once the time in Target Time registers is reached. The time programmed in these registers should always be into future. Only then PPS output is triggered when SYSTIM register reaches the programmed value. There are two modes in i225/6 hardware to program PPS, pulse and clock mode. There were issues reported where PPS is not generated when start time is in past. Example 1, "echo 0 0 0 2 0 > /sys/class/ptp/ptp0/period" In the current implementation, a value of '0' is programmed into Target time registers and PPS output is in pulse mode. Eventually an interrupt which is triggered upon SYSTIM register reaching Target time is not fired. Thus no PPS output is generated. Example 2, "echo 0 0 0 1 0 > /sys/class/ptp/ptp0/period" Above case, a value of '0' is programmed into Target time registers and PPS output is in clock mode. Here, HW tries to catch-up the current time by incrementing Target Time register. This catch-up time seem to vary according to programmed PPS period time as per the HW design. In my experiments, the delay ranged between few tens of seconds to few minutes. The PPS output is only generated after the Target time register reaches current time. In my experiments, I also observed PPS stopped working with below test and could not recover until module is removed and loaded again. 1) echo 0 <future time> 0 1 0 > /sys/class/ptp/ptp1/period 2) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period 3) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period After this PPS did not work even if i re-program with proper values. I could only get this back working by reloading the driver. This patch takes care of calculating and programming appropriate future time value into Target Time registers. Fixes: `5e91c72e56` ("igc: Fix PPS delta between two synchronized end-points") Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 11:18:56 -07:00
Tan Tee Min	25102893e4	igc: Include the length/type field and VLAN tag in queueMaxSDU IEEE 802.1Q does not have clear definitions of what constitutes an SDU (Service Data Unit), but IEEE Std 802.3 clause 3.1.2 does define the MAC service primitives and clause 3.2.7 does define the MAC Client Data for Q-tagged frames. It shows that the mac_service_data_unit (MSDU) does NOT contain the preamble, destination and source address, or FCS. The MSDU does contain the length/type field, MAC client data, VLAN tag and any padding data (prior to the FCS). Thus, the maximum 802.3 frame size that is allowed to be transmitted should be QueueMaxSDU (MSDU) + 16 (6 byte SA + 6 byte DA + 4 byte FCS). Fixes: `92a0dcb842` ("igc: offload queue max SDU from tc-taprio") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 11:18:56 -07:00
Prasad Koya	9ac3fc2f42	igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings set TP bit in the 'supported' and 'advertising' fields. i225/226 parts only support twisted pair copper. Fixes: `8c5ad0dae9` ("igc: Add ethtool support") Signed-off-by: Prasad Koya <prasad@arista.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 11:18:43 -07:00
Yinjun Zhang	cc7eab25b1	nfp: clean mc addresses in application firmware when closing port When moving devices from one namespace to another, mc addresses are cleaned in software while not removed from application firmware. Thus the mc addresses are remained and will cause resource leak. Now use `__dev_mc_unsync` to clean mc addresses when closing port. Fixes: `e20aa071cd` ("nfp: fix schedule in atomic context when sync mc address") Cc: stable@vger.kernel.org Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Message-ID: <20230705052818.7122-1-louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-05 10:59:12 -07:00
Dragos Tatulea	7abd955a58	net/mlx5e: RX, Fix page_pool page fragment tracking for XDP Currently mlx5e releases pages directly to the page_pool for XDP_TX and does page fragment counting for XDP_REDIRECT. RX pages from the page_pool are leaking on XDP_REDIRECT because the xdp core will release only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page is marked as "skip release" which avoids the driver release. A fix would be to take an extra fragment for XDP_REDIRECT and not set the "skip release" bit so that the release on the driver side can handle the remaining bias fragments. But this would be a shortsighted solution. Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to always do fragment tracking. The "skip release" bit is no longer necessary for XDP. Fixes: `6f57428460` ("net/mlx5e: RX, Enable skb page recycling through the page_pool") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:04 -07:00
Maher Sanalla	6496357aa5	net/mlx5: Query hca_cap_2 only when supported On vport enable, where fw's hca caps are queried, the driver queries hca_caps_2 without checking if fw truly supports them, causing a false failure of vfs vport load and blocking SRIOV enablement on old devices such as CX4 where hca_caps_2 support is missing. Thus, add a check for the said caps support before accessing them. Fixes: `e5b9642a33` ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:04 -07:00
Yevgeny Kliteynik	f7a485115a	net/mlx5e: TC, CT: Offload ct clear only once Non-clear CT action causes a flow rule split, while CT clear action doesn't and is just a header-rewrite to the current flow rule. But ct offload is done in post_parse and is per ct action instance, so ct clear offload is parsed multiple times, while its deleted once. Fix this by post_parsing the ct action only once per flow attribute (which is per flow rule) by using a offloaded ct_attr flag. Fixes: `08fe94ec5f` ("net/mlx5e: TC, Remove special handling of CT action") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:04 -07:00
Vlad Buslov	65e64640e9	net/mlx5e: Check for NOT_READY flag state after locking Currently the check for NOT_READY flag is performed before obtaining the necessary lock. This opens a possibility for race condition when the flow is concurrently removed from unready_flows list by the workqueue task, which causes a double-removal from the list and a crash[0]. Fix the issue by moving the flag check inside the section protected by uplink_priv->unready_flows_lock mutex. [0]: [44376.389654] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP [44376.391665] CPU: 7 PID: 59123 Comm: tc Not tainted 6.4.0-rc4+ #1 [44376.392984] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [44376.395342] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.396857] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06 [44376.399167] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246 [44376.399680] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00 [44376.400337] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0 [44376.401001] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001 [44376.401663] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000 [44376.402342] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000 [44376.402999] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000 [44376.403787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44376.404343] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0 [44376.405004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44376.405665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [44376.406339] Call Trace: [44376.406651] <TASK> [44376.406939] ? die_addr+0x33/0x90 [44376.407311] ? exc_general_protection+0x192/0x390 [44376.407795] ? asm_exc_general_protection+0x22/0x30 [44376.408292] ? mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.408876] __mlx5e_tc_del_fdb_peer_flow+0xbc/0xe0 [mlx5_core] [44376.409482] mlx5e_tc_del_flow+0x42/0x210 [mlx5_core] [44376.410055] mlx5e_flow_put+0x25/0x50 [mlx5_core] [44376.410529] mlx5e_delete_flower+0x24b/0x350 [mlx5_core] [44376.411043] tc_setup_cb_reoffload+0x22/0x80 [44376.411462] fl_reoffload+0x261/0x2f0 [cls_flower] [44376.411907] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core] [44376.412481] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core] [44376.413044] tcf_block_playback_offloads+0x76/0x170 [44376.413497] tcf_block_unbind+0x7b/0xd0 [44376.413881] tcf_block_setup+0x17d/0x1c0 [44376.414269] tcf_block_offload_cmd.isra.0+0xf1/0x130 [44376.414725] tcf_block_offload_unbind+0x43/0x70 [44376.415153] __tcf_block_put+0x82/0x150 [44376.415532] ingress_destroy+0x22/0x30 [sch_ingress] [44376.415986] qdisc_destroy+0x3b/0xd0 [44376.416343] qdisc_graft+0x4d0/0x620 [44376.416706] tc_get_qdisc+0x1c9/0x3b0 [44376.417074] rtnetlink_rcv_msg+0x29c/0x390 [44376.419978] ? rep_movs_alternative+0x3a/0xa0 [44376.420399] ? rtnl_calcit.isra.0+0x120/0x120 [44376.420813] netlink_rcv_skb+0x54/0x100 [44376.421192] netlink_unicast+0x1f6/0x2c0 [44376.421573] netlink_sendmsg+0x232/0x4a0 [44376.421980] sock_sendmsg+0x38/0x60 [44376.422328] ____sys_sendmsg+0x1d0/0x1e0 [44376.422709] ? copy_msghdr_from_user+0x6d/0xa0 [44376.423127] ___sys_sendmsg+0x80/0xc0 [44376.423495] ? ___sys_recvmsg+0x8b/0xc0 [44376.423869] __sys_sendmsg+0x51/0x90 [44376.424226] do_syscall_64+0x3d/0x90 [44376.424587] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [44376.425046] RIP: 0033:0x7f045134f887 [44376.425403] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [44376.426914] RSP: 002b:00007ffd63a82b98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [44376.427592] RAX: ffffffffffffffda RBX: 000000006481955f RCX: 00007f045134f887 [44376.428195] RDX: 0000000000000000 RSI: 00007ffd63a82c00 RDI: 0000000000000003 [44376.428796] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [44376.429404] R10: 00007f0451208708 R11: 0000000000000246 R12: 0000000000000001 [44376.430039] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400 [44376.430644] </TASK> [44376.430907] Modules linked in: mlx5_ib mlx5_core act_mirred act_tunnel_key cls_flower vxlan dummy sch_ingress openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_g ss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] [44376.433936] ---[ end trace 0000000000000000 ]--- [44376.434373] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.434951] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06 [44376.436452] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246 [44376.436924] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00 [44376.437530] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0 [44376.438179] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001 [44376.438786] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000 [44376.439393] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000 [44376.439998] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000 [44376.440714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44376.441225] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0 [44376.441843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44376.442471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: `ad86755b18` ("net/mlx5e: Protect unready flows with dedicated lock") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:03 -07:00
Saeed Mahameed	631079e08a	net/mlx5: Register a unique thermal zone per device Prior to this patch only one "mlx5" thermal zone could have been registered regardless of the number of individual mlx5 devices in the system. To fix this setup a unique name per device to register its own thermal zone. In order to not register a thermal zone for a virtual device (VF/SF) add a check for PF device type. The new name is a concatenation between "mlx5_" and "<PCI_DEV_BDF>", which will also help associating a thermal zone with its PCI device. $ lspci \| grep ConnectX 00:04.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] 00:05.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] $ cat /sys/devices/virtual/thermal/thermal_zone0/type mlx5_0000:00:04.0 $ cat /sys/devices/virtual/thermal/thermal_zone1/type mlx5_0000:00:05.0 Fixes: `c1fef618d6` ("net/mlx5: Implement thermal zone") CC: Sandipan Patra <spatra@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:03 -07:00
Dragos Tatulea	2e2d196579	net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK close. If the same regular RQ is closed (a config change for example) soon after the XSK close, a double release occurs because the missing wqes get released a second time. Fixes: `3f93f82988` ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:03 -07:00
Zhengchao Shao	d543b649ff	net/mlx5e: fix memory leak in mlx5e_ptp_open When kvzalloc_node or kvzalloc failed in mlx5e_ptp_open, the memory pointed by "c" or "cparams" is not freed, which can lead to a memory leak. Fix by freeing the array in the error path. Fixes: `145e5637d9` ("net/mlx5e: Add TX PTP port object support") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:03 -07:00
Zhengchao Shao	3250affdc6	net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create The memory pointed to by the fs->any pointer is not freed in the error path of mlx5e_fs_tt_redirect_any_create, which can lead to a memory leak. Fix by freeing the memory in the error path, thereby making the error path identical to mlx5e_fs_tt_redirect_any_destroy(). Fixes: `0f575c20bf` ("net/mlx5e: Introduce Flow Steering ANY API") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:03 -07:00
Zhengchao Shao	884abe45a9	net/mlx5e: fix double free in mlx5e_destroy_flow_table In function accel_fs_tcp_create_groups(), when the ft->g memory is successfully allocated but the 'in' memory fails to be allocated, the memory pointed to by ft->g is released once. And in function accel_fs_tcp_create_table, mlx5e_destroy_flow_table is called to release the memory pointed to by ft->g again. This will cause double free problem. Fixes: `c062d52ac2` ("net/mlx5e: Receive flow steering framework for accelerated TCP flows") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-07-05 10:57:02 -07:00
Muhammad Husaini Zulkifli	175c241288	igc: Fix TX Hang issue when QBV Gate is closed If a user schedules a Gate Control List (GCL) to close one of the QBV gates while also transmitting a packet to that closed gate, TX Hang will be happen. HW would not drop any packet when the gate is closed and keep queuing up in HW TX FIFO until the gate is re-opened. This patch implements the solution to drop the packet for the closed gate. This patch will also reset the adapter to perform SW initialization for each 1st Gate Control List (GCL) to avoid hang. This is due to the HW design, where changing to TSN transmit mode requires SW initialization. Intel Discrete I225/6 transmit mode cannot be changed when in dynamic mode according to Software User Manual Section 7.5.2.1. Subsequent Gate Control List (GCL) operations will proceed without a reset, as they already are in TSN Mode. Step to reproduce: DUT: 1) Configure GCL List with certain gate close. BASE=$(date +%s%N) tc qdisc replace dev $IFACE parent root handle 100 taprio \ num_tc 4 \ map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 \ base-time $BASE \ sched-entry S 0x8 500000 \ sched-entry S 0x4 500000 \ flags 0x2 2) Transmit the packet to closed gate. You may use udp_tai application to transmit UDP packet to any of the closed gate. ./udp_tai -i <interface> -P 100000 -p 90 -c 1 -t <0/1> -u 30004 Fixes: `ec50a9d437` ("igc: Add support for taprio offloading") Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Chwee Lin Choong <chwee.lin.choong@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 10:39:48 -07:00
Muhammad Husaini Zulkifli	cca28ceac7	igc: Remove delay during TX ring configuration Remove unnecessary delay during the TX ring configuration. This will cause delay, especially during link down and link up activity. Furthermore, old SKUs like as I225 will call the reset_adapter to reset the controller during TSN mode Gate Control List (GCL) setting. This will add more time to the configuration of the real-time use case. It doesn't mentioned about this delay in the Software User Manual. It might have been ported from legacy code I210 in the past. Fixes: `13b5b7fd6a` ("igc: Add support for Tx/Rx rings") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 10:21:33 -07:00
Muhammad Husaini Zulkifli	ed89b74d2d	igc: Add condition for qbv_config_change_errors counter Add condition to increase the qbv counter during taprio qbv configuration only. There might be a case when TC already been setup then user configure the ETF/CBS qdisc and this counter will increase if no condition above. Fixes: `ae4fe46983` ("igc: Add qbv_config_change_errors counter") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 10:21:33 -07:00
Sridhar Samudrala	479cdfe388	ice: Fix tx queue rate limit when TCs are configured Configuring tx_maxrate via sysfs interface /sys/class/net/eth0/queues/tx-1/tx_maxrate was not working when TCs are configured because always main VSI was being used. Fix by using correct VSI in ice_set_tx_maxrate when TCs are configured. Fixes: `1ddef455f4` ("ice: Add NDO callback to set the maximum per-queue bitrate") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 09:29:37 -07:00
Sridhar Samudrala	5f16da6ee6	ice: Fix max_rate check while configuring TX rate limits Remove incorrect check in ice_validate_mqprio_opt() that limits filter configuration when sum of max_rates of all TCs exceeds the link speed. The max rate of each TC is unrelated to value used by other TCs and is valid as long as it is less than link speed. Fixes: `fbc7b27af0` ("ice: enable ndo_setup_tc support for mqprio_qdisc") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2023-07-05 09:22:02 -07:00
Hariprasad Kelam	14bb236b29	octeontx-af: fix hardware timestamp configuration MAC block on CN10K (RPM) supports hardware timestamp configuration. The previous patch which added timestamp configuration support has a bug. Though the netdev driver requests to disable timestamp configuration, the driver is always enabling it. This patch fixes the same. Fixes: `d148920868` ("octeontx2-af: cn10k: RPM hardware timestamp configuration") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-04 19:52:56 +01:00
Dan Carpenter	90a8007bbe	mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check The mlxsw_sp_crif_alloc() function returns NULL on error. It doesn't return error pointers. Fix the check. Fixes: `78126cfd5d` ("mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-04 19:38:37 +01:00
Hariprasad Kelam	2e3e94c2f5	octeontx2-af: Reset MAC features in FLR AF driver configures MAC features like internal loopback and PFC upon receiving the request from PF and its VF netdev. But these features are not getting reset in FLR. This patch fixes the issue by resetting the same. Fixes: `23999b30ae` ("octeontx2-af: Enable or disable CGX internal loopback") Fixes: `1121f6b02e` ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-02 15:47:19 +01:00
Hariprasad Kelam	79ebb53772	octeontx2-af: Add validation before accessing cgx and lmac with the addition of new MAC blocks like CN10K RPM and CN10KB RPM_USX, LMACs are noncontiguous and CGX blocks are also noncontiguous. But during RVU driver initialization, the driver is assuming they are contiguous and trying to access cgx or lmac with their id which is resulting in kernel panic. This patch fixes the issue by adding proper checks. [ 23.219150] pc : cgx_lmac_read+0x38/0x70 [ 23.219154] lr : rvu_program_channels+0x3f0/0x498 [ 23.223852] sp : ffff000100d6fc80 [ 23.227158] x29: ffff000100d6fc80 x28: ffff00010009f880 x27: 000000000000005a [ 23.234288] x26: ffff000102586768 x25: 0000000000002500 x24: fffffffffff0f000 Fixes: `91c6945ea1` ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-02 15:47:19 +01:00
Hariprasad Kelam	2e7bc57b97	octeontx2-af: Fix mapping for NIX block from CGX connection Firmware configures NIX block mapping for all MAC blocks. The current implementation reads the configuration and creates the mapping between RVU PF and NIX blocks. But this configuration is only valid for silicons that support multiple blocks. For all other silicons, all MAC blocks map to NIX0. This patch corrects the mapping by adding a check for the same. Fixes: `c5a73b632b` ("octeontx2-af: Map NIX block from CGX connection") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-02 15:47:18 +01:00
Hariprasad Kelam	4c5a331cac	octeontx2-af: cn10kb: fix interrupt csr addresses The current design is that, for asynchronous events like link_up and link_down firmware raises the interrupt to kernel. The previous patch which added RPM_USX driver has a bug where it uses old csr addresses for configuring interrupts. Which is resulting in losing interrupts from source firmware. This patch fixes the issue by correcting csr addresses. Fixes: `b9d0fedc62` ("octeontx2-af: cn10kb: Add RPM_USX MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-02 15:47:18 +01:00
Linus Torvalds	9070577ae9	pci-v6.5-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmSdtQYUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vxp7A/+KmoBOm9ytiM2HPiq6pmHiJ9zF/DP 0jvKqDlc0BkmCyRw+/woxA5ZQgDnIYXxxt31toeSu+n31p6AR4wZ14Q5HBahABMw O/NUXmLAhYaczcp8hK4lS4Opz65+MaDHomu5fNuD7j7CagIwu20MegSEoyo35YC1 nDRN0IVYRRy/58wW509deQi/3U04TuC3kflc/iBToa/34g77L9ESoxpsZuAzo7wI nc9DF28H6PkuOtnp26iVufqkeYD3wfE5VAtaIZhyO+/WkhcTTUU5JB2hgFSACr0G qJTofncvGQrRYTNS7aIYPVrtTZpSMmPS08tgZc+iDkTr8iKth1jcPf3YHLpenQzx 2B197BUOLENOiWJFPIAe2TAgoGYYBgKhnnwcPHCHoinvVLTO82wUUW5qfn/GckgY WNYmS4PofkjlbJH3JdyHdH+vsL0VRzAmkhH+k6F+V02T78Lk+QdXKegLbr5yzRwh YF/GjX0JYjnONQeL1LQQ/4hfiA1VzmoXbHvXc0XJew6d/RYMon5G5xBAAZ8tnUHC PX6WMd/CG8RBxFv7IsPh6hTGKEXw4/1ElynPVP/ZEBGVelHda4Sm4G5nJ6/r2cwK VICfsSb9Nw76STGHpVeQB2O2Y/yZ1xIwWRiAsCndsYOlnGi+knRVxH5UH/1aQPOE N3DsyT0s/sZaJvc= =xpTd -----END PGP SIGNATURE----- Merge tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Export pcie_retrain_link() for use outside ASPM - Add Data Link Layer Link Active Reporting as another way for pcie_retrain_link() to determine the link is up - Work around link training failures (especially on the ASMedia ASM2824 switch) by training first at 2.5GT/s and then attempting higher rates Resource management: - When we coalesce host bridge windows, remove invalidated resources from the resource tree so future allocations work correctly Hotplug: - Cancel bringup sequence if card is not present, to keep from blinking Power Indicator indefinitely - Reassign bridge resources if necessary for ACPI hotplug Driver binding: - Convert platform_device .remove() callbacks to return void instead of a mostly useless int Power management: - Reduce wait time for secondary bus to be ready to speed up resume - Avoid putting EloPOS E2/S2/H2 (as well as Elo i2) PCIe Ports in D3cold - Call _REG when transitioning D-states so AML that uses the PCI config space OpRegion works, which fixes some ASMedia GPIO controllers after resume Virtualization: - Delay extra 250ms after FLR of Solidigm P44 Pro NVMe to avoid KVM hang when guest is rebooted - Add function 1 DMA alias quirk for Marvell 88SE9235 Error handling: - Unexport pci_save_aer_state() since it's only used in drivers/pci/ - Drop recommendation for drivers to configure AER Capability, since the PCI core does this for all devices ASPM: - Disable ASPM on MFD function removal to avoid use-after-free - Tighten up pci_enable_link_state() and pci_disable_link_state() interfaces so they don't enable/disable states the driver didn't specify - Avoid link retraining race that can happen if ASPM sets link control parameters while the link is in the midst of training for some other reason Endpoint framework: - Change "PCI Endpoint Virtual NTB driver" Kconfig prompt to be different from "PCI Endpoint NTB driver" - Automatically create a function specific attributes group for endpoint drivers to avoid reference counting issues - Fix many EPC test issues - Return pci_epf_type_add_cfs() error if EPF has no driver - Add kernel-doc for pci_epc_raise_irq() and pci_epc_map_msi_irq() MSI vector parameters - Pass EPF device ID to driver probe functions - Return -EALREADY if EPC has already been started/stopped - Add linkdown notifier support and use it in qcom-ep - Add Bus Master Enable event support and use it in qcom-ep - Add Qualcomm Modem Host Interface (MHI) endpoint driver - Add Layerscape PME interrupt handling to manage link-up notification Cadence PCIe controller driver: - Wait for link retrain to complete when working around the J721E i2085 erratum with Gen2 mode Faraday FTPC100 PCI controller driver: - Release clock resources on error paths Freescale i.MX6 PCIe controller driver: - Save and restore Root Port MSI control to work around hardware defect Intel VMD host bridge driver: - Reset VMD config register between soft reboots - Capture pci_reset_bus() return value instead of printing junk when it fails Qualcomm PCIe controller driver: - Add SDX65 endpoint compatible string to DT binding - Disable register write access after init for IP v2.3.3, v2.9.0 - Use DWC helpers for enabling/disabling writes to DBI registers - Hide slot hotplug capability for IP v1.0.0, v1.9.0, v2.1.0, v2.3.2, v2.3.3, v2.7.0, v2.9.0 - Reuse v2.3.2 post-init sequence for v2.4.0 Renesas R-Car PCIe controller driver: - Remove unused static pcie_base and pcie_dev Rockchip PCIe controller driver: - Remove writes to unused registers - Write endpoint Device ID using correct register - Assert PCI Configuration Enable bit after probe so endpoint responds instead of generating Request Retry Status messages - Poll waiting for PHY PLLs to lock - Update RK3399 example DT binding to be valid - Use RK3399 PCIE_CLIENT_LEGACY_INT_CTRL to generate INTx instead of manually generating PCIe message - Use multiple windows to avoid address translation conflicts - Use u32 (not u16) when accessing 32-bit registers - Hide MSI-X Capability, since RK3399 can't generate MSI-X - Set endpoint controller required alignment to 256 Synopsys DesignWare PCIe controller driver: - Wait for link to come up only if we've initiated link training Miscellaneous: - Add pci_clear_master() stub for non-CONFIG_PCI" * tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits) Documentation: PCI: correct spelling PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain() PCI: xgene-msi: Convert to platform remove callback returning void PCI: tegra: Convert to platform remove callback returning void PCI: rockchip-host: Convert to platform remove callback returning void PCI: mvebu: Convert to platform remove callback returning void PCI: mt7621: Convert to platform remove callback returning void PCI: mediatek-gen3: Convert to platform remove callback returning void PCI: mediatek: Convert to platform remove callback returning void PCI: iproc: Convert to platform remove callback returning void PCI: hisi-error: Convert to platform remove callback returning void PCI: dwc: Convert to platform remove callback returning void PCI: j721e: Convert to platform remove callback returning void PCI: brcmstb: Convert to platform remove callback returning void PCI: altera-msi: Convert to platform remove callback returning void PCI: altera: Convert to platform remove callback returning void PCI: aardvark: Convert to platform remove callback returning void PCI: rcar: Use correct product family name for Renesas R-Car PCI: layerscape: Add the endpoint linkup notifier support PCI: endpoint: pci-epf-vntb: Fix typo in comments ...	2023-06-30 15:06:45 -07:00
Linus Torvalds	7ede5f78a0	v6.5 merge window RDMA pull request This cycle saw a focus on rxe and bnxt_re drivers: - Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma - rxe uses workqueues instead of tasklets - rxe has better compliance around access checks for MRs and rereg_mr - mana supportst he 'v2' FW interface for RX coalescing - hfi1 bug fix for stale cache entries in its MR cache - mlx5 buf fix to handle FW failures when destroying QPs - erdma HW has a new doorbell allocation mechanism for uverbs that is secure - Lots of small cleanups and rework in bnxt_re * Use the common mmap functions * Support disassociation * Improve FW command flow - bnxt_re support for "low latency push", this allows a packet -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZJxq+wAKCRCFwuHvBreF YSN+AQDKAmWdKCqmc3boMIq5wt4h7yYzdW47LpzGarOn5Hf+UgEA6mpPJyRqB43C CNXYIbASl/LLaWzFvxCq/AYp6tzuog4= =I8ju -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This cycle saw a focus on rxe and bnxt_re drivers: - Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma - rxe uses workqueues instead of tasklets - rxe has better compliance around access checks for MRs and rereg_mr - mana supportst he 'v2' FW interface for RX coalescing - hfi1 bug fix for stale cache entries in its MR cache - mlx5 buf fix to handle FW failures when destroying QPs - erdma HW has a new doorbell allocation mechanism for uverbs that is secure - Lots of small cleanups and rework in bnxt_re: - Use the common mmap functions - Support disassociation - Improve FW command flow - support for 'low latency push'" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/bnxt_re: Fix an IS_ERR() vs NULL check RDMA/bnxt_re: Fix spelling mistake "priviledged" -> "privileged" RDMA/bnxt_re: Remove duplicated include in bnxt_re/main.c RDMA/bnxt_re: Refactor code around bnxt_qplib_map_rc() RDMA/bnxt_re: Remove incorrect return check from slow path RDMA/bnxt_re: Enable low latency push RDMA/bnxt_re: Reorg the bar mapping RDMA/bnxt_re: Move the interface version to chip context structure RDMA/bnxt_re: Query function capabilities from firmware RDMA/bnxt_re: Optimize the bnxt_re_init_hwrm_hdr usage RDMA/bnxt_re: Add disassociate ucontext support RDMA/bnxt_re: Use the common mmap helper functions RDMA/bnxt_re: Initialize opcode while sending message RDMA/cma: Remove NULL check before dev_{put, hold} RDMA/rxe: Simplify cq->notify code RDMA/rxe: Fixes mr access supported list RDMA/bnxt_re: optimize the parameters passed to helper functions RDMA/bnxt_re: remove redundant cmdq_bitmap RDMA/bnxt_re: use firmware provided max request timeout RDMA/bnxt_re: cancel all control path command waiters upon error ...	2023-06-29 21:01:17 -07:00
Zhengchao Shao	08fc75735f	mlxsw: minimal: fix potential memory leak in mlxsw_m_linecards_init The line cards array is not freed in the error path of mlxsw_m_linecards_init(), which can lead to a memory leak. Fix by freeing the array in the error path, thereby making the error path identical to mlxsw_m_linecards_fini(). Fixes: `01328e23a4` ("mlxsw: minimal: Extend module to port mapping with slot index") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230630012647.1078002-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-29 19:10:07 -07:00
Linus Torvalds	e4c8d01865	ARM: SoC drivers for 6.5 Nothing surprising in the SoC specific drivers, with the usual updates: * Added or improved SoC driver support for Tegra234, Exynos4121, RK3588, as well as multiple Mediatek and Qualcomm chips * SCMI firmware gains support for multiple SMC/HVC transport and version 3.2 of the protocol * Cleanups amd minor changes for the reset controller, memory controller, firmware and sram drivers * Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm, amlogic and renesas SoC specific drivers -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSdmbIACgkQYKtH/8kJ UicewQ/6Aq8j5pBFYBimZoyQ0bi9z+prGrHoDDYLew2vKjtOXJl5z7ZnM3J1oyPt Zvis3IaGkHJCuuqotPdsquZrzHq8slzXzwkHPfHORJBC4gV0V/vMS8w32tO5FfTq ULrMyWnbsU7Udeywc2xuEpAoC9+bXX9brnCpa3H41peIGZKM+0g7EE6FASt3YaOk O+ZMSGqF8QbCqSQrUH3GudFlFMy/VxIvwuUsbLt8aNkRACunQZXVgUdArvLV49nX SElFN7hOVRoVDv0rgYMxlwElymrta/kMyjLba8GU1GIhzyDGozVqIJQAnsQ3f6CC yyzaJm27zzJH0mx9jx4W+JLBdjqDL4ctE2WyllRVIpTGYMHiMQtutHNwtNupIuD5 j9j/fIVQWZqOdWXnA6V/CHYN1MZBRTH3KQcnLlYPC01dWKThPDnrHGfwOkfsrwtN zuERJJ+gd5b8KW4dmy1ueDOSB8162LxbS7iHxpOBGySmqVOYj3XUqACZhKRfXfIQ BVj9punCE/gO2fMb9IZByjeOzgtV+PBRmPxoglyaGkT4fVfL06kEbpKFYbXXq9b/ aAS/U84gGr8ebWsOXszwDnBzTZRzjMVv/T9KDTTJuWbBEPNyCR7fUG0cZ50rSKnJ 2cTPe3a0sS6LaBt71qfExCIfxG+cJ2c3N1U5/jb2C49Aob45obs= =zvLr -----END PGP SIGNATURE----- Merge tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "Nothing surprising in the SoC specific drivers, with the usual updates: - Added or improved SoC driver support for Tegra234, Exynos4121, RK3588, as well as multiple Mediatek and Qualcomm chips - SCMI firmware gains support for multiple SMC/HVC transport and version 3.2 of the protocol - Cleanups amd minor changes for the reset controller, memory controller, firmware and sram drivers - Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm, amlogic and renesas SoC specific drivers" * tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (118 commits) dt-bindings: interrupt-controller: Convert Amlogic Meson GPIO interrupt controller binding MAINTAINERS: add PHY-related files to Amlogic SoC file list drivers: meson: secure-pwrc: always enable DMA domain tee: optee: Use kmemdup() to replace kmalloc + memcpy soc: qcom: geni-se: Do not bother about enable/disable of interrupts in secondary sequencer dt-bindings: sram: qcom,imem: document qdu1000 soc: qcom: icc-bwmon: Fix MSM8998 count unit dt-bindings: soc: qcom,rpmh-rsc: Require power-domains soc: qcom: socinfo: Add Soc ID for IPQ5300 dt-bindings: arm: qcom,ids: add SoC ID for IPQ5300 soc: qcom: Fix a IS_ERR() vs NULL bug in probe soc: qcom: socinfo: Add support for new fields in revision 19 soc: qcom: socinfo: Add support for new fields in revision 18 dt-bindings: firmware: scm: Add compatible for SDX75 soc: qcom: mdt_loader: Fix split image detection dt-bindings: memory-controllers: drop unneeded quotes soc: rockchip: dtpm: use C99 array init syntax firmware: tegra: bpmp: Add support for DRAM MRQ GSCs soc/tegra: pmc: Use devm_clk_notifier_register() soc/tegra: pmc: Simplify debugfs initialization ...	2023-06-29 15:22:19 -07:00
Nick Child	48538ccb82	ibmvnic: Do not reset dql stats on NON_FATAL err All ibmvnic resets, make a call to netdev_tx_reset_queue() when re-opening the device. netdev_tx_reset_queue() resets the num_queued and num_completed byte counters. These stats are used in Byte Queue Limit (BQL) algorithms. The difference between these two stats tracks the number of bytes currently sitting on the physical NIC. ibmvnic increases the number of queued bytes though calls to netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports that it is done transmitting bytes, the ibmvnic device increases the number of completed bytes through calls to netdev_tx_completed_queue(). It is important to note that the driver batches its transmit calls and num_queued is increased every time that an skb is added to the next batch, not necessarily when the batch is sent to VIOS for transmission. Unlike other reset types, a NON FATAL reset will not flush the sub crq tx buffers. Therefore, it is possible for the batched skb array to be partially full. So if there is call to netdev_tx_reset_queue() when re-opening the device, the value of num_queued (0) would not account for the skb's that are currently batched. Eventually, when the batch is sent to VIOS, the call to netdev_tx_completed_queue() would increase num_completed to a value greater than the num_queued. This causes a BUG_ON crash: ibmvnic 30000002: Firmware reports error, cause: adapter problem. Starting recovery... ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:27! Oops: Exception in kernel mode, sig: 5 [....] NIP dql_completed+0x28/0x1c0 LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] Call Trace: ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] __handle_irq_event_percpu+0x98/0x270 ---[ end trace ]--- Therefore, do not reset the dql stats when performing a NON_FATAL reset. Fixes: `0d97338818` ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-29 11:12:19 -07:00
Martin Habets	915057ae79	sfc: support for devlink port requires MAE access On systems without MAE permission efx->mae is not initialised, and trying to lookup an mport results in a NULL pointer dereference. Fixes: `25414b2a64` ("sfc: add devlink port support for ef100") Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-29 11:10:27 -07:00
Tobias Heider	046f753da6	Add MODULE_FIRMWARE() for FIRMWARE_TG357766. Fixes a bug where on the M1 mac mini initramfs-tools fails to include the necessary firmware into the initrd. Fixes: `c4dab50697` ("tg3: Download 57766 EEE service patch firmware") Signed-off-by: Tobias Heider <me@tobhe.de> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/ZJt7LKzjdz8+dClx@tobhe.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-29 09:47:12 -07:00
Vladimir Oltean	45d0fcb5bc	net: mscc: ocelot: don't keep PTP configuration of all ports in single structure In a future change, the driver will need to determine whether PTP RX timestamping is enabled on a port (including whether traps were set up on that port in particular) and that is currently not possible. The driver supports different RX filters (L2, L4) and kinds of TX timestamping (one-step, two-step) on its ports, but it saves all configuration in a single struct hwtstamp_config that is global to the switch. So, the latest timestamping configuration on one port (including a request to disable timestamping) affects what gets reported for all ports, even though the configuration itself is still individual to each port. The port timestamping configurations are only coupled because of the common structure, so replace the hwtstamp_config with a mask of trapped protocols saved per port. We also have the ptp_cmd to distinguish between one-step and two-step PTP timestamping, so with those 2 bits of information we can fully reconstruct a descriptive struct hwtstamp_config for each port, during the SIOCGHWTSTAMP ioctl. Fixes: `4e3b0468e6` ("net: mscc: PTP Hardware Clock (PHC) support") Fixes: `96ca08c058` ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-29 12:40:27 +02:00
Vladimir Oltean	4fd44b82b7	net: mscc: ocelot: don't report that RX timestamping is enabled by default PTP RX timestamping should be enabled when the user requests it, not by default. If it is enabled by default, it can be problematic when the ocelot driver is a DSA master, and it sidesteps what DSA tries to avoid through __dsa_master_hwtstamp_validate(). Additionally, after the change which made ocelot trap PTP packets only to the CPU at ocelot_hwtstamp_set() time, it is no longer even true that RX timestamping is enabled by default, because until ocelot_hwtstamp_set() is called, the PTP traps are actually not set up. So the rx_filter field of ocelot->hwtstamp_config reflects an incorrect reality. Fixes: `96ca08c058` ("net: mscc: ocelot: set up traps for PTP packets") Fixes: `4e3b0468e6` ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-29 12:40:27 +02:00
Moritz Fischer	7a8227b2e7	net: lan743x: Don't sleep in atomic context dev_set_rx_mode() grabs a spin_lock, and the lan743x implementation proceeds subsequently to go to sleep using readx_poll_timeout(). Introduce a helper wrapping the readx_poll_timeout_atomic() function and use it to replace the calls to readx_polL_timeout(). Fixes: `23f0703c12` ("lan743x: Add main source files for new lan743x driver") Cc: stable@vger.kernel.org Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: Moritz Fischer <moritzf@google.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230627035000.1295254-1-moritzf@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-29 10:59:43 +02:00
Linus Torvalds	3a8a670eee	Networking changes for 6.5. Core ---- - Rework the sendpage & splice implementations. Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is. Remove the MSG_SENDPAGE_NOTLAST flag completely. - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid. - Enable socket busy polling with CONFIG_RT. - Improve reliability and efficiency of reporting for ref_tracker. - Auto-generate a user space C library for various Netlink families. Protocols --------- - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2]. - Use per-VMA locking for "page-flipping" TCP receive zerocopy. - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags. - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative. - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO). - Avoid waking up applications using TLS sockets until we have a full record. - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring. - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address. - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch. - PPPoE: make number of hash bits configurable. - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig). - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge). - Support matching on Connectivity Fault Management (CFM) packets. - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug. - HSR: don't enable promiscuous mode if device offloads the proto. - Support active scanning in IEEE 802.15.4. - Continue work on Multi-Link Operation for WiFi 7. BPF --- - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators. - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should be, without writing anything. - Accept dynptr memory as memory arguments passed to helpers. - Add routing table ID to bpf_fib_lookup BPF helper. - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands. - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only). - Show target_{obj,btf}_id in tracing link fdinfo. - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter --------- - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value. - Increase ip_vs_conn_tab_bits range for 64BIT builds. - Allow updating size of a set. - Improve NAT tuple selection when connection is closing. Driver API ---------- - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out). - Support configuring rate selection pins of SFP modules. - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines. - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer. - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio). - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message. - Split devlink instance and devlink port operations. New hardware / drivers ---------------------- - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers ------- - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/ fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/ JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4= =9i4I -----END PGP SIGNATURE----- Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...	2023-06-28 16:43:10 -07:00
Linus Torvalds	582c161cf3	hardening updates for v6.5-rc1 - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko) - Convert strreplace() to return string start (Andy Shevchenko) - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook) - Add missing function prototypes seen with W=1 (Arnd Bergmann) - Fix strscpy() kerndoc typo (Arne Welzel) - Replace strlcpy() with strscpy() across many subsystems which were either Acked by respective maintainers or were trivial changes that went ignored for multiple weeks (Azeem Shaikh) - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers) - Add KUnit tests for strcat()-family - Enable KUnit tests of FORTIFY wrappers under UML - Add more complete FORTIFY protections for strlcat() - Add missed disabling of FORTIFY for all arch purgatories. - Enable -fstrict-flex-arrays=3 globally - Tightening UBSAN_BOUNDS when using GCC - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays - Improve use of const variables in FORTIFY - Add requested struct_size_t() helper for types not pointers - Add __counted_by macro for annotating flexible array size members -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx 9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo 2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA 9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8 sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r 5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J cwANDmkL6QQK7yfeeg== =s0j1 -----END PGP SIGNATURE----- Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "There are three areas of note: A bunch of strlcpy()->strscpy() conversions ended up living in my tree since they were either Acked by maintainers for me to carry, or got ignored for multiple weeks (and were trivial changes). The compiler option '-fstrict-flex-arrays=3' has been enabled globally, and has been in -next for the entire devel cycle. This changes compiler diagnostics (though mainly just -Warray-bounds which is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_ coverage. In other words, there are no new restrictions, just potentially new warnings. Any new FORTIFY warnings we've seen have been fixed (usually in their respective subsystem trees). For more details, see commit `df8fc4e934`. The under-development compiler attribute __counted_by has been added so that we can start annotating flexible array members with their associated structure member that tracks the count of flexible array elements at run-time. It is possible (likely?) that the exact syntax of the attribute will change before it is finalized, but GCC and Clang are working together to sort it out. Any changes can be made to the macro while we continue to add annotations. As an example of that last case, I have a treewide commit waiting with such annotations found via Coccinelle: https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b Also see commit `dd06e72e68` for more details. Summary: - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko) - Convert strreplace() to return string start (Andy Shevchenko) - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook) - Add missing function prototypes seen with W=1 (Arnd Bergmann) - Fix strscpy() kerndoc typo (Arne Welzel) - Replace strlcpy() with strscpy() across many subsystems which were either Acked by respective maintainers or were trivial changes that went ignored for multiple weeks (Azeem Shaikh) - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers) - Add KUnit tests for strcat()-family - Enable KUnit tests of FORTIFY wrappers under UML - Add more complete FORTIFY protections for strlcat() - Add missed disabling of FORTIFY for all arch purgatories. - Enable -fstrict-flex-arrays=3 globally - Tightening UBSAN_BOUNDS when using GCC - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays - Improve use of const variables in FORTIFY - Add requested struct_size_t() helper for types not pointers - Add __counted_by macro for annotating flexible array size members" * tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits) netfilter: ipset: Replace strlcpy with strscpy uml: Replace strlcpy with strscpy um: Use HOST_DIR for mrproper kallsyms: Replace all non-returning strlcpy with strscpy sh: Replace all non-returning strlcpy with strscpy of/flattree: Replace all non-returning strlcpy with strscpy sparc64: Replace all non-returning strlcpy with strscpy Hexagon: Replace all non-returning strlcpy with strscpy kobject: Use return value of strreplace() lib/string_helpers: Change returned value of the strreplace() jbd2: Avoid printing outside the boundary of the buffer checkpatch: Check for 0-length and 1-element arrays riscv/purgatory: Do not use fortified string functions s390/purgatory: Do not use fortified string functions x86/purgatory: Do not use fortified string functions acpi: Replace struct acpi_table_slit 1-element array with flex-array clocksource: Replace all non-returning strlcpy with strscpy string: use __builtin_memcpy() in strlcpy/strlcat staging: most: Replace all non-returning strlcpy with strscpy drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy ...	2023-06-27 21:24:18 -07:00
Linus Torvalds	72dc6db7e3	workqueue: Ordered workqueue creation cleanups For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This pull request clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk= =hs79 -----END PGP SIGNATURE----- Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull ordered workqueue creation updates from Tejun Heo: "For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior" * tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues scsi: NCR5380: Use default @max_active for hostdata->work_q media: coda: Use alloc_ordered_workqueue() to create ordered workqueues crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues wifi: mwifiex: Use default @max_active for workqueues wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues greybus: Use alloc_ordered_workqueue() to create ordered workqueues powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues	2023-06-27 16:46:06 -07:00
Arnd Bergmann	356fa49759	NXP/FSL SoC driver updates for v6.5 - fsl-mc: Make remove function return void - QE USB: fix build issue caused by missing dependency -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEhb3UXAyxp6UQ0v6khtxQDvusFVQFAmSTduQACgkQhtxQDvus FVRsWg/+KIGaiTwEOvmhysMXE0Papwo+sdy/dOonvlIuPgO5S4MeisC9XFy5JqV9 qj9keqBVeB6dw9zqJIS6PI6OiOYZGrrkQIU5m/r36jqFBdUw0Ys8JAbZTFlFQa4X uwceN8jK1l0pdC+fmAf4SrVg9U1/3MvMZNsWKmvgfHKTSUNIalXvXI6uI2ohGBsr DV0/epPA/rvy6ffNQebXzudbXgPZL8p7V7LDUyCGxdsndIvZ25+JMZ63XQWbz/Qi B1Lkr9jiIn+GLDbemy846MK5tMDDJRLPss4Pk6rEbF+PDaWMhxMxfbRFHekPCbzV MrYrTRUrTNj4kz4J1QvkX0N6Kb2PbQsfNaN6w0lP/jdAgCRx+fbh1Q83PjMHp7ov ET3n0sX1JPmgAvMNRVNzbszX4vo5WlRvEC9nXyBUrTv/1jlt89oPtruiyCY2Ym0e nWv7UWUJuuiJIrSWVol3Egf9FHibfLGlLNwpYHZQnSAi53AXmqocIxiWEVvXUd6R 3xmAGEItrwojpV+DHUNGRVGUzDqHaPguzItnTuDTqXJ9lwKPk/PdO7bSHnewxNRi 9RQpP7AoU8qtRxf0i5aWCe5TjqQJeSEIN0t7jF6IZ8Is1rfaRfvIEDmML8WArSj6 nUh5LaM5Qvg9l2RMZoDa0n5hGsvpnIigRcx2dN6k46jyVG4lqeg= =cBX5 -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmSbTIoACgkQYKtH/8kJ UieZlA/+IOYfWVN7hTsZ22LUL5c0GWvUTsacD+wf2YlXC7ioJ0RSzL+yCdbYGbGw 62di/ZsiUFaXhYszNyAW/w3qFgal578OiyR79lkadESizJGexScejATPph/5J+JI XT5JDCXMwAYN78uoOpEDoTjC8Upv1mOoMZBwMCx1j3K3dVkkPI+/Fe9PuGRn3Njs Uc5OBtvNBVf0nulHupS3d52s6jWEL7TX6FZCdPyWQ2A4rYunze46cOXtBBEi+x9A +jZ4pR382zKFII2e2rtIp7P0q896vnLbuVaetEZUIrPKKFWANXfhoFmzl0aUQCgP Lqyui78CYFabKddkVWWd/JqQy8ZRzPWkkHtW6bW31DPZWzlOXjZds8tykP45ekeh Q6wTvqpZulgLgl7J8dE029go9N5hc2y/fuGExXJh6uaMUhpexY3OwBZ9DuJBAbwu mDpKLaDBGFfF21r903usumRArQPC/Y0V6cJSmb7Np9LnUhnef1jT2f1+cwL1Q7M5 5VhxsfJo4GhtCljZ8E3CZ4rrB3NWwggRDm71MNiY9UpLwqcZbYqdtEcvwdq37r0/ K8av+uzLWE/RTeA+1PXcERMXmylWrKy0BmcGYwjOkALje+v5NNgJwdwxG2U1yWcc VEjf/5bynMkNwNt3zTEkstN1chcafJTw4uqOJoqrh+55NGZHNpw= =E30d -----END PGP SIGNATURE----- Merge tag 'soc-fsl-next-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into soc/drivers NXP/FSL SoC driver updates for v6.5 - fsl-mc: Make remove function return void - QE USB: fix build issue caused by missing dependency * tag 'soc-fsl-next-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux: bus: fsl-mc: fsl-mc-allocator: Drop a write-only variable bus: fsl-mc: fsl-mc-allocator: Initialize mc_bus_dev before use soc/fsl/qe: fix usb.c build errors bus: fsl-mc: Make remove function return void soc: fsl: dpio: Suppress duplicated error reporting on device remove bus: fsl-mc: fsl-mc-allocator: Improve error reporting bus: fsl-mc: fsl-mc-allocator: Drop if block with always wrong condition bus: fsl-mc: dprc: Push down error message from fsl_mc_driver_remove() bus: fsl-mc: Only warn once about errors on device unbind Link: https://lore.kernel.org/r/20230621222503.12402-1-leoyang.li@nxp.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2023-06-27 22:54:34 +02:00
Jason Gunthorpe	5f004bcaee	Linux 6.4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSYzfYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ucH/iOM/1Py/fSg0qSs 7NJ4XXlourT5zrnRMom3cm3d9gYqgTzgvKFL3kjMEexTRVYbhlcO4ZPRsiry8zxF ToGX+V8tDMqb8WSdFHzkljRY+zDRyfEUDMlTzROAD9DunLmQtkJKyrggkeGdjkpP OyfGqKpwlLXZRAXBil/U8Mx9MHdjJubloZwghLZr33VdUZa68+JJ9l6w163Oe/ET K264NM0wxN/kvN57JvePgqMccQwpINylg8IhRI+XelgczjUXeJBsOA8TDv4bDN4Q bjCLhkWbIaZtTYqvOXa/kD0T8wd7KETsMBQN8YzyDh6W0GmAlJjTawyAhA6jA5in x3uz2W8= =L3zp -----END PGP SIGNATURE----- Merge tag 'v6.4' into rdma.git for-next Linux 6.4 Resolve conflicts between rdma rc and next in rxe_cq matching linux-next: drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-27 14:06:29 -03:00
Moritz Fischer	30ac666a2f	net: lan743x: Simplify comparison Simplify comparison, no functional changes. Cc: Bryan Whitehead <bryan.whitehead@microchip.com> Cc: UNGLinuxDriver@microchip.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Moritz Fischer <moritzf@google.com> Link: https://lore.kernel.org/r/20230627035432.1296760-1-moritzf@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:53:00 -07:00
Jakub Kicinski	3674fbf045	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge in late fixes to prepare for the 6.5 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:45:22 -07:00
Julia Lawall	e9c74f8b8a	net: mana: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-23-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Julia Lawall	fa87c54693	net: enetc: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-19-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Julia Lawall	f712c8297e	ionic: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-12-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Julia Lawall	906a76cc76	pds_core: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-10-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Julia Lawall	a13de901e8	gve: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-5-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Julia Lawall	32d462a5c3	octeon_ep: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-3-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Linus Torvalds	8d7868c41d	Thermal control updates for 6.5-rc1 - Add new IOCTLs to the int340x thermal driver to allow user space to retrieve the Passive v2 thermal table (Srinivas Pandruvada). - Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms (Konrad Dybcio). - Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki). - Add DT bindings for QCom ipq9574 (Praveenkumar I). - Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren). - Allow selecting the bang-bang governor as default (Thierry Reding). - Refactor and prepare the code to set the scene for RCar Gen4 (Wolfram Sang). - Clean up and fix the QCom tsens drivers. Add DT bindings and calibration for the MSM8909 platform (Stephan Gerhold). - Revert a patch introducing a wrong usage of devm_of_iomap() on the Mediatek platform (Ricardo Cañuelo). - Fix the clock vs reset ordering in order to conform to the documentation on the sun8i (Christophe JAILLET). - Prevent setting up undocumented registers, enable the only described sensors and add the version 2.1 on the Qoriq sensor (Peng Fan). - Add DT bindings and support for the Armada AP807 (Alex Leibovich). - Update the mlx5 driver with the recent thermal changes (Daniel Lezcano). - Convert to platform remove callback returning void on STM32 (Uwe Kleine-König). - Add an error information printing for devm_thermal_add_hwmon_sysfs() and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra, Qoriq, Mediateka and QCom (Yangtao Li). - Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai). - Use the dev_err_probe() function in the QCom tsens alarm driver (Luca Weiss). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmSZxuYSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxVicP/0Gl7fQc2l2/AEwe+FuwZz8xERhbLc4s kSXVIJ1B6GScefUmLBqDZoWokVqP4gXgjTnYdqZNJmnCo+hP9i6VzVjVXSI2eNe7 426cJd7la2K0zWcjsw/apdGa81Btnf0IjIMjeWiZOPVkRE6UmnHBwG52C5Hd+1pa EJaRcxv37MvrRKgzwUV+4WSBd5p1v3UrElzs10DXt+mVfRjsV7Y3bxnfW0+kh1sK 3J/AzPXx1v6e6mnKQQQAso0r1g22n/lfh9C7VQT01fXt704PLx1jNr4vtlGhCJkQ sRlodAx+6uSX2GyeVs9hKOPZees3lY9VcZOI3c4g0Sy2ad4Hl7QDRbMmMUiiEVNZ fmiIs26RbfeRvl2lNDEYxLxsBaMva4rerVuOlRjkKeXDboa/TyZ2/MwvaE/uaxT7 yWBoqoY67aPfyS8YgSu/DqRK60J+6PShaYeOJ4deYA6skctCe1v7nRJu07n62RBd JuPUUifiBb6jpxDPCRBHLxwUIMK9gzJIRM74ieZs/XSG56v4dJR2e8GmHZLoiLPV cc/uyHopQc75z7fYNDHdu7EHzUUGsjI6A9cNoCcdG86+b/pUEXTBLxpEuOv3bXfm DtpoS2O0PEJtblTmS4HFJJuJ3I1qUzJZgo/KmwmFEcXOfo2KEqVzcleSr1t+zAbR rCTcgVTaYpt4 =dtvf -----END PGP SIGNATURE----- Merge tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These extend the int340x thermal driver, add thermal DT bindings for some Qcom platforms, add DT bindings and support for Armada AP807 and MSM8909, allow selecting the bang-bang thermal governor as the default one, address issues in several thermal drivers for ARM platforms and clean up code. Specifics: - Add new IOCTLs to the int340x thermal driver to allow user space to retrieve the Passive v2 thermal table (Srinivas Pandruvada) - Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms (Konrad Dybcio) - Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki) - Add DT bindings for QCom ipq9574 (Praveenkumar I) - Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren) - Allow selecting the bang-bang governor as default (Thierry Reding) - Refactor and prepare the code to set the scene for RCar Gen4 (Wolfram Sang) - Clean up and fix the QCom tsens drivers. Add DT bindings and calibration for the MSM8909 platform (Stephan Gerhold) - Revert a patch introducing a wrong usage of devm_of_iomap() on the Mediatek platform (Ricardo Cañuelo) - Fix the clock vs reset ordering in order to conform to the documentation on the sun8i (Christophe JAILLET) - Prevent setting up undocumented registers, enable the only described sensors and add the version 2.1 on the Qoriq sensor (Peng Fan) - Add DT bindings and support for the Armada AP807 (Alex Leibovich) - Update the mlx5 driver with the recent thermal changes (Daniel Lezcano) - Convert to platform remove callback returning void on STM32 (Uwe Kleine-König) - Add an error information printing for devm_thermal_add_hwmon_sysfs() and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra, Qoriq, Mediateka and QCom (Yangtao Li) - Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai) - Use the dev_err_probe() function in the QCom tsens alarm driver (Luca Weiss)" * tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits) thermal/drivers/qcom/temp-alarm: Use dev_err_probe thermal/drivers/generic-adc: Register thermal zones as hwmon sensors thermal/drivers/mediatek/lvts_thermal: Remove redundant msg in lvts_ctrl_start() thermal/drivers/qcom: Remove redundant msg at probe time thermal/drivers/ti-soc: Remove redundant msg in ti_thermal_expose_sensor() thermal/drivers/qoriq: Remove redundant msg in qoriq_tmu_register_tmu_zone() thermal/drivers/tegra: Remove redundant msg in tegra_tsensor_register_channel() drivers/thermal/k3: Remove redundant msg in k3_bandgap_probe() thermal/drivers/imx: Remove redundant msg in imx8mm_tmu_probe() and imx_sc_thermal_probe() thermal/drivers/amlogic: Remove redundant msg in amlogic_thermal_probe() thermal/drivers/sun8i: Remove redundant msg in sun8i_ths_register() thermal/hwmon: Add error information printing for devm_thermal_add_hwmon_sysfs() thermal/drivers/stm32: Convert to platform remove callback returning void net/mlx5: Update the driver with the recent thermal changes thermal/drivers/armada: Add support for AP807 thermal data dt-bindings: armada-thermal: Add armada-ap807-thermal compatible thermal/drivers/qoriq: Support version 2.1 thermal/drivers/qoriq: Only enable supported sensors thermal/drivers/qoriq: No need to program site adjustment register thermal/drivers/mediatek/lvts_thermal: Register thermal zones as hwmon sensors ...	2023-06-26 19:41:26 -07:00
Daniel Lezcano	a5639fade0	net/mlx5: Update the driver with the recent thermal changes The thermal framework is migrating to the generic trip points. The set of changes also implies a self-encapsulation of the thermal zone device structure where the internals are no longer directly accessible but with accessors. Use the new API instead, so the next changes can be pushed in the thermal framework without this driver failing to compile. No functional changes intended. Cc: Sandipan Patra <spatra@nvidia.com> Cc: Gal Pressman <gal@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230525140135.3589917-2-daniel.lezcano@linaro.org	2023-06-26 12:03:14 +02:00
Edward Cree	1186c6b31e	sfc: falcon: use padding to fix alignment in loopback test Add two bytes of padding to the start of struct ef4_loopback_payload, which are not sent on the wire. This ensures the 'ip' member is 4-byte aligned, preventing the following W=1 warning: net/ethernet/sfc/falcon/selftest.c:43:15: error: field ip within 'struct ef4_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct ef4_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] struct iphdr ip; Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-26 10:36:48 +01:00
Edward Cree	30c24dd87f	sfc: siena: use padding to fix alignment in loopback test Add two bytes of padding to the start of struct efx_loopback_payload, which are not sent on the wire. This ensures the 'ip' member is 4-byte aligned, preventing the following W=1 warning: net/ethernet/sfc/siena/selftest.c:46:15: error: field ip within 'struct efx_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct efx_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] struct iphdr ip; Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-26 10:36:48 +01:00
Edward Cree	cf60ed4696	sfc: use padding to fix alignment in loopback test Add two bytes of padding to the start of struct efx_loopback_payload, which are not sent on the wire. This ensures the 'ip' member is 4-byte aligned, preventing the following W=1 warning: net/ethernet/sfc/selftest.c:46:15: error: field ip within 'struct efx_loopback_payload' is less aligned than 'struct iphdr' and is usually due to 'struct efx_loopback_payload' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] struct iphdr ip; Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-26 10:36:48 +01:00
Edward Cree	d1b355438b	sfc: fix crash when reading stats while NIC is resetting efx_net_stats() (.ndo_get_stats64) can be called during an ethtool selftest, during which time nic_data->mc_stats is NULL as the NIC has been fini'd. In this case do not attempt to fetch the latest stats from the hardware, else we will crash on a NULL dereference: BUG: kernel NULL pointer dereference, address: 0000000000000038 RIP efx_nic_update_stats abridged calltrace: efx_ef10_update_stats_pf efx_net_stats dev_get_stats dev_seq_printf_stats Skipping the read is safe, we will simply give out stale stats. To ensure that the free in efx_ef10_fini_nic() does not race against efx_ef10_update_stats_pf(), which could cause a TOCTTOU bug, take the efx->stats_lock in fini_nic (it is already held across update_stats). Fixes: `d3142c193d` ("sfc: refactor EF10 stats handling") Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-06-26 09:28:27 +01:00
David Howells	dc97391e66	sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:13 -07:00
Jakub Kicinski	b545a13ca9	mlx5-updates-2023-06-21 mlx5 driver minor cleanup and fixes to net-next -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSV8icACgkQSD+KveBX +j6WgAf+Ph5w2dAfuadjlAcQlK2WKXyS1OliVpvCuglepgsP9Zree11WVoyiAKef 70Xd5Vu4xAQTdpCsw4DGM7sh6xW1SxvZFP1A/FVk7UU1E0zL2SXzKyEHK29I1MH5 nej2hRf/W1dwYtxfwNOTKAFco3wr0e1vLgMDEqZBZbJXzcUetDRADkgWrx9U+pno lBPhVMZWK5R0GzOjlWZXoedaXx2RIcm+5U02ov5S6d1y8AA+sE93tiYxrP9z/2lj nml1KeQQl0Ku/y+e8RMSUd9mPdomQZi6CVHMD1wV5DNv6dnrT1bPrUdt4torSXEI KAzkVie979XP2jvkDKY2nyXZ8dn7cw== =woKn -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-21 mlx5 driver minor cleanup and fixes to net-next * tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Remove pointless vport lookup from mlx5_esw_check_port_type() net/mlx5: Remove redundant check from mlx5_esw_query_vport_vhca_id() net/mlx5: Remove redundant is_mdev_switchdev_mode() check from is_ib_rep_supported() net/mlx5: Remove redundant MLX5_ESWITCH_MANAGER() check from is_ib_rep_supported() net/mlx5e: E-Switch, Fix shared fdb error flow net/mlx5e: Remove redundant comment net/mlx5e: E-Switch, Pass other_vport flag if vport is not 0 net/mlx5e: E-Switch, Use xarray for devcom paired device index net/mlx5e: E-Switch, Add peer fdb miss rules for vport manager or ecpf net/mlx5e: Use vhca_id for device index in vport rx rules net/mlx5: Lag, Remove duplicate code checking lag is supported net/mlx5: Fix error code in mlx5_is_reset_now_capable() net/mlx5: Fix reserved at offset in hca_cap register net/mlx5: Fix SFs kernel documentation error net/mlx5: Fix UAF in mlx5_eswitch_cleanup() ==================== Link: https://lore.kernel.org/r/20230623192907.39033-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:48:04 -07:00
Maxim Kochetkov	f1bc9fc4a0	net: axienet: Move reset before 64-bit DMA detection 64-bit DMA detection will fail if axienet was started before (by boot loader, boot ROM, etc). In this state axienet will not start properly. XAXIDMA_TX_CDESC_OFFSET + 4 register (MM2S_CURDESC_MSB) is used to detect 64-bit DMA capability here. But datasheet says: When DMACR.RS is 1 (axienet is in enabled state), CURDESC_PTR becomes Read Only (RO) and is used to fetch the first descriptor. So iowrite32()/ioread32() trick to this register to detect 64-bit DMA will not work. So move axienet reset before 64-bit DMA detection. Fixes: `f735c40ed9` ("net: axienet: Autodetect 64-bit DMA capability") Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://lore.kernel.org/r/20230622192245.116864-1-fido_max@inbox.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:39:38 -07:00
Bartosz Golaszewski	4194f32a4b	net: stmmac: dwmac-qcom-ethqos: use devm_stmmac_pltfr_probe() Use the devres variant of stmmac_pltfr_probe() and finally drop the remove() callback entirely. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-12-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	fc9ee2ac4f	net: stmmac: platform: provide devm_stmmac_pltfr_probe() Provide a devres variant of stmmac_pltfr_probe() which allows users to skip calling stmmac_pltfr_remove() at driver detach. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-11-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	061425d933	net: stmmac: dwmac-qco-ethqos: use devm_stmmac_probe_config_dt() Significantly simplify the driver's probe() function by using the devres variant of stmmac_probe_config_dt(). This allows to drop the goto jumps entirely. The remove_new() callback now needs to be switched to stmmac_pltfr_remove_no_dt(). Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-10-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	d740654273	net: stmmac: platform: provide devm_stmmac_probe_config_dt() Provide a devres variant of stmmac_probe_config_dt() that allows users to skip calling stmmac_remove_config_dt() at driver detach. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-9-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	1be0c9d65e	net: stmmac: platform: provide stmmac_pltfr_remove_no_dt() Add a variant of stmmac_pltfr_remove() that only frees resources allocated by stmmac_pltfr_probe() and - unlike stmmac_pltfr_remove() - does not call stmmac_remove_config_dt(). Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-8-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	0a68a59493	net: stmmac: dwmac-generic: use stmmac_pltfr_probe() Shrink the code and remove labels by using the new stmmac_pltfr_probe() function. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-7-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	3d5bf75d76	net: stmmac: platform: provide stmmac_pltfr_probe() Implement stmmac_pltfr_probe() which is the logical API counterpart for stmmac_pltfr_remove(). It calls the platform's init() callback and then probes the stmmac device. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-6-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	40db9f1ddf	net: stmmac: dwmac-generic: use stmmac_pltfr_exit() Shrink the code in dwmac-generic by using the new stmmac_pltfr_exit() helper. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-5-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	5b0acf8dd2	net: stmmac: platform: provide stmmac_pltfr_exit() Provide a helper wrapper around calling the platform's exit() callback. This allows users to skip checking if the callback exists. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-4-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	4450e7d423	net: stmmac: dwmac-generic: use stmmac_pltfr_init() Shrink the code in dwmac-generic by using the new stmmac_pltfr_init() helper. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-3-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	97117eb51e	net: stmmac: platform: provide stmmac_pltfr_init() Provide a helper wrapper around calling the platform's init() callback. This allows users to skip checking if the callback exists. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-2-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Jakub Kicinski	cfd40b82a5	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-06-22 (ice) This series contains updates to ice driver only. Jake adds a slight wait on control queue send to reduce wait time for responses that occur within normal times. Maciej allows for hot-swapping XDP programs. Przemek removes unnecessary checks when enabling SR-IOV and freeing allocated memory. Christophe Jaillet converts a managed memory allocation to a regular one. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: use ice_down_up() where applicable ice: Remove managed memory usage in ice_get_fw_log_cfg() ice: remove null checks before devm_kfree() calls ice: clean up freeing SR-IOV VFs ice: allow hot-swapping XDP programs ice: reduce initial wait for control queue messages ==================== Link: https://lore.kernel.org/r/20230622183601.2406499-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:32:18 -07:00
Jakub Kicinski	1c78eb8760	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-06-22 (iavf) This series contains updates to iavf driver only. Przemek defers removing, previous, primary MAC address until after getting result of adding its replacement. He also does some cleanup by removing unused functions and making applicable functions static. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: make functions static where possible iavf: remove some unused functions and pointless wrappers iavf: fix err handling for MAC replace ==================== Link: https://lore.kernel.org/r/20230622165914.2203081-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:12:05 -07:00
Jakub Kicinski	eb441289f9	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== igc: TX timestamping fixes This is the fixes part of the series intended to add support for using the 4 timestamp registers present in i225/i226. Moving the timestamp handling to be inline with the interrupt handling has the advantage of improving the TX timestamping retrieval latency, here are some numbers using ntpperf: Before: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37 \| responses \| TX timestamp offset (ns) rate clients \| lost invalid basic xleave \| min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -56 +9 +52 19 1500 150 0.00% 0.00% 0.00% 100.00% -40 +30 +75 22 2250 225 0.00% 0.00% 0.00% 100.00% -11 +29 +72 15 3375 337 0.00% 0.00% 0.00% 100.00% -18 +40 +88 22 5062 506 0.00% 0.00% 0.00% 100.00% -19 +23 +77 15 7593 759 0.00% 0.00% 0.00% 100.00% +7 +47 +5168 43 11389 1138 0.00% 0.00% 0.00% 100.00% -11 +41 +5240 39 17083 1708 0.00% 0.00% 0.00% 100.00% +19 +60 +5288 50 25624 2562 0.00% 0.00% 0.00% 100.00% +1 +56 +5368 58 38436 3843 0.00% 0.00% 0.00% 100.00% -84 +12 +8847 66 57654 5765 0.00% 0.00% 100.00% 0.00% 86481 8648 0.00% 0.00% 100.00% 0.00% 129721 12972 0.00% 0.00% 100.00% 0.00% 194581 16384 0.00% 0.00% 100.00% 0.00% 291871 16384 27.35% 0.00% 72.65% 0.00% 437806 16384 50.05% 0.00% 49.95% 0.00% After: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37 \| responses \| TX timestamp offset (ns) rate clients \| lost invalid basic xleave \| min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -44 +0 +61 19 1500 150 0.00% 0.00% 0.00% 100.00% -6 +39 +81 16 2250 225 0.00% 0.00% 0.00% 100.00% -22 +25 +69 15 3375 337 0.00% 0.00% 0.00% 100.00% -28 +15 +56 14 5062 506 0.00% 0.00% 0.00% 100.00% +7 +78 +143 27 7593 759 0.00% 0.00% 0.00% 100.00% -54 +24 +144 47 11389 1138 0.00% 0.00% 0.00% 100.00% -90 -33 +28 21 17083 1708 0.00% 0.00% 0.00% 100.00% -50 -2 +35 14 25624 2562 0.00% 0.00% 0.00% 100.00% -62 +7 +66 23 38436 3843 0.00% 0.00% 0.00% 100.00% -33 +30 +5395 36 57654 5765 0.00% 0.00% 100.00% 0.00% 86481 8648 0.00% 0.00% 100.00% 0.00% 129721 12972 0.00% 0.00% 100.00% 0.00% 194581 16384 19.50% 0.00% 80.50% 0.00% 291871 16384 35.81% 0.00% 64.19% 0.00% 437806 16384 55.40% 0.00% 44.60% 0.00% During this series, and to show that as is always the case, things are never easy as they should be, a hardware issue was found, and it took some time to find the workaround(s). The bug and workaround are better explained in patch 4/4. Note: the workaround has a simpler alternative, but it would involve adding support for the other timestamp registers, and only using the TXSTMP{H/L}_0 as a way to clear the interrupt. But I feel bad about throwing this kind of resources away. Didn't test this extensively but it should work. Also, as Marc Kleine-Budde suggested, after some consensus is reached on this series, most parts of it will be proposed for igb. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Work around HW bug causing missing timestamps igc: Retrieve TX timestamp during interrupt handling igc: Check if hardware TX timestamping is enabled earlier igc: Fix race condition in PTP tx code ==================== Link: https://lore.kernel.org/r/20230622165244.2202786-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:04:12 -07:00
Petr Machata	9464a3d68e	mlxsw: spectrum_router: Track next hops at CRIFs Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The reason is that eventually, next hops for mlxsw uppers should be offloaded and unoffloaded on demand as a netdevice becomes an upper, or stops being one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the netdevice lifetime. Correspondingly, track at each next hop not its RIF, but its CRIF (from which a RIF can always be deduced). Note that now that next hops are tracked at a CRIF, it is not necessary to move each over to a new RIF when it is necessary to edit a RIF. Therefore drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy() call mlxsw_sp_nexthop_rif_update() directly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	a285d66423	mlxsw: spectrum_router: Split nexthop finalization to two stages Nexthop finalization consists of two steps: the part where the offload is removed, because the backing RIF is now gone; and the part where the association to the RIF is severed. Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later be called independently. Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini() vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a historical happenstance than a conscious decision. The two cleanups do not depend on each other, and this change should have no observable effects. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	bdc0b78e79	mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_index A previous patch added a pointer to loopback CRIF to the router data structure. That makes the loopback RIF index redundant, as everything necessary can be derived from the CRIF. Drop the field and adjust the code accordingly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	aa21242b07	mlxsw: spectrum_router: Link CRIFs to RIFs When a RIF is about to be created, the registration of the netdevice that it should be associated with must have been seen in the past, and a CRIF created. Therefore make this a hard requirement by looking up the CRIF during RIF creation, and complaining loudly when there isn't one. This then allows to keep a link between a RIF and its corresponding CRIF (and back, as the relationship is one-to-at-most-one), which do. The CRIF will later be useful as the objects tracked there will be offloaded lazily as a result of RIF creation. CRIFs are created when an "interesting" netdevice is registered, and destroyed after such device is unregistered. CRIFs are supposed to already exist when a RIF creation request arises, and exist at least as long as that RIF exists. This makes for a simple invariant: it is always safe to dereference CRIF pointer from "its" RIF. To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER event is delivered. The reason is that if a RIF's netdevices has an IPv6 address, removal of this address is notified in an atomic block. To remove the RIF, the IPv6 removal handler schedules a work item. It must be safe for this work item to access the associated CRIF as well. Thus when a netdevice that backs the CRIF is removed, if it still has a RIF, do not actually free the CRIF, only toggle its can_destroy flag, which this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	78126cfd5d	mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF CRIFs are generally not maintained for loopback RIFs. However, the RIF for the default VRF is used for offloading of blackhole nexthops. Nexthops expect to have a valid CRIF. Therefore in this patch, add code to maintain CRIF for the loopback RIF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	4796c287b7	mlxsw: spectrum_router: Maintain a hash table of CRIFs CRIFs are objects that mlxsw maintains for netdevices that may not have an associated RIF (i.e. they may not have been instantiated in the ASIC), but if indeed they do not, it is quite possible they will in the future. These netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are created include e.g. bridges, LAGs, or front panel ports. The idea is that next hops would be kept at CRIFs, not RIFs, and thus it would be easier to offload and unoffload the entities that have been added before the RIF was created. In this patch, add the code for low-level CRIF maintenance: create and destroy, and keep in a table keyed by the netdevice pointer for easy recall. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	f3c85eed1a	mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIF The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the function mentioned in the subject. As such it forms an external interface of the router code. In future patches we will want to maintain connection between RIFs and the CRIFs (introduced in the next patch) that back them. That will not hold for the VRF-based loopback netdevices, so the whole CRIF business can be kept hidden from the rest of mlxsw. But for the main VRF loopback RIF we do want to keep the RIF-CRIF connection, because that RIF is used for blackhole next hops, and the next hop code can be kept simpler for assuming rif->crif is valid. Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback RIF. This being an internal function will take the CRIF argument anyway. Furthermore, the function does not lock, which is not necessary at this point in code yet. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	ebbd17ce29	mlxsw: spectrum_router: Add extack argument to mlxsw_sp_lb_rif_init() The extack will be handy in later patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e87ba300121010d580b80a281877573a7b1377ca.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Florian Fainelli	1b5ea7ffb7	net: bcmgenet: Ensure MDIO unregistration has clocks enabled With support for Ethernet PHY LEDs having been added, while unregistering a MDIO bus and its child device liks PHYs there may be "late" accesses to the MDIO bus. One typical use case is setting the PHY LEDs brightness to OFF for instance. We need to ensure that the MDIO bus controller remains entirely functional since it runs off the main GENET adapter clock. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230617155500.4005881-1-andrew@lunn.ch/ Fixes: `9a4e796970` ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230622103107.1760280-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 18:59:10 -07:00
Jiri Pirko	29e4c95fae	net/mlx5: Remove pointless vport lookup from mlx5_esw_check_port_type() As xa_get_mark() returns false in case the entry is not present, no need to redundantly check if vport is present. Remove the lookup. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:35 -07:00
Jiri Pirko	899862b653	net/mlx5: Remove redundant check from mlx5_esw_query_vport_vhca_id() Since mlx5_esw_query_vport_vhca_id() could be called either from mlx5_esw_vport_enable() or mlx5_esw_vport_disable() where the the check is done, this is always false here. Remove the redundant check. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:34 -07:00
Jiri Pirko	0d0946d648	net/mlx5: Remove redundant is_mdev_switchdev_mode() check from is_ib_rep_supported() is_mdev_switchdev_mode() check is done in is_eth_rep_supported(). Function is_ib_rep_supported() calls is_eth_rep_supported(). Remove the redundant check from it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:34 -07:00
Jiri Pirko	61955da523	net/mlx5: Remove redundant MLX5_ESWITCH_MANAGER() check from is_ib_rep_supported() MLX5_ESWITCH_MANAGER() check is done in is_eth_rep_supported(). Function is_ib_rep_supported() calls is_eth_rep_supported(). Remove the redundant check from it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:34 -07:00
Roi Dayan	15ddd72ee3	net/mlx5e: E-Switch, Fix shared fdb error flow On error flow resources being freed in esw_master_egress_destroy_resources() but pointers not being set to null if error flow is from creating a bounce rule. Then in esw_acl_egress_ofld_cleanup() we try to access already freed pointers. Fix it by resetting the pointers to null. Also if error is from creating a second or later bounce rule then the flow group and table being used and cannot and should not be freed. Add a check to destroy the flow group and table if there are no bounce rules. mlx5_core.sf mlx5_core.sf.2: mlx5_destroy_flow_group:2306:(pid 2235): Flow group 4 wasn't destroyed, refcount > 1 mlx5_core.sf mlx5_core.sf.2: mlx5_destroy_flow_table:2295:(pid 2235): Flow table 3 wasn't destroyed, refcount > 1 Fixes: `5e0202eb49` ("net/mlx5: E-switch, Handle multiple master egress rules") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:34 -07:00
Roi Dayan	ae4de89493	net/mlx5e: Remove redundant comment The function comment says what it is and the comment is redundant. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:34 -07:00
Roi Dayan	4575ab3b7d	net/mlx5e: E-Switch, Pass other_vport flag if vport is not 0 When creating flow table for shared fdb resources, there is only need to pass other_vport flag if vport is not 0 or if the port is ECPF in BlueField. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:34 -07:00
Roi Dayan	70c3643839	net/mlx5e: E-Switch, Use xarray for devcom paired device index To allow devcom events on E-Switch that is not a vport group manager, use vhca id as an index instead of device index which might be shared between several E-Switches. for example SF and its PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:33 -07:00
Roi Dayan	1552e9b518	net/mlx5e: E-Switch, Add peer fdb miss rules for vport manager or ecpf Add peer fdb rules for E-Switch that are vport managers or ecpf device. It is not needed for other devices. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:33 -07:00
Roi Dayan	1da9f36252	net/mlx5e: Use vhca_id for device index in vport rx rules Device index is like PF index and limited to max physical ports. For example, SFs created under PF the device index is the PF device index. Use vhca_id which gets the FW index per vport, for vport rx rules and vport pair events. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:33 -07:00
Roi Dayan	8ec91f5d07	net/mlx5: Lag, Remove duplicate code checking lag is supported Remove duplicate function for checking if device has lag support. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:33 -07:00
Dan Carpenter	690ad62fc6	net/mlx5: Fix error code in mlx5_is_reset_now_capable() The mlx5_is_reset_now_capable() function returns bool, not negative error codes. So if fast teardown is not supported it should return false instead of -EOPNOTSUPP. Fixes: `92501fa6e4` ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:33 -07:00
Shay Drory	da744fd136	net/mlx5: Fix UAF in mlx5_eswitch_cleanup() mlx5_eswitch_cleanup() is using esw right after freeing it for releasing devlink_param. Fix it by releasing the devlink_param before freeing the esw, and adjust the create function accordingly. Fixes: `3f90840305` ("net/mlx5: Move esw multiport devlink param to eswitch code") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Automatic Verification <verifier@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-06-23 12:27:32 -07:00
Peiyang Wang	ed1c6f35b7	net: hns3: clear hns unused parameter alarm Several functions in the hns3 driver have unused parameters. The compiler will warn about them when building with -Wunused-parameter option of hns3. Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-23 10:59:17 +02:00
Hao Chen	1cf3d5567f	net: hns3: fix strncpy() not using dest-buf length as length issue Now, strncpy() in hns3_dbg_fill_content() use src-length as copy-length, it may result in dest-buf overflow. This patch is to fix intel compile warning for csky-linux-gcc (GCC) 12.1.0 compiler. The warning reports as below: hclge_debugfs.c:92:25: warning: 'strncpy' specified bound depends on the length of the source argument [-Wstringop-truncation] strncpy(pos, items[i].name, strlen(items[i].name)); hclge_debugfs.c:90:25: warning: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation] strncpy(pos, result[i], strlen(result[i])); strncpy() use src-length as copy-length, it may result in dest-buf overflow. So,this patch add some values check to avoid this issue. Signed-off-by: Hao Chen <chenhao418@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/lkml/202207170606.7WtHs9yS-lkp@intel.com/T/ Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-23 10:59:17 +02:00
Jian Shen	9b476494da	net: hns3: refine the tcam key convert handle The result of expression '(k ^ ~v) & k' is exactly the same with 'k & v', so simplify it. (k ^ ~v) & k == k & v The truth table (in non table form): k == 0, v == 0: (k ^ ~v) & k == (0 ^ ~0) & 0 == (0 ^ 1) & 0 == 1 & 0 == 0 k & v == 0 & 0 == 0 k == 0, v == 1: (k ^ ~v) & k == (0 ^ ~1) & 0 == (0 ^ 0) & 0 == 1 & 0 == 0 k & v == 0 & 1 == 0 k == 1, v == 0: (k ^ ~v) & k == (1 ^ ~0) & 1 == (1 ^ 1) & 1 == 0 & 1 == 0 k & v == 1 & 0 == 0 k == 1, v == 1: (k ^ ~v) & k == (1 ^ ~1) & 1 == (1 ^ 0) & 1 == 1 & 1 == 1 k & v == 1 & 1 == 1 Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-23 10:59:17 +02:00
Edward Cree	9a14f2e3da	sfc: keep alive neighbour entries while a TC encap action is using them When processing counter updates, if any action set using the newly incremented counter includes an encap action, prod the corresponding neighbouring entry to indicate to the neighbour cache that the entry is still in use and passing traffic. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230621121504.17004-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-22 19:54:14 -07:00
Ying Hsu	004d25060c	igb: Fix igb_down hung on surprise removal In a setup where a Thunderbolt hub connects to Ethernet and a display through USB Type-C, users may experience a hung task timeout when they remove the cable between the PC and the Thunderbolt hub. This is because the igb_down function is called multiple times when the Thunderbolt hub is unplugged. For example, the igb_io_error_detected triggers the first call, and the igb_remove triggers the second call. The second call to igb_down will block at napi_synchronize. Here's the call trace: __schedule+0x3b0/0xddb ? __mod_timer+0x164/0x5d3 schedule+0x44/0xa8 schedule_timeout+0xb2/0x2a4 ? run_local_timers+0x4e/0x4e msleep+0x31/0x38 igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4] __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4] igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4] __dev_close_many+0x95/0xec dev_close_many+0x6e/0x103 unregister_netdevice_many+0x105/0x5b1 unregister_netdevice_queue+0xc2/0x10d unregister_netdev+0x1c/0x23 igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4] pci_device_remove+0x3f/0x9c device_release_driver_internal+0xfe/0x1b4 pci_stop_bus_device+0x5b/0x7f pci_stop_bus_device+0x30/0x7f pci_stop_bus_device+0x30/0x7f pci_stop_and_remove_bus_device+0x12/0x19 pciehp_unconfigure_device+0x76/0xe9 pciehp_disable_slot+0x6e/0x131 pciehp_handle_presence_or_link_change+0x7a/0x3f7 pciehp_ist+0xbe/0x194 irq_thread_fn+0x22/0x4d ? irq_thread+0x1fd/0x1fd irq_thread+0x17b/0x1fd ? irq_forced_thread_fn+0x5f/0x5f kthread+0x142/0x153 ? __irq_get_irqchip_state+0x46/0x46 ? kthread_associate_blkcg+0x71/0x71 ret_from_fork+0x1f/0x30 In this case, igb_io_error_detected detaches the network interface and requests a PCIE slot reset, however, the PCIE reset callback is not being invoked and thus the Ethernet connection breaks down. As the PCIE error in this case is a non-fatal one, requesting a slot reset can be avoided. This patch fixes the task hung issue and preserves Ethernet connection by ignoring non-fatal PCIE errors. Signed-off-by: Ying Hsu <yinghsu@chromium.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620174732.4145155-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-22 19:49:44 -07:00
Zhengchao Shao	2a441a3dbe	net: txgbe: remove unused buffer in txgbe_calc_eeprom_checksum Half a year passed since commit `049fe53653` ("net: txgbe: Add operations to interact with firmware") was submitted, the buffer in txgbe_calc_eeprom_checksum was not used. So remove it and the related branch codes. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306200242.FXsHokaJ-lkp@intel.com/ Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230620062519.1575298-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-22 19:45:57 -07:00
Russell King (Oracle)	f40df95d37	net: macb: update PCS driver to use neg_mode Update macb's embedded PCS drivers to use neg_mode, even though it makes no use of it or the "mode" argument. This makes the driver consistent with converted drivers. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8Eo-00EaGX-KJ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-22 19:41:02 -07:00
Russell King (Oracle)	6e5bb3da98	net: sparx5: update PCS driver to use neg_mode Update Sparx5's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EZ-00EaGF-6F@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-22 19:41:02 -07:00
Russell King (Oracle)	d5a0529930	net: prestera: update PCS driver to use neg_mode Update prestera's embedded PCS driver to use neg_mode rather than the mode argument. As there is no pcs_link_up() method, this only affects the pcs_config() method. Acked-by: Elad Nachman <enachman@marvell.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qA8EO-00EaG3-TR@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-22 19:41:01 -07:00

1 2 3 4 5 ...

46973 Commits