backport fix for network crashes when exiting ovs-tcpdump

A user in enterprise support reported a low (<1/50) chance of network crash (with soft lockups reported) when exiting ovs-tcpdump (a tcpdump wrapper provided by Open vSwitch) which they could only resolve by rebooting the host. After reporting the issue upstream with a reproducer [1], an OVS developer submitted a kernel patch which is now included 6.13 and some stable kernels. With this patch, the reproducer does not seem to trigger the issue anymore. Hence, backport the patch. [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
2025-10-04 09:42:10 +00:00 · 2025-01-24 11:21:23 +01:00 · 2025-01-24 11:21:23 +01:00 · 1fb1563b45
commit 1fb1563b45
parent 8f6fe4f2f0
1 changed files with 77 additions and 0 deletions
--- a/patches/kernel/0016-openvswitch-fix-lockup-on-tx-to-unregistering-netdev.patch
+++ b/patches/kernel/0016-openvswitch-fix-lockup-on-tx-to-unregistering-netdev.patch
@ -0,0 +1,77 @@
+From 179893798c518361c20cb11b5b0c71cecf019ad4 Mon Sep 17 00:00:00 2001
+From: Ilya Maximets <i.maximets@ovn.org>
+Date: Thu, 9 Jan 2025 13:21:24 +0100
+Subject: [PATCH] openvswitch: fix lockup on tx to unregistering netdev with
+ carrier
+
+Commit in a fixes tag attempted to fix the issue in the following
+sequence of calls:
+
+    do_output
+    -> ovs_vport_send
+       -> dev_queue_xmit
+          -> __dev_queue_xmit
+             -> netdev_core_pick_tx
+                -> skb_tx_hash
+
+When device is unregistering, the 'dev->real_num_tx_queues' goes to
+zero and the 'while (unlikely(hash >= qcount))' loop inside the
+'skb_tx_hash' becomes infinite, locking up the core forever.
+
+But unfortunately, checking just the carrier status is not enough to
+fix the issue, because some devices may still be in unregistering
+state while reporting carrier status OK.
+
+One example of such device is a net/dummy.  It sets carrier ON
+on start, but it doesn't implement .ndo_stop to set the carrier off.
+And it makes sense, because dummy doesn't really have a carrier.
+Therefore, while this device is unregistering, it's still easy to hit
+the infinite loop in the skb_tx_hash() from the OVS datapath.  There
+might be other drivers that do the same, but dummy by itself is
+important for the OVS ecosystem, because it is frequently used as a
+packet sink for tcpdump while debugging OVS deployments.  And when the
+issue is hit, the only way to recover is to reboot.
+
+Fix that by also checking if the device is running.  The running
+state is handled by the net core during unregistering, so it covers
+unregistering case better, and we don't really need to send packets
+to devices that are not running anyway.
+
+While only checking the running state might be enough, the carrier
+check is preserved.  The running and the carrier states seem disjoined
+throughout the code and different drivers.  And other core functions
+like __dev_direct_xmit() check both before attempting to transmit
+a packet.  So, it seems safer to check both flags in OVS as well.
+
+Fixes: 066b86787fa3 ("net: openvswitch: fix race on port output")
+Reported-by: Friedrich Weber <f.weber@proxmox.com>
+Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html
+Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
+Tested-by: Friedrich Weber <f.weber@proxmox.com>
+Reviewed-by: Aaron Conole <aconole@redhat.com>
+Link: https://patch.msgid.link/20250109122225.4034688-1-i.maximets@ovn.org
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+(cherry picked from commit 47e55e4b410f7d552e43011baa5be1aab4093990)
+Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
+---
+ net/openvswitch/actions.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
+index 101f9a23792c..896f50387347 100644
+--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
+@@ -930,7 +930,9 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
+ {
+ 	struct vport *vport = ovs_vport_rcu(dp, out_port);
+ 
+-	if (likely(vport && netif_carrier_ok(vport->dev))) {
+	if (likely(vport &&
+		   netif_running(vport->dev) &&
+		   netif_carrier_ok(vport->dev))) {
+ 		u16 mru = OVS_CB(skb)->mru;
+ 		u32 cutlen = OVS_CB(skb)->cutlen;
+ 
+-- 
+2.39.5
+