mirror_ubuntu-kernels/drivers/net/ethernet
Manish Chopra af68656d66 bnx2x: fix napi API usage sequence
While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.

System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload

Uninterruptible tasks
=====================
crash> ps | grep UN
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
     215      2   0  c000000004c80000  UN   0.0       0      0
[kworker/0:2]
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
   32423      2  26  c00000020038c580  UN   0.0       0      0
[kworker/26:3]
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd

EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
  #0 [c000000004d477e0] __schedule at c000000000c70808
  #1 [c000000004d478b0] schedule at c000000000c70ee0
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
  #3 [c000000004d479c0] msleep at c0000000002120cc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
                                        ^^^^^^^^^^^^^^^^
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64

And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702

   6697  {
   6698          might_sleep();
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
   6700
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702                  msleep(1);
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
   6704                  msleep(1);
   6705
   6706          hrtimer_cancel(&n->timer);
   6707
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
   6709  }

EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:

bnx2x_io_error_detected()
  +-> bnx2x_eeh_nic_unload()
       +-> bnx2x_del_all_napi()
            +-> __netif_napi_del()

bnx2x_io_slot_reset()
  +-> bnx2x_netif_stop()
       +-> bnx2x_napi_disable()
            +->napi_disable()

Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.

Fixes: 7fa6f34081 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27 17:50:32 -07:00
..
3com net: typhoon: include <net/vxlan.h> 2022-02-07 19:53:38 -08:00
8390 ethernet: 8390: Remove unnecessary print function dev_err() 2022-03-11 22:59:03 -08:00
actions
adaptec
aeroflex
agere et131x: support arbitrary MAX_SKB_FRAGS 2022-02-08 16:51:23 -08:00
alacritech
allwinner
alteon
altera net: ethernet: altera: cleanup comments 2022-02-16 20:33:04 -08:00
amazon
amd net: amd-xgbe: disable interrupts during pci removal 2022-02-09 12:52:59 +00:00
apm drivers: net: xgene: Fix regression in CRC stripping 2022-03-23 10:30:05 -07:00
apple
aquantia net: atlantic: invert deep par in pm functions, preventing null derefs 2022-04-18 13:34:36 +01:00
arc net: arc_emac: Fix use after free in arc_mdio_probe() 2022-03-10 14:49:21 -08:00
asix net: ethernet: Use netif_rx(). 2022-03-04 12:02:19 +00:00
atheros atl1c: remove redundant assignment to variable size 2022-03-18 14:16:47 -07:00
broadcom bnx2x: fix napi API usage sequence 2022-04-27 17:50:32 -07:00
brocade
cadence net: macb: Restart tx only if queue pointer is lagging 2022-04-11 18:18:07 -07:00
calxeda
cavium Revert "net: ethernet: cavium: use div64_u64() instead of do_div()" 2022-02-11 16:54:47 -08:00
chelsio net: cxgb3: Fix an error code when probing the driver 2022-03-07 22:18:52 -08:00
cirrus
cisco
cortina
davicom net: ethernet: Use netif_rx(). 2022-03-04 12:02:19 +00:00
dec
dlink net: sundance: Replace one-element array with non-array object 2022-02-05 15:30:32 +00:00
emulex
engleder
ezchip net: ethernet: ezchip: fix platform_get_irq.cocci warning 2022-03-11 11:07:23 +00:00
faraday net: ftgmac100: access hardware register after clock ready 2022-04-13 12:43:55 +01:00
freescale dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info() 2022-04-11 12:02:33 +01:00
fujitsu
fungible net/fungible: Fix reference to __udivdi3 on 32b builds 2022-04-01 21:32:30 -07:00
google gve: Fix spelling mistake "droping" -> "dropping" 2022-03-16 19:29:00 -07:00
hisilicon net: hns: Add missing fwnode_handle_put in hns_mac_init 2022-04-25 11:06:53 +01:00
huawei
i825xx Networking changes for 5.18. 2022-03-24 13:13:26 -07:00
ibm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-23 10:53:49 -07:00
intel ice: fix use-after-free when deinitializing mailbox snapshot 2022-04-26 09:26:48 -07:00
litex net: ethernet: litex: Add the dependency on HAS_IOMEM 2022-02-08 20:43:40 -08:00
marvell net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address() 2022-04-05 18:12:55 -07:00
mediatek
mellanox net: Handle l3mdev in ip_tunnel_init_flow 2022-04-15 14:27:30 -07:00
micrel net: micrel: Fix KS8851 Kconfig 2022-04-05 17:32:05 -07:00
microchip net: lan966x: fix a couple off by one bugs 2022-04-25 11:25:37 +01:00
microsoft net: mana: Remove unnecessary check of cqe_type in mana_process_rx_cqe() 2022-02-05 15:26:00 +00:00
moxa net: moxa: use GFP_KERNEL 2022-02-11 14:39:08 -08:00
mscc net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridge 2022-04-25 11:47:55 +01:00
myricom myri10ge: fix an incorrect free for skb in myri10ge_sw_tso 2022-04-06 15:29:18 +01:00
natsemi
neterion
netronome devlink: hold the instance lock during eswitch_mode callbacks 2022-03-21 14:11:38 +00:00
ni net: nixge: Use GFP_KERNEL instead of GFP_ATOMIC when possible 2022-02-17 20:03:39 -08:00
nvidia
nxp net: ethernet: lpc_eth: Handle error for clk_enable 2022-03-09 12:15:20 +00:00
oki-semi
packetengines drivers: net: packetengines: fix typos in comments 2022-03-14 10:04:28 -07:00
pasemi
pensando ionic: no transition while stopping 2022-02-28 11:42:45 +00:00
qlogic qede: confirm skb is allocated before using 2022-04-06 15:16:23 +01:00
qualcomm net: add per-cpu storage and net->core_stats 2022-03-11 23:17:24 -08:00
rdc
realtek r8169: improve driver unload and system shutdown behavior on DASH-enabled systems 2022-03-17 16:47:32 -07:00
renesas ravb: Use GFP_KERNEL instead of GFP_ATOMIC when possible 2022-02-21 12:00:46 +00:00
rocker
samsung Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-03 11:55:12 -08:00
seeq
sfc net: sfc: fix using uninitialized xdp tx_queue 2022-04-06 13:50:17 +01:00
sgi
silan
sis
smsc ethernet: smc911x: fix indentation in get/set EEPROM 2022-02-01 19:59:03 -08:00
socionext net: netsec: enable pp skb recycling 2022-02-28 11:39:23 +00:00
stmicro net: ethernet: stmmac: fix write to sgmii_adapter_base 2022-04-22 16:31:56 -07:00
sun ethernet: sun: Fix spelling mistake "mis-matched" -> "mismatched" 2022-03-17 16:36:05 -07:00
synopsys
tehuti
ti Networking changes for 5.18. 2022-03-24 13:13:26 -07:00
toshiba
tundra
vertexcom net: ethernet: Use netif_rx(). 2022-03-04 12:02:19 +00:00
via
wiznet net: ethernet: Use netif_rx(). 2022-03-04 12:02:19 +00:00
xilinx net: axiemac: use a phandle to reference pcs_phy 2022-04-06 13:54:52 +01:00
xircom
xscale ARM: ixp4xx: Drop all common code 2022-02-12 18:20:04 +01:00
dnet.c
dnet.h
ec_bhf.c
ethoc.c
fealnx.c
jme.c net: ethernet: use time_is_before_eq_jiffies() instead of open coding it 2022-02-28 13:21:31 +00:00
jme.h
Kconfig net: restore alpha order to Ethernet devices in config 2022-04-15 11:56:16 +01:00
korina.c
lantiq_etop.c
lantiq_xrx200.c net: lantiq_xrx200: fix use after free bug 2022-03-07 11:29:35 +00:00
Makefile net/fungible: Kconfig, Makefiles, and MAINTAINERS 2022-02-27 10:51:23 +00:00