Commit Graph

37686 Commits

Author SHA1 Message Date
Haiyue Wang
f995f95af6 iavf: change the flex-byte support number to macro definition
The maximum number (2) of flex-byte support is derived from ethtool
use-def data size (8 byte).

Change the magic number 2 to macro definition, and add the comment to
track the design thinking, so the code is clear and easily maintained.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-23 09:03:56 -07:00
Stefan Assmann
1a0e880b02 iavf: remove duplicate free resources calls
Both iavf_free_all_tx_resources() and iavf_free_all_rx_resources() have
already been called in the very same function.
Remove the duplicate calls.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-23 09:03:56 -07:00
Coiby Xu
5c208e9f49 i40e: use minimal admin queue for kdump
The minimum size of admin send/receive queue is 1 and 2 respectively.
The admin send queue can't be set to 1 because in that case, the
firmware would fail to init.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-23 09:03:56 -07:00
Coiby Xu
dcb75338f6 i40e: use minimal Rx and Tx ring buffers for kdump
Use the minimum of the number of descriptors thus we will allocate the
minimal ring buffers for kdump.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-23 09:03:56 -07:00
Coiby Xu
065aa694a7 i40e: use minimal Tx and Rx pairs for kdump
Set the number of the MSI-X vectors to 1. When MSI-X is enabled,
it's not allowed to use more TC queue pairs than MSI-X vectors
(pf->num_lan_msix) exist. Thus the number of Tx and Rx pairs
(vsi->num_queue_pairs) will be equal to the number of MSI-X vectors,
i.e., 1.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-23 09:03:56 -07:00
Aleksandr Loktionov
6d2c322cce i40e: refactor repeated link state reporting code
Refactor repeated link state reporting code into a separate helper
functions: i40e_set_vf_link_state() i40e_vc_link_speed2mbps().
Add support of VIRTCHNL_VF_CAP_ADV_LINK_SPEED;

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-23 09:03:56 -07:00
Mohammad Athari Bin Ismail
676b7ec67d stmmac: intel: Enable HW descriptor prefetch by default
Enable HW descriptor prefetch by default by setting plat->dma_cfg->dche =
true in intel_mgbe_common_data(). Need to be noted that this capability
only be supported in DWMAC core version 5.20 onwards. In stmmac, there is
a checking to check the core version. If the core version is below 5.20,
this capability wouldn`t be configured.

Below is the iperf result comparison between HW descriptor prefetch
disabled(DCHE=0b) and enabled(DCHE=1b). Tested on Intel Elkhartlake
platform with DWMAC Core 5.20. Observed line rate performance
improvement with HW descriptor prefetch enabled.

DCHE = 0b
[  5] local 169.254.1.162 port 42123 connected to 169.254.244.142 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  96.7 MBytes   811 Mbits/sec  70050
[  5]   1.00-2.00   sec  96.5 MBytes   809 Mbits/sec  69850
[  5]   2.00-3.00   sec  96.3 MBytes   808 Mbits/sec  69720
[  5]   3.00-4.00   sec  95.9 MBytes   804 Mbits/sec  69450
[  5]   4.00-5.00   sec  96.0 MBytes   806 Mbits/sec  69530
[  5]   5.00-6.00   sec  96.8 MBytes   812 Mbits/sec  70080
[  5]   6.00-7.00   sec  96.9 MBytes   813 Mbits/sec  70140
[  5]   7.00-8.00   sec  96.8 MBytes   812 Mbits/sec  70080
[  5]   8.00-9.00   sec  97.0 MBytes   814 Mbits/sec  70230
[  5]   9.00-10.00  sec  96.9 MBytes   813 Mbits/sec  70170
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   966 MBytes   810 Mbits/sec  0.000 ms  0/699300 (0%)  sender
[  5]   0.00-10.00  sec   966 MBytes   810 Mbits/sec  0.011 ms  0/699265 (0%)  receiver

DCHE = 1b
[  5] local 169.254.1.162 port 49740 connected to 169.254.244.142 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  97.9 MBytes   821 Mbits/sec  70880
[  5]   1.00-2.00   sec  98.1 MBytes   823 Mbits/sec  71060
[  5]   2.00-3.00   sec  98.2 MBytes   824 Mbits/sec  71140
[  5]   3.00-4.00   sec  98.2 MBytes   824 Mbits/sec  71090
[  5]   4.00-5.00   sec  98.1 MBytes   823 Mbits/sec  71050
[  5]   5.00-6.00   sec  98.1 MBytes   823 Mbits/sec  71040
[  5]   6.00-7.00   sec  98.1 MBytes   823 Mbits/sec  71050
[  5]   7.00-8.00   sec  98.2 MBytes   824 Mbits/sec  71140
[  5]   8.00-9.00   sec  98.2 MBytes   824 Mbits/sec  71120
[  5]   9.00-10.00  sec  98.3 MBytes   824 Mbits/sec  71150
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   981 MBytes   823 Mbits/sec  0.000 ms  0/710720 (0%)  sender
[  5]   0.00-10.00  sec   981 MBytes   823 Mbits/sec  0.041 ms  0/710650 (0%) receiver

Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 15:02:40 -07:00
Mohammad Athari Bin Ismail
96874c619c net: stmmac: Add HW descriptor prefetch setting for DWMAC Core 5.20 onwards
DWMAC Core 5.20 onwards supports HW descriptor prefetching.
Additionally, it also depends on platform specific RTL configuration.
This capability could be enabled by setting DMA_Mode bit-19 (DCHE).

So, to enable this cability, platform must set plat->dma_cfg->dche = true
and the DWMAC core version must be 5.20 onwards. Else, this capability
wouldn`t be configured

Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 15:02:40 -07:00
Hans Westgaard Ry
79ebfb11fe net/mlx4: Treat VFs fair when handling comm_channel_events
Handling comm_channel_event in mlx4_master_comm_channel uses a double
loop to determine which slaves have requested work. The search is
always started at lowest slave. This leads to unfairness; lower VFs
tends to be prioritized over higher VFs.

The patch uses find_next_bit to determine which slaves to handle.
Fairness is implemented by always starting at the next to the last
start.

An MPI program has been used to measure improvements. It runs 500
ibv_reg_mr, synchronizes with all other instances and then runs 500
ibv_dereg_mr.

The results running 500 processes, time reported is for running 500
calls:

ibv_reg_mr:
             Mod.   Org.
mlx4_1    403.356ms 424.674ms
mlx4_2    403.355ms 424.674ms
mlx4_3    403.354ms 424.674ms
mlx4_4    403.355ms 424.674ms
mlx4_5    403.357ms 424.677ms
mlx4_6    403.354ms 424.676ms
mlx4_7    403.357ms 424.675ms
mlx4_8    403.355ms 424.675ms

ibv_dereg_mr:
             Mod.   Org.
mlx4_1    116.408ms 142.818ms
mlx4_2    116.434ms 142.793ms
mlx4_3    116.488ms 143.247ms
mlx4_4    116.679ms 143.230ms
mlx4_5    112.017ms 107.204ms
mlx4_6    112.032ms 107.516ms
mlx4_7    112.083ms 184.195ms
mlx4_8    115.089ms 190.618ms

Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 14:59:26 -07:00
Dan Carpenter
27537929f3 bnxt_en: fix ternary sign extension bug in bnxt_show_temp()
The problem is that bnxt_show_temp() returns long but "rc" is an int
and "len" is a u32.  With ternary operations the type promotion is quite
tricky.  The negative "rc" is first promoted to u32 and then to long so
it ends up being a high positive value instead of a a negative as we
intended.

Fix this by removing the ternary.

Fixes: d69753fa1e ("bnxt_en: return proper error codes in bnxt_show_temp")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 14:56:38 -07:00
David S. Miller
9904e1ee96 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-04-22

This series contains updates to virtchnl header file, ice, and iavf
drivers.

Vignesh adds support to warn about potentially malicious VFs; those that
are overflowing the mailbox for the ice driver.

Michal adds support for an allowlist/denylist of VF commands based on
supported capabilities for the ice driver.

Brett adds support for iavf UDP segmentation offload by adding the
capability bit to virtchnl, advertising support in the ice driver, and
enabling it in the iavf driver. He also adds a helper function for
getting the VF VSI for ice.

Colin Ian King removes an unneeded pointer assignment.

Qi enables support in the ice driver to support virtchnl requests from
the iavf to configure its own RSS input set. This includes adding new
capability bits, structures, and commands to virtchnl header file.

Haiyue enables configuring RSS flow hash via ethtool to support TCP, UDP
and SCTP protocols in iavf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 13:57:21 -07:00
Arnd Bergmann
3197a98c70 vxge: avoid -Wemtpy-body warnings
There are a few warnings about empty debug macros in this driver:

drivers/net/ethernet/neterion/vxge/vxge-main.c: In function 'vxge_probe':
drivers/net/ethernet/neterion/vxge/vxge-main.c:4480:76: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
 4480 |                                 "Failed in enabling SRIOV mode: %d\n", ret);

Change them to proper 'do { } while (0)' expressions to make the
code a little more robust and avoid the warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 13:27:36 -07:00
Arnd Bergmann
74c97ea3b6 net: enetc: fix link error again
A link time bug that I had fixed before has come back now that
another sub-module was added to the enetc driver:

ERROR: modpost: "enetc_ierb_register_pf" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined!

The problem is that the enetc Makefile is not actually used for
the ierb module if that is the only built-in driver in there
and everything else is a loadable module.

Fix it by always entering the directory this time, regardless
of which symbols are configured. This should reliably fix the
problem and prevent it from coming back another time.

Fixes: 112463ddbe ("net: dsa: felix: fix link error")
Fixes: e7d48e5fbf ("net: enetc: add a mini driver for the Integrated Endpoint Register Block")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 13:23:07 -07:00
Arnd Bergmann
45b102dd81 net: mana: fix PCI_HYPERV dependency
The MANA driver causes a build failure in some configurations when
it selects an unavailable symbol:

WARNING: unmet direct dependencies detected for PCI_HYPERV
  Depends on [n]: PCI [=y] && X86_64 [=y] && HYPERV [=n] && PCI_MSI [=y] && PCI_MSI_IRQ_DOMAIN [=y] && SYSFS [=y]
  Selected by [y]:
  - MICROSOFT_MANA [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROSOFT [=y] && PCI_MSI [=y] && X86_64 [=y]
drivers/pci/controller/pci-hyperv.c: In function 'hv_irq_unmask':
drivers/pci/controller/pci-hyperv.c:1217:9: error: implicit declaration of function 'hv_set_msi_entry_from_desc' [-Werror=implicit-function-declaration]
 1217 |         hv_set_msi_entry_from_desc(&params->int_entry.msi_entry, msi_desc);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~

A PCI driver should never depend on a particular host bridge
implementation in the first place, but if we have this dependency
it's better to express it as a 'depends on' rather than 'select'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 13:22:28 -07:00
Haiyue Wang
e41985f0fe iavf: Support for modifying SCTP RSS flow hashing
Provide the ability to enable SCTP RSS hashing by ethtool.

It gives users option of generating RSS hash based on the SCTP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:23 -07:00
Haiyue Wang
7b8f3f957b iavf: Support for modifying UDP RSS flow hashing
Provides the ability to enable UDP RSS hashing by ethtool.

It gives users option of generating RSS hash based on the UDP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:23 -07:00
Haiyue Wang
5ab91e0593 iavf: Support for modifying TCP RSS flow hashing
Provides the ability to enable TCP RSS hashing by ethtool.

It gives users option of generating RSS hash based on the TCP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:23 -07:00
Haiyue Wang
0aaeb4fbc8 iavf: Add framework to enable ethtool RSS config
Add the virtchnl message interface to VF, so that VF can request RSS
input set(s) based on PF's capability.

This framework allows ethtool RSS config support on the VF driver.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:23 -07:00
Qi Zhang
ddd1f3cfed ice: Support RSS configure removal for AVF
Add the handler for virtchnl message VIRTCHNL_OP_DEL_RSS_CFG to remove
an existing RSS configuration with matching hashed fields.

Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Co-developed-by: Jia Guo <jia.guo@intel.com>
Signed-off-by: Jia Guo <jia.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Bo Chen <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Qi Zhang
222a8ab016 ice: Enable RSS configure for AVF
Currently, RSS hash input is not available to AVF by ethtool, it is set
by the PF directly.

Add the RSS configure support for AVF through new virtchnl message, and
define the capability flag VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF to query this
new RSS offload support.

Signed-off-by: Jia Guo <jia.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Bo Chen <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Brett Creeley
c5afbe99b7 ice: Add helper function to get the VF's VSI
Currently, the driver gets the VF's VSI by using a long string of
dereferences (i.e. vf->pf->vsi[vf->lan_vsi_idx]). If the method to get
the VF's VSI were to change the driver would have to change it in every
location. Fix this by adding the helper ice_get_vf_vsi().

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Colin Ian King
c9b5f681fe ice: remove redundant assignment to pointer vsi
Pointer vsi is being re-assigned a value that is never read,
the assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Brett Creeley
c91a4f9feb iavf: add support for UDP Segmentation Offload
Add code to support UDP segmentation offload (USO) for
hardware that supports it.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Brett Creeley
142da08c4d ice: Advertise virtchnl UDP segmentation offload capability
As the hardware is capable of supporting UDP segmentation offload, add a
capability bit to virtchnl.h to communicate this and have the driver
advertise its support.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Michal Swiatkowski
c0dcaa55f9 ice: Allow ignoring opcodes on specific VF
Declare bitmap of allowed commands on VF. Initialize default
opcodes list that should be always supported. Declare array of
supported opcodes for each caps used in virtchnl code.

Change allowed bitmap by setting or clearing corresponding
bit to allowlist (bit set) or denylist (bit clear).

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Vignesh Sridhar
0891c89674 ice: warn about potentially malicious VFs
Attempt to detect malicious VFs and, if suspected, log the information but
keep going to allow the user to take any desired actions.

Potentially malicious VFs are identified by checking if the VFs are
transmitting too many messages via the PF-VF mailbox which could cause an
overflow of this channel resulting in denial of service. This is done by
creating a snapshot or static capture of the mailbox buffer which can be
traversed and in which the messages sent by VFs are tracked.

Co-developed-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Co-developed-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-22 09:26:22 -07:00
Adam Ford
36e69da892 net: ethernet: ravb: Fix release of refclk
The call to clk_disable_unprepare() can happen before priv is
initialized. This means moving clk_disable_unprepare out of
out_release into a new label.

Fixes: 8ef7adc6be ("net: ethernet: ravb: Enable optional refclk")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 11:17:09 -07:00
Yoshihiro Shimoda
5718458b09 net: renesas: ravb: Fix a stuck issue when a lot of frames are received
When a lot of frames were received in the short term, the driver
caused a stuck of receiving until a new frame was received. For example,
the following command from other device could cause this issue.

    $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>

The previous code always cleared the interrupt flag of RX but checks
the interrupt flags in ravb_poll(). So, ravb_poll() could not call
ravb_rx() in the next time until a new RX frame was received if
ravb_rx() returned true. To fix the issue, always calls ravb_rx()
regardless the interrupt flags condition.

Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:56:37 -07:00
Ong Boon Leong
5e6038b88a net: stmmac: fix TSO and TBS feature enabling during driver open
TSO and TBS cannot co-exist and current implementation requires two
fixes:

 1) stmmac_open() does not need to call stmmac_enable_tbs() because
    the MAC is reset in stmmac_init_dma_engine() anyway.
 2) Inside stmmac_hw_setup(), we should call stmmac_enable_tso() for
    TX Q that is _not_ configured for TBS.

Fixes: 579a25a854 ("net: stmmac: Initial support for TBS")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:51:26 -07:00
Ong Boon Leong
17cb00704c stmmac: intel: set TSO/TBS TX Queues default settings
TSO and TBS cannot coexist, for now we set Intel mGbE controller to use
below TX Queue mapping: TxQ0 uses TSO and the rest of TXQs supports TBS.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:49:02 -07:00
Dan Carpenter
53e35ebb9a stmmac: intel: unlock on error path in intel_crosststamp()
We recently added some new locking to this function but one error path
was overlooked.  We need to drop the lock before returning.

Fixes: f4da56529d ("net: stmmac: Add support for external trigger timestamping")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:27:55 -07:00
Yinjun Zhang
90b669d65d nfp: devlink: initialize the devlink port attribute "lanes"
The number of lanes of devlink port should be correctly initialized
when registering the port, so that the input check when running
"devlink port split <port> count <N>" can pass.

Fixes: a21cf0a833 ("devlink: Add a new devlink port lanes attribute and pass to netlink")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:26:35 -07:00
Srujana Challa
2e2ee4cd0a octeontx2-af: Add mailbox for CPT stats
Adds a new mailbox to get CPT stats, includes performance
counters, CPT engines status and RXC status.

Signed-off-by: Narayana Prasad Raju Atherya <pathreya@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:23:17 -07:00
Srujana Challa
ecad2ce8c4 octeontx2-af: cn10k: Add mailbox to configure reassembly timeout
CN10K CPT coprocessor includes a component named RXC which
is responsible for reassembly of inner IP packets. RXC has
the feature to evict oldest entries based on age/threshold.
This patch adds a new mailbox to configure reassembly age
or threshold.

Signed-off-by: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:23:17 -07:00
Srujana Challa
e4bbc5c53a octeontx2-af: cn10k: Mailbox changes for CN10K CPT
Adds changes to existing CPT mailbox messages to support
CN10K CPT block. This patch also adds new register defines
for CN10K CPT.

Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21 10:23:17 -07:00
Colin Ian King
d83b8aa520 net: davinci_emac: Fix incorrect masking of tx and rx error channel
The bit-masks used for the TXERRCH and RXERRCH (tx and rx error channels)
are incorrect and always lead to a zero result. The mask values are
currently the incorrect post-right shifted values, fix this by setting
them to the currect values.

(I double checked these against the TMS320TCI6482 data sheet, section
5.30, page 127 to ensure I had the correct mask values for the TXERRCH
and RXERRCH fields in the MACSTATUS register).

Addresses-Coverity: ("Operands don't affect result")
Fixes: a6286ee630 ("net: Add TI DaVinci EMAC driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 17:12:18 -07:00
Vadym Kochan
333980481b net: marvell: prestera: fix port event handling on init
For some reason there might be a crash during ports creation if port
events are handling at the same time  because fw may send initial
port event with down state.

The crash points to cancel_delayed_work() which is called when port went
is down.  Currently I did not find out the real cause of the issue, so
fixed it by cancel port stats work only if previous port's state was up
& runnig.

The following is the crash which can be triggered:

[   28.311104] Unable to handle kernel paging request at virtual address
000071775f776600
[   28.319097] Mem abort info:
[   28.321914]   ESR = 0x96000004
[   28.324996]   EC = 0x25: DABT (current EL), IL = 32 bits
[   28.330350]   SET = 0, FnV = 0
[   28.333430]   EA = 0, S1PTW = 0
[   28.336597] Data abort info:
[   28.339499]   ISV = 0, ISS = 0x00000004
[   28.343362]   CM = 0, WnR = 0
[   28.346354] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100bf7000
[   28.352842] [000071775f776600] pgd=0000000000000000,
p4d=0000000000000000
[   28.359695] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   28.365310] Modules linked in: prestera_pci(+) prestera
uio_pdrv_genirq
[   28.372005] CPU: 0 PID: 1291 Comm: kworker/0:1H Not tainted
5.11.0-rc4 #1
[   28.378846] Hardware name: DNI AmazonGo1 A7040 board (DT)
[   28.384283] Workqueue: prestera_fw_wq prestera_fw_evt_work_fn
[prestera_pci]
[   28.391413] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
[   28.397468] pc : get_work_pool+0x48/0x60
[   28.401442] lr : try_to_grab_pending+0x6c/0x1b0
[   28.406018] sp : ffff80001391bc60
[   28.409358] x29: ffff80001391bc60 x28: 0000000000000000
[   28.414725] x27: ffff000104fc8b40 x26: ffff80001127de88
[   28.420089] x25: 0000000000000000 x24: ffff000106119760
[   28.425452] x23: ffff00010775dd60 x22: ffff00010567e000
[   28.430814] x21: 0000000000000000 x20: ffff80001391bcb0
[   28.436175] x19: ffff00010775deb8 x18: 00000000000000c0
[   28.441537] x17: 0000000000000000 x16: 000000008d9b0e88
[   28.446898] x15: 0000000000000001 x14: 00000000000002ba
[   28.452261] x13: 80a3002c00000002 x12: 00000000000005f4
[   28.457622] x11: 0000000000000030 x10: 000000000000000c
[   28.462985] x9 : 000000000000000c x8 : 0000000000000030
[   28.468346] x7 : ffff800014400000 x6 : ffff000106119758
[   28.473708] x5 : 0000000000000003 x4 : ffff00010775dc60
[   28.479068] x3 : 0000000000000000 x2 : 0000000000000060
[   28.484429] x1 : 000071775f776600 x0 : ffff00010775deb8
[   28.489791] Call trace:
[   28.492259]  get_work_pool+0x48/0x60
[   28.495874]  cancel_delayed_work+0x38/0xb0
[   28.500011]  prestera_port_handle_event+0x90/0xa0 [prestera]
[   28.505743]  prestera_evt_recv+0x98/0xe0 [prestera]
[   28.510683]  prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci]
[   28.516660]  process_one_work+0x1e8/0x360
[   28.520710]  worker_thread+0x44/0x480
[   28.524412]  kthread+0x154/0x160
[   28.527670]  ret_from_fork+0x10/0x38
[   28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020)
[   28.537429] ---[ end trace 5eced933df3a080b ]---

Fixes: 501ef3066c ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
Signed-off-by: Vadym Kochan <vkochan@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 17:11:30 -07:00
Colin Ian King
55cdc26a91 net: mana: remove redundant initialization of variable err
The variable err is being initialized with a value that is
never read and it is being updated later with a new value.  The
initialization is redundant and can be removed

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 17:05:19 -07:00
Edward Cree
172e269edf sfc: ef10: fix TX queue lookup in TX event handling
We're starting from a TXQ label, not a TXQ type, so
 efx_channel_get_tx_queue() is inappropriate.  This worked by chance,
 because labels and types currently match on EF10, but we shouldn't
 rely on that.

Fixes: 12804793b1 ("sfc: decouple TXQ type from label")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 17:03:53 -07:00
Edward Cree
83b09a1807 sfc: farch: fix TX queue lookup in TX event handling
We're starting from a TXQ label, not a TXQ type, so
 efx_channel_get_tx_queue() is inappropriate (and could return NULL,
 leading to panics).

Fixes: 12804793b1 ("sfc: decouple TXQ type from label")
Cc: stable@vger.kernel.org
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 17:03:53 -07:00
Edward Cree
5b1faa9228 sfc: farch: fix TX queue lookup in TX flush done handling
We're starting from a TXQ instance number ('qid'), not a TXQ type, so
 efx_get_tx_queue() is inappropriate (and could return NULL, leading
 to panics).

Fixes: 12804793b1 ("sfc: decouple TXQ type from label")
Reported-by: Trevor Hemsley <themsley@voiceflex.com>
Cc: stable@vger.kernel.org
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 17:03:53 -07:00
Michael Walle
1b8caefaf4 net: enetc: automatically select IERB module
Now that enetc supports flow control we have to make sure the settings in
the IERB are correct. Therefore, we actually depend on the enetc-ierb
module. Previously it was possible that this module was disabled while the
enetc was enabled. Fix it by automatically select the enetc-ierb module.

Fixes: e7d48e5fbf ("net: enetc: add a mini driver for the Integrated Endpoint Register Block")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:56:32 -07:00
Petr Machata
7de85b0431 mlxsw: spectrum_qdisc: Index future FIFOs by band number
mlxsw used to hold an array of qdiscs indexed by the TC number. In the
previous patch, it was changed to allocate child qdiscs dynamically, and
they are now indexed by band number. Follow suit with the array of future
FIFOs.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
5cbd960253 mlxsw: spectrum_qdisc: Allocate child qdiscs dynamically
Instead of keeping qdiscs in globally-preallocated arrays, introduce a
per-qdisc-kind value num_classes, and then allocate the necessary child
qdiscs (if any) based on that value. Since now dynamic allocation is
involved, mlxsw_sp_qdisc_replace() gets messy enough that it is worth it to
split it to two cases: a new qdisc allocation and a change of existing
qdisc. (Note that the change also includes what TC formally calls replace,
if the qdisc kind is the same.)

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
cff99e2045 mlxsw: spectrum_qdisc: Guard all qdisc accesses with a lock
The FIFO handler currently guards accesses to the future FIFO tracking by
asserting RTNL. In the future, the changes to the qdisc state will be more
thorough, so other qdiscs will need this guarding is as well. In order
to not further the RTNL infestation, instead convert to a custom lock that
will guard accesses to the qdisc state.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
51d52ed955 mlxsw: spectrum_qdisc: Track children per qdisc
mlxsw currently allows a two-level structure of qdiscs: the root and
possibly a number of children. In order to support offloading more general
qdisc trees, introduce to struct mlxsw_sp_qdisc a pointer to child qdiscs.
Refer to the child qdiscs through this pointer, instead of going through
the tclass_qdiscs in qdisc_state. Additionally introduce a field
num_classes, which holds number of given qdisc's children.

Also introduce a generic function for walking qdisc trees. Rewrite
mlxsw_sp_qdisc_find() and _find_by_handle() to use the generic walker.

For now, keep the qdisc_state.tclass_qdisc, and just point root_qdiscs's
children to this array. Following patches will make the allocation dynamic.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
b21832b568 mlxsw: spectrum_qdisc: Promote backlog reduction to mlxsw_sp_qdisc_destroy()
When a qdisc is removed, it is necessary to update the backlog value at its
parent--unless the qdisc is at root position. RED, TBF and FIFO all do
that, each separately. Since all of them need to do this, just promote the
operation directly to mlxsw_sp_qdisc_destroy(), instead of deferring it to
individual destructors. Since FIFO dtor thus becomes trivial, remove it.

Add struct mlxsw_sp_qdisc.parent to point at the parent qdisc. This will be
handy later as deeper structures are offloaded. Use the parent qdisc to
find the chain of parents whose backlog value needs to be updated.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
017a131cde mlxsw: spectrum_qdisc: Track tclass_num as int, not u8
tclass_num is just a number, a value that would be ordinarily passed around
as an int. (Which is unlike a u8 prio_bitmap.) In several places,
tclass_num already is an int. Convert the remaining instances.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
549f2aae84 mlxsw: spectrum_qdisc: Drop an always-true condition
The function mlxsw_sp_qdisc_compare() is invoked a couple lines above this
check, which will bounce any requests where this condition does not hold.
Therefore drop it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
290fe2c595 mlxsw: spectrum_qdisc: Simplify mlxsw_sp_qdisc_compare()
The purpose of this function is to filter out events that are related to
qdiscs that are not offloaded, or are not offloaded anymore. But the
function is unnecessarily thorough:

- mlxsw_sp_qdisc pointer is never NULL in the context where it is called
- Two qdiscs with the same handle will never have different types. Even
  when replacing one qdisc with another in the same class, Linux will not
  permit handle reuse unless the qdisc type also matches.

Simplify the function by omitting these two unnecessary conditions.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
Petr Machata
17c0e6d175 mlxsw: spectrum_qdisc: Drop one argument from check_params callback
The mlxsw_sp_qdisc argument is not used in any of the actual callbacks.
Drop it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:43:13 -07:00
David S. Miller
790aad0ecc korina: Fix build.
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:40:08 -07:00
David S. Miller
56e2e5de44 korina: Fix conflict with global symbol desc_empty on x86.
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:24:26 -07:00
David S. Miller
ff254dad0e mlx5-updates-2021-04-19
This patchset provides some updates to mlx5e and mlx5 SW steering drivers:
 
 1) Tariq and Vladyslav they both provide some trivial update to mlx5e netdev.
 
 The next 12 patches in the patchset are focused toward mlx5 SW steering:
 2) 3 trivial cleanup patches
 
 3) Dynamic Flex parser support:
    Flex parser is a HW parser that can support protocols that are not
     natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
     There are 8 such parsers, and each of them can be assigned to parse a
     specific set of protocols.
 
 4) Enable matching on Geneve TLV options
 
 5) Use Flex parser for MPLS over UDP/GRE
 
 6) Enable matching on tunnel GTP-U and GTP-U first extension
    header using
 
 7) Improved QoS for SW steering internal QPair for a better insertion rate
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB+R90ACgkQSD+KveBX
 +j5UKAf+OODHHlYUxp3k4uSRGPNHuVZsUw4DjoCY6b0E4uMup9bP0YF7/B1I8bpC
 xTbVK9SzYTVOt0pxBu3aJ1Qom5hpJt5iT7QG9m5LlhEn/ZD3KqpnenGuDMIlyOa5
 EvLIdeOoWxJ+7Za6pULy4hsbUcu8hupsBBN+poC3dN4akQu1NyvFE4mdHVTP/c7n
 DB0mZWskoDyXm1dQiZ4+cDWoltrrpFLo5n7N08QbS+AvJ7jsRrT5myBU4IPMEfP6
 peRecTpKZEOwBwTzxi41ao5XZYnKTROD3zax30v6DXxw5K41SQKwvCjgvpc2/V1J
 jymdJzdYa17mxu5XMC0aaQoFo2VBDg==
 =wwzX
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-19

This patchset provides some updates to mlx5e and mlx5 SW steering drivers:

1) Tariq and Vladyslav they both provide some trivial update to mlx5e netdev.

The next 12 patches in the patchset are focused toward mlx5 SW steering:
2) 3 trivial cleanup patches

3) Dynamic Flex parser support:
   Flex parser is a HW parser that can support protocols that are not
    natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
    There are 8 such parsers, and each of them can be assigned to parse a
    specific set of protocols.

4) Enable matching on Geneve TLV options

5) Use Flex parser for MPLS over UDP/GRE

6) Enable matching on tunnel GTP-U and GTP-U first extension
   header using

7) Improved QoS for SW steering internal QPair for a better insertion rate
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:14:02 -07:00
Oleksij Rempel
b62a12fc04 net: ag71xx: make use of generic NET_SELFTESTS library
With this patch the ag71xx on Atheros AR9331 will able to run generic net
selftests.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:08:02 -07:00
Oleksij Rempel
6016ba345f net: fec: make use of generic NET_SELFTESTS library
With this patch FEC on iMX will able to run generic net selftests

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20 16:08:02 -07:00
Parav Pandit
dedbc2d358 IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
Currently when GID is deleted, it zero out all the fields of the RoCE
address in the SET_ROCE_ADDRESS command for a specified index.

roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command.

This assumes that device has RoCEv1 always enabled which is not always
correct. For example Subfunction does not support RoCEv1.

Due to this assumption a previously added RoCEv2 GID is always deleted as
RoCEv1 GID. This results in a below syndrome:

   mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x12822d)

Hence set the right RoCE version during GID deletion provided by the core.

Link: https://lore.kernel.org/r/d3f54129c90ca329caf438dbe31875d8ad08d91a.1618753425.git.leonro@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-20 09:41:10 -03:00
Yevgeny Kliteynik
aeacb52a8d net/mlx5: DR, Add support for isolate_vl_tc QP
When using SW steering, rule insertion rate depends on the RDMA RC QP
performance used for writing to the ICM. During stress this QP is competing
on the HW resources with all the other QPs that are used to send data.
To protect SW steering QP's performance in such cases, we set this QP to
use isolated VL. The VL number is reserved by FW and is not exposed to the
driver.
Support for this QP on isolated VL exists only when both force-loopback and
isolate_vl_tc capabilities are set.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:46 -07:00
Yevgeny Kliteynik
7304d603a5 net/mlx5: DR, Add support for force-loopback QP
When supported by the device, SW steering RoCE RC QP that is used to
write/read to/from ICM will be created with force-loopback attribute.
Such QP doesn't require GID index upon creation.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:43 -07:00
Yevgeny Kliteynik
df9dd15ae1 net/mlx5: DR, Add support for matching tunnel GTP-U
Enable matching on tunnel GTP-U and GTP-U first extension
header using dynamic flex parser.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:40 -07:00
Yevgeny Kliteynik
35ba005d82 net/mlx5: DR, Set flex parser for TNL_MPLS dynamically
Query the flex_parser id that's intended for TNL_MPLS
and use an appropriate flex parser for MPLS over UDP/GRE.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:37 -07:00
Yevgeny Kliteynik
3442e0335e net/mlx5: DR, Add support for matching on geneve TLV option
Enable matching on tunnel geneve TLV option using the flex parser.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:34 -07:00
Yevgeny Kliteynik
4923938d2f net/mlx5: DR, Set STEv0 ICMP flex parser dynamically
Set the flex parser ID dynamicly for ICMP instead of relying
on hardcoded values.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:31 -07:00
Yevgeny Kliteynik
160e9cb37a net/mlx5: DR, Add support for dynamic flex parser
Flex parser is a HW parser that can support protocols that are not
natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
There are 8 such parsers, and each of them can be assigned to parse a
specific set of protocols.
This patch adds misc4 match params which allows using a correct flex parser
that was programmed to the required protocol.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:28 -07:00
Muhammad Sammar
323b91acc1 net/mlx5: DR, Remove protocol-specific flex_parser_3 definitions
Remove MPLS specific fields from flex parser 3 layout.
Flex parser can be used for multiple protocols and should
not be hardcoded to a specific type.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:24 -07:00
Yevgeny Kliteynik
25cb317680 net/mlx5: E-Switch, Improve error messages in term table creation
Add error code to the error messages and removed duplicated message:
if termination table creation failed, we already get an error message
in mlx5_eswitch_termtbl_create, so no need for the additional error print
in the calling function.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:18 -07:00
Yevgeny Kliteynik
ff1925bb0d net/mlx5: DR, Fix SQ/RQ in doorbell bitmask
QP doorbell size is 16 bits.
Fixing sw steering's QP doorbel bitmask, which had 20 bits.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:15 -07:00
Yevgeny Kliteynik
7d22ad732d net/mlx5: DR, Rename an argument in dr_rdma_segments
Rename the argument to better reflect that the meaning is
not number of records, but wheather or not we should
ring the dorbell.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:12 -07:00
Tariq Toukan
6980ffa0c5 net/mlx5e: RX, Add checks for calculated Striding RQ attributes
Striding RQ attributes below are mutually dependent. An unaware
change to one might take the others out of the valid range derived
by the HW caps:
- The MPWQE size in bytes
- The number of strides in a MPWQE
- The stride size

Add checks to verify they are valid and comply to the HW spec
and SW assumptions/requirements.
This is not a fix, no particular issue exists today.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:09 -07:00
Vladyslav Tarasiuk
6a5689ba02 net/mlx5e: Fix possible non-initialized struct usage
If mlx5e_devlink_port_register() fails, driver may try to register
devlink health TX and RX reporters on non-registered devlink port.

Instead, create health reporters only if mlx5e_devlink_port_register()
does not fail. And destroy reporters only if devlink_port is registered.

Also, change mlx5e_get_devlink_port() behavior and return NULL in case
port is not registered to replicate devlink's wrapper when ndo is not
implemented.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:06 -07:00
Tariq Toukan
d408c01cae net/mlx5e: Fix lost changes during code movements
The changes done in commit [1] were missed by the code movements
done in [2], as they were developed in ~parallel.
Here we re-apply them.

[1] commit e4484d9df5 ("net/mlx5e: Enable striding RQ for Connect-X IPsec capable devices")
[2] commit b3a131c2a1 ("net/mlx5e: Move params logic into its dedicated file")

Fixes: b3a131c2a1 ("net/mlx5e: Move params logic into its dedicated file")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:03 -07:00
Jakub Kicinski
37434782d6 bnxt: add more ethtool standard stats
Michael suggest a few more stats we can expose.

$ ethtool -S eth0 --groups eth-mac
Standard stats for eth0:
eth-mac-FramesTransmittedOK: 902623288966
eth-mac-FramesReceivedOK: 28727667047
eth-mac-FrameCheckSequenceErrors: 1
eth-mac-AlignmentErrors: 0
eth-mac-OutOfRangeLengthField: 0
$ ethtool -S eth0 | grep '\(fcs\|align\|oor\)'
     rx_fcs_err_frames: 1
     rx_align_err_frames: 0
     tx_fcs_err_frames: 0

Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 16:22:23 -07:00
Vadym Kochan
ced97eea39 net: marvell: prestera: add support for AC3X 98DX3265 device
Add PCI match for AC3X 98DX3265 device which is supported by the current
driver and firmware.

Signed-off-by: Vadym Kochan <vkochan@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 16:13:58 -07:00
Wong Vee Khee
d7f576dc98 net: stmmac: fix memory leak during driver probe
On driver probe, kmemleak reported the following memory leak which was
due to allocated bitmap that was not being freed in stmmac_dvr_probe().

unreferenced object 0xffff9276014b13c0 (size 8):
  comm "systemd-udevd", pid 2143, jiffies 4294681112 (age 116.720s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<00000000c51e34b2>] stmmac_dvr_probe+0x1c0/0x440 [stmmac]
    [<00000000b530eb41>] intel_eth_pci_probe.cold+0x2b/0x14e [dwmac_intel]
    [<00000000b10f8929>] pci_device_probe+0xd2/0x150
    [<00000000fb254c74>] really_probe+0xf8/0x410
    [<0000000034128a59>] driver_probe_device+0x5d/0x150
    [<00000000016104d5>] device_driver_attach+0x53/0x60
    [<00000000cb18cd07>] __driver_attach+0x96/0x140
    [<00000000da9ffd5c>] bus_for_each_dev+0x7a/0xc0
    [<00000000af061a88>] bus_add_driver+0x184/0x1f0
    [<000000008be5c1c5>] driver_register+0x6c/0xc0
    [<0000000052b18a9e>] do_one_initcall+0x4d/0x210
    [<00000000154d4f07>] do_init_module+0x5c/0x230
    [<000000009b648d09>] load_module+0x2a5a/0x2d40
    [<000000000d86b76d>] __do_sys_finit_module+0xb5/0x120
    [<000000002b0cef95>] do_syscall_64+0x33/0x40
    [<0000000067b45bbb>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: bba2556efa ("net: stmmac: Enable RX via AF_XDP zero-copy")
Cc: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 16:02:27 -07:00
Thomas Bogendoerfer
6ef92063bf net: korina: Make driver COMPILE_TESTable
Move structs/defines for ethernet/dma register into driver, since they
are only used for this driver and remove any MIPS specific includes.
This makes it possible to COMPILE_TEST the driver.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
e4cd854ec4 net: korina: Get mdio input clock via common clock framework
With device tree clock is provided via CCF. For non device tree
use a maximum clock value to not overclock the PHY. The non device
tree usage will go away after platform is converted to DT.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
10b26f0781 net: korina: Add support for device tree
If there is no mac address passed via platform data try to get it via
device tree and fall back to a random mac address, if all fail.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
af80425e05 net: korina: Only pass mac address via platform data
Get rid of access to struct korina_device by just passing the mac
address via platform data and use drvdata for passing netdev to remove
function.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
0fc96939a9 net: korina: Use DMA API
Instead of messing with MIPS specific macros use DMA API for mapping
descriptors and skbs.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
0fe632471a net: korina: Remove nested helpers
Remove helpers, which are only used in one call site.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
e42f10533d net: korina: Remove not needed cache flushes
Descriptors are mapped uncached so there is no need to do any cache
handling for them.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
b4cd249a8c net: korina: Use devres functions
Simplify probe/remove code by using devm_ functions.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Thomas Bogendoerfer
89f9d5400b net: korina: Fix MDIO functions
Fixed MDIO functions to work reliable and not just by accident.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:58:14 -07:00
Pablo Neira Ayuso
f5c2cb583a net: ethernet: mtk_eth_soc: handle VLAN pop action
Do not hit EOPNOTSUPP when flowtable offload provides a VLAN pop action.

Fixes: efce49dfe6 ("netfilter: flowtable: add vlan pop action offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:55:05 -07:00
Pablo Neira Ayuso
014d029876 net: ethernet: mtk_eth_soc: missing mutex
Patch 2ed37183ab ("netfilter: flowtable: separate replace, destroy and
stats to different workqueues") splits the workqueue per event type. Add
a mutex to serialize updates.

Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:55:05 -07:00
Pablo Neira Ayuso
0e389028ad net: ethernet: mtk_eth_soc: fix undefined reference to `dsa_port_from_netdev'
Caused by:

 CONFIG_NET_DSA=m
 CONFIG_NET_MEDIATEK_SOC=y

mtk_ppe_offload.c:undefined reference to `dsa_port_from_netdev'

Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:55:04 -07:00
Linus Walleij
8d892d6094 net: ethernet: ixp4xx: Set the DMA masks explicitly
The former fix only papered over the actual problem: the
ethernet core expects the netdev .dev member to have the
proper DMA masks set, or there will be BUG_ON() triggered
in kernel/dma/mapping.c.

Fix this by simply copying dma_mask and dma_mask_coherent
from the parent device.

Fixes: e45d0fad4a ("net: ethernet: ixp4xx: Use parent dev for DMA pool")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:47:56 -07:00
DENG Qingfang
6ecaf81d4a net: ethernet: mediatek: fix a typo bug in flow offloading
Issue was traffic problems after a while with increased ping times if
flow offload is active. It turns out that key_offset with cookie is
needed in rhashtable_params but was re-assigned to head_offset.
Fix the assignment.

Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:39:52 -07:00
Huazhong Tan
e407efdd94 net: hns3: change the value of the SEPARATOR_VALUE macro in hclgevf_main.c
The SEPARATOR_VALUE macro is used as separator when getting
the register value, but the value of this macro is different
between pf and vf, it is a bit confusing for the user, so
synchronize the value of vf with pf.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:38:05 -07:00
Huazhong Tan
8ed64dbe0b net: hns3: cleanup inappropriate spaces in struct hlcgevf_tqp_stats
Modify some inappropriate spaces in comments of struct
hlcgevf_tqp_stats.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:38:05 -07:00
Huazhong Tan
1c5a2ba679 net: hns3: remove a duplicate pf reset counting
When enter suspend mode the counter of pf reset will be increased
twice, since both hclge_prepare_general() and hclge_prepare_wait()
increase this counter. So remove the duplicate counting in
hclge_prepare_general().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:38:05 -07:00
Randy Dunlap
46fd447161 net: xilinx: drivers need/depend on HAS_IOMEM
kernel test robot reports build errors in 3 Xilinx ethernet drivers.
They all use ioremap functions that are only available when HAS_IOMEM
is set/enabled. If it is not enabled, they all have build errors,
so make these 3 drivers depend on HAS_IOMEM.

ld: drivers/net/ethernet/xilinx/xilinx_emaclite.o: in function `xemaclite_of_probe':
xilinx_emaclite.c:(.text+0x9fc): undefined reference to `devm_ioremap_resource'

ld: drivers/net/ethernet/xilinx/xilinx_axienet_main.o: in function `axienet_probe':
xilinx_axienet_main.c:(.text+0x942): undefined reference to `devm_ioremap_resource'

ld: drivers/net/ethernet/xilinx/ll_temac_main.o: in function `temac_probe':
ll_temac_main.c:(.text+0x1283): undefined reference to `devm_platform_ioremap_resource_byname'
ld: ll_temac_main.c:(.text+0x13ad): undefined reference to `devm_of_iomap'
ld: ll_temac_main.c:(.text+0x162e): undefined reference to `devm_platform_ioremap_resource'

Fixes: 8a3b7a252d ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Zhang Changzhong <zhangchangzhong@huawei.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: stable@vger.kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:36:00 -07:00
Vladimir Oltean
a864888788 net: enetc: add support for flow control
In the ENETC receive path, a frame received by the MAC is first stored
in a 256KB 'FIFO' memory, then transferred to DRAM when enqueuing it to
the RX ring. The FIFO is a shared resource for all ENETC ports, but
every port keeps track of its own memory utilization, on RX and on TX.

There is a setting for RX rings through which they can either operate in
'lossy' mode (where the lack of a free buffer causes an immediate
discard of the frame) or in 'lossless' mode (where the lack of a free
buffer in the ring makes the frame stay longer in the FIFO).

In turn, when the memory utilization of the FIFO exceeds a certain
margin, the MAC can be configured to emit PAUSE frames.

There is enough FIFO memory to buffer up to 3 MTU-sized frames per RX
port while not jeopardizing the other use cases (jumbo frames), and
also not consume bytes from the port TX allocations. Also, 3 MTU-sized
frames worth of memory is enough to ensure zero loss for 64 byte packets
at 1G line rate.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:31:45 -07:00
Vladimir Oltean
e7d48e5fbf net: enetc: add a mini driver for the Integrated Endpoint Register Block
The NXP ENETC is a 4-port Ethernet controller which 'smells' to
operating systems like 4 distinct PCIe PFs with SR-IOV, each PF having
its own driver instance, but in fact there are some hardware resources
which are shared between all ports, like for example the 256 KB SRAM
FIFO between the MACs and the Host Transfer Agent which DMAs frames to
DRAM.

To hide the stuff that cannot be neatly exposed per port, the hardware
designers came up with this idea of having a dedicated register block
which is supposed to be populated by the bootloader, and contains
everything configuration-related: MAC addresses, FIFO partitioning, etc.

When a port is reset using PCIe Function Level Reset, its defaults are
transferred from the IERB configuration. Most of the time, the settings
made through the IERB are read-only in the port's memory space (if they
are even visible), so they cannot be modified at runtime.

Linux doesn't have any advanced FIFO partitioning requirements at all,
but when reading through the hardware manual, it became clear that, even
though there are many good 'recommendations' for default values, many of
them were not actually put in practice on LS1028A. So we end up with a
default configuration that:

(a) does not have enough TX and RX byte credits to support the max MTU
    of 9600 (which the Linux driver claims already) properly (at full speed)
(b) allows the FIFO to be overrun with RX traffic, potentially
    overwriting internal data structures.

The last part sounds a bit catastrophic, but it isn't. Frames are
supposed to transit the FIFO for a very short time, but they can
actually accumulate there under 2 conditions:

(a) there is very severe congestion on DRAM memory, or
(b) the RX rings visible to the operating system were configured for
    lossless operation, and they just ran out of free buffers to copy
    the frame to. This is what is used to put backpressure onto the MAC
    with flow control.

So since ENETC has not supported flow control thus far, RX FIFO overruns
were never seen with Linux. But with the addition of flow control, we
should configure some registers to prevent this from happening. What we
are trying to protect against are bad actors which continue to send us
traffic despite the fact that we have signaled a PAUSE condition. Of
course we can't be lossless in that case, but it is best to configure
the FIFO to do tail dropping rather than letting it overrun.

So in a nutshell, this driver is a fixup for all the IERB default values
that should have been but aren't.

The IERB configuration needs to be done _before_ the PFs are enabled.
So every PF searches for the presence of the "fsl,ls1028a-enetc-ierb"
node in the device tree, and if it finds it, it "registers" with the
IERB, which means that it requests the IERB to fix up its default
values. This is done through -EPROBE_DEFER. The IERB driver is part of
the fsl_enetc module, but is technically a platform driver, since the
IERB is a good old fashioned MMIO region, as opposed to ENETC ports
which pretend to be PCIe devices.

The driver was already configuring ENETC_PTXMBAR (FIFO allocation for
TX) because due to an omission, TXMBAR is a read/write register in the
PF memory space. But the manual is quite clear that the formula for this
should depend upon the TX byte credits (TXBCR). In turn, the TX byte
credits are only readable/writable through the IERB. So if we want to
ensure that the TXBCR register also has a value that is correct and in
line with TXMBAR, there is simply no way this can be done from the PF
driver, access to the IERB is needed.

I could have modified U-Boot to fix up the IERB values, but that is
quite undesirable, as old U-Boot versions are likely to be floating
around for quite some time from now.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:31:45 -07:00
Vladimir Oltean
87614b931c net: enetc: create a common enetc_pf_to_port helper
Even though ENETC interfaces are exposed as individual PCIe PFs with
their own driver instances, the ENETC is still fundamentally a
multi-port Ethernet controller, and some parts of the IP take a port
number (as can be seen in the PSFP implementation).

Create a common helper that can be used outside of the TSN code for
retrieving the ENETC port number based on the PF number. This is only
correct for LS1028A, the only Linux-capable instantiation of ENETC thus
far.

Note that ENETC port 3 is PF 6. The TSN code did not care about this
because ENETC port 3 does not support TSN, so the wrong mapping done by
enetc_get_port for PF 6 could have never been hit.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:31:45 -07:00
Dexuan Cui
ca9c54d2d6 net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)
Add a VF driver for Microsoft Azure Network Adapter (MANA) that will be
available in the future.

Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Co-developed-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:24:25 -07:00
Leon Romanovsky
bcf9ee0520 RDMA/bnxt_re: Create direct symbol link between bnxt modules
Convert indirect probe call to its direct equivalent to create a symbol
link between RDMA and netdev modules. This will give us an ability to
remove custom module reference counting that doesn't belong to the driver.

Link: https://lore.kernel.org/r/20210401065715.565226-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-19 14:57:03 -03:00
Heiner Kallweit
11ac4e668a r8169: keep pause settings on interface down/up cycle
Currently, if the user changes the pause settings, the default settings
will be restored after an interface down/up cycle, and also when
resuming from suspend. This doesn't seem to provide the best user
experience. Change this to keep user settings, and just ensure that in
jumbo mode pause is disabled.
Small drawback: When switching back mtu from jumbo to non-jumbo then
pause remains disabled (but user can enable it using ethtool).
I think that's a not too common scenario and acceptable.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-17 14:02:48 -07:00
Jakub Kicinski
8203c7ce4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
 - keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
 - fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-17 11:08:07 -07:00
Vladimir Oltean
24e3930971 net: enetc: apply the MDIO workaround for XDP_REDIRECT too
Described in fd5736bf9f ("enetc: Workaround for MDIO register access
issue") is a workaround for a hardware bug that requires a register
access of the MDIO controller to never happen concurrently with a
register access of a port PF. To avoid that, a mutual exclusion scheme
with rwlocks was implemented - the port PF accessors are the 'read'
side, and the MDIO accessors are the 'write' side.

When we do XDP_REDIRECT between two ENETC interfaces, all is fine
because the MDIO lock is already taken from the NAPI poll loop.

But when the ingress interface is not ENETC, just the egress is, the
MDIO lock is not taken, so we might access the port PF registers
concurrently with MDIO, which will make the link flap due to wrong
values returned from the PHY.

To avoid this, let's just slap an enetc_lock_mdio/enetc_unlock_mdio at
the beginning and ending of enetc_xdp_xmit. The fact that the MDIO lock
is designed as a rwlock is important here, because the read side is
reentrant (that is one of the main reasons why we chose it). Usually,
the way we benefit of its reentrancy is by running the data path
concurrently on both CPUs, but in this case, we benefit from the
reentrancy by taking the lock even when the lock is already taken
(and that's the situation where ENETC is both the ingress and the egress
interface for XDP_REDIRECT, which was fine before and still is fine now).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:40 -07:00
Vladimir Oltean
92ff9a6e57 net: enetc: fix buffer leaks with XDP_TX enqueue rejections
If the TX ring is congested, enetc_xdp_tx() returns false for the
current XDP frame (represented as an array of software BDs).

This array of software TX BDs is constructed in enetc_rx_swbd_to_xdp_tx_swbd
from software BDs freshly cleaned from the RX ring. The issue is that we
scrub the RX software BDs too soon, more precisely before we know that
we can enqueue the TX BDs successfully into the TX ring.

If we can't enqueue them (and enetc_xdp_tx returns false), we call
enetc_xdp_drop which attempts to recycle the buffers held by the RX
software BDs. But because we scrubbed those RX BDs already, two things
happen:

(a) we leak their memory
(b) we populate the RX software BD ring with an all-zero rx_swbd
    structure, which makes the buffer refill path allocate more memory.

enetc_refill_rx_ring
-> if (unlikely(!rx_swbd->page))
   -> enetc_new_page

That is a recipe for fast OOM.

Fixes: 7ed2bc8007 ("net: enetc: add support for XDP_TX")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:40 -07:00
Vladimir Oltean
975acc833c net: enetc: handle the invalid XDP action the same way as XDP_DROP
When the XDP program returns an invalid action, we should free the RX
buffer.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:40 -07:00
Vladimir Oltean
7eab503b11 net: enetc: use dedicated TX rings for XDP
It is possible for one CPU to perform TX hashing (see netdev_pick_tx)
between the 8 ENETC TX rings, and the TX hashing to select TX queue 1.

At the same time, it is possible for the other CPU to already use TX
ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual
exclusion between XDP and the network stack, we run into an issue
because the ENETC TX procedure is not reentrant.

The obvious approach would be to just make XDP take the lock of the
network stack's TX queue corresponding to the ring it's about to enqueue
in.

For XDP_REDIRECT, this is quite straightforward, a lock at the beginning
and end of enetc_xdp_xmit() should do the trick.

But for XDP_TX, it's a bit more complicated. For one, we do TX batching
all by ourselves for frames with the XDP_TX verdict. This is something
we would like to keep the way it is, for performance reasons. But
batching means that the network stack's lock should be kept from the
first enqueued XDP_TX frame and until we ring the doorbell. That is
mostly fine, except for cases when in the same NAPI loop we have mixed
XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while
we are holding the lock from the RX NAPI, then bam, deadlock. The naive
answer could be 'just flush the XDP_TX frames first, then release the
network stack's TX queue lock, then call xdp_do_flush_map()'. But even
xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT
frames, so unless we unlock/relock the TX queue around xdp_do_redirect(),
there simply isn't any clean way to protect XDP_TX from concurrent
network stack .ndo_start_xmit() on another CPU.

So we need to take a different approach, and that is to reserve two
rings for the sole use of XDP. We leave TX rings
0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we
pick them from the end of the priv->tx_ring array.

We make an effort to keep the mapping done by enetc_alloc_msix() which
decides which CPU handles the TX completions of which TX ring in its
NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the
XDP TX ring of CPU 1 is handled by TX ring 7.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:40 -07:00
Vladimir Oltean
ee3e875f10 net: enetc: increase TX ring size
Now that commit d6a2829e82 ("net: enetc: increase RX ring default
size") has increased the RX ring size, it is quite easy to congest the
TX rings when the traffic is predominantly XDP_TX, as the RX ring is
quite a bit larger than the TX one.

Since we bit the bullet and did the expensive thing already (larger RX
rings consume more memory pages), it seems quite foolish to keep the TX
rings small. So make them equally sized with TX.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:39 -07:00
Vladimir Oltean
a6369fe6e0 net: enetc: remove unneeded xdp_do_flush_map()
xdp_do_redirect already contains:
-> dev_map_enqueue
   -> __xdp_enqueue
      -> bq_enqueue
         -> bq_xmit_all // if we have more than 16 frames

So the logic from enetc will never be hit, because ENETC_DEFAULT_TX_WORK
is 128.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:39 -07:00
Vladimir Oltean
8f50d8bb3f net: enetc: stop XDP NAPI processing when build_skb() fails
When the code path below fails:

enetc_clean_rx_ring_xdp // XDP_PASS
-> enetc_build_skb
   -> enetc_map_rx_buff_to_skb
      -> build_skb

enetc_clean_rx_ring_xdp will 'break', but that 'break' instruction isn't
strong enough to actually break the NAPI poll loop, just the switch/case
statement for XDP actions. So we increment rx_frm_cnt and go to the next
frames minding our own business.

Instead let's do what the skb NAPI poll function does, and break the
loop now, waiting for the memory pressure to go away. Otherwise the next
calls to build_skb() are likely to fail too.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:39 -07:00
Vladimir Oltean
672f9a2198 net: enetc: recycle buffers for frames with RX errors
When receiving a frame with errors, currently we do nothing with it (we
don't construct an skb or an xdp_buff), we just exit the NAPI poll loop.

Let's put the buffer back into the RX ring (similar to XDP_DROP).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:39 -07:00
Vladimir Oltean
6b04830d5e net: enetc: rename the buffer reuse helpers
enetc_put_xdp_buff has nothing to do with XDP, frankly, it is just a
helper to populate the recycle end of the shadow RX BD ring
(next_to_alloc) with a given buffer.

On the other hand, enetc_put_rx_buff plays more tricks than its name
would suggest.

So let's rename enetc_put_rx_buff into enetc_flip_rx_buff to reflect the
half-page buffer reuse tricks that it employs, and enetc_put_xdp_buff
into enetc_put_rx_buff which suggests a more garden-variety operation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:39 -07:00
Vladimir Oltean
e9e49ae88e net: enetc: remove redundant clearing of skb/xdp_frame pointer in TX conf path
Later in enetc_clean_tx_ring we have:

		/* Scrub the swbd here so we don't have to do that
		 * when we reuse it during xmit
		 */
		memset(tx_swbd, 0, sizeof(*tx_swbd));

So these assignments are unnecessary.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:08:39 -07:00
David S. Miller
bc45f524d9 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-04-16

This series contains updates to igb and igc drivers.

Ederson adjusts Tx buffer distributions in Qav mode to improve
TSN-aware traffic for igb. He also enable PPS support and auxiliary PHC
functions for igc.

Grzegorz checks that the MTA register was properly written and
retries if not for igb.

Sasha adds reporting of EEE low power idle counters to ethtool and fixes
a return value being overwritten through looping for igc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 17:06:14 -07:00
Jakub Kicinski
b572ec9ff0 mlx5: implement ethtool standard stats
Add support for PHY/MAC/Ctrl/RMON stats.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:47 -07:00
Jakub Kicinski
782bc00aff bnxt: implement ethtool standard stats
Most of the names seem to strongly correlate with names from
the standard and RFC. Whether ..+good_frames are indeed Frames..OK
I'm the least sure of.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:20 -07:00
Jakub Kicinski
c1912ab0ee mlxsw: implement ethtool standard stats
mlxsw has nicely grouped stats, add support for standard uAPI.
I'm guessing the register access part. Compile tested only.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:59:20 -07:00
David S. Miller
03e481e88b mlx5-updates-2021-04-16
This patchset introduces updates to mlx5e netdev driver.
 
 1) Tariq refactors TLS offloads and adds resiliency against RX resync
    failures
 
 2) Maxim reduces code duplications by unifying channels reset flow
    regardless if channels are closed or open
 
 3) Aya Enhances TX/RX health reporters diagnostics to expose the
    internal clock time-stamping format
 
 4) Moshe adds support for ethtool extended link state, to show the reason
    for link down
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB53AUACgkQSD+KveBX
 +j6rzAf+JwJG9G7GSj3a/xird4dlgt4xPbRLB19pTw19ZyHZyujDxdN4QM3r5hTk
 5ua1PnhYYaUcyPFvdgR9J0cIJ3QRaxZ+q/XnkE9Yo0eZ1DJ0SL/n6rxEQpcxpee1
 XP7qjJu3leVwh5mVW2uOx/ClrL9vYb/fG3Q00j59rUB+i9bZszXZgZ99hJvYBFTB
 k7W/9X6BNxuLlEg/Ui9L499aDWHRcIY5J2ku+1v/8paJZltk+IFv5glYszylE++M
 l68drIy3dIjl/Sxj6WR2rHTBus6AIFxWFH8C2L7uqGl97BPjS80snMPIefLJhW+y
 bQvzMDtfKDmIpvEIdzHPuEhEdqqteg==
 =YCy6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-16

This patchset introduces updates to mlx5e netdev driver.

1) Tariq refactors TLS offloads and adds resiliency against RX resync
   failures

2) Maxim reduces code duplications by unifying channels reset flow
   regardless if channels are closed or open

3) Aya Enhances TX/RX health reporters diagnostics to expose the
   internal clock time-stamping format

4) Moshe adds support for ethtool extended link state, to show the reason
   for link down
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 16:53:52 -07:00
Claudiu Manoil
8eda54c5e6 gianfar: Drop GFAR_MQ_POLLING support
Gianfar used to enable all 8 Rx queues (DMA rings) per
ethernet device, even though the controller can only
support 2 interrupt lines at most.  This meant that
multiple Rx queues would have to be grouped per NAPI poll
routine, and the CPU would have to split the budget and
service them in a round robin manner.  The overhead of
this scheme proved to outweight the potential benefits.
The alternative was to introduce the "Single Queue" polling
mode, supporting one Rx queue per NAPI, which became the
default packet processing option and helped improve the
performance of the driver.
MQ_POLLING also relies on undocumeted device tree properties
to specify how to map the 8 Rx and Tx queues to a given
interrupt line (aka "interrupt group").  Using module parameters
to enable this mode wasn't an option either.  Long story short,
MQ_POLLING became obsolete, now it is just dead code, and no
one asked for it so far.
For the Tx queues, multi-queue support (more than 1 Tx queue
per CPU) could be revisited by adding tc MQPRIO support, but
again, one has to consider that there are only 2 interrupt lines.
So the NAPI poll routine would have to service multiple Tx rings.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:46:15 -07:00
Stefan Chulski
4ad29b1a48 net: mvpp2: Add parsing support for different IPv4 IHL values
Add parser entries for different IPv4 IHL values.
Each entry will set the L4 header offset according to the IPv4 IHL field.
L3 header offset will set during the parsing of the IPv4 protocol.

Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails
and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack).
This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20.

v1 --> v2
- Improve commit message.

Suggested-by: Dana Vardi <danat@marvell.com>
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:29:40 -07:00
Ilya Lipnitskiy
c5d66587b8 net: ethernet: mediatek: ppe: fix busy wait loop
The intention is for the loop to timeout if the body does not succeed.
The current logic calls time_is_before_jiffies(timeout) which is false
until after the timeout, so the loop body never executes.

Fix by using readl_poll_timeout as a more standard and less error-prone
solution.

Fixes: ba37b7caf1 ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:24:18 -07:00
Gatis Peisenieks
a1150a04b7 atl1c: move tx cleanup processing out of interrupt
Tx queue cleanup happens in interrupt handler on same core as rx queue
processing. Both can take considerable amount of processing in high
packet-per-second scenarios.

Sending big amounts of packets can stall the rx processing which is
unfair and also can lead to out-of-memory condition since
__dev_kfree_skb_irq queues the skbs for later kfree in softirq which
is not allowed to happen with heavy load in interrupt handler.

This puts tx cleanup in its own napi and enables threaded napi to
allow the rx/tx queue processing to happen on different cores.

The ability to sustain equal amounts of tx/rx traffic increased:
from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
Mikrotik 10/25G NIC,
from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:16:54 -07:00
Vladimir Oltean
2c4eca3ef7 net: bridge: switchdev: include local flag in FDB notifications
As explained in bugfix commit 6ab4c3117a ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/

the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.

Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.

Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16 15:15:45 -07:00
Sasha Neftin
1feaf60ff2 igc: Expose LPI counters
Expose EEE Tx and Rx low power idle counters via ethtool
A EEE TX or RX LPI event occurs when the transmitter or the receiver
enters EEE (IEEE802.3az) LPI state.
ethtool --statistics <iface>

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16 13:15:45 -07:00
Sasha Neftin
b3d4f40562 igc: Fix overwrites return value
drivers/net/ethernet/intel/igc/igc_i225.c:235 igc_write_nvm_srwr()
warn: loop overwrites return value 'ret_val'

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16 13:15:45 -07:00
Ederson de Souza
87938851b6 igc: enable auxiliary PHC functions for the i225
The i225 device offers a number of special PTP Hardware Clock features on
the Software Defined Pins (SDPs) - much like i210, which is used as
inspiration for this patch. It enables two possible functions, namely
time stamping external events and periodic output signals.

The assignment of PHC functions to the four SDP can be freely chosen by
the user.

For the external events time stamping, when the SDP (configured as input
by user) level changes, an interrupt is generated and the kernel
Precision Time Protocol (PTP) is informed.

For the periodic output signals, the i225 is configured to generate them
(so the SDP level will change periodically) and the driver also has to
keep updating the time of the next level change. However, this work is
not necessary for some frequencies as the i225 takes care of them
(namely, anything with a half-cycle of 500ms, 250ms, 125ms or < 70ms).

While i225 allows up to four timers to be used to source the time used
on the external events or output signals, this patch uses only one of
those timers. Main reason is to keep it simple, as it's not clear how
these extra timers would be exposed to users. Note that currently a NIC
can expose a single PTP device.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16 13:15:45 -07:00
Ederson de Souza
64433e5bf4 igc: Enable internal i225 PPS
The i225 device can produce one interrupt on the full second, much
like i210 - from where this patch is inspired.

This patch sets up the full second interruption on the i225 and when
receiving it, it sends a PPS event to PTP (Precision Time Protocol)
kernel subsystem.

The PTP subsystem exposes the PPS events via ioctl and sysfs, and one
can use the `testptp` tool (tools/testing/selftests/ptp) to check that
the events are being generated.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16 13:15:45 -07:00
Grzegorz Siwik
1d3cb90cb0 igb: Add double-check MTA_REGISTER for i210 and i211
Add new function which checks MTA_REGISTER if its filled correctly.
If not then writes again to same register.
There is possibility that i210 and i211 could not accept
MTA_REGISTER settings, specially when you add and remove
many of multicast addresses in short time.
Without this patch there is possibility that multicast settings will be
not always set correctly in hardware.

Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16 13:15:39 -07:00
Aya Levin
95742c1cc5 net/mlx5: Enhance diagnostics info for TX/RX reporters
Add ts_format to 'Common Config' section of the TX/RX devlink reporters
diagnostics info. Possible values for ts_format: 'RT' or 'FRC'
which stands for: Real Time and Free Running Counters correspondingly.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:34 -07:00
Aya Levin
302522e67c net/mlx5: Add helper to initialize 1PPS
Wrap 1PPS initialization in a helper for a cleaner init flow.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:31 -07:00
Moshe Tal
b3446acb2b net/mlx5e: Add ethtool extended link state
In case the interface was set up but cannot establish the link, ethtool
will print more information to help the user troubleshoot the state.

For example, no link due to missing cable:
$ ethtool eth1
...
Link detected: no (No cable)

Beside the general extended state, drivers can pass additional
information about the link state using the sub-state field. For example:

$ ethtool eth1
...
Link detected: no (Autoneg, No partner detected)

The extended state is available only for specific cases, in other cases
ethtool with print only "Link detected: no" as before

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:28 -07:00
Maor Dickman
5cec6de0ae net/mlx5: Allocate FC bulk structs with kvzalloc() instead of kzalloc()
The bulk size is larger than 16K so use kvzalloc().
The bulk bitmask upper size limit is 16K so use kvcalloc().

Signed-off-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:22 -07:00
Maxim Mikityanskiy
94872d4ef9 net/mlx5e: Cleanup safe switch channels API by passing params
mlx5e_safe_switch_channels accepts new_chs as a parameter and opens new
channels in place, then copying them to priv->channels. It requires all
the callers to allocate space for this temporary storage of the new
channels.

This commit cleans up the API by replacing new_chs with new_params, a
meaningful subset of new_chs to be filled by the caller. The temporary
space for the new channels is allocated inside mlx5e_safe_switch_params
(a new name for mlx5e_safe_switch_channels). An extra copy of params is
made, but since it's control flow, it's not critical.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:20 -07:00
Maxim Mikityanskiy
b3b886cf96 net/mlx5e: Refactor on-the-fly configuration changes
This commit extends mlx5e_safe_switch_channels() to support on-the-fly
configuration changes, when the channels are open, but don't need to be
recreated. Such flows exist when a parameter being changed doesn't
affect how the queues are created, or when the queues can be modified
while remaining active.

Before this commit, such flows were handled as special cases on the
caller site. This commit adds this functionality to
mlx5e_safe_switch_channels(), allowing the caller to pass a boolean
indicating whether it's required to recreate the channels or it's
allowed to skip it. The logic of switching channel parameters is now
completely encapsulated into mlx5e_safe_switch_channels().

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:17 -07:00
Maxim Mikityanskiy
69cc4185dc net/mlx5e: Use mlx5e_safe_switch_channels when channels are closed
This commit uses new functionality of mlx5e_safe_switch_channels
introduced by the previous commit to reduce the amount of repeating
similar code all over the driver.

It's very common in mlx5e to call mlx5e_safe_switch_channels when the
channels are open, but assign parameters and run hardware commands
manually when the channels are closed.

After the previous commit it's no longer needed to do such manual things
every time, so this commit removes unneeded code and relies on the new
functionality of mlx5e_safe_switch_channels. Some of the places are
refactored and simplified, where more complex flows are used to change
configuration on the fly, without recreating the channels (the logic is
rewritten in a more robust way, with a reset required by default and a
list of exceptions).

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:14 -07:00
Maxim Mikityanskiy
6cad120d9e net/mlx5e: Allow mlx5e_safe_switch_channels to work with channels closed
mlx5e_safe_switch_channels is used to modify channel parameters and/or
hardware configuration in a safe way, so that if anything goes wrong,
everything reverts to the old configuration and remains in a consistent
state.

However, this function only works when the channels are open. When the
caller needs to modify some parameters, first it has to check that the
channels are open, otherwise it has to assign parameters directly, and
such boilerplate repeats in many different places.

This commit prepares for the refactoring of such places by allowing
mlx5e_safe_switch_channels to work when the channels are closed. In this
case it will assign the new parameters and run the preactivate hook.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:11 -07:00
Tariq Toukan
e9ce991bce net/mlx5e: kTLS, Add resiliency to RX resync failures
When the TLS logic finds a tcp seq match for a kTLS RX resync
request, it calls the driver callback function mlx5e_ktls_resync()
to handle it and communicate it to the device.

Errors might occur during mlx5e_ktls_resync(), however, they are not
reported to the stack. Moreover, there is no error handling in the
stack for these errors.

In this patch, the driver obtains responsibility on errors handling,
adding queue and retry mechanisms to these resyncs.

We maintain a linked list of resync matches, and try posting them
to the async ICOSQ in the NAPI context.

Only possible failure that demands driver handling is ICOSQ being full.
By relying on the NAPI mechanism, we make sure that the entries in list
will be handled when ICOSQ completions arrive and make some room
available.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:08 -07:00
Tariq Toukan
72f6f2f8d6 net/mlx5e: TX, Inline function mlx5e_tls_handle_tx_wqe()
When TLS is supported, WQE ctrl segment of every transmitted packet
is updated with the (possibly empty, for non-TLS packets) TISN field.

Take this one-liner function into the header file and inline it,
to save the overhead of a function call per packet.

While here, remove unused function parameter.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:05 -07:00
Tariq Toukan
b6b3ad2175 net/mlx5e: TX, Inline TLS skb check
When TLS is supported and enabled, every transmitted packet is tested
to identify if TLS offload is required.

Take the early-return condition into an inline function, to save
the overhead of a function call for non-TLS packets.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:02 -07:00
Tariq Toukan
8668587a33 net/mlx5e: Cleanup unused function parameter
Socket parameter is not used in accel_rule_init(), remove it.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:47:59 -07:00
Tariq Toukan
2f014f4016 net/mlx5e: Remove non-essential TLS SQ state bit
Maintaining an SQ state bit to indicate TLS support
has no real need, a simple and fast test [1] for the SKB is
almost equally good.

[1] !skb->sk || !tls_is_sk_tx_device_offloaded(skb->sk)

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:47:56 -07:00
Ederson de Souza
26b67f5a1e igb: Redistribute memory for transmit packet buffers when in Qav mode
i210 has a total of 24KB of transmit packet buffer. When in Qav mode,
this buffer is divided into four pieces, one for each Tx queue.
Currently, 8KB are given to each of the two SR queues and 4KB are given
to each of the two SP queues.

However, it was noticed that such distribution can make best effort
traffic (which would usually go to the SP queues when Qav is enabled, as
the SR queues would be used by ETF or CBS qdiscs for TSN-aware traffic)
perform poorly. Using iperf3 to measure, one could see the performance
of best effort traffic drop by nearly a third (from 935Mbps to 578Mbps),
with no TSN traffic competing.

This patch redistributes the 24KB to each queue equally: 6KB each. On
tests, there was no notable performance reduction of best effort traffic
performance when there was no TSN traffic competing.

Below, more details about the data collected:

All experiments were run using the following qdisc setup:

qdisc taprio 100: root refcnt 9 tc 4 map 3 3 3 2 3 0 0 3 3 3 3 3 3 3 3 3
    queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1
    clockid TAI base-time 0 cycle-time 10000000 cycle-time-extension 0
    index 0 cmd S gatemask 0xf interval 10000000

qdisc etf 8045: parent 100:1 clockid TAI delta 1000000 offload on
    deadline_mode off skip_sock_check off

TSN traffic, when enabled, had this characteristics:
 Packet size: 1500 bytes
 Transmission interval: 125us

----------------------------------
Without this patch:
----------------------------------
- TCP data:
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.35 GBytes   578 Mbits/sec    0

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.07 GBytes   460 Mbits/sec    1

- TCP data limiting iperf3 buffer size to 4K:
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.35 GBytes   579 Mbits/sec    0

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.08 GBytes   462 Mbits/sec    0

- TCP data limiting iperf3 buffer size to 192 bytes (smallest size without
 serious performance degradation):
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.34 GBytes   577 Mbits/sec    0

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.07 GBytes   461 Mbits/sec    1

- UDP data at 1000Mbit/sec:
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
        [  5]   0.00-20.00  sec  1.36 GBytes   586 Mbits/sec  0.000 ms  0/1011407 (0%)

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
        [  5]   0.00-20.00  sec  1.05 GBytes   451 Mbits/sec  0.000 ms  0/778672 (0%)

----------------------------------
With this patch:
----------------------------------

- TCP data:
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  2.17 GBytes   932 Mbits/sec    0

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.50 GBytes   646 Mbits/sec    1

- TCP data limiting iperf3 buffer size to 4K:
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  2.17 GBytes   931 Mbits/sec    0

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.50 GBytes   645 Mbits/sec    0

- TCP data limiting iperf3 buffer size to 192 bytes (smallest size without
 serious performance degradation):
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  2.17 GBytes   932 Mbits/sec    1

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Retr
        [  5]   0.00-20.00  sec  1.50 GBytes   645 Mbits/sec    0

- UDP data at 1000Mbit/sec:
    - No TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
        [  5]   0.00-20.00  sec  2.23 GBytes   956 Mbits/sec  0.000 ms  0/1650226 (0%)

    - With TSN traffic:
        [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
        [  5]   0.00-20.00  sec  1.51 GBytes   649 Mbits/sec  0.000 ms  0/1120264 (0%)

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16 10:42:52 -07:00
Jakub Kicinski
1703bb50df mlx5: implement ethtool::get_fec_stats
Report corrected bits.

v2: catch reg access errors (Saeed)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 17:08:29 -07:00
Jakub Kicinski
cab351be53 sfc: ef10: implement ethtool::get_fec_stats
Report what appears to be the standard block counts:
 - 30.5.1.1.17 aFECCorrectedBlocks
 - 30.5.1.1.18 aFECUncorrectableBlocks

Don't report the per-lane symbol counts, if those really
count symbols they are not what the standard calls for
(even if symbols seem like the most useful thing to count.)

Fingers crossed that fec_corrected_errors is not in symbols.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 17:08:29 -07:00
Jakub Kicinski
c9ca5c3aab bnxt: implement ethtool::get_fec_stats
Report corrected bits.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 17:08:29 -07:00
Vinay Kumar Yadav
e8a4155567 ch_ktls: do not send snd_una update to TCB in middle
snd_una update should not be done when the same skb is being
sent out.chcr_short_record_handler() sends it again even
though SND_UNA update is already sent for the skb in
chcr_ktls_xmit(), which causes mismatch in un-acked
TCP seq number, later causes problem in sending out
complete record.

Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:55:49 -07:00
Vinay Kumar Yadav
21d8c25e3f ch_ktls: tcb close causes tls connection failure
HW doesn't need marking TCB closed. This TCB state change
sometimes causes problem to the new connection which gets
the same tid.

Fixes: 34aba2c450 ("cxgb4/chcr : Register to tls add and del callback")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:55:49 -07:00
Vinay Kumar Yadav
bc16efd243 ch_ktls: fix device connection close
When sge queue is full and chcr_ktls_xmit_wr_complete()
returns failure, skb is not freed if it is not the last tls record in
this skb, causes refcount never gets freed and tls_dev_del()
never gets called on this connection.

Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:55:49 -07:00
Vinay Kumar Yadav
1a73e427b8 ch_ktls: Fix kernel panic
Taking page refcount is not ideal and causes kernel panic
sometimes. It's better to take tx_ctx lock for the complete
skb transmit, to avoid page cleanup if ACK received in middle.

Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:55:49 -07:00
Yangbo Lu
b6faf160d0 enetc: convert to schedule_work()
Convert system_wq queue_work() to schedule_work() which is
a wrapper around it, since the former is a rare construct.

Fixes: 7294380c52 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:53:08 -07:00
Guangbin Huang
01305e16eb net: hns3: VF not request link status when PF support push link status feature
To reduce the processing of unnecessary mailbox command when PF supports
actively push its link status to VFs, VFs stop sending request link
status command in periodic service task in this case.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:51:29 -07:00
Guangbin Huang
18b6e31f8b net: hns3: PF add support for pushing link status to VFs
Previously, VF updates its link status every second by send query command
to PF in periodic service task. If link stats of PF is changed, VF may
need at most one second to update its link status.

To reduce delay of link status between PF and VFs, PF actively push its
link status to VFs when its link status is updated. And to let VF know
PF supports this new feature, the link status changed mailbox command
adds one bit to indicate it.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:51:29 -07:00
David S. Miller
61d773586e mlx5-fixes-2021-04-14
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB3dv4ACgkQSD+KveBX
 +j4VdQgAocwr2vJsDXi0Wk9I3haPQkUxw88EbggBQQVUZOqhWC3PMQLZhJp0U1LF
 +smJ2irPlS/FVArnUW8sHPcbVBq3Vm04E/NWTd7tEYp+pxJcQV6ETRviFX5QMx2O
 FBrfiaJJR1MR7kBUyvYbqhylz1FZy7kwNQq8RwmXOjs8C/uyxok1jEeaWZ6AoCoa
 9J67xILajAaMKiFtpf/5SZPRgWPI9yPnzVeQMTLBKvH/jQUElkhOtxmCLOOR0BFL
 FLyKFISxX2AebACD6wBJVa1BkRE4OWMaqJfDbZ7XqxWhjW6/fxlvvck9lMniDHd3
 O2mZ0I2jFbLW3KSwPn1To3ie566Kwg==
 =Pa2u
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-04-14

This series provides 3 small fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 16:43:29 -07:00
Jason Xing
4e39a072a6 i40e: fix the panic when running bpf in xdpdrv mode
Fix this panic by adding more rules to calculate the value of @rss_size_max
which could be used in allocating the queues when bpf is loaded, which,
however, could cause the failure and then trigger the NULL pointer of
vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
cpus are online and then allocates 256 queues on the machine with 32 cpus
online actually.

Once the load of bpf begins, the log will go like this "failed to get
tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
failed".

Thus, I attach the key information of the crash-log here.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
Call Trace:
[2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
[2160294.717666]  dev_xdp_install+0x4f/0x70
[2160294.718036]  dev_change_xdp_fd+0x11f/0x230
[2160294.718380]  ? dev_disable_lro+0xe0/0xe0
[2160294.718705]  do_setlink+0xac7/0xe70
[2160294.719035]  ? __nla_parse+0xed/0x120
[2160294.719365]  rtnl_newlink+0x73b/0x860

Fixes: 41c445ff0f ("i40e: main driver core")
Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15 14:37:35 -07:00