Commit Graph

7252 Commits

Author SHA1 Message Date
Linus Torvalds
3672ac8ac0 RDMA 5.12 merge window pull request
- Driver updates and bug fixes: siw, hns, bnxt_re, mlx5, efa
 
 - Significant rework in rxe to get it ready to have XRC support added
 
 - Several rts bug fixes
 
 - Big series to get to 'make W=1' cleanness, primarily updating kdocs
 
 - Support for creating a RDMA MR from a DMABUF fd to allow PCI peer to
   peer transfers to GPU VRAM
 
 - Device disassociation now works properly with umad
 
 - Work to support more than 255 ports on a RDMA device
 
 - Further support for the new HNS HIP09 hardware
 
 - Coding style cleanups: comma to semicolon, unneded semicolon/blank
   lines, remove 'h' printk format, don't check for NULL before kfree,
   use true/false for bool.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmAzvugACgkQOG33FX4g
 mxpE4w/9HqJF0lsHkRHhorrVZnJwO5hMs1QzY4wya+BqWtMEi5DreS/75uMiRmYH
 EsvO4LOvzNuP8uDjUmznRe7MLBRUg7GqIfDrxhGDIJ4tWBJ5amoordoDCY/IKcTW
 fBETEGcL92wTnZBxXX8jsVA+7QUYgGenFr6ozpdQ9EldQeEBb2CHzn5sxD/CCHXS
 k49mdk2FvPanb0r7ZIkqsDDMXjP/n7/hi9JX9fK4oCbsap0S5YavCuwVMkV0XHPe
 l7hjxsrztHZwrxFq846Sz0tIdwPIiHam+3CWpV5pUJxaI7xUZkgmCaXHRTeRCYRR
 amDOpXL7FjvUShnTyp2wUAFNR/xHdHx2uMSGR0KR5chUTmSixwD4H6xQlg2ZCvgd
 hAVWIliMh5mMqFy1+gz6ES98/Wh4u+Iv7ws5iQ8qQXWVB26+OyWL1l9ArVVysuXW
 vMIXkDR2lMk//qSz8klnqQjPR2gpjnmZ9PYq6a6EQa6xRaS3oWj2E/OWXCkdo4mv
 ISpqTNq/aZPz5+wsiv6ngxMl36Vof0T8rPudCuN+SGYTG4D7s3gZu3IGsB0cbqbW
 DMXUXLzUWx/KlMeErxWjPOQQReHZ7jq4O/A8aXBe3q13hKRlmk15MY66YFu9Poad
 mUuqxRavINNxqfPP0dkxpVL1/1w5QDMREP6AHQRs4AGl5qzzvhk=
 =Z4cZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is quite a small cycle, if not for Lee's 70 patches cleaning the
  kdocs it would be well below typical for patch count.

  Most of the interesting work here was in the HNS and rxe drivers which
  got fairly major internal changes.

  Summary:

   - Driver updates and bug fixes: siw, hns, bnxt_re, mlx5, efa

   - Significant rework in rxe to get it ready to have XRC support added

   - Several rts bug fixes

   - Big series to get to 'make W=1' cleanness, primarily updating kdocs

   - Support for creating a RDMA MR from a DMABUF fd to allow PCI peer
     to peer transfers to GPU VRAM

   - Device disassociation now works properly with umad

   - Work to support more than 255 ports on a RDMA device

   - Further support for the new HNS HIP09 hardware

   - Coding style cleanups: comma to semicolon, unneded semicolon/blank
     lines, remove 'h' printk format, don't check for NULL before kfree,
     use true/false for bool"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (205 commits)
  RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()
  RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes
  RDMA/mlx5: Fail QP creation if the device can not support the CQE TS
  RDMA/mlx5: Allow CQ creation without attached EQs
  RDMA/rtrs-srv-sysfs: fix missing put_device
  RDMA/rtrs-srv: fix memory leak by missing kobject free
  RDMA/rtrs: Only allow addition of path to an already established session
  RDMA/rtrs-srv: Fix stack-out-of-bounds
  RDMA/rxe: Remove unused pkt->offset
  RDMA/ucma: Fix use-after-free bug in ucma_create_uevent
  RDMA/core: Fix kernel doc warnings for ib_port_immutable_read()
  RDMA/qedr: Use true and false for bool variable
  RDMA/hns: Adjust definition of FRMR fields
  RDMA/hns: Refactor process of posting CMDQ
  RDMA/hns: Adjust fields and variables about CMDQ tail/head
  RDMA/hns: Remove redundant operations on CMDQ
  RDMA/hns: Fixes missing error code of CMDQ
  RDMA/hns: Remove unused member and variable of CMDQ
  RDMA/ipoib: Remove racy Subnet Manager sendonly join checks
  RDMA/mlx5: Support 400Gbps IB rate in mlx5 driver
  ...
2021-02-22 10:27:48 -08:00
Jason Gunthorpe
7289e26f39 Linux 5.11
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmAppPgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGeXYH/imZPBd4A1jIMehN
 5HV2A53Z+MXmmaMuGj9X1KV6vsf55/xB+IhOoFdtRAIsO8c2yYSCO8i4+4R0XfYA
 +/YFJeq672rojQnmh6XbpR8dugaAV7CUHy6n7KDsyvtT6EOCpwFSwkOb4X3tBRX6
 TlYgm2d/xgV/wRHSgLVugK0MdFCLMAnyb7mkPfar9QrMgG1BiDKLq07xmwnS23On
 TkqpJ9yZ/rJpUrrUqQYPShSO/FmA+fSfWs0CDv7EIrJ40LUScD6PZxSHWTIHtjLk
 E4jFda6wuqLRVWsBwaBzUIdD0zk7X5quHRzEpbC5ga16SK6yrWvE5YJJXCguIEuZ
 f3FMRYs=
 =CAjn
 -----END PGP SIGNATURE-----

Merge tag 'v5.11' into rdma.git for-next

Linux 5.11

Merged to resolve conflicts with RDMA rc commits

- drivers/infiniband/sw/rxe/rxe_net.c
  The final logic is to call rxe_get_dev_from_net() again with the master
  netdev if the packet was rx'd on a vlan. To keep the elimination of the
  local variables requires a trivial edit to the code in -rc

Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-18 11:19:29 -04:00
Jakub Kicinski
b646acd5eb net: re-solve some conflicts after net -> net-next merge
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-16 23:12:23 -08:00
David S. Miller
d489ded1a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-16 17:51:13 -08:00
David S. Miller
44c3203975 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
pull-request: mlx5-next 2021-02-16

The patches in this pr are already submitted and reviewed through the
netdev and rdma mailing lists.

The series includes mlx5 HW bits and definitions for mlx5 real time clock
translation and handling in the mlx5 driver clock module to enable and
support such mode [1]

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20210212223042.449816-7-saeed@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 14:53:30 -08:00
Aya Levin
432119de33 net/mlx5: Add cyc2time HW translation mode support
Device timestamp can be in real time mode (cycles to time translation is
offloaded into the Hardware). With real time mode, HW provides timestamp
which is already translated into nanoseconds.

With this mode, driver adjusts both the HW and timecounter (to keep
clock_info_page updated) using callbacks: adjfreq, adjtime and settime.
HW clock modifications are done via MTUTC access reg commands. Driver is
allowed to modify HW real time clock only if MCAM ptpcyc2realtime_modify
capability is set.

Add MTUTC set function to be used for configuring the HW real time
clock. Modify existing code to support both internal timer (with
conversion via timecounter_cyc2time() and real time (no conversions).

Align the signatures of the helpers converting from timestamp to
nanoseconds. With that, when allocating a queue assign the corresponding
callback with respect to the capability.

Adjust 1PPS timestamp calculation flows based on the timestamp mode.

Cyc2time offload brings two major advantages:
- Improve MTAE (Max Time Absolute Error) for HW TS by up to 160 ns over a
  100% loaded CPU.
- Faster data-path timestamp to nanoseconds, as translation is
  lock-less and done in HW.

On real time mode, timestamp format is 32 high bits of seconds and 32
low bits of nanoseconds. On some flows, driver shall convert this format
into nanoseconds wall-clock with REAL_TIME_TO_NS macro.

HW supports a single clock, and it is shared by all functions on a
device. In case real time clock is used, it is recommended to use
a single GM to all device's functions.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16 14:04:54 -08:00
Eran Ben Elisha
de19cd6cc9 net/mlx5: Move some PPS logic into helper functions
Some of PPS logic (timestamp calculations) fits only internal timer
timestamp mode. Move these logics into helper functions. Later in the
patchset cyc2time HW translation mode will expose its own PPS timestamp
calculations.

With this change, main flow will only hold calling PPS logic based on run
time mode.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16 14:04:54 -08:00
Eran Ben Elisha
d6f3dc8f50 net/mlx5: Move all internal timer metadata into a dedicated struct
Internal timer mode (SW clock) requires some PTP clock related metadata
structs. Real time mode (HW clock) will not need these metadata structs.
This separation emphasize the different interfaces for HW clock and SW
clock.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16 14:04:54 -08:00
Eran Ben Elisha
1436de0b99 net/mlx5: Refactor init clock function
Function mlx5_init_clock() is responsible for internal PTP related metadata
initializations. Break mlx5_init_clock() to sub functions, each takes care
of its own logic.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-16 14:04:54 -08:00
Vladimir Oltean
e18f4c18ab net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes
This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.

This patch also changes drivers to make use of the "mask" field for edge
detection when possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Vladimir Oltean
4c08c586ff net: switchdev: propagate extack to port attributes
When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:08:04 -08:00
Tariq Toukan
2af3e35c5a net/mlx5: Remove TLS dependencies on XPS
No real dependency on XPS, but on RX queue mapping, which
is being selected by TLS_DEVICE.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 19:08:06 -08:00
Moshe Shemesh
e1c3940c60 net/mlx5e: Check tunnel offload is required before setting SWP
Check that tunnel offload is required before setting Software Parser
offsets to get Geneve HW offload. In case of Geneve packet we check HW
offload support of SWP in mlx5e_tunnel_features_check() and set features
accordingly, this should be reflected in skb offload requested by the
kernel and we should add the Software Parser offsets only if requested.
Otherwise, in case HW doesn't support SWP for Geneve, data path will
mistakenly try to offload Geneve SKBs with skb->encapsulation set,
regardless of whether offload was requested or not on this specific SKB.

Fixes: e3cfc7e6b7 ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:16 -08:00
Oz Shlomo
a217313152 net/mlx5e: CT: manage the lifetime of the ct entry object
The ct entry object is accessed by the ct add, del, stats and restore
methods. In addition, it is referenced from several hash tables.

The lifetime of the ct entry object was not managed which triggered race
conditions as in the following kasan dump:
[ 3374.973945] ==================================================================
[ 3374.988552] BUG: KASAN: use-after-free in memcmp+0x4c/0x98
[ 3374.999590] Read of size 1 at addr ffff00036129ea55 by task ksoftirqd/1/15
[ 3375.016415] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G           O      5.4.31+ #1
[ 3375.055301] Call trace:
[ 3375.060214]  dump_backtrace+0x0/0x238
[ 3375.067580]  show_stack+0x24/0x30
[ 3375.074244]  dump_stack+0xe0/0x118
[ 3375.081085]  print_address_description.isra.9+0x74/0x3d0
[ 3375.091771]  __kasan_report+0x198/0x1e8
[ 3375.099486]  kasan_report+0xc/0x18
[ 3375.106324]  __asan_load1+0x60/0x68
[ 3375.113338]  memcmp+0x4c/0x98
[ 3375.119409]  mlx5e_tc_ct_restore_flow+0x3a4/0x6f8 [mlx5_core]
[ 3375.131073]  mlx5e_rep_tc_update_skb+0x1d4/0x2f0 [mlx5_core]
[ 3375.142553]  mlx5e_handle_rx_cqe_rep+0x198/0x308 [mlx5_core]
[ 3375.154034]  mlx5e_poll_rx_cq+0x2a0/0x1060 [mlx5_core]
[ 3375.164459]  mlx5e_napi_poll+0x1d4/0xa78 [mlx5_core]
[ 3375.174453]  net_rx_action+0x28c/0x7a8
[ 3375.182004]  __do_softirq+0x1b4/0x5d0

Manage the lifetime of the ct entry object by using synchornization
mechanisms for concurrent access.

Fixes: ac991b48d4 ("net/mlx5e: CT: Offload established flows")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:15 -08:00
Shay Drory
edac23c2b3 net/mlx5: Disable devlink reload for lag devices
Devlink reload can't be allowed on lag devices since reloading one lag
device will cause traffic on the bond to get stucked.
Users who wish to reload a lag device, need to remove the device from
the bond, and only then reload it.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:15 -08:00
Shay Drory
7ab91f2b03 net/mlx5: Disallow RoCE on lag device
In lag mode, setting roce enabled/disable of lag device have no effect.
e.g.: bond device (roce/vf_lag) roce status remain unchanged.
Therefore disable it and add an error message.

Fixes: cc9defcbb8 ("net/mlx5: Handle "enable_roce" devlink param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:15 -08:00
Shay Drory
c70f8597fc net/mlx5: Disallow RoCE on multi port slave device
In dual port mode, setting roce enabled/disable for the slave device
have no effect. e.g.: the slave device roce status remain unchanged.
Therefore disable it and add an error message.
Enable or disable roce of the master device affect both master and slave
devices.

Fixes: cc9defcbb8 ("net/mlx5: Handle "enable_roce" devlink param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:14 -08:00
Shay Drory
d89ddaae17 net/mlx5: Disable devlink reload for multi port slave device
Devlink reload can't be allowed on a multi port slave device, because
reload of slave device doesn't take effect.

The right flow is to disable devlink reload for multi port slave
device. Hence, disabling it in mlx5_core probing.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:14 -08:00
Maxim Mikityanskiy
b850bbff96 net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context
wait_for_resync is unreliable - if it timeouts, priv_rx will be freed
anyway. However, mlx5e_ktls_handle_get_psv_completion will be called
sooner or later, leading to use-after-free. For example, it can happen
if a CQ error happened, and ICOSQ stopped, but later on the queues are
destroyed, and ICOSQ is flushed with mlx5e_free_icosq_descs.

This patch converts the lifecycle of priv_rx to fully refcount-based, so
that the struct won't be freed before the refcount goes to zero.

Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:13 -08:00
Maxim Mikityanskiy
ebf79b6be6 net/mlx5e: Fix CQ params of ICOSQ and async ICOSQ
The commit mentioned below has split the parameters of ICOSQ and async
ICOSQ, but it contained a typo: the CQ parameters were swapped for ICOSQ
and async ICOSQ. Async ICOSQ is longer than the normal ICOSQ, and the CQ
size must be the same as the size of the corresponding SQ, but due to
this bug, the CQ of async ICOSQ was much shorter than async ICOSQ
itself. It led to overflows of the CQ with such messages in dmesg, in
particular, when running multiple kTLS-offloaded streams:

mlx5_core 0000:08:00.0: cq_err_event_notifier:529:(pid 9422): CQ error
on CQN 0x406, syndrome 0x1
mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000406
event=0x04

This commit fixes the issue by using the corresponding parameters for
ICOSQ and async ICOSQ.

Fixes: c293ac927f ("net/mlx5e: Refactor build channel params")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:12 -08:00
Maxim Mikityanskiy
4d6e6b0c6d net/mlx5e: Replace synchronize_rcu with synchronize_net
The commit cited below switched from using napi_synchronize to
synchronize_rcu to have a guarantee that it will finish in finite time.
However, on average, synchronize_rcu takes more time than
napi_synchronize. Given that it's called multiple times per channel on
deactivation, it accumulates to a significant amount, which causes
timeouts in some applications (for example, when using bonding with
NetworkManager).

This commit replaces synchronize_rcu with synchronize_net, which is
faster when called under rtnl_lock, allowing to speed up the described
flow.

Fixes: 9c25a22dfb ("net/mlx5e: Use synchronize_rcu to sync with NAPI")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:12 -08:00
Shay Drory
51d138c261 net/mlx5: Fix health error state handling
Currently, when we discover a fatal error, we are queueing a work that
will wait for a lock in order to enter the device to error state.
Meanwhile, FW commands are still being processed, and gets timeouts.
This can block the driver for few minutes before the work will manage
to get the lock and enter to error state.

Setting the device to error state before queueing health work, in order
to avoid FW commands being processed while the work is waiting for the
lock.

Fixes: c1d4d2e92a ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:11 -08:00
Maxim Mikityanskiy
65ba8594a2 net/mlx5e: Change interrupt moderation channel params also when channels are closed
struct mlx5e_params contains fields ({rx,tx}_cq_moderation) that depend
on two things: whether DIM is enabled and the state of a private flag
(MLX5E_PFLAG_{RX,TX}_CQE_BASED_MODER). Whenever the DIM state changes,
mlx5e_reset_{rx,tx}_moderation is called to update the fields, however,
only if the channels are open. The flow where the channels are closed
misses the required update of the fields. This commit moves the calls of
mlx5e_reset_{rx,tx}_moderation, so that they run in both flows.

Fixes: ebeaf084ad ("net/mlx5e: Properly set default values when disabling adaptive moderation")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:11 -08:00
Maxim Mikityanskiy
019f93bc4b net/mlx5e: Don't change interrupt moderation params when DIM is enabled
When mlx5e_ethtool_set_coalesce doesn't change DIM state
(enabled/disabled), it calls mlx5e_set_priv_channels_coalesce
unconditionally, which in turn invokes a firmware command to set
interrupt moderation parameters. It shouldn't happen while DIM manages
those parameters dynamically (it might even be happening at the same
time).

This patch fixes it by splitting mlx5e_set_priv_channels_coalesce into
two functions (for RX and TX) and calling them only when DIM is disabled
(for RX and TX respectively).

Fixes: cb3c7fd4f8 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:10 -08:00
Raed Salem
e33f9f5f2d net/mlx5e: Enable XDP for Connect-X IPsec capable devices
This limitation was inherited by previous Innova (FPGA) IPsec
implementation, it uses its private set of RQ handlers which
does not support XDP, for Connect-X this is no longer true.

Fix by keeping this limitation only for Innova IPsec supporting devices,
as otherwise this limitation effectively wrongly blocks XDP for all
future Connect-X devices for all flows even if IPsec offload is not
used.

Fixes: 2d64663cd5 ("net/mlx5: IPsec: Add HW crypto offload support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:10 -08:00
Raed Salem
e4484d9df5 net/mlx5e: Enable striding RQ for Connect-X IPsec capable devices
This limitation was inherited by previous Innova (FPGA) IPsec
implementation, it uses its private set of RQ handlers which does
not support striding rq, for Connect-X this is no longer true.

Fix by keeping this limitation only for Innova IPsec supporting devices,
as otherwise this limitation effectively wrongly blocks striding RQs for
all future Connect-X devices for all flows even if IPsec offload is not
used.

Fixes: 2d64663cd5 ("net/mlx5: IPsec: Add HW crypto offload support")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:09 -08:00
Parav Pandit
0e22bfb7c0 net/mlx5e: E-switch, Fix rate calculation for overflow
rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.

Fix it by performing 64-bit calculation.

Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-11 18:50:09 -08:00
Wei Yongjun
b50c4892cb net/mlx5: SF, Fix error return code in mlx5_sf_dev_probe()
Fix to return negative error code -ENOMEM from the ioremap() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:15 -08:00
Wei Yongjun
2b6c3c1e74 net/mlx5e: Fix error return code in mlx5e_tc_esw_init()
Fix to return negative error code from the mlx5e_tc_tun_init() error
handling case instead of 0, as done elsewhere in this function.

This commit also using 0 instead of 'ret' when success since it is
always equal to 0.

Fixes: 8914add2c9 ("net/mlx5e: Handle FIB events to update tunnel endpoint device")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:14 -08:00
Dan Carpenter
4782c5d8b9 net/mlx5: Fix a NULL vs IS_ERR() check
The mlx5_chains_get_table() function doesn't return NULL, it returns
error pointers so we need to fix this condition.

Fixes: 34ca65352d ("net/mlx5: E-Switch, Indirect table infrastructure")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:14 -08:00
Vlad Buslov
36280f0797 net/mlx5e: Fix tc_tun.h to verify MLX5_ESWITCH config
Exclude contents of tc_tun.h header when CONFIG_MLX5_ESWITCH is disabled to
prevent compile-time errors when compiling with such config.

Fixes: 0d9f964714 ("net/mlx5e: Extract tc tunnel encap/decap code to dedicated file")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:13 -08:00
Jiapeng Zhong
793985432d net/mlx5: Assign boolean values to a bool variable
Fix the following coccicheck warnings:

./drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:575:2-14: WARNING:
Assignment of 0/1 to bool variable.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:13 -08:00
Colin Ian King
a3f5a45200 net/mlx5e: Fix spelling mistake "Unknouwn" -> "Unknown"
There is a spelling mistake in a netdev_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:13 -08:00
Colin Ian King
83907506f7 net/mlx5e: Fix spelling mistake "channles" -> "channels"
There is a spelling mistake in a netdev_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:12 -08:00
Zou Wei
b171fcd29c net/mlx5_core: remove unused including <generated/utsrelease.h>
Remove including <generated/utsrelease.h> that don't need it.

Fixes: 17a7612b99 ("net/mlx5_core: Clean driver version and name")
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:12 -08:00
Colin Ian King
1b7eb33750 net/mlx5: fix spelling mistake in Kconfig "accelaration" -> "acceleration"
There are some spelling mistakes in the Kconfig. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-10 20:47:11 -08:00
Amit Cohen
a4cb1c02c3 mlxsw: spectrum_router: Set offload_failed flag
When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion
fails, FIB abort is triggered.

After aborting, set the appropriate hardware flag to make the kernel emit
RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
0c5fcf9e24 IPv6: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.

'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
36c5100e85 IPv4: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.

'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Yishai Hadas
db72438c93 RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.

For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.

The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().

By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.

Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.

As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.

Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08 20:31:11 -04:00
Vlad Buslov
8914add2c9 net/mlx5e: Handle FIB events to update tunnel endpoint device
Process FIB route update events to dynamically update the stack device
rules when tunnel routing changes. Use rtnl lock to prevent FIB event
handler from running concurrently with neigh update and neigh stats
workqueue tasks. Use encap_tbl_lock mutex to synchronize with TC rule
update path that doesn't use rtnl lock.

FIB event workflow for encap flows:

- Unoffload all flows attached to route encaps from slow or fast path
depending on encap destination endpoint neigh state.

- Update encap IP header according to new route dev.

- Update flows mod_hdr action that is responsible for overwriting reg_c0
source port bits to source port of new underlying VF of new route dev. This
step requires changing flow create/delete code to save flow parse attribute
mod_hdr_acts structure for whole flow lifetime instead of deallocating it
after flow creation. Refactor mod_hdr code to allow saving id of individual
mod_hdr actions and updating them with dedicated helper.

- Offload all flows to either slow or fast path depending on encap
destination endpoint neigh state.

FIB event workflow for decap flows:

- Unoffload all route flows from hardware. When last route flow is deleted
all indirect table rules for the route dev will also be deleted.

- Update flow attr decap_vport and destination MAC according to underlying
VF of new rote dev.

- Offload all route flows back to hardware creating new indirect table
rules according to updated flow attribute data.

Extract some neigh update code to helper functions to be used by both neigh
update and route update infrastructure.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:39 -08:00
Vlad Buslov
021905f806 net/mlx5e: Rename some encap-specific API to generic names
Some of the encap-specific functions and fields will also be used by route
update infrastructure in following patches. Rename them to generic names.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:38 -08:00
Vlad Buslov
c7b9038d8a net/mlx5e: TC preparation refactoring for routing update event
Following patch in series implement routing update event which requires
ability to modify rule match_to_reg modify header actions dynamically
during rule lifetime. In order to accommodate such behavior, refactor and
extend TC infrastructure in following ways:

- Modify mod_hdr infrastructure to preserve its parse attribute for whole
rule lifetime, instead of deallocating it after rule creation.

- Extend match_to_reg infrastructure with new function
mlx5e_tc_match_to_reg_set_and_get_id() that returns mod_hdr action id that
can be used afterwards to update the action, and
mlx5e_tc_match_to_reg_mod_hdr_change() that can modify existing actions by
its id.

- Extend tun API with new functions mlx5e_tc_tun_update_header_ipv{4|6}()
that are used to updated existing encap entry tunnel header.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:38 -08:00
Vlad Buslov
2221d954d9 net/mlx5e: Refactor neigh update infrastructure
Following patches in series implements route update which can cause encap
entries to migrate between routing devices. Consecutively, their parent
nhe's need to be also transferable between devices instead of having neigh
device as a part of their immutable key. Move neigh device from struct
mlx5_neigh to struct mlx5e_neigh_hash_entry and check that nhe and neigh
devices are the same in workqueue neigh update handler.

Save neigh net_device that can change dynamically in dedicated nhe->dev
field. With FIB event handler that is implemented in following patches
changing nhe->dev, NETEVENT_DELAY_PROBE_TIME_UPDATE handler can
concurrently access the nhe entry when traversing neigh list under rcu read
lock. Processing stale values in that handler doesn't change the handler
logic, so just wrap all accesses to the dev pointer in {WRITE|READ}_ONCE()
helpers.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:38 -08:00
Vlad Buslov
777bb800c6 net/mlx5e: Create route entry infrastructure
Implement dedicated route entry infrastructure to be used in following
patch by route update event. Both encap (indirectly through their
corresponding encap entries) and decap (directly) flows are attached to
routing entry. Since route update also requires updating encap (route
device MAC address is a source MAC address of tunnel encapsulation), same
encap_tbl_lock mutex is used for synchronization.

The new infrastructure looks similar to existing infrastructures for shared
encap, mod_hdr and hairpin entries:

- Per-eswitch hash table is used for quick entry lookup.

- Flows are attached to per-entry linked list and hold reference to entry
  during their lifetime.

- Atomic reference counting and rcu mechanisms are used as synchronization
  primitives for concurrent access.

The infrastructure also enables connection tracking on stacked devices
topology by attaching CT chain 0 flow on tunneling dev to decap route
entry.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:38 -08:00
Vlad Buslov
0d9f964714 net/mlx5e: Extract tc tunnel encap/decap code to dedicated file
Following patches in series extend the extracted code with routing
infrastructure. To improve code modularity created a dedicated
tc_tun_encap.c source file and move encap/decap related code to the new
file. Export code that is used by both regular TC code and encap/decap code
into tc_priv.h (new header intended to be used only by TC module). Rename
some exported functions by adding "mlx5e_" prefix to their names.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:37 -08:00
Vlad Buslov
8e404fefa5 net/mlx5e: Match recirculated packet miss in slow table using reg_c1
Previous patch in series that implements stack devices RX path implements
indirect table rules that match on tunnel VNI. After such rule is created
all tunnel traffic is recirculated to root table. However, recirculated
packet might not match on any rules installed in the table (for example,
when IP traffic follows ARP traffic). In that case packets appear on
representor of tunnel endpoint VF instead being redirected to the VF
itself.

Extend slow table with additional flow group that matches on reg_c0 (source
port value set by indirect tables implemented by previous patch in series)
and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install
one rule per VF vport to match on recirculated miss packets and redirect
them to appropriate VF vport. Modify indirect tables code to also rewrite
reg_c1 with special 0xFFF mark.

Implementation reuses reg_c1 tunnel id bits. This is safe to do because
recirculated packets are always matched before decapsulation.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:37 -08:00
Vlad Buslov
48d216e559 net/mlx5e: Refactor reg_c1 usage
Following patch in series uses reg_c1 in eswitch code. To use reg_c1
helpers in both TC and eswitch code, refactor existing helpers according to
similar use case of reg_c0 and move the functionality into eswitch.h.
Calculate reg mappings length from new defines to ensure that they are
always in sync and only need to be changed in single place.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:37 -08:00
Vlad Buslov
a508728a4c net/mlx5e: VF tunnel RX traffic offloading
When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the
representor of the VF without any further processing of rules installed on
the VF. Detect such case by checking if the device returned by route lookup
in decap rule handling code is a mlx5 VF and handle it with new redirection
tables API.

Example TC rules for VF tunnel traffic:

1. Rule that encapsulates the tunneled flow and redirects packets from
source VF rep to tunnel device:

$ tc -s filter show dev enp8s0f0_1 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  dst_mac 0a:40:bd:30:89:99
  src_mac ca:2e:a7:3f:f5:0f
  eth_type ipv4
  ip_tos 0/0x3
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: tunnel_key  set
        src_ip 7.7.7.5
        dst_ip 7.7.7.1
        key_id 98
        dst_port 4789
        nocsum
        ttl 64 pipe
         index 1 ref 1 bind 1 installed 411 sec used 411 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        no_percpu
        used_hw_stats delayed

        action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen
        index 1 ref 1 bind 1 installed 411 sec used 0 sec
        Action statistics:
        Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 5615833 bytes 4028 pkt
        backlog 0b 0p requeues 0
        cookie bb406d45d343bf7ade9690ae80c7cba4
        no_percpu
        used_hw_stats delayed

2. Rule that redirects from tunnel device to UL rep:

$ tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  dst_mac ca:2e:a7:3f:f5:0f
  src_mac 0a:40:bd:30:89:99
  eth_type ipv4
  enc_dst_ip 7.7.7.5
  enc_src_ip 7.7.7.1
  enc_key_id 98
  enc_dst_port 4789
  enc_tos 0
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: tunnel_key  unset pipe
         index 2 ref 1 bind 1 installed 434 sec used 434 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats delayed

        action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
        index 4 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 129936 bytes 1082 pkt
        backlog 0b 0p requeues 0
        cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
        no_percpu
        used_hw_stats delayed

Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:36 -08:00
Vlad Buslov
4ad9116c84 net/mlx5e: Remove redundant match on tunnel destination mac
Remove hardcoded match on tunnel destination MAC address. Such match is no
longer required and would be wrong for stacked devices topology where
encapsulation destination MAC address will be the address of tunnel VF that
can change dynamically on route change (implemented in following patches in
the series).

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-05 20:53:36 -08:00