Commit Graph

9406 Commits

Author SHA1 Message Date
Kees Cook
2ab6478d12 mlxsw: spectrum_router: Replace 0-length array with flexible array
Zero-length arrays are deprecated[1]. Replace struct
mlxsw_sp_nexthop_group_info's "nexthops" 0-length array with a flexible
array. Detected with GCC 13, using -fstrict-flex-arrays=3:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_nexthop_group_hash_obj':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3278:38: warning: array subscript i is outside array bounds of 'struct mlxsw_sp_nexthop[0]' [-Warray-bounds=]
 3278 |                         val ^= jhash(&nh->ifindex, sizeof(nh->ifindex), seed);
      |                                      ^~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2954:33: note: while referencing 'nexthops'
 2954 |         struct mlxsw_sp_nexthop nexthops[0];
      |                                 ^~~~~~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Ido Schimmel <idosch@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09 07:32:21 +00:00
Eli Cohen
4d1c1379d7 net/mlx5: Lag, fix failure to cancel delayed bond work
Commit 0d4e8ed139 ("net/mlx5: Lag, avoid lockdep warnings")
accidentally removed a call to cancel delayed bond work thus it may
cause queued delay to expire and fall on an already destroyed work
queue.

Fix by restoring the call cancel_delayed_work_sync() before
destroying the workqueue.

This prevents call trace such as this:

[  329.230417] BUG: kernel NULL pointer dereference, address: 0000000000000000
 [  329.231444] #PF: supervisor write access in kernel mode
 [  329.232233] #PF: error_code(0x0002) - not-present page
 [  329.233007] PGD 0 P4D 0
 [  329.233476] Oops: 0002 [#1] SMP
 [  329.234012] CPU: 5 PID: 145 Comm: kworker/u20:4 Tainted: G OE      6.0.0-rc5_mlnx #1
 [  329.235282] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [  329.236868] Workqueue: mlx5_cmd_0000:08:00.1 cmd_work_handler [mlx5_core]
 [  329.237886] RIP: 0010:_raw_spin_lock+0xc/0x20
 [  329.238585] Code: f0 0f b1 17 75 02 f3 c3 89 c6 e9 6f 3c 5f ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 f3 c3 89 c6 e9 45 3c 5f ff 0f 1f 44 00 00 0f 1f
 [  329.241156] RSP: 0018:ffffc900001b0e98 EFLAGS: 00010046
 [  329.241940] RAX: 0000000000000000 RBX: ffffffff82374ae0 RCX: 0000000000000000
 [  329.242954] RDX: 0000000000000001 RSI: 0000000000000014 RDI: 0000000000000000
 [  329.243974] RBP: ffff888106ccf000 R08: ffff8881004000c8 R09: ffff888100400000
 [  329.244990] R10: 0000000000000000 R11: ffffffff826669f8 R12: 0000000000002000
 [  329.246009] R13: 0000000000000005 R14: ffff888100aa7ce0 R15: ffff88852ca80000
 [  329.247030] FS:  0000000000000000(0000) GS:ffff88852ca80000(0000) knlGS:0000000000000000
 [  329.248260] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  329.249111] CR2: 0000000000000000 CR3: 000000016d675001 CR4: 0000000000770ee0
 [  329.250133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [  329.251152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [  329.252176] PKRU: 55555554

Fixes: 0d4e8ed139 ("net/mlx5: Lag, avoid lockdep warnings")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:51 -08:00
Maor Dickman
e54638a838 net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option
The cited patch added support of matching on geneve option by setting
geneve_tlv_option_0_data mask and key but didn't set geneve_tlv_option_0_exist
bit which is required on some HWs when matching geneve_tlv_option_0_data parameter,
this may cause in some cases for packets to wrongly match on rules with different
geneve option.

Example of such case is packet with geneve_tlv_object class=789 and data=456
will wrongly match on rule with match geneve_tlv_object class=123 and data=456.

Fix it by setting geneve_tlv_option_0_exist bit when supported by the HW when matching
on geneve_tlv_option_0_data parameter.

Fixes: 9272e3df30 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:51 -08:00
Adham Faris
1e267ab88d net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
hw mtu (configured at xdp sq allocation) before xmiting it. This check
does not account for ethernet fcs length (calculated and filled by the
nic). Hence, when we try sending packets with length > (hw-mtu -
ethernet-fcs-size), the device port drops it and tx_errors_phy is
incremented. Desired behavior is to catch these packets and drop them
by the driver.

Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
subtracting ethernet FCS header size (4 Bytes) from current hw mtu
value, since ethernet FCS is calculated and written to ethernet frames
by the nic.

Fixes: d8bec2b29a ("net/mlx5e: Support bpf_xdp_adjust_head()")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:51 -08:00
Chris Mi
2951b2e142 net/mlx5e: Always clear dest encap in neigh-update-del
The cited commit introduced a bug for multiple encapsulations flow.
If one dest encap becomes invalid, the flow is set slow path flag.
But when other dests encap become invalid, they are not cleared due
to slow path flag of the flow. When neigh-update-add is running, it
will use invalid encap.

Fix it by checking slow path flag after clearing dest encap.

Fixes: 9a5f9cc794 ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Chris Mi
849190e3e4 net/mlx5e: CT: Fix ct debugfs folder name
Need to use sprintf to build a string instead of sscanf. Otherwise
dirname is null and both "ct_nic" and "ct_fdb" won't be created.
But its redundant anyway as driver could be in switchdev mode but
still add nic rules. So use "ct" as folder name.

Fixes: 77422a8f6f ("net/mlx5e: CT: Add ct driver counters")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Tariq Toukan
f8c18a5749 net/mlx5e: Fix RX reporter for XSK RQs
RX reporter mistakenly reads from the regular (inactive) RQ
when XSK RQ is active. Fix it here.

Fixes: 3db4c85cde ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Dragos Tatulea
b12d581e83 net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
mlx5e_build_nic_params will turn CQE compression on if the hardware
capability is enabled and the slow_pci_heuristic condition is detected.
As IPoIB doesn't support CQE compression, make sure to disable the
feature in the IPoIB profile init.

Please note that the feature is not exposed to the user for IPoIB
interfaces, so it can't be subsequently turned on.

Fixes: b797a684b0 ("net/mlx5e: Enable CQE compression when PCI is slower than link")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Shay Drory
c4ad5f2bda net/mlx5: Fix RoCE setting at HCA level
mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is
marked as unsupported on those VFs/SFs.
The cited patch added an option for disable (and enable) RoCE at HCA
level. However, that commit didn't check whether RoCE is supported on
the HCA and enabled user to try and set RoCE to on.
Fix it by checking whether the HCA supports RoCE.

Fixes: fbfa97b4d7 ("net/mlx5: Disable roce at HCA level")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Shay Drory
9078e843ef net/mlx5: Avoid recovery in probe flows
Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.

Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.

Fixes: 51d138c261 ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:50 -08:00
Shay Drory
44aee8ea15 net/mlx5: Fix io_eq_size and event_eq_size params validation
io_eq_size and event_eq_size params are of param type
DEVLINK_PARAM_TYPE_U32. But, the validation callback is addressing them
as DEVLINK_PARAM_TYPE_U16.

This cause mismatch in validation in big-endian systems, in which
values in range were rejected while 268500991 was accepted.
Fix it by checking the U32 value in the validation callback.

Fixes: 0844fa5f7b ("net/mlx5: Let user configure io_eq_size param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Jiri Pirko
2a35b2c2e6 net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
There are two cleanup calls missing in mlx5_init_once() error path.
Add them making the error path flow to be the same as
mlx5_cleanup_once().

Fixes: 52ec462eca ("net/mlx5: Add reserved-gids support")
Fixes: 7c39afb394 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Moshe Shemesh
1f0ae22ab4 net/mlx5: E-Switch, properly handle ingress tagged packets on VST
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Jakub Kicinski
dd8b3a802b ipsec-next-2022-12-09
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmOS/ooACgkQrB3Eaf9P
 W7cOVA/+L8rHwLe78DDz/PNESyShtTVCYBDF/ngYMV8AIvjSfPresMbFV3NKqO5E
 3qbMl199QH2eWI7dhQaQ+edynSG0QCx5FmPai0UuHPLxATct1pNPJPpvBryO/4jC
 ZouYBIVjdMbq6Y8vD2gJ8UtA7TZpncP0HYOKTvYyDL9kQ+nUmu9KUYxcEcNHL5w+
 TjL9jJafR+GqczCRiwAoMKIFV7lUrTFzh7slfINNN5DVTuzN33H7Tp70z6IKOfVL
 1LATlZv7mqpLVF6dQuMXOt6kd/BEBl1y4ZHTHow5nstJvwu99P96iKwEfIXuOvWK
 fulhDU61eIik8D9QJWeM7TuZDbYewWI77plwVY/R/zRt0At4VLpq7I1m33CmLLMY
 Fb5fMxJPkM8YAtDID+BknYPrSAcxo8ji04BWFrVqQ6InPmtGfnP83XSSkYfxY7FB
 3hUfz4igsJpV5vrS1EFRhjklNwI+jY2yAvIggQtdkJ97ubSUY3E4ACfNqlJ5lJbv
 2KqWnSKlG21F9ZTR68VzcQVhFIQF6j/EuQqro+4TQUIdZswcml2iK32zrel0rs9C
 iAsgQQaMV9a2vEaScRZqdOJ4HENTbm9wD7Mso/i5vr+lnpr1ThKjQo8osU8YUlbC
 SDTMeWRRos+esFML6SP+YZ7SM/qXMluou204x/llJ/VDMXQ5e8k=
 =enQp
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
ipsec-next 2022-12-09

1) Add xfrm packet offload core API.
   From Leon Romanovsky.

2) Add xfrm packet offload support for mlx5.
   From Leon Romanovsky and Raed Salem.

3) Fix a typto in a error message.
   From Colin Ian King.

* tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits)
  xfrm: Fix spelling mistake "oflload" -> "offload"
  net/mlx5e: Open mlx5 driver to accept IPsec packet offload
  net/mlx5e: Handle ESN update events
  net/mlx5e: Handle hardware IPsec limits events
  net/mlx5e: Update IPsec soft and hard limits
  net/mlx5e: Store all XFRM SAs in Xarray
  net/mlx5e: Provide intermediate pointer to access IPsec struct
  net/mlx5e: Skip IPsec encryption for TX path without matching policy
  net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows
  net/mlx5e: Improve IPsec flow steering autogroup
  net/mlx5e: Configure IPsec packet offload flow steering
  net/mlx5e: Use same coding pattern for Rx and Tx flows
  net/mlx5e: Add XFRM policy offload logic
  net/mlx5e: Create IPsec policy offload tables
  net/mlx5e: Generalize creation of default IPsec miss group and rule
  net/mlx5e: Group IPsec miss handles into separate struct
  net/mlx5e: Make clear what IPsec rx_err does
  net/mlx5e: Flatten the IPsec RX add rule path
  net/mlx5e: Refactor FTE setup code to be more clear
  net/mlx5e: Move IPsec flow table creation to separate function
  ...
====================

Link: https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09 20:06:35 -08:00
Jakub Kicinski
c80edd8d41 mlx5-updates-2022-12-08
1) Support range match action in SW steering
 
 Yevgeny Kliteynik says:
 =======================
 
 The following patch series adds support for a range match action in
 SW Steering.
 
 SW steering is able to match only on the exact values of the packet fields,
 as requested by the user: the user provides mask for the fields that are of
 interest, and the exact values to be matched on when the traffic is handled.
 
 The following patch series add new type of action - Range Match, where the
 user provides a field to be matched on and a range of values (min to max)
 that will be considered as hit.
 
 There are several new notions that were implemented in order to support
 Range Match:
  - MATCH_RANGES Steering Table Entry (STE): the new STE type that allows
    matching the packets' fields on the range of values instead of a specific
    value.
  - Match Definer: this is a general FW object that defines which fields
    in the packet will be referenced by the mask and tag of each STE.
    Match definer ID is part of STE fields, and it defines how the HW needs
    to interpret the STE's mask/tag values.
    Till now SW steering used the definers that were managed by FW and
    implemented the STE layout as described by the HW spec.
    Now that we're adding a new type of STE, SW steering needs to also be
    able to define this new STE's layout, and this is do
 
 =======================
 
 2) From OZ add support for meter mtu offload
    2.1: Refactor the code to allow both metering and range post actions as a
         pre-step for adding police mtu offload support.
    2.2: Instantiate mtu green/red flow tables with a single match-all rule.
         Add the green/red actions to the hit/miss table accordingly
    2.3: Initialize the meter object with the TC police mtu parameter.
         Use the hardware range match action feature.
 
 3) From MaorD, support routes with more than 2 nexthops in multipath
 
 4) Michael and Or, improve and extend vport representor counters.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmOSfRAACgkQSD+KveBX
 +j7T2Qf/cRCmUOa6RqTMGs+u5LSACfGrFQapJG4Kd4yAcUIWl4lGpihWZKNjb0Hf
 fJWjy5mpUfCvhpTU139SVGZJ/bCJel0GK3Vu2UG++6q8m2L+XJUMKO/kIKxF8gdu
 qUo0JuCmSvgdLgSsDCqwY+8+tO6wSGE8VdV9uuOZj3jvtcHWNRCEwUDziMu0W1qh
 cYsx/yORSHk2n0YyL7De+c71+QZIa3kIwXMoVsZPpk27RkVMSvVL0oJ7vO3Zg73Y
 C6kK1KNg0GQ8UofHbtqrGW4NXvWvKPL986maYmGQDrP1jOEtey20QI3jW5G3OWZt
 xIzTs9D+MpmB4Cb/2O+lVL8aBfkQyw==
 =6ApF
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2022-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-12-08

1) Support range match action in SW steering

Yevgeny Kliteynik says:
=======================

The following patch series adds support for a range match action in
SW Steering.

SW steering is able to match only on the exact values of the packet fields,
as requested by the user: the user provides mask for the fields that are of
interest, and the exact values to be matched on when the traffic is handled.

The following patch series add new type of action - Range Match, where the
user provides a field to be matched on and a range of values (min to max)
that will be considered as hit.

There are several new notions that were implemented in order to support
Range Match:
 - MATCH_RANGES Steering Table Entry (STE): the new STE type that allows
   matching the packets' fields on the range of values instead of a specific
   value.
 - Match Definer: this is a general FW object that defines which fields
   in the packet will be referenced by the mask and tag of each STE.
   Match definer ID is part of STE fields, and it defines how the HW needs
   to interpret the STE's mask/tag values.
   Till now SW steering used the definers that were managed by FW and
   implemented the STE layout as described by the HW spec.
   Now that we're adding a new type of STE, SW steering needs to also be
   able to define this new STE's layout, and this is do

=======================

2) From OZ add support for meter mtu offload
   2.1: Refactor the code to allow both metering and range post actions as a
        pre-step for adding police mtu offload support.
   2.2: Instantiate mtu green/red flow tables with a single match-all rule.
        Add the green/red actions to the hit/miss table accordingly
   2.3: Initialize the meter object with the TC police mtu parameter.
        Use the hardware range match action feature.

3) From MaorD, support routes with more than 2 nexthops in multipath

4) Michael and Or, improve and extend vport representor counters.

* tag 'mlx5-updates-2022-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Expose steering dropped packets counter
  net/mlx5: Refactor and expand rep vport stat group
  net/mlx5e: multipath, support routes with more than 2 nexthops
  net/mlx5e: TC, add support for meter mtu offload
  net/mlx5e: meter, add mtu post meter tables
  net/mlx5e: meter, refactor to allow multiple post meter tables
  net/mlx5: DR, Add support for range match action
  net/mlx5: DR, Add function that tells if STE miss addr has been initialized
  net/mlx5: DR, Some refactoring of miss address handling
  net/mlx5: DR, Manage definers with refcounts
  net/mlx5: DR, Handle FT action in a separate function
  net/mlx5: DR, Rework is_fw_table function
  net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object
  net/mlx5: fs, add match on ranges API
  net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object
====================

Link: https://lore.kernel.org/r/20221209001420.142794-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09 19:44:43 -08:00
Ido Schimmel
7ec5364351 mlxsw: spectrum_ipip: Add Spectrum-1 ip6gre support
As explained in the previous patch, the existing Spectrum-2 ip6gre
implementation can be reused for Spectrum-1. Change the Spectrum-1
ip6gre operations structure to use the common operations.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:46:32 -08:00
Ido Schimmel
ab30e4d4b2 mlxsw: spectrum_ipip: Rename Spectrum-2 ip6gre operations
There are two main differences between Spectrum-1 and newer ASICs in
terms of IP-in-IP support:

1. In Spectrum-1, RIFs representing ip6gre tunnels require two entries
   in the RIF table.

2. In Spectrum-2 and newer ASICs, packets ingress the underlay (during
   encapsulation) and egress the underlay (during decapsulation) via a
   special generic loopback RIF.

The first difference was handled in previous patches by adding the
'double_rif_entry' field to the Spectrum-1 operations structure of
ip6gre RIFs. The second difference is handled during RIF creation, by
only creating a generic loopback RIF in Spectrum-2 and newer ASICs.

Therefore, the ip6gre operations can be shared between Spectrum-1 and
newer ASIC in a similar fashion to how the ipgre operations are shared.

Rename the operations to not be Spectrum-2 specific and move them
earlier in the file so that they could later be used for Spectrum-1.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:46:32 -08:00
Ido Schimmel
5ca1b208c5 mlxsw: spectrum_router: Add support for double entry RIFs
In Spectrum-1, loopback router interfaces (RIFs) used for IP-in-IP
encapsulation with an IPv6 underlay require two RIF entries and the RIF
index must be even.

Prepare for this change by extending the RIF parameters structure with a
'double_entry' field that indicates if the RIF being created requires
two RIF entries or not. Only set it for RIFs representing ip6gre tunnels
in Spectrum-1.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:46:32 -08:00
Ido Schimmel
1a2f65b4a2 mlxsw: spectrum_router: Parametrize RIF allocation size
Currently, each router interface (RIF) consumes one entry in the RIFs
table. This is going to change in subsequent patches where some RIFs
will consume two table entries.

Prepare for this change by parametrizing the RIF allocation size. For
now, always pass '1'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:46:31 -08:00
Ido Schimmel
40ef76de8a mlxsw: spectrum_router: Use gen_pool for RIF index allocation
Currently, each router interface (RIF) consumes one entry in the RIFs
table and there are no alignment constraints. This is going to change in
subsequent patches where some RIFs will consume two table entries and
their indexes will need to be aligned to the allocation size (even).

Prepare for this change by converting the RIF index allocation to use
gen_pool with the 'gen_pool_first_fit_order_align' algorithm.

No Kconfig changes necessary as mlxsw already selects
'GENERIC_ALLOCATOR'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 18:46:31 -08:00
Michael Guralnik
4fe1b3a5f8 net/mlx5: Expose steering dropped packets counter
Add rx steering discarded packets counter to the vnic_diag debugfs.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:56 -08:00
Or Har-Toov
64b68e3696 net/mlx5: Refactor and expand rep vport stat group
Expand representor vport stat group to support all counters from the
vport stat group, to count all the traffic passing through the vport.

Fix current implementation where fill_stats and update_stats use
different structs.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:55 -08:00
Maor Dickman
7c33e73995 net/mlx5e: multipath, support routes with more than 2 nexthops
Today multipath offload is only supported when the number of
nexthops is 2 which block the use of it in case of system with
2 NICs.

This patch solve it by enabling multipath offload per NIC if
2 nexthops of the route are its uplinks.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:55 -08:00
Oz Shlomo
6fda078d5f net/mlx5e: TC, add support for meter mtu offload
Initialize the meter object with the TC police mtu parameter.
Use the hardware range destination to compare the pkt len to the mtu setting.
Assign the range destination hit/miss ft to the police conform/exceed
attributes.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:55 -08:00
Oz Shlomo
d56713250a net/mlx5e: meter, add mtu post meter tables
TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
MTU check is realized in hardware using the range destination, specifying
a hit ft, if packet len is in the range, or miss ft otherwise.

Instantiate mtu green/red flow tables with a single match-all rule.
Add the green/red actions to the hit/miss table accordingly.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:55 -08:00
Oz Shlomo
fd6fa76146 net/mlx5e: meter, refactor to allow multiple post meter tables
TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
Currently the post meter table steers the packet according to the meter
aso output.

Refactor the code to allow both metering and range post actions as a
pre-step for adding police mtu offload support.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:55 -08:00
Yevgeny Kliteynik
be6d5daeaa net/mlx5: DR, Add support for range match action
Add support for matching on range.
The supported type of range is L2 frame size.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:54 -08:00
Yevgeny Kliteynik
1207a772c0 net/mlx5: DR, Add function that tells if STE miss addr has been initialized
Up until now miss address in all the STEs was used to connect miss lists
and to link the last STE in the list to end anchor.
Match range STE will require special handling because its miss address is
part of the 'action'. That is, range action has hit and miss addresses.
Since the range action is always the last action, need to make sure that
its miss address isn't overwritten by the end anchor.

Adding new function mlx5dr_ste_is_miss_addr_set() to answer the question
whether the STE's miss address has already been set as part of STE
initialization. Use a callback that always returns false right now. Once
match range is added, a different callback will be used for that STE type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:54 -08:00
Yevgeny Kliteynik
f31bda789f net/mlx5: DR, Some refactoring of miss address handling
In preparation for MATCH RANGE STE support, create a function
to set the miss address of an STE.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:54 -08:00
Yevgeny Kliteynik
1339678fdd net/mlx5: DR, Manage definers with refcounts
In many cases different actions will ask for the same definer format.
Instead of allocating new definer general object and running out of
definers, have an xarray of allocated definers and keep track of their
usage with refcounts: allocate a new definer only when there isn't
one with the same format already created, and destroy definer only
when its refcount runs down to zero.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:54 -08:00
Yevgeny Kliteynik
c72a57ad6e net/mlx5: DR, Handle FT action in a separate function
As preparation for range action support, moving the handling
of final ICM address for flow table action to a separate function.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:54 -08:00
Yevgeny Kliteynik
0a8c20e23f net/mlx5: DR, Rework is_fw_table function
This patch handles the following two changes w.r.t. is_fw_table function:

1. When SW steering is asked to create/destroy FW table, we allow for
creation/destruction of only termination tables. Rename mlx5_dr_is_fw_table
both to comply with the static function naming and to reflect that we're
actually checking for FW termination table.

2. When the action 'go to flow table' is created, the destination flow
table can be any FW table, not only termination table. Adding function
to check if the dest table is FW table. This function will also be used
by the later creation of range match action, so putting it the header file.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:54 -08:00
Yevgeny Kliteynik
e046b86e29 net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object
SW steering is able to match only on the exact values of the packet fields,
as requested by the user: the user provides mask for the fields that are of
interest, and the exact values to be matched on when the traffic is handled.

Match Definer is a general FW object that defines which fields in the
packet will be referenced by the mask and tag of each STE. Match definer ID
is part of STE fields, and it defines how the HW needs to interpret the STE's
mask/tag values.
Till now SW steering used the definers that were managed by FW and implemented
the STE layout as described by the HW spec. Now that we're adding a new type
of STE, SW steering needs to define for the HW how it should interpret this
new STE's layout.
This is done with a programmable match definer.

The programmable definer allows to selects which fields will be included in
the definer, and their layout: it has up to 9 DW selectors 8 Byte selectors.
Each selector indicates a DW/Byte worth of fields out of the table that
is defined by HW spec by referencing the offset of the required DW/Byte.

This patch adds dr_cmd function to create and destroy MATCH_DEFINER
general object.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:53 -08:00
Yevgeny Kliteynik
38bf24c38d net/mlx5: fs, add match on ranges API
Range is a new flow destination type which allows matching on
a range of values instead of matching on a specific value.

Range flow destination has the following fields:
 - hit_ft: flow table to forward the traffic in case of hit
 - miss_ft: flow table to forward the traffic in case of miss
 - field: which packet characteristic to match on
 - min: minimal value for the selected field
 - max: maximal value for the selected field

Note:
 - In order to match, the value in the packet should meet
   the following criteria: min <= value < max
 - Currently, the only supported field type is L2 packet length

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:53 -08:00
Eric Dumazet
0e706f7961 net/mlx4: small optimization in mlx4_en_xmit()
Test against MLX4_MAX_DESC_TXBBS only matters if the TX
bounce buffer is going to be used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 14:27:48 -08:00
Eric Dumazet
26782aad00 net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS
Google production kernel has increased MAX_SKB_FRAGS to 45
for BIG-TCP rollout.

Unfortunately mlx4 TX bounce buffer is not big enough whenever
an skb has up to 45 page fragments.

This can happen often with TCP TX zero copy, as one frag usually
holds 4096 bytes of payload (order-0 page).

Tested:
 Kernel built with MAX_SKB_FRAGS=45
 ip link set dev eth0 gso_max_size 185000
 netperf -t TCP_SENDFILE

I made sure that "ethtool -G eth0 tx 64" was properly working,
ring->full_size being set to 15.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wei Wang <weiwan@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 14:27:48 -08:00
Eric Dumazet
35f31ff0c0 net/mlx4: rename two constants
MAX_DESC_SIZE is really the size of the bounce buffer used
when reaching the right side of TX ring buffer.

MAX_DESC_TXBBS get a MLX4_ prefix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 14:27:48 -08:00
Oz Shlomo
3603f26633 net/mlx5e: TC, allow meter jump control action
Separate the matchall police action validation from flower validation.
Isolate the action validation logic in the police action parser.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
0d8c38d44f net/mlx5e: TC, init post meter rules with branching attributes
Instantiate the post meter actions with the platform initialized branching
action attributes.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
3fcb94e393 net/mlx5e: TC, rename post_meter actions
Currently post meter supports only the pipe/drop conform-exceed policy.
This assumption is reflected in several variable names.
Rename the following variables as a pre-step for using the generalized
branching action platform.

Rename fwd_green_rule/drop_red_rule to green_rule/red_rule respectively.
Repurpose red_counter/green_counter to act_counter/drop_counter to allow
police conform-exceed configurations that do not drop.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
c84fa1ab94 net/mlx5e: TC, initialize branching action with target attr
Identify the jump target action when iterating the action list.
Initialize the jump target attr with the jumping attribute during the
parsing phase. Initialize the jumping attr post action with the target
during the offload phase.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
f86488cb46 net/mlx5e: TC, initialize branch flow attributes
Initialize flow attribute for drop, accept, pipe and jump branching actions.

Instantiate a flow attribute instance according to the specified branch
control action. Store the branching attributes on the branching action
flow attribute during the parsing phase. Then, during the offload phase,
allocate the relevant mod header objects to the branching actions.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
ec5878552b net/mlx5e: TC, set control params for branching actions
Extend the act tc api to set the branch control params aligning with
the police conform/exceed use case.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
6442638251 net/mlx5e: TC, validate action list per attribute
Currently the entire flow action list is validate for offload limitations.
For example, flow with both forward and drop actions are declared invalid
due to hardware restrictions.
However, a multi-table hardware model changes the limitations from a flow
scope to a single flow attribute scope.

Apply offload limitations to flow attributes instead of the entire flow.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
d3f6b0df91 net/mlx5e: TC, add terminating actions
Extend act api to identify actions that terminate action list.
Pre-step for terminating branching actions.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
8facc02f22 net/mlx5e: TC, reuse flow attribute post parser processing
After the tc action parsing phase the flow attribute is initialized with
relevant eswitch offload objects such as tunnel, vlan, header modify and
counter attributes. The post processing is done both for fdb and post-action
attributes.

Reuse the flow attribute post parsing logic by both fdb and post-action
offloads.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
f07d8afb1c net/mlx5: fs, assert null dest pointer when dest_num is 0
Currently create_flow_handle() assumes a null dest pointer when there
are no destinations.
This might not be the case as the caller may pass an allocated dest
array while setting the dest_num parameter to 0.

Assert null dest array for flow rules that have no destinations (e.g. drop
rule).

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Oz Shlomo
5a5624d1ed net/mlx5e: E-Switch, handle flow attribute with no destinations
Rules with drop action are not required to have a destination.
Currently the destination list is allocated with the maximum number of
destinations and passed to the fs_core layer along with the actual number
of destinations.

Remove redundant passing of dest pointer when count of dest is 0.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08 13:04:26 -08:00
Leon Romanovsky
37d244ad18 net/mlx5e: Open mlx5 driver to accept IPsec packet offload
Enable configuration of IPsec packet offload through XFRM state add
interface together with moving specific to IPsec packet mode limitations
to specific switch-case section.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-08 10:36:10 +01:00
Leon Romanovsky
cee137a634 net/mlx5e: Handle ESN update events
Extend event logic to update ESN state (esn_msb, esn_overlap)
for an IPsec Offload context.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-08 10:36:09 +01:00