Commit Graph

1325 Commits

Author SHA1 Message Date
Linus Torvalds
7ede5f78a0 v6.5 merge window RDMA pull request
This cycle saw a focus on rxe and bnxt_re drivers:
 
 - Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma
 
 - rxe uses workqueues instead of tasklets
 
 - rxe has better compliance around access checks for MRs and rereg_mr
 
 - mana supportst he 'v2' FW interface for RX coalescing
 
 - hfi1 bug fix for stale cache entries in its MR cache
 
 - mlx5 buf fix to handle FW failures when destroying QPs
 
 - erdma HW has a new doorbell allocation mechanism for uverbs that is
   secure
 
 - Lots of small cleanups and rework in bnxt_re
    * Use the common mmap functions
    * Support disassociation
    * Improve FW command flow
 
 - bnxt_re support for "low latency push", this allows a packet
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZJxq+wAKCRCFwuHvBreF
 YSN+AQDKAmWdKCqmc3boMIq5wt4h7yYzdW47LpzGarOn5Hf+UgEA6mpPJyRqB43C
 CNXYIbASl/LLaWzFvxCq/AYp6tzuog4=
 =I8ju
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This cycle saw a focus on rxe and bnxt_re drivers:

   - Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma

   - rxe uses workqueues instead of tasklets

   - rxe has better compliance around access checks for MRs and rereg_mr

   - mana supportst he 'v2' FW interface for RX coalescing

   - hfi1 bug fix for stale cache entries in its MR cache

   - mlx5 buf fix to handle FW failures when destroying QPs

   - erdma HW has a new doorbell allocation mechanism for uverbs that is
     secure

   - Lots of small cleanups and rework in bnxt_re:
       - Use the common mmap functions
       - Support disassociation
       - Improve FW command flow
       - support for 'low latency push'"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits)
  RDMA/bnxt_re: Fix an IS_ERR() vs NULL check
  RDMA/bnxt_re: Fix spelling mistake "priviledged" -> "privileged"
  RDMA/bnxt_re: Remove duplicated include in bnxt_re/main.c
  RDMA/bnxt_re: Refactor code around bnxt_qplib_map_rc()
  RDMA/bnxt_re: Remove incorrect return check from slow path
  RDMA/bnxt_re: Enable low latency push
  RDMA/bnxt_re: Reorg the bar mapping
  RDMA/bnxt_re: Move the interface version to chip context structure
  RDMA/bnxt_re: Query function capabilities from firmware
  RDMA/bnxt_re: Optimize the bnxt_re_init_hwrm_hdr usage
  RDMA/bnxt_re: Add disassociate ucontext support
  RDMA/bnxt_re: Use the common mmap helper functions
  RDMA/bnxt_re: Initialize opcode while sending message
  RDMA/cma: Remove NULL check before dev_{put, hold}
  RDMA/rxe: Simplify cq->notify code
  RDMA/rxe: Fixes mr access supported list
  RDMA/bnxt_re: optimize the parameters passed to helper functions
  RDMA/bnxt_re: remove redundant cmdq_bitmap
  RDMA/bnxt_re: use firmware provided max request timeout
  RDMA/bnxt_re: cancel all control path command waiters upon error
  ...
2023-06-29 21:01:17 -07:00
Jason Gunthorpe
5f004bcaee Linux 6.4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmSYzfYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ucH/iOM/1Py/fSg0qSs
 7NJ4XXlourT5zrnRMom3cm3d9gYqgTzgvKFL3kjMEexTRVYbhlcO4ZPRsiry8zxF
 ToGX+V8tDMqb8WSdFHzkljRY+zDRyfEUDMlTzROAD9DunLmQtkJKyrggkeGdjkpP
 OyfGqKpwlLXZRAXBil/U8Mx9MHdjJubloZwghLZr33VdUZa68+JJ9l6w163Oe/ET
 K264NM0wxN/kvN57JvePgqMccQwpINylg8IhRI+XelgczjUXeJBsOA8TDv4bDN4Q
 bjCLhkWbIaZtTYqvOXa/kD0T8wd7KETsMBQN8YzyDh6W0GmAlJjTawyAhA6jA5in
 x3uz2W8=
 =L3zp
 -----END PGP SIGNATURE-----

Merge tag 'v6.4' into rdma.git for-next

Linux 6.4

Resolve conflicts between rdma rc and next in rxe_cq matching linux-next:

drivers/infiniband/sw/rxe/rxe_cq.c:
  https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-27 14:06:29 -03:00
Lama Kayal
9ee473c259 net/mlx5: Fix reserved at offset in hca_cap register
A member of struct mlx5_ifc_cmd_hca_cap_bits has been mistakenly
assigned the wrong reserved_at offset value. Correct it to align to the
right value, thus avoid future miscalculation.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-23 12:27:33 -07:00
Or Har-Toov
0bd2e6fc78 net/mlx5: Expose bits for local loopback counter
Add needed HW bits for querying local loopback counter and the
HCA capability for it.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:08 -07:00
Moshe Shemesh
7a9770f1bf net/mlx5: Handle sync reset unload event
Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.

If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.

Added capability bit for supporting sync_reset_unload event.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:07 -07:00
Moshe Shemesh
8bb42ed421 net/mlx5: Expose timeout for sync reset unload stage
Expose new timoueout in Default Timeouts Register to be used on sync
reset flow running on smart NIC. In this flow the driver should know how
much time to wait from getting unload request till firmware will ask the
PF to continue to next stage of the flow.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:06 -07:00
Jakub Kicinski
173780ff18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/linux/mlx5/driver.h
  617f5db1a6 ("RDMA/mlx5: Fix affinity assignment")
  dc13180824 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/

tools/testing/selftests/net/mptcp/mptcp_join.sh
  47867f0a7e ("selftests: mptcp: join: skip check if MIB counter not supported")
  425ba80312 ("selftests: mptcp: join: support RM_ADDR for used endpoints or not")
  45b1a1227a ("mptcp: introduces more address related mibs")
  0639fa230a ("selftests: mptcp: add explicit check for new mibs")
https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 22:19:41 -07:00
Linus Torvalds
93fd8eb053 v6.4 second rc RDMA pull request
Many rc bug fixes:
 
 - Two rtrs bug fixes for error unwind bugs
 
 - Several rxe bug fixes:
    * Incorrect Rx packet validation
    * Using memory without a refcount
    * Syzkaller found use before initialization
    * Regression fix for missing locking with the tasklet conversion from
      this merge window
 
 - Have bnxt report the correct link properties to userspace, this was
   a regression in v6.3
 
 - Several mlx5 bug fixes:
    * Kernel crash triggerable by userspace for the RAW ethernet profile
    * Defend against steering refcounting issues created by userspace
    * Incorrect change of QP port affinity parameters in some LAG configurations
 
 - Fix mlx5 Q counters:
    * Do not over allocate Q counters to allow userspace to use the full
      port capacity
    * Kernel crash triggered by eswitch due to mis-use of Q counters
    * Incorrect mlx5_device for Q counters in some LAG configurations
 
 - Properly implement the IBA spec restricting privileged qkeys to root
 
 - Always an error when reading from a disassociated device's event queue
 
 - isert bug fixes:
    * Avoid a deadlock with the CM handler and CM ID destruction
    * Correct list corruption due to incorrect locking
    * Fix a use after free around connection tear down
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZIoZ7gAKCRCFwuHvBreF
 YUSyAPsH+VdqzYd1z/bk0zMds9JR2oqMOR8l02e4gq8K9hjTJAD/ePEuT5PTpFZu
 FrT4SDhOfA30XNz+RofRNisJC92OOAo=
 =KyF7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is an unusually large bunch of bug fixes for the later rc cycle,
  rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
  itself, and mlx5 is fixing a bunch of "queue counters" related bugs.

  There is one highly notable bug fix regarding the qkey. This small
  security check was missed in the original 2005 implementation and it
  allows some significant issues.

  Summary:

   - Two rtrs bug fixes for error unwind bugs

   - Several rxe bug fixes:
      * Incorrect Rx packet validation
      * Using memory without a refcount
      * Syzkaller found use before initialization
      * Regression fix for missing locking with the tasklet conversion
        from this merge window

   - Have bnxt report the correct link properties to userspace, this was
     a regression in v6.3

   - Several mlx5 bug fixes:
      * Kernel crash triggerable by userspace for the RAW ethernet
        profile
      * Defend against steering refcounting issues created by userspace
      * Incorrect change of QP port affinity parameters in some LAG
        configurations

   - Fix mlx5 Q counters:
      * Do not over allocate Q counters to allow userspace to use the
        full port capacity
      * Kernel crash triggered by eswitch due to mis-use of Q counters
      * Incorrect mlx5_device for Q counters in some LAG configurations

   - Properly implement the IBA spec restricting privileged qkeys to
     root

   - Always an error when reading from a disassociated device's event
     queue

   - isert bug fixes:
      * Avoid a deadlock with the CM handler and CM ID destruction
      * Correct list corruption due to incorrect locking
      * Fix a use after free around connection tear down"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Fix rxe_cq_post
  IB/isert: Fix incorrect release of isert connection
  IB/isert: Fix possible list corruption in CMA handler
  IB/isert: Fix dead lock in ib_isert
  RDMA/mlx5: Fix affinity assignment
  IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
  RDMA/uverbs: Restrict usage of privileged QKEYs
  RDMA/cma: Always set static rate to 0 for RoCE
  RDMA/mlx5: Fix Q-counters query in LAG mode
  RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
  RDMA/mlx5: Fix Q-counters per vport allocation
  RDMA/mlx5: Create an indirect flow table for steering anchor
  RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
  RDMA/rxe: Fix the use-before-initialization error of resp_pkts
  RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
  RDMA/rxe: Fix ref count error in check_rkey()
  RDMA/rxe: Fix packet length checks
  RDMA/rtrs: Fix rxe_dealloc_pd warning
  RDMA/rtrs: Fix the last iu->buf leak in err path
2023-06-15 20:13:56 -07:00
Mark Bloch
617f5db1a6 RDMA/mlx5: Fix affinity assignment
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.

However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.

The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.

To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.

Fixes: 802dcc7fc5 ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11 11:27:17 +03:00
Patrisious Haddad
afff248998 RDMA/mlx5: Handle DCT QP logic separately from low level QP interface
Previously when destroying a DCT, if the firmware function for the
destruction failed, the common resource would have been destroyed
either way, since it was destroyed before the firmware object.
Which leads to kernel warning "refcount_t: underflow" which indicates
possible use-after-free.
Which is triggered when we try to destroy the common resource for the
second time and execute refcount_dec_and_test(&common->refcount).

So, let's fix the destruction order by factoring out the DCT QP logic
to be in separate XArray database.

refcount_t: underflow; use-after-free.
WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse
CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0xd8/0xe0
Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0
RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003
R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246
R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020
FS:  00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 destroy_resource_common+0x6e/0x95 [mlx5_ib]
 mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib]
 mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib]
 ib_destroy_wq_user+0x1f/0x40 [ib_core]
 uverbs_free_wq+0x19/0x40 [ib_uverbs]
 destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs]
 uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs]
 uobj_destroy+0x3c/0x80 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs]
 ? uverbs_free_wq+0x40/0x40 [ib_uverbs]
 ? ip_list_rcv+0xf7/0x120
 ? netif_receive_skb_list_internal+0x1b6/0x2d0
 ? task_tick_fair+0xbf/0x450
 ? __handle_mm_fault+0x11fc/0x1450
 ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
 __x64_sys_ioctl+0x3e4/0x8e0
 ? handle_mm_fault+0xb9/0x210
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7efebc0be17b
Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b
RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005
RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f
R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80
R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0
 </TASK>

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4470888466c8a898edc9833286967529cc5f3c0d.1685953497.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11 11:21:40 +03:00
Leon Romanovsky
2ecfd94616 RDMA/mlx5: Reduce QP table exposure
driver.h is common header to whole mlx5 code base, but struct
mlx5_qp_table is used in mlx5_ib driver only. So move that struct
to be under sole responsibility of mlx5_ib.

Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-06-11 11:21:28 +03:00
Daniel Jurgens
395ccd6eb4 net/mlx5: Add new page type for EC VF pages
When the embedded cpu supports SRIOV it can be enabled and disabled
independently from the host SRIOV. Track the pages separately so we can
properly wait for returned VF pages.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
9ac0b12824 net/mlx5: Update vport caps query/set for EC VFs
These functions are for query/set by vport, there was an underlying
assumption that vport was equal to function ID. That's not the case for
EC VF functions. Set the ec_vf_function bit accordingly.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
dc13180824 net/mlx5: Enable devlink port for embedded cpu VF vports
Enable creation of a devlink port for EC VF vports.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:51 -07:00
Daniel Jurgens
93b36d0f28 net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport
of embedded CPU virtual functions that are connected to the eswitch.

Add ec_vf_function to query/set_hca_caps. If set this indicates
accessing a virtual function on the embedded CPU by function ID. This
should only be used with other_function set to 1.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:50 -07:00
Lama Kayal
a33682e4e7 net/mlx5e: Expose catastrophic steering error counters
Add generated_pkt_steering_fail and handled_pkt_steering_fail to devlink
heatlth reporter.
generated_pkt_steering_fail indicates the number of packets dropped due to
illegal steering operation within the vport steering domain.
handled_pkt_steering_fail indicates the number of packets dropped due to
illegal steering operation, originated by the vport.

Also, update devlink reporter functionality documentation with the newly
exposed counters.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07 14:00:43 -07:00
Shay Drory
222dd18583 {net/RDMA}/mlx5: introduce lag_for_each_peer
Introduce a generic APIs to iterate over all the devices which are part
of the LAG. This API replace mlx5_lag_get_peer_mdev() which retrieve
only a single peer device from the lag.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07 14:00:42 -07:00
Dragos Tatulea
1db1f21cae net/mlx5e: Use query_special_contexts cmd only once per mdev
Don't query the firmware so many times (num rqs * num wqes * wqe frags)
because it slows down linearly the interface creation time when the
product is larger. Do it only once per mdev and store the result in
mlx5e_param.

Due to helper function being called from different files, move it to
an appropriate location. Rename the function with a proper prefix and
add a small cleanup.

This fix applies only for legacy rq.

Fixes: 1b1e486883 ("net/mlx5e: Use query_special_contexts for mkeys")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-24 20:44:18 -07:00
Yevgeny Kliteynik
c7dd225bc2 net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
(loopback), and FL (force-loopback) QP is preferred for performance. FL is
available when RoCE is enabled or disabled based on RoCE caps.
This patch adds reading of FL capability from HCA caps in addition to the
existing reading from RoCE caps, thus fixing the case where we didn't
have loopback enabled when RoCE was disabled.

Fixes: 7304d603a5 ("net/mlx5: DR, Add support for force-loopback QP")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Linus Torvalds
af3877265d v6.4 merge window RDMA pull request
Usual wide collection of unrelated items in drivers:
 
 - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe,
   usnic, usnic, bnxt_re, ocrdma, iser
    * Unnecessary NULL checks
    * kmap obsolescence
    * pci_enable_pcie_error_reporting() obsolescence
    * Unused variables and macros
    * trace event related warnings
    * casting warnings
 
 - Code cleanups for irdm and erdma
 
 - EFA reporting of 128 byte PCIe TLP support
 
 - mlx5 more agressively uses the out of order HW feature
 
 - Big rework of how state machines and tasks work in rxe
 
 - Fix a syzkaller found crash netdev refcount leak in siw
 
 - bnxt_re revises their HW description header
 
 - Congestion control for bnxt_re
 
 - Use mmu_notifiers more safely in hfi1
 
 - mlx5 gets better support for PCIe relaxed ordering inside VMs
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZEva5wAKCRCFwuHvBreF
 YZFmAQC9T3b/XQ3bRknYciuzbatC98o9xB0FTqmEFYGj+Y2lVAD9EEVe3HKfHfi3
 t/GxXYB5r22oxg5bgsblZfEdEdTVCg8=
 =akMm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual wide collection of unrelated items in drivers:

   - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5,
     rxe, usnic, usnic, bnxt_re, ocrdma, iser:
       - remove unnecessary NULL checks
       - kmap obsolescence
       - pci_enable_pcie_error_reporting() obsolescence
       - unused variables and macros
       - trace event related warnings
       - casting warnings

   - Code cleanups for irdm and erdma

   - EFA reporting of 128 byte PCIe TLP support

   - mlx5 more agressively uses the out of order HW feature

   - Big rework of how state machines and tasks work in rxe

   - Fix a syzkaller found crash netdev refcount leak in siw

   - bnxt_re revises their HW description header

   - Congestion control for bnxt_re

   - Use mmu_notifiers more safely in hfi1

   - mlx5 gets better support for PCIe relaxed ordering inside VMs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits)
  RDMA/efa: Add rdma write capability to device caps
  RDMA/mlx5: Use correct device num_ports when modify DC
  RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call
  RDMA/rxe: Fix spinlock recursion deadlock on requester
  RDMA/mlx5: Fix flow counter query via DEVX
  RDMA/rxe: Protect QP state with qp->state_lock
  RDMA/rxe: Move code to check if drained to subroutine
  RDMA/rxe: Remove qp->req.state
  RDMA/rxe: Remove qp->comp.state
  RDMA/rxe: Remove qp->resp.state
  RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
  net/mlx5: Update relaxed ordering read HCA capabilities
  RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
  RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
  RDMA: Add ib_virt_dma_to_page()
  RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
  RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame()
  RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c
  IB/hfi1: Place struct mmu_rb_handler on cache line start
  IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests
  ...
2023-04-29 17:21:24 -07:00
Jakub Kicinski
fbc1449d38 mlx5-updates-2023-04-20
1) Dragos Improves RX page pool, and provides some fixes to his previous series:
  1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case
  1.2) Hook NAPIs to page pools to gain more performance.
 
 2) From Roi, Some cleanups to TC and eswitch modules.
 
 3) Maher migrates vnic diagnostic counters reporting from debugfs to a
     dedicated devlink health reporter
 
 Maher Says:
 ===========
  net/mlx5: Expose vnic diagnostic counters using devlink
 
 Currently, vnic diagnostic counters are exposed through the following
 debugfs:
 
 $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
 cq_overrun
 quota_exceeded_command
 total_q_under_processor_handle
 invalid_command
 send_queue_priority_update_flow
 nic_receive_steering_discard
 
 The current design does not allow the hypervisor to view the diagnostic
 counters of its VFs, in case the VFs get bound to a VM. In other words,
 the counters are not exposed for representor interfaces.
 Furthermore, the debugfs design is inconvenient future-wise, in case more
 counters need to be reported by the driver in the future.
 
 As these counters pertain to vNIC health, it is more appropriate to
 utilize the devlink health reporter to expose them.
 
 Thus, this patchest includes the following changes:
 
 * Drop the current vnic diagnostic counters debugfs interface.
 * Add a vnic devlink health reporter for PFs/VFs core devices, which
   when diagnosed will dump vnic diagnostic counter values that are
   queried from FW.
 * Add a vnic devlink health reporter for the representor interface, which
   serves the same purpose listed in the previous point, in addition to
   allowing the hypervisor to view its VFs diagnostic counters, even when
   the VFs are bounded to external VMs.
 
 Example of devlink health reporter usage is:
 $devlink health diagnose pci/0000:08:00.0 reporter vnic
  vNIC env counters:
     total_error_queues: 0 send_queue_priority_update_flow: 0
     comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
     invalid_command: 0 quota_exceeded_command: 0
     nic_receive_steering_discard: 0
 
 ===========
 
 4) SW steering fixes and improvements
 
 Yevgeny Kliteynik Says:
 =======================
 These short patch series are just small fixes / improvements for
 SW steering:
 
  - Patch 1: Fix dumping of legacy modify_hdr in debug dump to
    align to what is expected by parser
  - Patch 2: Have separate threshold for ICM sync per ICM type
  - Patch 3: Add more info to the steering debug dump - Linux
    version and device name
  - Patch 4: Keep track of number of buddies that are currently
    in use per domain per buddy type
 
 =======================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmRB6HcACgkQSD+KveBX
 +j4+nAf/cCJ7poDQQOp/Oug6H3xn/COYPv3Iasl0dhT8yu+LlNMQTF0bRrU4UpAQ
 aIKcja2biOBAnD96EhA1nJoo9bJUTtKLokUDDyK/xRHS+wIyr8Lia6vxTz1yjj3C
 jDqX3+ZP4rFuhAvh+92AT1I0JvS0g+ocokPVKmm+Pwf4y7sG69CZ7phVGSc0iFfT
 y+gnP4C6cdIr7kNLByeeX6alDHL/q83vfNFWrugRPna2uXjcSR5Gtp03pJ0OVkI5
 qHxG7Bz0BE/hcMYwNcNVTu/5e02+5PS6B8kN/ho5DkhVFhp+h17XWWvOMZxC3jfI
 k0ijRSWocG1jIgRgkioBRgzXmc6nNw==
 =jUe8
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-04-20

1) Dragos Improves RX page pool, and provides some fixes to his previous
   series:
 1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case
 1.2) Hook NAPIs to page pools to gain more performance.

2) From Roi, Some cleanups to TC and eswitch modules.

3) Maher migrates vnic diagnostic counters reporting from debugfs to a
    dedicated devlink health reporter

Maher Says:
===========
 net/mlx5: Expose vnic diagnostic counters using devlink

Currently, vnic diagnostic counters are exposed through the following
debugfs:

$ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
cq_overrun
quota_exceeded_command
total_q_under_processor_handle
invalid_command
send_queue_priority_update_flow
nic_receive_steering_discard

The current design does not allow the hypervisor to view the diagnostic
counters of its VFs, in case the VFs get bound to a VM. In other words,
the counters are not exposed for representor interfaces.
Furthermore, the debugfs design is inconvenient future-wise, in case more
counters need to be reported by the driver in the future.

As these counters pertain to vNIC health, it is more appropriate to
utilize the devlink health reporter to expose them.

Thus, this patchest includes the following changes:

* Drop the current vnic diagnostic counters debugfs interface.
* Add a vnic devlink health reporter for PFs/VFs core devices, which
  when diagnosed will dump vnic diagnostic counter values that are
  queried from FW.
* Add a vnic devlink health reporter for the representor interface, which
  serves the same purpose listed in the previous point, in addition to
  allowing the hypervisor to view its VFs diagnostic counters, even when
  the VFs are bounded to external VMs.

Example of devlink health reporter usage is:
$devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
    total_error_queues: 0 send_queue_priority_update_flow: 0
    comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
    invalid_command: 0 quota_exceeded_command: 0
    nic_receive_steering_discard: 0

===========

4) SW steering fixes and improvements

Yevgeny Kliteynik Says:
=======================
These short patch series are just small fixes / improvements for
SW steering:

 - Patch 1: Fix dumping of legacy modify_hdr in debug dump to
   align to what is expected by parser
 - Patch 2: Have separate threshold for ICM sync per ICM type
 - Patch 3: Add more info to the steering debug dump - Linux
   version and device name
 - Patch 4: Keep track of number of buddies that are currently
   in use per domain per buddy type

=======================

* tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Update op_mode to op_mod for port selection
  net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set()
  net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()
  net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn()
  net/mlx5e: RX, Hook NAPIs to page pools
  net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
  net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
  net/mlx5e: Add vnic devlink health reporter to representors
  net/mlx5: Add vnic devlink health reporter to PFs/VFs
  Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"
  Revert "net/mlx5: Expose steering dropped packets counter"
  net/mlx5: DR, Add memory statistics for domain object
  net/mlx5: DR, Add more info in domain dbg dump
  net/mlx5: DR, Calculate sync threshold of each pool according to its type
  net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump
====================

Link: https://lore.kernel.org/r/20230421013850.349646-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21 20:47:05 -07:00
Roi Dayan
f9c895a72a net/mlx5: Update op_mode to op_mod for port selection
To be consistent with the other enum keys use OP_MOD
instead of OP_MODE.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-20 18:35:50 -07:00
Maher Sanalla
b0bc615df4 net/mlx5: Add vnic devlink health reporter to PFs/VFs
Create a vnic devlink health reporter for PFs/VFs interfaces.
The reporter's diagnose callback displays the values of vNIC/vport
transport debug counters of PFs/VFs, as follows:

$ devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
    total_error_queues: 0 send_queue_priority_update_flow: 0
    comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
    invalid_command: 0 quota_exceeded_command: 0
    nic_receive_steering_discard: 0

Moreover, add documentation on the reporter functionality and the
counters description.

While at it, expose the vNIC counters diagnose function to be used by
the downstream patch, which will reveal the counters for representor
interfaces.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-20 18:35:49 -07:00
Jakub Kicinski
681c5b51dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Adjacent changes:

net/mptcp/protocol.h
  63740448a3 ("mptcp: fix accept vs worker race")
  2a6a870e44 ("mptcp: stops worker on unaccepted sockets at listener close")
  ddb1a072f8 ("mptcp: move first subflow allocation at mpc access time")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20 16:29:51 -07:00
Jakub Kicinski
f52cc627b8 Revert "net/mlx5: Enable management PF initialization"
This reverts commit fe998a3c77.

Paul reports that it causes a regression with IB on CX4
and FW 12.18.1000. In addition I think that the concept
of "management PF" is not fully accepted and requires
a discussion.

Fixes: fe998a3c77 ("net/mlx5: Enable management PF initialization")
Reported-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/
Link: https://lore.kernel.org/r/20230413222547.56901-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19 18:51:28 -07:00
Mark Bloch
3e358ea861 RDMA/mlx5: Fix flow counter query via DEVX
Commit cited in "fixes" tag added bulk support for flow counters but it
didn't account that's also possible to query a counter using a non-base id
if the counter was allocated as bulk.

When a user performs a query, validate the flow counter id given in the
mailbox is inside the valid range taking bulk value into account.

Fixes: 208d70f562 ("IB/mlx5: Support flow counters offset for bulk counters")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-18 08:47:10 +03:00
Leon Romanovsky
1210af3b99 net/mlx5e: Add IPsec packet offload tunnel bits
Extend packet reformat types and flow table capabilities with
IPsec packet offload tunnel bits.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17 18:55:25 -07:00
Avihai Horon
ccbbfe0682 net/mlx5: Update relaxed ordering read HCA capabilities
Rename existing HCA capability relaxed_ordering_read to
relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM
change to better describe the capability, as it's set only if both the
device supports relaxed ordering (RO) read and RO is enabled in PCI
config space.

In addition, add new HCA capability relaxed_ordering_read which is set
if the device supports RO read, regardless of RO in PCI config space.
This will be used in the following patch to allow RO in VFs and VMs.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 13:29:19 +03:00
Jakub Kicinski
e473ea818b mlx5-updates-2023-04-11
1) Vlad adds the support for linux bridge multicast offload support
    Patches #1 through #9
    Synopsis
 
 Vlad Says:
 ==============
 Implement support of bridge multicast offload in mlx5. Handle port object
 attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast
 offload and bridge snooping support on bridge. Handle port object
 SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB.
 
 Steering architecture
 
 Existing offload infrastructure relies on two levels of flow tables - bridge
 ingress and egress. For multicast offload the architecture is extended with
 additional layer of per-port multicast replication tables. Such tables filter
 loopback traffic (so packets are not replicated to their source port) and pop
 VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in
 egress table. MDB egress rule can point to multiple per-port multicast tables,
 which causes matching multicast traffic to be replicated to all of them, and,
 consecutively, to several bridge ports:
 
                                                                                                                             +--------+--+
                                                                                     +---------------------------------------> Port 1 |  |
                                                                                     |                                       +-^------+--+
                                                                                     |                                         |
                                                                                     |                                         |
                                        +-----------------------------------------+  |     +---------------------------+       |
                                        | EGRESS table                            |  |  +--> PORT 1 multicast table    |       |
 +----------------------------------+   +-----------------------------------------+  |  |  +---------------------------+       |
 | INGRESS table                    |   |                                         |  |  |  |                           |       |
 +----------------------------------+   | dst_mac=P1,vlan=X -> pop vlan, goto P1  +--+  |  | FG0:                      |       |
 |                                  |   | dst_mac=P1,vlan=Y -> pop vlan, goto P1  |     |  | src_port=dst_port -> drop |       |
 | src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2  +--+  |  | FG1:                      |       |
 | ...                              |   | dst_mac=P2,vlan=Y -> goto P2            |  |  |  | VLAN X -> pop, goto port  |       |
 |                                  |   | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+  | ...                       |       |
 +----------------------------------+   |                                         |  |  |  | VLAN Y -> pop, goto port  +-------+
                                        +-----------------------------------------+  |  |  | FG3:                      |
                                                                                     |  |  | matchall -> goto port     |
                                                                                     |  |  |                           |
                                                                                     |  |  +---------------------------+
                                                                                     |  |
                                                                                     |  |
                                                                                     |  |                                    +--------+--+
                                                                                     +---------------------------------------> Port 2 |  |
                                                                                        |                                    +-^------+--+
                                                                                        |                                      |
                                                                                        |                                      |
                                                                                        |  +---------------------------+       |
                                                                                        +--> PORT 2 multicast table    |       |
                                                                                           +---------------------------+       |
                                                                                           |                           |       |
                                                                                           | FG0:                      |       |
                                                                                           | src_port=dst_port -> drop |       |
                                                                                           | FG1:                      |       |
                                                                                           | VLAN X -> pop, goto port  |       |
                                                                                           | ...                       |       |
                                                                                           |                           |       |
                                                                                           | FG3:                      |       |
                                                                                           | matchall -> goto port     +-------+
                                                                                           |                           |
                                                                                           +---------------------------+
 
 Patches overview:
 
 - Patch 1 adds hardware definition bits for capabilities required to replicate
   multicast packets to multiple per-port tables. These bits are used by
   following patches to only attempt multicast offload if firmware and hardware
   provide necessary support.
 
 - Pathces 2-4 patches are preparations and refactoring.
 
 - Patch 5 implements necessary infrastructure to toggle multicast offload
   via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification.
   This also enabled IGMP and MLD snooping.
 
 - Patch 6 implements per-port multicast replication tables. It only supports
   filtering of loopback packets.
 
 - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged'
   VLANs.
 
 - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It
   creates MDB replication rules in egress table that can replicate packets to
   multiple per-port multicast tables.
 
 - Patch 9 adds tracepoints for MDB events.
 
 ==============
 
 2) Parav Create a new allocation profile for SFs, to save on memory
 
 3) Yevgeny provides some initial patches for upcoming software steering
    support new pattern/arguments type of modify_header actions.
 
 Starting with ConnectX-6 DX, we use a new design of modify_header FW object.
 The current modify_header object allows for having only limited number of
 these FW objects, which means that we are limited in the number of offloaded
 flows that require modify_header action.
 
 As a preparation Yevgeny provides the following 4 patches:
  - Patch 1: Add required mlx5_ifc HW bits
  - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg
    support and adds appropriate support in dr_send.c
  - Patch 4: Add ICM pool for modify-header-pattern objects and implement
    patterns cache, allowing patterns reuse for different flows
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQ2LDIACgkQSD+KveBX
 +j59UAf+NfLEq7i/gYOri1rLwXOrgPd13w64Oob79+pKVXzRqik/gBg4d0CKdc2X
 Qafntudus7IvVrGJCM0vurw7nC75H9BMFh3dQW2u3uaR9m6i0k4tSKIAoJUFHMdj
 dMgAJywYkCCXlBbGVhRE3yybal2EXfOSJ+Fr2tGeqpL4vRlg4kIaVuFGcGk2gMpq
 gOahX6wtuop3ABzY3sqKCIQ1Q2jWx9AWWz+lEmw7HJGmHGugscYrgSIloHDnu1fz
 Bt6uODn7GFr866rQr9+JemG0VyK8ahyisEFnX1oJ3642ZidPcDrbIrrtMWeEqo7X
 1tcmY/wNti87xFbg3gL+DZekhHTHoQ==
 =oIN8
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-04-11

1) Vlad adds the support for linux bridge multicast offload support
   Patches #1 through #9
   Synopsis

Vlad Says:
==============
Implement support of bridge multicast offload in mlx5. Handle port object
attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast
offload and bridge snooping support on bridge. Handle port object
SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB.

Steering architecture

Existing offload infrastructure relies on two levels of flow tables - bridge
ingress and egress. For multicast offload the architecture is extended with
additional layer of per-port multicast replication tables. Such tables filter
loopback traffic (so packets are not replicated to their source port) and pop
VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in
egress table. MDB egress rule can point to multiple per-port multicast tables,
which causes matching multicast traffic to be replicated to all of them, and,
consecutively, to several bridge ports:

                                                                                                                            +--------+--+
                                                                                    +---------------------------------------> Port 1 |  |
                                                                                    |                                       +-^------+--+
                                                                                    |                                         |
                                                                                    |                                         |
                                       +-----------------------------------------+  |     +---------------------------+       |
                                       | EGRESS table                            |  |  +--> PORT 1 multicast table    |       |
+----------------------------------+   +-----------------------------------------+  |  |  +---------------------------+       |
| INGRESS table                    |   |                                         |  |  |  |                           |       |
+----------------------------------+   | dst_mac=P1,vlan=X -> pop vlan, goto P1  +--+  |  | FG0:                      |       |
|                                  |   | dst_mac=P1,vlan=Y -> pop vlan, goto P1  |     |  | src_port=dst_port -> drop |       |
| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2  +--+  |  | FG1:                      |       |
| ...                              |   | dst_mac=P2,vlan=Y -> goto P2            |  |  |  | VLAN X -> pop, goto port  |       |
|                                  |   | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+  | ...                       |       |
+----------------------------------+   |                                         |  |  |  | VLAN Y -> pop, goto port  +-------+
                                       +-----------------------------------------+  |  |  | FG3:                      |
                                                                                    |  |  | matchall -> goto port     |
                                                                                    |  |  |                           |
                                                                                    |  |  +---------------------------+
                                                                                    |  |
                                                                                    |  |
                                                                                    |  |                                    +--------+--+
                                                                                    +---------------------------------------> Port 2 |  |
                                                                                       |                                    +-^------+--+
                                                                                       |                                      |
                                                                                       |                                      |
                                                                                       |  +---------------------------+       |
                                                                                       +--> PORT 2 multicast table    |       |
                                                                                          +---------------------------+       |
                                                                                          |                           |       |
                                                                                          | FG0:                      |       |
                                                                                          | src_port=dst_port -> drop |       |
                                                                                          | FG1:                      |       |
                                                                                          | VLAN X -> pop, goto port  |       |
                                                                                          | ...                       |       |
                                                                                          |                           |       |
                                                                                          | FG3:                      |       |
                                                                                          | matchall -> goto port     +-------+
                                                                                          |                           |
                                                                                          +---------------------------+

Patches overview:

- Patch 1 adds hardware definition bits for capabilities required to replicate
  multicast packets to multiple per-port tables. These bits are used by
  following patches to only attempt multicast offload if firmware and hardware
  provide necessary support.

- Pathces 2-4 patches are preparations and refactoring.

- Patch 5 implements necessary infrastructure to toggle multicast offload
  via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification.
  This also enabled IGMP and MLD snooping.

- Patch 6 implements per-port multicast replication tables. It only supports
  filtering of loopback packets.

- Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged'
  VLANs.

- Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It
  creates MDB replication rules in egress table that can replicate packets to
  multiple per-port multicast tables.

- Patch 9 adds tracepoints for MDB events.

==============

2) Parav Create a new allocation profile for SFs, to save on memory

3) Yevgeny provides some initial patches for upcoming software steering
   support new pattern/arguments type of modify_header actions.

Starting with ConnectX-6 DX, we use a new design of modify_header FW object.
The current modify_header object allows for having only limited number of
these FW objects, which means that we are limited in the number of offloaded
flows that require modify_header action.

As a preparation Yevgeny provides the following 4 patches:
 - Patch 1: Add required mlx5_ifc HW bits
 - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg
   support and adds appropriate support in dr_send.c
 - Patch 4: Add ICM pool for modify-header-pattern objects and implement
   patterns cache, allowing patterns reuse for different flows

* tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: DR, Add modify-header-pattern ICM pool
  net/mlx5: DR, Prepare sending new WQE type
  net/mlx5: Add new WQE for updating flow table
  net/mlx5: Add mlx5_ifc bits for modify header argument
  net/mlx5: DR, Set counter ID on the last STE for STEv1 TX
  net/mlx5: Create a new profile for SFs
  net/mlx5: Bridge, add tracepoints for multicast
  net/mlx5: Bridge, implement mdb offload
  net/mlx5: Bridge, support multicast VLAN pop
  net/mlx5: Bridge, add per-port multicast replication tables
  net/mlx5: Bridge, snoop igmp/mld packets
  net/mlx5: Bridge, extract code to lookup parent bridge of port
  net/mlx5: Bridge, move additional data structures to priv header
  net/mlx5: Bridge, increase bridge tables sizes
  net/mlx5: Add mlx5_ifc definitions for bridge multicast support
====================

Link: https://lore.kernel.org/r/20230412040752.14220-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:28:03 -07:00
Jakub Kicinski
800e68c44f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

tools/testing/selftests/net/config
  62199e3f16 ("selftests: net: Add VXLAN MDB test")
  3a0385be13 ("selftests: add the missing CONFIG_IP_SCTP in net config")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 16:04:28 -07:00
Jesper Dangaard Brouer
67f245c2ec mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via mapping table.

The mlx5 hardware can also identify and RSS hash IPSEC.  This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.

Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.

Fixes: bc8d405b1b ("net/mlx5e: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Yevgeny Kliteynik
977c4a3e7c net/mlx5: Add new WQE for updating flow table
Add new WQE type: FLOW_TBL_ACCESS, which will be used for
writing modify header arguments.
This type has specific control segment and special data segment.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Yevgeny Kliteynik
9fa7f1de3d net/mlx5: Add mlx5_ifc bits for modify header argument
Add enum value for modify-header argument object and mlx5_bits
for the related capabilities.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Parav Pandit
9df839a711 net/mlx5: Create a new profile for SFs
Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.

Use a new profile to provide for future differences between SFs and PFs.

The mr_cache not used for non-PF functions, so it is excluded from the
new profile.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Vlad Buslov
e5688f6fb9 net/mlx5: Add mlx5_ifc definitions for bridge multicast support
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin,
fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:35 -07:00
Patrisious Haddad
77f7eb9f34 net/mlx5: Introduce other vport query for Q-counters
These new fields in QUERY_Q_COUNTER command allow us to access
another vport counters during the query command, which is specially
useful to query representor vports.

In addition also add the required caps to check if this capability
is actually supported.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/75c73a4a0e60f18c37b35a4a11ca2e2415e4a6f3.1679566038.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 10:08:40 +03:00
Jakub Kicinski
de7494524d mlx5-updates-2023-03-20
mlx5 dynamic msix
 
 This patch series adds support for dynamic msix vectors allocation in mlx5.
 
 Eli Cohen Says:
 
 ================
 
 The following series of patches modifies mlx5_core to work with the
 dynamic MSIX API. Currently, mlx5_core allocates all the interrupt
 vectors it needs and distributes them amongst the consumers. With the
 introduction of dynamic MSIX support, which allows for allocation of
 interrupts more than once, we now allocate vectors as we need them.
 This allows other drivers running on top of mlx5_core to allocate
 interrupt vectors for their own use. An example for this is mlx5_vdpa,
 which uses these vectors to propagate interrupts directly from the
 hardware to the vCPU [1].
 
 As a preparation for using this series, a use after free issue is fixed
 in lib/cpu_rmap.c and the allocator for rmap entries has been modified.
 A complementary API for irq_cpu_rmap_add() has also been introduced.
 
 [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e
 
 ================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQeTIUACgkQSD+KveBX
 +j7oCQgAx9yNHM4BZD2UfIx/P+W13v1B+xOds04Vezl9JlakoqvviPxm3vvuKkl+
 j/8DdyoqMUbWV0j5XxgZ+GG91bc14jN1GQ+4fUf63SzA99vAGb9GJPV2aQt5roGh
 JmMqI2utDfoz+29qtQ+kVchY5AN5AoPXSQH2zkEZmJaPUjYb9Dr/4IayL0JaViAw
 S31QLHKkSJ8bL8Wc6Op1emNVV7eXs18f7IIjVs3sYOb3WJRPVpmdKneRqLgVYplf
 Td40Gwobl1elpjEqSSRTJI5YUSR8gcAJlBqIwHeJzFFpO3Pnciopl761osNKKs/a
 5ctES5DS6JHqqFGbWV1gKYcRMil3LA==
 =9i8l
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-03-20

mlx5 dynamic msix

This patch series adds support for dynamic msix vectors allocation in mlx5.

Eli Cohen Says:

================

The following series of patches modifies mlx5_core to work with the
dynamic MSIX API. Currently, mlx5_core allocates all the interrupt
vectors it needs and distributes them amongst the consumers. With the
introduction of dynamic MSIX support, which allows for allocation of
interrupts more than once, we now allocate vectors as we need them.
This allows other drivers running on top of mlx5_core to allocate
interrupt vectors for their own use. An example for this is mlx5_vdpa,
which uses these vectors to propagate interrupts directly from the
hardware to the vCPU [1].

As a preparation for using this series, a use after free issue is fixed
in lib/cpu_rmap.c and the allocator for rmap entries has been modified.
A complementary API for irq_cpu_rmap_add() has also been introduced.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e

================

* tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Provide external API for allocating vectors
  net/mlx5: Use one completion vector if eth is disabled
  net/mlx5: Refactor calculation of required completion vectors
  net/mlx5: Move devlink registration before mlx5_load
  net/mlx5: Use dynamic msix vectors allocation
  net/mlx5: Refactor completion irq request/release code
  net/mlx5: Improve naming of pci function vectors
  net/mlx5: Use newer affinity descriptor
  net/mlx5: Modify struct mlx5_irq to use struct msi_map
  net/mlx5: Fix wrong comment
  net/mlx5e: Coding style fix, add empty line
  lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add
  lib: cpu_rmap: Use allocator for rmap entries
  lib: cpu_rmap: Avoid use after free on rmap->obj array entries
====================

Link: https://lore.kernel.org/r/20230324231341.29808-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 23:52:12 -07:00
Eli Cohen
fb0a6a268d net/mlx5: Provide external API for allocating vectors
Provide external API to be used by other drivers relying on mlx5_core,
for allocating MSIX vectors. An example for such a driver would be
mlx5_vdpa.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
2023-03-24 16:04:30 -07:00
Or Har-Toov
6e2a3a324a net/mlx5: Expose bits for enabling out-of-order by default
Add needed HW bits for enabling out-of-order by default and
use go_back_n when out-of-order is not needed.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/75d6dfe263989a05c08c43406132b336ea12d00a.1679230449.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23 10:21:13 +02:00
Gavin Li
6ee44c5181 net/mlx5e: TC, Add support for VxLAN GBP encap/decap flows offload
Add HW offloading support for TC flows with VxLAN GBP encap/decap.

Example of encap rule:
tc filter add dev eth0 protocol ip ingress flower \
    action tunnel_key set id 42 vxlan_opts 512 \
    action mirred egress redirect dev vxlan1

Example of decap rule:
tc filter add dev vxlan1 protocol ip ingress flower \
    enc_key_id 42 enc_dst_port 4789 vxlan_opts 1024 \
    action tunnel_key unset action mirred egress redirect dev eth0

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17 22:41:16 -07:00
Gal Pressman
028522e284 net/mlx5: Move needed PTYS functions to core layer
Downstream patches require devlink params to access the PTYS register,
move the needed functions from mlx5e to the core layer.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 22:12:08 -07:00
Sandipan Patra
c1fef618d6 net/mlx5: Implement thermal zone
Implement thermal zone support for mlx5 based HW. The NIC
uses temperature sensor provided by ASIC to report current temperature
to thermal core.

Signed-off-by: Sandipan Patra <spatra@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 22:09:14 -07:00
Linus Torvalds
8cbd92339d v6.3 RDMA pull request
Small cycle this time:
 
 - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana
 
 - inline CQE support for hns
 
 - Have mlx5 display device error codes
 
 - Pinned DMABUF support for irdma
 
 - Continued rxe cleanups, particularly converting the MRs to use xarray
 
 - Improvements to what can be cached in the mlx5 mkey cache
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/gPmgAKCRCFwuHvBreF
 YW5IAP4xOAiTif4f87vD1twRU/ebq4VEX0r+C2NX5x5fwlCJrAEA7RLV8uG9Uii2
 ez0BuWNxfajuvFHntnZ1E+7UDP0S8gk=
 =CgUH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Quite a small cycle this time, even with the rc8. I suppose everyone
  went to sleep over xmas.

   - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw,
     mana

   - inline CQE support for hns

   - Have mlx5 display device error codes

   - Pinned DMABUF support for irdma

   - Continued rxe cleanups, particularly converting the MRs to use
     xarray

   - Improvements to what can be cached in the mlx5 mkey cache"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits)
  IB/mlx5: Extend debug control for CC parameters
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
  IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
  RDMA/irdma: Add support for dmabuf pin memory regions
  RDMA/mlx5: Use query_special_contexts for mkeys
  net/mlx5e: Use query_special_contexts for mkeys
  net/mlx5: Change define name for 0x100 lkey value
  net/mlx5: Expose bits for querying special mkeys
  RDMA/rxe: Fix missing memory barriers in rxe_queue.h
  RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet
  RDMA/rxe: Remove rxe_alloc()
  RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size
  Subject: RDMA/rxe: Handle zero length rdma
  iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
  RDMA/mlx5: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Remove unused 'work' member from struct ib_umem
  RDMA/irdma: Cap MSIX used to online CPUs + 1
  RDMA/mlx5: Check reg_create() create for errors
  RDMA/restrack: Correct spelling
  RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
  ...
2023-02-24 15:11:03 -08:00
Edward Srouji
66fb1d5df6 IB/mlx5: Extend debug control for CC parameters
This patch adds rtt_resp_dscp to the current debug controllability of
congestion control (CC) parameters.
rtt_resp_dscp can be read or written through debugfs.
If set, its value overwrites the DSCP of the generated RTT response.

Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-19 11:50:59 +02:00
Jason Gunthorpe
5ef17179da Merge mlx5-next into rdma.git for-next
Synchronize the shared mlx5 branch with net:
 - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister.
 - From Mark and Patrisious: add IPsec RoCEv2 support.
 - From Or: Rely on firmware to get special mkeys

* branch mlx5-next:
  RDMA/mlx5: Use query_special_contexts for mkeys
  net/mlx5e: Use query_special_contexts for mkeys
  net/mlx5: Change define name for 0x100 lkey value
  net/mlx5: Expose bits for querying special mkeys
  net/mlx5: Configure IPsec steering for egress RoCEv2 traffic
  net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic
  net/mlx5: Add IPSec priorities in RDMA namespaces
  net/mlx5: Implement new destination type TABLE_TYPE
  net/mlx5: Introduce new destination type TABLE_TYPE
  RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister
  net/mlx5e: Propagate an internal event in case uplink netdev changes
  net/mlx5e: Fix trap event handling

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17 16:24:14 -04:00
Or Har-Toov
a419bfb763 net/mlx5: Change define name for 0x100 lkey value
Change define of 0x100 lkey value from MLX5_INVALID_LKEY to be
MLX5_TERMINATE_SCATTER_LIST_LKEY as 0x100 is the value of
terminate_scatter_list_mkey.

Link: https://lore.kernel.org/r/3a116dc3fbae4cb6b76a63d27d418830b06ade0c.1673960981.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17 16:22:22 -04:00
Or Har-Toov
4b7296aa6c net/mlx5: Expose bits for querying special mkeys
Add needed HW bits to query the values of all special mkeys.

Link: https://lore.kernel.org/r/080ebb563a9717c15b1ea75d669aede676df386b.1673960981.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17 16:22:22 -04:00
Jakub Kicinski
84cb1b53cd Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
mlx5-next changes

Following previous conversations [1] and our clear commitment to do
the TC work [2], please pull mlx5-next shared branch, which includes
low-level steering logic to allow RoCEv2 traffic to be encrypted/
decrypted through IPsec.

[1] https://lore.kernel.org/all/20230126230815.224239-1-saeed@kernel.org/
[2] https://lore.kernel.org/all/Y+Z7lVVWqnRBiPh2@nvidia.com/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Configure IPsec steering for egress RoCEv2 traffic
  net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic
  net/mlx5: Add IPSec priorities in RDMA namespaces
  net/mlx5: Implement new destination type TABLE_TYPE
  net/mlx5: Introduce new destination type TABLE_TYPE
====================

Link: https://lore.kernel.org/r/20230215095624.1365200-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-16 11:36:14 -08:00
Mark Zhang
f91ddd3aa4 net/mlx5: Add IPSec priorities in RDMA namespaces
Add IPSec flow steering priorities in RDMA namespaces. This allows
adding tables/rules to forward RoCEv2 traffic to the IPSec crypto
tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15 11:29:59 +02:00
Mark Zhang
4f226b71f5 net/mlx5: Implement new destination type TABLE_TYPE
Implement new destination type to support flow transition between
different table types.
e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX.
The new destination is described in the tracepoint as follows:
"mlx5_fs_add_rule: rule=00000000d53cd0ed fte=0000000048a8a6ed index=0 sw_action=<> [dst] flow_table_type=7 id:262152"

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15 11:29:52 +02:00