Commit Graph

995 Commits

Author SHA1 Message Date
Gustavo A. R. Silva
290562075d net/mlx5: Replace zero-length array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-10-30 16:57:42 -05:00
Parav Pandit
fbdd0049d9 RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:18:19 -03:00
Linus Torvalds
a1e16bc7d5 RDMA 5.10 pull request
The typical set of driver updates across the subsystem:
 
  - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
    usnic, qib, qedr, cxgb4, hns, bnxt_re
 
  - Various rtrs fixes and updates
 
  - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
    wasn't working right
 
  - Use tracepoints instead of pr_debug in the CM code
 
  - Scrub the locking in ucma and cma to close more syzkaller bugs
 
  - Use tasklet_setup in the subsystem
 
  - Revert the idea that 'destroy' operations are not allowed to fail at
    the driver level. This proved unworkable from a HW perspective.
 
  - Revise how the umem API works so drivers make fewer mistakes using it
 
  - XRC support for qedr
 
  - Convert uverbs objects RWQ and MW to new the allocation scheme
 
  - Large queue entry sizes for hns
 
  - Use hmm_range_fault() for mlx5 On Demand Paging
 
  - uverbs APIs to inspect the GID table instead of sysfs
 
  - Move some of the RDMA code for building large page SGLs into
    lib/scatterlist
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
 mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
 l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
 5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
 46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
 oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
 YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
 /b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
 p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
 ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
 fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
 16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
 =LKpl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Huy Nguyen
9b9d454ddb net/mlx5e: IPsec: Add TX steering rule per IPsec state
Add new FTE in TX IPsec FT per IPsec state. It has the
same matching criteria as the RX steering rule.

The IPsec FT is created/destroyed when the first/last rule
is added/deleted respectively.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-12 15:37:45 -07:00
Huy Nguyen
ee92e4f1f9 net/mlx5: Add NIC TX domain namespace
Add new namespace that represents the NIC TX domain.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-12 15:37:44 -07:00
Moshe Shemesh
2d69356752 net/mlx5: Add support for fw live patch event
Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09 12:06:53 -07:00
Moshe Shemesh
38b9f903f2 net/mlx5: Handle sync reset request event
Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09 12:06:52 -07:00
David S. Miller
8b0308fe31 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Rejecting non-native endian BTF overlapped with the addition
of support for it.

The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05 18:40:01 -07:00
Saeed Mahameed
b898ce7bcc net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible
In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.

This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.

In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().

Alternative fix:
 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
    command success simulation list.
 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().

Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02 10:59:55 -07:00
Eran Ben Elisha
50b2412b7e net/mlx5: Avoid possible free of command entry while timeout comp handler
Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.

Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.

Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().

Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.

For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.

In addition, gather the entry index free and the entry free into a one
flow for all command types release.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-10-02 10:59:54 -07:00
sunils
7cd7becddd net/mlx5: E-switch, Use PF num in metadata reg c0
Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.

Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-30 21:26:28 -07:00
Jason Gunthorpe
a1255fff5d Merge branch 'mlx_sw_owner_v2' into rdma.git for-next
Leon Romanovsky says:

====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_sw_owner_v2:
  RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
  RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
  RDMA/mlx5: Add sw_owner_v2 bit capability
2020-09-18 10:31:45 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Alex Vesker
9d8feb460a RDMA/mlx5: Add sw_owner_v2 bit capability
Added sw_owner_v2 which will be enabled for future devices,
replacing sw_owner bit.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 21:12:28 +03:00
Aharon Landau
e27014bdb4 RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
Combine two same enums to avoid duplication.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:03 +03:00
Aharon Landau
639bf4415c net/mlx5: Refactor query port speed functions
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.

And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.

According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.

Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:02 +03:00
Ofer Levi
b7cf0806e8 net/mlx5e: Add CQE compression support for multi-strides packets
Add CQE compression support for completions of packets that span
multiple strides in a Striding RQ, per the HW capability.
In our memory model, we use small strides (256B as of today) for the
non-linear SKB mode. This feature allows CQE compression to work also
for multiple strides packets. In this case decompressing the mini CQE
array will use stride index provided by HW as part of the mini CQE.
Before this feature, compression was possible only for single-strided
packets, i.e. for packets of size up to 256 bytes when in non-linear
mode, and the index was maintained by SW.
This feature is supported for ConnectX-5 and above.

Feature performance test:
This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
packets before building the SKB to achieve low cpu utilization.
Outcome is low cpu utilization and bottleneck on pci only.
Test setup:
Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
NIC: ConnectX-6 DX.
Sender side generates 300 byte packets at full pci bandwidth.
Receiver side configuration:
Single channel, one cpu processing with one ring allocated. Cpu utilization
is ~20% while pci bandwidth is fully utilized.
For the generated traffic and interface MTU of 4500B (to activate the
non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
to ~21Mpps.
Without this feature, counters show no CQE compression blocks for
this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
in ~500K compression blocks.

Signed-off-by: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-15 11:59:53 -07:00
Eran Ben Elisha
fb609b5112 net/mlx5: Always use container_of to find mdev pointer from clock struct
Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
some cases used container_of and on another used clock->mdev.

Align code to use container_of amd remove clock->mdev pointer.
While here, fix reverse xmas tree coding style.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2020-09-15 11:59:49 -07:00
Mark Zhang
7c4b1ab9f1 IB/mlx5: Add DCT RoCE LAG support
When DCT QPs work in RoCE LAG mode:
 1. DCT creation is allowed only when it is supported
 2. The "port" of a DCT QP is assigned in a round-robin way

Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Linus Torvalds
d7806bbd22 RDMA 5.9 merge window pull request
Smaller set of RDMA updates. A smaller number of 'big topics' with the
 majority of changes being driver updates.
 
 - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re
 
 - Removal of dead or redundant code across the drivers
 
 - RAW resource tracker dumps to include a device specific data blob for
   device objects to aide device debugging
 
 - Further advance the IOCTL interface, remove the ability to turn it off.
   Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands
 
 - Remove stubs related to devices with no pkey table
 
 - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a
   device to give higher performance
 
 - Several more static checker, syzkaller and rare crashers fixed
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl8sSA0ACgkQOG33FX4g
 mxpp1w/8Df/KIB38PVHpKraIW10bX03KsXwoskMYCA+ITYWM5ce+P7YF+yXXGs69
 Vh2vUYHlr1RvqXQkq3Y3LjzCPKTYFuNFVQRZF1LrfbfOpSS9aoQqoxwgKs08dibm
 YDeRwueWneksWhXeEZLA0QoKd4kEWrScA/n7VGYQ4YcWw8FLKa9t6OMSGivCrFLu
 QA+sA9nytrvMWC5uJUCdeVwlRnoaICPYHmM5yafOykPyEciRw2jU1kzTRVy5Z0Hu
 iCsXm2lJPcVoMgSjW6SgktY3oBkQeSu3ZZesT3eTM6FJsoDYkuSiKjNmWSZjW1zv
 x6CFGjVVin41rN4FMTeqqnwYoML9Q/obbyHvBHs5MTd5J8tLDhesQj3Ev7CUaUed
 b0s38v+oEL1w22nkOChfeyfh7eLcy3yiszqvkIU9ABk8mF0p1guGQYsfguzbsq0K
 3ZRw/361SxCUBvU6P8CdQbIJlhkH+Un7d81qyt+rhLgaZYm/N+d8auIKUxP1jCxh
 q9hss2Cj2U9eZsA/wGNqV1LNazfEAAj/5qjItMirbRd90FL8h+AP2LfJfC7p+id3
 3BfOui0JbZqNTTl4ftTxPuxtWDEdTPgwi7JvQd/be9HRlSV8DYCSMUzYFn8A+Zya
 cbxjxFuBJWmF+y9csDIVBTdFi+j9hO6notw+G89NznuB3QlPl50=
 =0z2L
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A quiet cycle after the larger 5.8 effort. Substantially cleanup and
  driver work with a few smaller features this time.

   - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re

   - Removal of dead or redundant code across the drivers

   - RAW resource tracker dumps to include a device specific data blob
     for device objects to aide device debugging

   - Further advance the IOCTL interface, remove the ability to turn it
     off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands

   - Remove stubs related to devices with no pkey table

   - A shared CQ scheme to allow multiple ULPs to share the CQ rings of
     a device to give higher performance

   - Several more static checker, syzkaller and rare crashers fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits)
  RDMA/mlx5: Fix flow destination setting for RDMA TX flow table
  RDMA/rxe: Remove pkey table
  RDMA/umem: Add a schedule point in ib_umem_get()
  RDMA/hns: Fix the unneeded process when getting a general type of CQE error
  RDMA/hns: Fix error during modify qp RTS2RTS
  RDMA/hns: Delete unnecessary memset when allocating VF resource
  RDMA/hns: Remove redundant parameters in set_rc_wqe()
  RDMA/hns: Remove support for HIP08_A
  RDMA/hns: Refactor hns_roce_v2_set_hem()
  RDMA/hns: Remove redundant hardware opcode definitions
  RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP
  RDMA/include: Replace license text with SPDX tags
  RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
  RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
  RDMA/cma: Execute rdma_cm destruction from a handler properly
  RDMA/cma: Remove unneeded locking for req paths
  RDMA/cma: Using the standard locking pattern when delivering the removal event
  RDMA/cma: Simplify DEVICE_REMOVAL for internal_id
  RDMA/efa: Add EFA 0xefa1 PCI ID
  RDMA/efa: User/kernel compatibility handshake mechanism
  ...
2020-08-06 16:43:36 -07:00
David S. Miller
bd0b33b248 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Resolved kernel/bpf/btf.c using instructions from merge commit
69138b34a7

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-02 01:02:12 -07:00
Ron Diskin
7d0314b11c net/mlx5e: Modify uplink state on interface up/down
When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.

This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.

When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".

Fixes: 63bfd399de ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 12:55:51 -07:00
Eran Ben Elisha
d6945242f4 net/mlx5: Hold pages RB tree per VF
Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".

As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.

Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.

NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After:  0.014 Sec

The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:26 -07:00
Meir Lichtinger
896ec97353 RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.

With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.

Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Meir Lichtinger
2224635938 RDMA/mlx5: Use MLX5_SET macro instead of local structure
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.

Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Meir Lichtinger
042dd05bdd RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
Up to ConnectX-7 setting mkey relaxed ordering read/write attributes
by UMR is not supported. ConnectX-7 supports this option, which is
indicated by two new HCA capabilities - relaxed_ordering_write_umr
and relaxed_ordering_read_umr.

Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-07-24 22:28:29 +03:00
Huy Nguyen
78fb6122fa net/mlx5: Add IPsec related Flow steering entry's fields
Add FTE actions IPsec ENCRYPT/DECRYPT
Add ipsec_obj_id field in FTE
Add new action field MLX5_ACTION_IN_FIELD_IPSEC_SYNDROME

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-16 16:36:46 -07:00
Raed Salem
9a6ad1ad71 net/mlx5: Accel, Add core IPsec support for the Connect-X family
This to set the base for downstream patches to support
the new IPsec implementation of the Connect-X family.

Following modifications made:
- Remove accel layer dependency from MLX5_FPGA_IPSEC.
- Introduce accel_ipsec_ops, each IPsec device will
  have to support these ops.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-16 16:36:42 -07:00
Eli Cohen
8a06a79b0a net/mlx5: Add interface changes required for VDPA
Rename mlx5_ifc_device_virtio_emulation_cap_bits to
mlx5_ifc_virtio_emulation_cap_bits to match names produced by the
tools producing these auto generated files.

In addition missing capabilities that will be required by VDPA
implementation.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-15 22:21:29 -07:00
Eli Cohen
2a913f2344 net/mlx5: Add VDPA interface type to supported enumerations
VDPA is a new interface that will be added in subsequent patches. It
uses mlx5 core devices and resources. Add an interface type for it.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-15 22:21:29 -07:00
Eli Cohen
1dcb6c36a5 net/mlx5: Support setting access rights of dma addresses
mlx5_fill_page_frag_array() is used to populate dma addresses to
resources that require it, such as QPs, RQs etc. When the resource is
used, PA list permissions are ignored. For resources that use MTT list,
the user is required to provide the access rights. Subsequent patches
use resources that require MTT lists, so modify API and implementation
to support that.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-15 22:21:29 -07:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Eran Ben Elisha
88b3d5c90e net/mlx5e: Fix port buffers cell size value
Device unit for port buffers size, xoff_threshold and xon_threshold is
cells. Fix a bug in driver where cell unit size was hard-coded to
128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
versions.

Driver to read cell size from SBCAM register and translate bytes to cell
units accordingly.

In order to fix the bug, this patch exposes SBCAM (Shared buffer
capabilities mask) layout and defines.

If SBCAM.cap_cell_size is valid, use it for all bytes to cells
calculations. If not valid, fallback to 128.

Cell size do not change on the fly per device. Instead of issuing SBCAM
access reg command every time such translation is needed, cache it in
mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
as a param to every function that needs bytes to cells translation.

While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
en_dcbnl.c, as it is only used by that file.

Fixes: 0696d60853 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-09 19:27:07 -07:00
Meir Lichtinger
12fdafb817 net/mlx5: Added support for 100Gbps per lane link modes
This patch exposes new link modes using 100Gbps per lane, including 100G,
200G and 400G modes.

Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:30:42 -07:00
Michael Guralnik
4dca650991 net/mlx5: Enable QP number request when creating IPoIB underlay QP
If in the process of creating the underlay QP for an IPoIB interface
the user has set the address and specifically the 1st-3rd bytes
representing the QP number, use the requested QP number when creating
the underlay QP.

For a user to be able to request a QP number on QP creation, the MKEY_BY_NAME
NVCONFIG should be set. As mkey_by_name and qp_by_name are coupled in FW.
This requires driver to query the mkey_by_name max cap during initialization
and set the current cap if it was enabled in FW.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-07-03 18:38:01 +03:00
Saeed Mahameed
e396eccf0f Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: kTLS, Improve TLS params layout structures
  net/mlx5: Avoid eswitch header inclusion in fs core layer
  net/mlx5: Avoid RDMA file inclusion in core driver
  net/mlx5: Add support in query QP, CQ and MKEY segments
  net/mlx5: Export resource dump interface

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-27 14:00:13 -07:00
Tariq Toukan
2d1b69ed65 net/mlx5: kTLS, Improve TLS params layout structures
Add explicit WQE segment structures for the TLS static and progress
params.
According to the HW spec, TISN is not part of the progress params context,
take it out of it.
Rename the control segment tisn field as it could hold either a TIS or
a TIR number.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-27 13:50:46 -07:00
Parav Pandit
9205d7b1c1 net/mlx5: Avoid RDMA file inclusion in core driver
mlx5 cq.h does not depend on RDMA verbs.
Remove RDMA verbs file inclusion.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-27 13:50:46 -07:00
David S. Miller
7bed145516 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 19:29:51 -07:00
Maor Gottlieb
608ca553c9 net/mlx5: Add support in query QP, CQ and MKEY segments
Introduce new resource dump segments - PRM_QUERY_QP,
PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource
dump in PRM query format.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-06-23 17:26:10 +03:00
Maor Gottlieb
d63cc24933 net/mlx5: Export resource dump interface
Export some of the resource dump API. mlx5_ib driver will use
it in downstream patches.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-06-23 17:26:10 +03:00
Parav Pandit
fa997825eb net/mlx5: Constify mac address pointer
Since none of the functions need to modify the input mac address,
constify them.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22 15:29:19 -07:00
Leon Romanovsky
ab183d460d RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
Missed steps during ECE handshake left userspace application with less
options for the ECE handshake. Pass ECE options in the additional
transitions.

Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200616104536.2426384-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:52:29 -03:00
Linus Torvalds
242b233198 RDMA 5.8 merge window pull request
A few large, long discussed works this time. The RNBD block driver has
 been posted for nearly two years now, and the removal of FMR has been a
 recurring discussion theme for a long time. The usual smattering of
 features and bug fixes.
 
 - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa
 
 - Continuing driver cleanups in bnxt_re, hns
 
 - Big cleanup of mlx5 QP creation flows
 
 - More consistent use of src port and flow label when LAG is used and a
   mlx5 implementation
 
 - Additional set of cleanups for IB CM
 
 - 'RNBD' network block driver and target. This is a network block RDMA
   device specific to ionos's cloud environment. It brings strong multipath
   and resiliency capabilities.
 
 - Accelerated IPoIB for HFI1
 
 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds
 
 - Support for exchanging the new IBTA defiend ECE data during RDMA CM
   exchanges
 
 - Removal of the very old and insecure FMR interface from all ULPs and
   drivers. FRWR should be preferred for at least a decade now.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g
 mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3
 FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7
 OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N
 ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo
 n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi
 qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc
 qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik
 696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb
 YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ
 nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp
 wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY=
 =9zTe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A more active cycle than most of the recent past, with a few large,
  long discussed works this time.

  The RNBD block driver has been posted for nearly two years now, and
  flowing through RDMA due to it also introducing a new ULP.

  The removal of FMR has been a recurring discussion theme for a long
  time.

  And the usual smattering of features and bug fixes.

  Summary:

   - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa

   - Continuing driver cleanups in bnxt_re, hns

   - Big cleanup of mlx5 QP creation flows

   - More consistent use of src port and flow label when LAG is used and
     a mlx5 implementation

   - Additional set of cleanups for IB CM

   - 'RNBD' network block driver and target. This is a network block
     RDMA device specific to ionos's cloud environment. It brings strong
     multipath and resiliency capabilities.

   - Accelerated IPoIB for HFI1

   - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple
     async fds

   - Support for exchanging the new IBTA defiend ECE data during RDMA CM
     exchanges

   - Removal of the very old and insecure FMR interface from all ULPs
     and drivers. FRWR should be preferred for at least a decade now"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits)
  RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
  RDMA/mlx5: Return ECE DC support
  RDMA/mlx5: Don't rely on FW to set zeros in ECE response
  RDMA/mlx5: Return an error if copy_to_user fails
  IB/hfi1: Use free_netdev() in hfi1_netdev_free()
  RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr()
  RDMA/core: Move and rename trace_cm_id_create()
  IB/hfi1: Fix hfi1_netdev_rx_init() error handling
  RDMA: Remove 'max_map_per_fmr'
  RDMA: Remove 'max_fmr'
  RDMA/core: Remove FMR device ops
  RDMA/rdmavt: Remove FMR memory registration
  RDMA/mthca: Remove FMR support for memory registration
  RDMA/mlx4: Remove FMR support for memory registration
  RDMA/i40iw: Remove FMR leftovers
  RDMA/bnxt_re: Remove FMR leftovers
  RDMA/mlx5: Remove FMR leftovers
  RDMA/core: Remove FMR pool API
  RDMA/rds: Remove FMR support for memory registration
  RDMA/srp: Remove support for FMR memory registration
  ...
2020-06-05 14:05:57 -07:00
Leon Romanovsky
a645a89d9a RDMA/mlx5: Return ECE DC support
The DC QPs are many-to-one QP types that means that first connection will
establish ECE options that coming connections should follow.  Due to this
property, the ECE code was removed between first [1] and second [2] ECE
submissions.

This patch returns the dropped code, because ECE is a property of a
connection and like any other connection users are needed to manage this
data. Allow them to set ECE parameter for DC too and avoid need of having
compatibility flag for the DC ECE.

[1]
https://lore.kernel.org/linux-rdma/20200523132243.817936-1-leon@kernel.org/
[2]
https://lore.kernel.org/linux-rdma/20200525174401.71152-1-leon@kernel.org/

Link: https://lore.kernel.org/r/20200602125548.172654-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:02 -03:00
Saeed Mahameed
44345c4c13 net/mlx5: IPSec: Fix incorrect type for spi
spi is __be32, fix that.

Fixes sparse warning:
drivers/net/ethernet/mellanox/mlx5/core/accel/ipsec.c:74:64
warning: incorrect type

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-29 21:20:22 -07:00
Saeed Mahameed
2553f421f4 net/mlx5: cmd: Fix memset with byte count warning
Fix sparse warning:
drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1949:15:
warning: memset with byte count of 271720

mlx5_cmd_stats array is too big to be held inline in mlx5_cmd.
Allocate it separately.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-29 21:20:21 -07:00
Saeed Mahameed
971ae1ed03 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
net/mlx5: Add ability to read and write ECE options
  net/mlx5: Add support for RDMA TX FT headers modifying
  net/mlx5: Move iseg access helper routines close to mlx5_core driver
  net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
  net/mlx5: Add support in forward to namespace
  {IB/net}/mlx5: Simplify don't trap code
  net/mlx5: Replace zero-length array with flexible-array

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-29 14:38:57 -07:00
Leon Romanovsky
f18e26af6a RDMA/mlx5: Convert modify QP to use MLX5_SET macros
Instead of hand crafted mlx5_qp_context and mlx5_qp_path use common
MLX5_SET() macros.

Link: https://lore.kernel.org/r/20200526115440.205922-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Jason Gunthorpe
e4fdf7625b Merge branch 'mellanox/mlx5-next' into rdma.git for/next
From the mlx5-next branch at
  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in following patches

* branch 'mellanox/mlx5-next':
  net/mlx5: Add ability to read and write ECE options
  net/mlx5: Add support for RDMA TX FT headers modifying
  net/mlx5: Move iseg access helper routines close to mlx5_core driver
  net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:01:17 -03:00
Leon Romanovsky
6b646a7e4a net/mlx5: Add ability to read and write ECE options
The end result of RDMA-CM ECE handshake is ECE options, which is
needed to be used while configuring data QPs. Such options can
come in any QP state, so add in/out fields to set and query
ECE options.

OUT fields:
* create_qp() - default ECE options for that type of QP.
* modify_qp() - enabled ECE options after QP state transition.

IN fields:
* create_qp() - create QP with this ECE option.
* modify_qp() - requested options. For unconnected QPs, the FW
will return an error if ECE is already configured with any options
that not equal to previously set.

Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-05-27 14:08:28 +03:00
David S. Miller
13209a8f73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24 13:47:27 -07:00
Eran Ben Elisha
f7936ddd35 net/mlx5: Avoid processing commands before cmdif is ready
When driver is reloading during recovery flow, it can't get new commands
till command interface is up again. Otherwise we may get to null pointer
trying to access non initialized command structures.

Add cmdif state to avoid processing commands while cmdif is not ready.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-22 17:28:38 -07:00
Eran Ben Elisha
d43b7007db net/mlx5: Fix a race when moving command interface to events mode
After driver creates (via FW command) an EQ for commands, the driver will
be informed on new commands completion by EQE. However, due to a race in
driver's internal command mode metadata update, some new commands will
still be miss-handled by driver as if we are in polling mode. Such commands
can get two non forced completion, leading to already freed command entry
access.

CREATE_EQ command, that maps EQ to the command queue must be posted to the
command queue while it is empty and no other command should be posted.

Add SW mechanism that once the CREATE_EQ command is about to be executed,
all other commands will return error without being sent to the FW. Allow
sending other commands only after successfully changing the driver's
internal command mode metadata.
We can safely return error to all other commands while creating the command
EQ, as all other commands might be sent from the user/application during
driver load. Application can rerun them later after driver's load was
finished.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-22 17:28:36 -07:00
Moshe Shemesh
17d00e839d net/mlx5: Add command entry handling completion
When FW response to commands is very slow and all command entries in
use are waiting for completion we can have a race where commands can get
timeout before they get out of the queue and handled. Timeout
completion on uninitialized command will cause releasing command's
buffers before accessing it for initialization and then we will get NULL
pointer exception while trying access it. It may also cause releasing
buffers of another command since we may have timeout completion before
even allocating entry index for this command.
Add entry handling completion to avoid this race.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-22 17:28:34 -07:00
Parav Pandit
555af0c3fa net/mlx5: Move iseg access helper routines close to mlx5_core driver
Only mlx5_core driver handles fw initialization check and command
interface revision check.
Hence move them inside the mlx5_core driver where it is used.
This avoid exposing these helpers to all mlx5 drivers.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-18 09:21:46 -07:00
Raed Salem
356d411c26 net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
Remove the "metadata_reg_b" field and all uses of this field in code
to match the device specification. As this field is not in use in SW
steering it is safe to remove it.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-18 09:21:46 -07:00
Jason Gunthorpe
10c2615513 Merge branch 'mellanox/mlx5-next' into rdma.git for/next
From the mlx5-next branch at
  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in following patches

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support in forward to namespace
  {IB/net}/mlx5: Simplify don't trap code
  net/mlx5: Replace zero-length array with flexible-array

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:54:19 -03:00
Maor Gottlieb
9254f8ed15 net/mlx5: Add support in forward to namespace
Currently, fs_core supports rule of forward the traffic
to continue matching in the next priority, now we add support
to forward the traffic matching in the next namespace.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-05-13 18:56:31 +03:00
Gustavo A. R. Silva
b6ca09cb15 net/mlx5: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-10 18:05:38 -07:00
Saeed Mahameed
76cd622fe2 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This merge includes updates to bonding driver needed for the rdma stack,
to avoid conflicts with the RDMA branch.

Maor Gottlieb Says:

====================
Bonding: Add support to get xmit slave

The following series adds support to get the LAG master xmit slave by
introducing new .ndo - ndo_get_xmit_slave. Every LAG module can
implement it and it first implemented in the bond driver.
This is follow-up to the RFC discussion [1].

The main motivation for doing this is for drivers that offload part
of the LAG functionality. For example, Mellanox Connect-X hardware
implements RoCE LAG which selects the TX affinity when the resources
are created and port is remapped when it goes down.

The first part of this patchset introduces the new .ndo and add the
support to the bonding module.

The second part adds support to get the RoCE LAG xmit slave by building
skb of the RoCE packet based on the AH attributes and call to the new
.ndo.

The third part change the mlx5 driver driver to set the QP's affinity
port according to the slave which found by the .ndo.
====================

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-09 01:05:30 -07:00
Maor Gottlieb
cfc1a89e44 RDMA/mlx5: Set lag tx affinity according to slave
The patch sets the lag tx affinity of the data QPs and the GSI QPs
according to the LAG xmit slave.

For GSI QPs, in case the link layer is Ethenet (RoCE) we create two GSI
QPs, one for each physical port. When the driver selects the GSI QP, it
will consider the port affinity result.  For connected QPs, the driver
sets the affinity of the xmit slave.

The above, ensures that RC QP and it's corresponding GSI QP will transmit
from the same physical port.

Link: https://lore.kernel.org/r/20200430192146.12863-17-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
c6bc6041b1 net/mlx5: Add support to get lag physical port
Add function to get the device physical port of the lag slave.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-01 12:15:38 -07:00
Saeed Mahameed
a6b1b93605 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 updates for both net-next and rdma-next:

1) HW bits and definitions for TLS and IPsec offlaods
2) Release all pages capability bits
3) New command interface helpers and some code cleanup as a result
4) Move qp.c out of mlx5 core driver into mlx5_ib rdma driver

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30 09:49:53 -07:00
Raed Salem
2dc8b5246d net/mlx5: TX WQE Add trailer insertion field
Add new TX WQE field for Connect-X6DX trailer insertion support,
when set, the HW adds a trailer to the packet, the WQE trailer
association flags are used to set to HW the header which the
trailer belongs.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:18 -07:00
Eran Ben Elisha
0e1533bb9c net/mlx5: Add release all pages capability bit
Add a bit in HCA capabilities layout to indicate if release all pages is
supported.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:18 -07:00
Tariq Toukan
ee5cdf7a5e net/mlx5: Introduce TLS RX offload hardware bits
Add TLS RX offload related IFC hardware fields and enumerations.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:18 -07:00
Moshe Shemesh
3df0107784 net/mlx5: Add structure and defines for pci sync for fw update event
Add needed structure layouts and defines for pci sync for fw update
event. The downstream patches will include event handlers for this event
type.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:18 -07:00
Moshe Shemesh
0693953626 net/mlx5: Add structure layout and defines for MFRL register
Add needed structure layouts and defines for MFRL (Management Firmware
Reset Level) register. This structure will be used for the firmware
upgrade and reset flow in the downstream patches.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:18 -07:00
Raed Salem
244faedfd4 net/mlx5: Refactor imm_inval_pkey field in cqe struct
The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata

Current implementation doesn't reflect the intended usage of the
field at usage time.

Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:15 -07:00
Erez Shitrit
dff8e2d152 net/mlx5: Use aligned variable while allocating ICM memory
The alignment value is part of the input structure, so use it and spare
extra memory allocation when is not needed.
Now, using the new ability when allocating icm for Direct-Rule
insertion.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:10 -07:00
Raed Salem
2b58f6d9df net/mlx5: Introduce IPsec Connect-X offload hardware bits and structures
Add IPsec offload related IFC structs, layouts and enumerations.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:44:45 -07:00
Huy Nguyen
d65dbedfd2 net/mlx5: Add support for COPY steering action
Add COPY type to modify_header action. IPsec feature is the first
feature that needs COPY steering action.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-28 12:44:44 -07:00
Leon Romanovsky
e0b4b4722d net/mlx5: Update transobj.c new cmd interface
Do mass update of transobj.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:16 +03:00
Leon Romanovsky
d1f620500c net/mlx5: Update cq.c to new cmd interface
Do mass update of cq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:03 +03:00
Leon Romanovsky
5d1c9a114a net/mlx5: Update vport.c to new cmd interface
Do mass update of vport.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:02 +03:00
Raed Salem
1dbd51d0a7 net/mlx5: Refactor mlx5_accel_esp_create_hw_context parameter list
Currently the FPGA IPsec is the only hw implementation of the IPsec
acceleration api, and so the mlx5_accel_esp_create_hw_context was
wrongly made to suit this HW api, among other in its parameter list
and some of its parameter endianness.

This implementation might not be suitable for different HW.

Refactor by group and pass all function arguments of
mlx5_accel_esp_create_hw_context in common mlx5_accel_esp_xfrm_attrs
struct field of mlx5_accel_esp_xfrm struct and correct the endianness
according to the HW being called.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:19 -07:00
Mark Zhang
59e9e8e4fe net/mlx5: Enable SW-defined RoCEv2 UDP source port
When this is enabled, UDP source port for RoCEv2 packets are defined
by software instead of firmware.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:57:53 +03:00
Leon Romanovsky
333fbaa025 net/mlx5: Move QP logic to mlx5_ib
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:21 +03:00
Leon Romanovsky
66247fbb28 net/mlx5: Remove Q counter low level helper APIs
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.

This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
bb7fc86372 net/mlx5: Provide simplified command interfaces
Many mlx5_cmd_exec() callers are not interested in the output from that
command or have standard in/out structures. Those callers simply allocate
those structure on the stack and use sizeof() to provide in/out arguments.

In this naive approach provide simplified versions of mlx5_cmd_exec().

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:17 +03:00
Linus Torvalds
919dce2470 RDMA 5.7 pull request
The majority of the patches are cleanups, refactorings and clarity
 improvements
 
 - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
 
 - Lots of cleanup patches for hns
 
 - Convert more places to use refcount
 
 - Aggressively lock the RDMA CM code that syzkaller says isn't working
 
 - Work to clarify ib_cm
 
 - Use the new ib_device lifecycle model in bnxt_re
 
 - Fix mlx5's MR cache which seems to be failing more often with the new
   ODP code
 
 - mlx5 'dynamic uar' and 'tx steering' user interfaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CSr0ACgkQOG33FX4g
 mxrtKg//XovbOfYAO7nC05FtGz9iEkIUBiwQOjUojgNSi6RMNDqRW1bmqKUugm1o
 9nXA6tw+fueEvUNSD541SCcxkUZJzWvubO9wHB6N3Fgy68N3Vf2rKV3EBBTh99rK
 Cb7rnmTTN6izRyI1wdyP2sjDJGyF8zvsgIrG2sibzLnqlyScrnD98YS0FdPZUfOa
 a1mmXBN/T7eaQ4TbE3lLLzGnifRlGmZ5vxEvOQmAlOdqAfIKQdbbW7oCRLVIleso
 gfQlOOvIgzHRwQ3VrFa3i6ETYtywXq7EgmQxCjqPVJQjCA79n5TLBkP1iRhvn8xi
 3+LO4YCkiSJ/NjTA2d9KwT6K4djj3cYfcueuqo2MoXXr0YLiY6TLv1OffKcUIq7c
 LM3d4CSwIAG+C2FZwaQrdSGa2h/CNfLAEeKxv430zggeDNKlwHJPV5w3rUJ8lT56
 wlyT7Lzosl0O9Z/1BSLYckTvbBCtYcmanVyCfHG8EJKAM1/tXy5LS8btJ3e51rPu
 XekR9ELrTTA2CTuoSCQGP6J0dBD2U7qO4XRCQ9N5BYLrI6RdP7Z4xYzzey49Z3Cx
 JaF86eurM7nS5biUszTtwww8AJMyYicB+0VyjBfk+mhv90w8tS1vZ1aZKzaQ1L6Z
 jWn8WgIN4rWY0YGQs6PiovT1FplyGs3p1wNmjn92WO0wZZ3WsmQ=
 =ae+a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "The majority of the patches are cleanups, refactorings and clarity
  improvements.

  This cycle saw some more activity from Syzkaller, I think we are now
  clean on all but one of those bugs, including the long standing and
  obnoxious rdma_cm locking design defect. Continue to see many drivers
  getting cleanups, with a few new user visible features.

  Summary:

   - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1

   - Lots of cleanup patches for hns

   - Convert more places to use refcount

   - Aggressively lock the RDMA CM code that syzkaller says isn't
     working

   - Work to clarify ib_cm

   - Use the new ib_device lifecycle model in bnxt_re

   - Fix mlx5's MR cache which seems to be failing more often with the
     new ODP code

   - mlx5 'dynamic uar' and 'tx steering' user interfaces"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
  RDMA/bnxt_re: make bnxt_re_ib_init static
  IB/qib: Delete struct qib_ivdev.qp_rnd
  RDMA/hns: Fix uninitialized variable bug
  RDMA/hns: Modify the mask of QP number for CQE of hip08
  RDMA/hns: Reduce the maximum number of extend SGE per WQE
  RDMA/hns: Reduce PFC frames in congestion scenarios
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
  IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
  IB/hfi1: Fix memory leaks in sysfs registration and unregistration
  IB/mlx5: Move to fully dynamic UAR mode once user space supports it
  IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
  IB/mlx5: Extend QP creation to get uar page index from user space
  IB/mlx5: Extend CQ creation to get uar page index from user space
  IB/mlx5: Expose UAR object and its alloc/destroy commands
  IB/hfi1: Get rid of a warning
  RDMA/hns: Remove redundant judgment of qp_type
  RDMA/hns: Remove redundant assignment of wc->smac when polling cq
  RDMA/hns: Remove redundant qpc setup operations
  RDMA/hns: Remove meaningless prints
  ...
2020-04-01 18:18:18 -07:00
Saeed Mahameed
e999a7343d Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:11 -07:00
Jason Gunthorpe
dbdf8909d0 Merge branch 'mlx5_tx_steering' into rdma.git for-next
Leon Romanovsky says:

====================
Those two patches from Michael extends mlx5_core and mlx5_ib flow steering
to support RDMA TX in similar way to already supported RDMA RX.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_tx_steering':
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
2020-03-27 13:26:59 -03:00
Michael Guralnik
24670b1a31 net/mlx5: Add support for RDMA TX steering
Add new RDMA TX flow steering namespace. Flow steering rules in
this namespace are used to filter transmitted RDMA traffic.

Link: https://lore.kernel.org/r/20200324061425.1570190-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 13:24:48 -03:00
Leon Romanovsky
2152862298 IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
struct mlx5_bfreg_info is used by mlx5_ib only but is exposed to both RDMA
and netdev parts of mlx5 driver. Move that struct to mlx5_ib namespace,
clean vertical space alignment and convert lib_uar_4k from bool to
bitfield.

Link: https://lore.kernel.org/r/20200324060143.1569116-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Eli Cohen
613f53fe09 net/mlx5: Eswitch, enable forwarding back to uplink port
Add dependencny on cap termination_table_raw_traffic to allow non
encapsulated packets received from uplink to be forwarded back to the
received uplink port.

Refactor the conditions into a separate function.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:39 -07:00
Jason Gunthorpe
d613bd64c6 Merge branch 'mlx5_mr_cache' into rdma.git for-next
Leon Romanovsky says:

====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_mr-cache':
  RDMA/mlx5: Allow MRs to be created in the cache synchronously
  RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
  RDMA/mlx5: Fix locking in MR cache work queue
  RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
  RDMA/mlx5: Fix MR cache size and limit debugfs
  RDMA/mlx5: Always remove MRs from the cache before destroying them
  RDMA/mlx5: Simplify how the MR cache bucket is located
  RDMA/mlx5: Rename the tracking variables for the MR cache
  RDMA/mlx5: Replace spinlock protected write with atomic var
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation
2020-03-13 11:11:07 -03:00
Michael Guralnik
a3cfdd3928 {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:10 +02:00
Saeed Mahameed
fc6a9f86f0 {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:04 +02:00
Paul Blakey
5b7cb74515 net/mlx5: E-Switch, Enable reg c1 loopback when possible
Enable reg c1 loopback if firmware reports it's supported,
as this is needed for restoring packet metadata (e.g chain).

Also define helper to query if it is enabled.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:38 -07:00
David S. Miller
bf3347c4d1 Merge branch 'ct-offload' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux 2020-03-12 12:34:23 -07:00
Alex Vesker
41e684ef3f IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.

Newer devices and firmware will have configurations with the flexparser
but without mpls support.

Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.

Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.

Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes: e818e255a5 ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:41:35 -03:00
Jason Gunthorpe
3e3cf2e82c Merge branch 'mlx5_packet_pacing' into rdma.git for-next
Yishai Hadas Says:

====================
Expose raw packet pacing APIs to be used by DEVX based applications.  The
existing code was refactored to have a single flow with the new raw APIs.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_packet_pacing':
  IB/mlx5: Introduce UAPIs to manage packet pacing
  net/mlx5: Expose raw packet pacing APIs
2020-03-10 11:54:17 -03:00
Saeed Mahameed
a70ed9d8ec Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This series adds some HW bits and definitions for mlx5 driver, to be
used by downstream features in both rdma and netdev branches.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: HW bit for goto chain offload support
  net/mlx5: Expose link speed directly
  net/mlx5: Introduce TLS and IPSec objects enums
  net/mlx5: Introduce egress acl forward-to-vport capability
  net/mlx5: Expose raw packet pacing APIs
  net/mlx5e: Replace zero-length array with flexible-array member
  net/mlx5: fix spelling mistake "reserverd" -> "reserved"

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:26 -07:00
Eli Cohen
e0ebd8eb36 net/mlx5: HW bit for goto chain offload support
Add the HW bit definition indecating goto chain offload support.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-07 13:19:25 -08:00
Mark Bloch
dc392fc56f net/mlx5: Expose link speed directly
Expose port rate as part of the port speed register fields.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-07 13:19:25 -08:00
Saeed Mahameed
bd673da6d9 net/mlx5: Introduce TLS and IPSec objects enums
Expose the TLS encryption key general object type enum correctly,
and add the IPSec encryption key general object type enum.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-07 13:19:25 -08:00
Vu Pham
86f5d0f3d4 net/mlx5: Introduce egress acl forward-to-vport capability
Add HCA_CAP.egress_acl_forward_to_vport field to check whether HW
supports e-switch vport's egress acl to forward packets to other
e-switch vport or not.

By default E-Switch egress ACL forwards eswitch vports egress packets
to their corresponding NIC/VF vports.

With this cap enabled, the driver is allowed to alter this behavior
and forward packets to arbitrary NIC/VF vports with the following
limitations:

   a. Multiple processing paths are supported if all of the following
      conditions are met:
      - HCA_CAP.egress_acl_forward_to_vport is set ==1.
      - A destination of type Flow Table only appears once, as the
        last destination in the list.
      - Vport destination is supported if
        HCA_CAP.egress_acl_forward_to_vport==1. Vport must not be
        the Uplink.
   b. Flow_tag not supported.
   c. This table is only applicable after an FDB table is created.
   d. Push VLAN action is not supported.
   e. Pop VLAN action cannot be added concurrently to this table and
      FDB table.

This feature will be used during port failover in bonding scenario
where two VFs representors are bonded to handle failover egress traffic
(VM's ingress/receive traffic).

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-07 13:19:25 -08:00
Yishai Hadas
1326034b3c net/mlx5: Expose raw packet pacing APIs
Expose raw packet pacing APIs to be used by DEVX based applications.
The existing code was refactored to have a single flow with the new raw
APIs.

The new raw APIs considered the input of 'pp_rate_limit_context', uid,
'dedicated', upon looking for an existing entry.

This raw mode enables future device specification data in the raw
context without changing the existing logic and code.

The ability to ask for a dedicated entry gives control for application
to allocate entries according to its needs.

A dedicated entry may not be used by some other process and it also
enables the process spreading its resources to some different entries
for use different hardware resources as part of enforcing the rate.

The counter per entry was changed to be u64 to prevent any option to
overflow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-05 14:18:09 +02:00