Commit Graph

1527 Commits

Author SHA1 Message Date
Lee Jones
30cd9fc5e7 RDMA/hw/mlx5/qp: Demote non-conformant kernel-doc header
Fixes the following W=1 kernel build warning(s):

 drivers/infiniband/hw/mlx5/qp.c:5384: warning: Function parameter or member 'qp' not described in 'mlx5_ib_qp_set_counter'
 drivers/infiniband/hw/mlx5/qp.c:5384: warning: Function parameter or member 'counter' not described in 'mlx5_ib_qp_set_counter'

Link: https://lore.kernel.org/r/20210121094519.2044049-3-lee.jones@linaro.org
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-22 14:37:29 -04:00
Lee Jones
5e769e444d RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'
Fixes the following W=1 kernel build warning(s):

 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'dev' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'pfault' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe_end' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'bytes_mapped' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'total_wqe_bytes' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'receive_queue' not described in 'pagefault_data_segments'

Link: https://lore.kernel.org/r/20210121094519.2044049-2-lee.jones@linaro.org
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-22 14:37:29 -04:00
Jianxin Xiong
90da7dc820 RDMA/mlx5: Support dma-buf based userspace memory region
Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Link: https://lore.kernel.org/r/1608067636-98073-5-git-send-email-jianxin.xiong@intel.com
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-20 16:49:25 -04:00
Parav Pandit
de641d74fb Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"
This reverts commit fbdd0049d9.

Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.

This results in empty GID table due to RDMA device being detached from its
net device.

Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.

Fixes: fbdd0049d9 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:22:59 -04:00
Parav Pandit
559a3eacc4 IB/mlx5: Make function static
mlx5_query_mad_ifc_smp_attr_node_info() is internal to mad.c Hence, make
it static.

Link: https://lore.kernel.org/r/20210113121703.559778-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:14:14 -04:00
Parav Pandit
ab40530a2e IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
mutex_destroy() call for device's cap_mask_mutex mutex is missing, let's
add it to annotate destruction.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20210113121703.559778-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:14:14 -04:00
Yishai Hadas
1368ead04c RDMA/mlx5: Use strict get/set operations for obj_id
Use strict get/set operations for obj_id based on the specific object
type. This comes to prevent any miss match between the general header to
the legacy header commands.

Link: https://lore.kernel.org/r/20201230130121.180350-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 10:40:54 -04:00
Yishai Hadas
8798e4ad0a RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
Use the correct obj_id upon DEVX TIR creation by strictly taking the tirn
24 bits and not the general obj_id which is 32 bits.

Fixes: 7efce3691d ("IB/mlx5: Add obj create and destroy functionality")
Link: https://lore.kernel.org/r/20201230130121.180350-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 10:40:54 -04:00
Mark Bloch
1c3aa6bd0b RDMA/mlx5: Fix wrong free of blue flame register on error
If the allocation of the fast path blue flame register fails, the driver
should free the regular blue flame register allocated a statement above,
not the one that it just failed to allocate.

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Link: https://lore.kernel.org/r/20210113121703.559778-6-leon@kernel.org
Reported-by: Hans Petter Selasky <hanss@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14 12:51:29 -04:00
Parav Pandit
2cb091f629 IB/mlx5: Fix error unwinding when set_has_smi_cap fails
When set_has_smi_cap() fails, multiport master cleanup is missed. Fix it
by doing the correct error unwinding goto.

Fixes: a989ea01cb ("RDMA/mlx5: Move SMI caps logic")
Link: https://lore.kernel.org/r/20210113121703.559778-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14 12:51:18 -04:00
Linus Torvalds
009bd55dfc RDMA 5.11 pull request
A smaller set of patches, nothing stands out as being particularly major
 this cycle:
 
 - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw, cxgb4,
   mlx4 and mlx5
 
 - Bug fixes and polishing for the new rts ULP
 
 - Cleanup of uverbs checking for allowed driver operations
 
 - Use sysfs_emit all over the place
 
 - Lots of bug fixes and clarity improvements for hns
 
 - hip09 support for hns
 
 - NDR and 50/100Gb signaling rates
 
 - Remove dma_virt_ops and go back to using the IB DMA wrappers
 
 - mlx5 optimizations for contiguous DMA regions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl/aNXUACgkQOG33FX4g
 mxqlMQ/+O6UhxKnDAnMB+HzDGvOm+KXNHOQBuzxz4ZWXqtUrW8WU5ca3PhXovc4z
 /QX0HhMhQmVsva5mjp1OGVATxQ2E+yasqFLg4QXAFWFR3N7s0u/sikE9i1DoPvOC
 lsmLTeRauCFaE4mJD5nvYwm+riECX0GmyVVW7v6V05xwAp0hwdhyU7Kb6Yh3lxsE
 umTz+onPNJcD6Tc4snziyC5QEp5ebEjAaj4dVI1YPR5X0c2RwC5E1CIDI6u4OQ2k
 j7/+Kvo8LNdYNERGiR169x6c1L7WS6dYnGMMeXRgyy0BVbVdRGDnvCV9VRmF66w5
 99fHfDjNMNmqbGNt/4/gwNdVrR9aI4jMZWCh7SmsguX6XwNOlhYldy3x3WnlkfkQ
 e4O0huJceJqcB2Uya70GqufnAetRXsbjzcvWxpR5YAwRmcRkm1f6aGK3BxPjWEbr
 BbYRpiKMxxT4yTe65BuuThzx6g4pNQHe0z3BM/dzMJQAX+PZcs1CPQR8F8PbCrZR
 Ad7qw4HJ587PoSxPi3toVMpYZRP6cISh1zx9q/JCj8cxH9Ri4MovUCS3cF63Ny3B
 1LJ2q0x8FuLLjgZJogKUyEkS8OO6q7NL8WumjvrYWWx19+jcYsV81jTRGSkH3bfY
 F7Esv5K2T1F2gVsCe1ZFFplQg6ja1afIcc+LEl8cMJSyTdoSub4=
 =9t8b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A smaller set of patches, nothing stands out as being particularly
  major this cycle. The biggest item would be the new HIP09 HW support
  from HNS, otherwise it was pretty quiet for new work here:

   - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw,
     cxgb4, mlx4 and mlx5

   - Bug fixes and polishing for the new rts ULP

   - Cleanup of uverbs checking for allowed driver operations

   - Use sysfs_emit all over the place

   - Lots of bug fixes and clarity improvements for hns

   - hip09 support for hns

   - NDR and 50/100Gb signaling rates

   - Remove dma_virt_ops and go back to using the IB DMA wrappers

   - mlx5 optimizations for contiguous DMA regions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits)
  RDMA/cma: Don't overwrite sgid_attr after device is released
  RDMA/mlx5: Fix MR cache memory leak
  RDMA/rxe: Use acquire/release for memory ordering
  RDMA/hns: Simplify AEQE process for different types of queue
  RDMA/hns: Fix inaccurate prints
  RDMA/hns: Fix incorrect symbol types
  RDMA/hns: Clear redundant variable initialization
  RDMA/hns: Fix coding style issues
  RDMA/hns: Remove unnecessary access right set during INIT2INIT
  RDMA/hns: WARN_ON if get a reserved sl from users
  RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
  RDMA/hns: Do shift on traffic class when using RoCEv2
  RDMA/hns: Normalization the judgment of some features
  RDMA/hns: Limit the length of data copied between kernel and userspace
  RDMA/mlx4: Remove bogus dev_base_lock usage
  RDMA/uverbs: Fix incorrect variable type
  RDMA/core: Do not indicate device ready when device enablement fails
  RDMA/core: Clean up cq pool mechanism
  RDMA/core: Update kernel documentation for ib_create_named_qp()
  MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address
  ...
2020-12-16 13:42:26 -08:00
Maor Gottlieb
e899389029 RDMA/mlx5: Fix MR cache memory leak
If the MR cache entry invalidation failed, then we detach this entry from
the cache, therefore we must to free the memory as well.

Allcation backtrace for the leaker:

    [<00000000d8e423b0>] alloc_cache_mr+0x23/0xc0 [mlx5_ib]
    [<000000001f21304c>] create_cache_mr+0x3f/0xf0 [mlx5_ib]
    [<000000009d6b45dc>] mlx5_ib_alloc_implicit_mr+0x41/0×210 [mlx5_ib]
    [<00000000879d0d68>] mlx5_ib_reg_user_mr+0x9e/0×6e0 [mlx5_ib]
    [<00000000be74bf89>] create_qp+0x2fc/0xf00 [ib_uverbs]
    [<000000001a532d22>] ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ+0x1d9/0×230 [ib_uverbs]
    [<0000000070f46001>] rdma_alloc_commit_uobject+0xb5/0×120 [ib_uverbs]
    [<000000006d8a0b38>] uverbs_alloc+0x2b/0xf0 [ib_uverbs]
    [<00000000075217c9>] ksysioctl+0x234/0×7d0
    [<00000000eb5c120b>] __x64_sys_ioctl+0x16/0×20
    [<00000000db135b48>] do_syscall_64+0x59/0×2e0

Fixes: 1769c4c575 ("RDMA/mlx5: Always remove MRs from the cache before destroying them")
Link: https://lore.kernel.org/r/20201213132940.345554-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-14 15:14:27 -04:00
Tom Rix
7f1d2dfa30 RDMA/mlx5: Remove unneeded semicolon
A semicolon is not needed after a switch statement.

Link: https://lore.kernel.org/r/20201031134638.2135060-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-10 12:00:27 -04:00
Maor Gottlieb
ca991a7d14 RDMA/mlx5: Assign dev to DM MR
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and
can cause a NULL pointer dereference if userspace dumps the MR via rdma
tool.

Assign the IB device together with the other fields and remove the
redundant reference of mlx5_ib_dev from mlx5_ib_mr.

Cc: stable@vger.kernel.org
Fixes: 6c29f57ea4 ("IB/mlx5: Device memory mr registration support")
Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 15:52:54 -04:00
Jason Gunthorpe
ef3642c4f5 RDMA/mlx5: Fix error unwinds for rereg_mr
This is all a giant train wreck of error handling, in many cases the MR is
left in some corrupted state where continuing on is going to lead to
chaos, or various unwinds/order is missed.

rereg had three possible completely different actions, depending on flags
and various details about the MR. Split the three actions into three
functions, and call the right action from the start.

For each action carefully design the error handling to fit the action:

- UMR access/PD update is a simple UMR, if it fails the MR isn't changed,
  so do nothing

- PAS update over UMR is multiple UMR operations. To keep everything sane
  revoke access to the MKey while it is being changed and restore it once
  the MR is correct.

- Recreating the mkey should completely build a parallel MR with a fully
  loaded PAS then swap and destroy the old one. If it fails the original
  should be left untouched. This is handled in the core code. Directly
  call the normal MR creation functions, possibly re-using the existing
  umem.

Add support for working with ODP MRs. The READ/WRITE access flags can be
changed by UMR and we can trivially convert to/from ODP MRs using the
logic to build a completely new MR.

This new logic also fixes various problems with MRs continuing to work
while their PAS lists are no longer valid, eg during a page size change.

Link: https://lore.kernel.org/r/20201130075839.278575-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
38f8ff5b44 RDMA/mlx5: Reorganize mlx5_ib_reg_user_mr()
This function handles an ODP and regular MR flow all mushed together, even
though the two flows are quite different. Split them into two dedicated
functions.

Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
6e0954b11c RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.

Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.

The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.

Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
adac4cb3c1 RDMA/uverbs: Check ODP in ib_check_mr_access() as well
No reason only one caller checks this. This properly blocks ODP
from the rereg flow if the device does not support ODP.

Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Leon Romanovsky
04b222f957 RDMA/mlx5: Remove IB representors dead code
Delete dead code.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:54 +02:00
Leon Romanovsky
e87114022e net/mlx5: Simplify eswitch mode check
Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:54 +02:00
Leon Romanovsky
93f8244431 RDMA/mlx5: Convert mlx5_ib to use auxiliary bus
The conversion to auxiliary bus solves long standing issue with
existing mlx5_ib<->mlx5_core coupling. It required to have both
modules in initramfs if one of them needed for the boot.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:50 +02:00
Leon Romanovsky
2b1f747071 RDMA/core: Allow drivers to disable restrack DB
Driver QP types are special case with no IBTA restrictions. For example,
EFA implemented creation of this QP type as regular one, while mlx5
separated create to two step: create and modify. That separation causes to
the situation where DC QP (mlx5) is always added to the same xarray index
zero.

This change allows to drivers like mlx5 simply disable restrack DB
tracking, but it doesn't disable kref on the memory.

Fixes: 52e0a118a2 ("RDMA/restrack: Track driver QP types in resource tracker")
Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 11:38:46 -04:00
Parav Pandit
7ec3df174f RDMA/mlx5: Use PCI device for dma mappings
DMA operation of the IB device is done using ib_device->dma_device.

Instead of accessing parent of the IB device, use the PCI dma device which
is setup to ib_device->dma_device during IB device registration.

Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Leon Romanovsky
d4b2d19dc5 RDMA/mlx5: Silence the overflow warning while building offset mask
Coverity reports "Potentially overflowing expression ..." warning, which
is correct thing to complain from the compiler point of view, but this is
not possible in the current code. Still, this is a small error as there
are some future situations that might need to use a 32 bit offset. Use ULL
so the calculation works up to 63.

Fixes: b045db62f6 ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ")
Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Jason Gunthorpe
d0b7721c5e RDMA/mlx5: Check for ERR_PTR from uverbs_zalloc()
The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR
not NULL like other allocators:

drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR'

Fixes: 878f7b31c3 ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx")
Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:04 -04:00
Avihai Horon
f957d4d09a RDMA/mlx5: Enable querying AH for XRC QP types
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.

Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.

Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.

Hence, we enable it now for INI and TGT QPs as well.

Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 11:58:19 -04:00
Gustavo A. R. Silva
808b2c925d IB/mlx5: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding the new pseudo-keyword fallthrough; instead of
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/2b0c87362bc86f6adfe56a5a6685837b71022bbf.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:11 -04:00
Jason Gunthorpe
bf3b7b7ba9 Merge branch 'for-rc' into rdma.git
From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

The rc RDMA branch is needed due to dependencies on the next patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17 15:20:26 -04:00
Jason Gunthorpe
8a7904a672 RDMA/mlx5: Lower setting the umem's PAS for SRQ
Some of the SRQ types are created using a WQ, and the WQ requires a
different parameter set to mlx5_umem_find_best_quantized_pgoff() as it has
a 5 bit page_offset.

Add the umem to the mlx5_srq_attr and defer computing the PAS data until
the code has figured out what kind of mailbox to use. Compute the PAS
directly from the umem for each of the four unique mailbox types.

This also avoids allocating memory to store the user PAS, instead it is
written directly to the mailbox as in most other cases.

Fixes: 01949d0109 ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0")
Link: https://lore.kernel.org/r/20201115114311.136250-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
878f7b31c3 RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx
Since devx uses the new rdma_for_each_block() to fill the PAS it can also
use ib_umem_find_best_pgsz().

However, the umem constructionin devx is complicated, the umem must still
respect all the HW limits such as page_offset_quantized and the IOVA
alignment.

Since we don't know what the user intends to use the umem for we have to
limit it to PAGE_SIZE.

There are users trying to mix umem's with mkeys so this makes them work
reliably, at least for an identity IOVA, by ensuring the IOVA matches the
selected page size.

Last user of mlx5_ib_get_buf_offset() so it can also be removed.

Fixes: aeae94579c ("IB/mlx5: Add DEVX support for memory registration")
Link: https://lore.kernel.org/r/20201115114311.136250-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
c08fbdc577 RDMA/mlx5: mlx5_umem_find_best_quantized_pgoff() for CQ
This fixes a bug where the page_offset was not being considered when
building a CQ. The HW specification says it 'must be zero', so use
a variant of mlx5_umem_find_best_quantized_pgoff() with a 0 pgoff_bitmask
to force this result.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20201115114311.136250-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
a59b7b05ef RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for QP
Delete custom logic in the QP in favor of more general variant.

Link: https://lore.kernel.org/r/20201115114311.136250-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
7579dcdf73 RDMA/mlx5: Directly compute the PAS list for raw QP RQ's
The RQ WQ created when making a raw ethernet QP copies the PAS list from
a dummy QPC command created earlier in the flow. The WQC and QPC PAS lists
are not fully compatible as the page_offset is a different size.

Create the RQ WQ's PAS list directly and do not try to copy it from
another command structure.

Like the prior patch, this also means that badly aligned buffers were not
correctly rejected.

Link: https://lore.kernel.org/r/20201115114311.136250-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe
ad480ea5d6 RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for WQ
This fixes a subtle bug, the WQ mailbox has only 5 bits to describe the
page_offset, while mlx5_ib_get_buf_offset() is hard wired to only work
with 6 bit page_offsets.

Thus it did not properly reject badly aligned buffers.

Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Link: https://lore.kernel.org/r/20201115114311.136250-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe
b045db62f6 RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ
SRQ uses a quantized and scaled page_offset, which is another variation of
ib_umem_find_best_pgsz(). Add mlx5_umem_find_best_quantized_pgoff() to
perform this calculation for each mailbox. A macro shows how the
calculation is directly connected to the mailbox format.

This new routine replaces the limited mlx5_ib_cont_pages() and
mlx5_ib_get_buf_offset() pairing which would reject valid configurations
rather than adjust the page_size to make it work.

In turn this is much more aggressive about choosing large page sizes for
these objects and when THP is enabled it will now often find a single page
solution.

Link: https://lore.kernel.org/r/20201115114311.136250-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Leon Romanovsky
c5633a72a1 RDMA/core: Make FD destroy callback void
All FD object destroy implementations return 0, so declare this callback
void.

Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Leon Romanovsky
efa968ee20 RDMA/core: Postpone uobject cleanup on failure till FD close
Remove the ib_is_destroyable_retryable() concept.

The idea here was to allow the drivers to forcibly clean the HW object
even if they otherwise didn't want to (eg because of usecnt). This was an
attempt to clean up in a world where drivers were not allowed to fail HW
object destruction.

Now that we are going back to allowing HW objects to fail destroy this
doesn't make sense. Instead if a uobject's HW object can't be destroyed it
is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to
clean it. Multiple passes over the uobject list allow hidden dependencies
to be resolved. If that fails the HW driver is broken, throw a WARN_ON and
leak the HW object memory.

All the other tricky failure paths (eg on creation error unwind) have
already been updated to this new model.

Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Meir Lichtinger
f946e45f59 IB/mlx5: Add support for NDR link speed
The IBTA specification has new speed - NDR. That speed supports signaling
rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX
device to IB speed and width. Added translation of new 100Gb, 200Gb and
400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.

Link: https://lore.kernel.org/r/20201026133738.1340432-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:48:57 -04:00
Jason Gunthorpe
d5c7916fe4 RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc's
Now that all the PAS arrays or UMR XLT's for mkcs are filled using
rdma_for_each_block() we can use the common ib_umem_find_best_pgsz()
algorithm.

Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe
f1eaac37da RDMA/mlx5: Split mlx5_ib_update_xlt() into ODP and non-ODP cases
Mixing these together is just a mess, make a dedicated version,
mlx5_ib_update_mr_pas(), which directly loads the whole MTT for a non-ODP
MR.

The split out version can trivially use a simple loop with
rdma_for_each_block() which allows using the core code to compute the MR
pages and avoids seeking in the SGL list after each chunk as the
__mlx5_ib_populate_pas() call required.

Significantly speeds loading large MTTs.

Link: https://lore.kernel.org/r/20201026132314.1336717-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe
8010d74b99 RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()
The memory allocation is quite complicated, and makes this function hard
to understand. Refactor things so that a function call sets up the WR, SG,
DMA mapping and buffer, further splitting that into buffer and DMA/wr.

This also slightly changes the buffer allocation logic to try an order 0
page allocation (with OOM warnings on) before going to the emergency page.

Link: https://lore.kernel.org/r/20201026132314.1336717-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:55 -04:00
Jason Gunthorpe
f22c30aa6d RDMA/mlx5: Move xlt_emergency_page_mutex into mr.c
This is the only user, so remove the wrappers.

Link: https://lore.kernel.org/r/20201026132314.1336717-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe
aab8d3966d RDMA/mlx5: Change mlx5_ib_populate_pas() to use rdma_for_each_block()
This routine converts the umem SGL into a list of fixed pages for DMA,
which is exactly what rdma_umem_for_each_dma_block() is for, use the
common code directly.

Link: https://lore.kernel.org/r/20201026132314.1336717-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe
f8fb311063 RDMA/mlx5: Remove npages from mlx5_ib_cont_pages()
Most callers don't need this, and the few that do can get it as
ib_umem_num_pages(umem).

Link: https://lore.kernel.org/r/20201026131936.1335664-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
7db0eea916 RDMA/mlx5: Remove ncont from mlx5_ib_cont_pages()
This is the same as ib_umem_num_dma_blocks(umem, 1UL << page_shift), have
the callers compute it directly.

Link: https://lore.kernel.org/r/20201026131936.1335664-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
95741ee3f0 RDMA/mlx5: Remove order from mlx5_ib_cont_pages()
Only alloc_mr_from_cache() needs order and can trivially compute it, so
lift it to the one call site and remove the NULL arguments.

Link: https://lore.kernel.org/r/20201026131936.1335664-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
f0093fb1a7 RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr
For the user MR path, instead of calling this after getting the umem, call
it as part of creating the struct mlx5_ib_mr and distill its output to a
single page_shift stored inside the mr.

This avoids passing around the tuple of its output. Based on the umem and
page_shift, the output arguments can be computed using:

  count == ib_umem_num_pages(mr->umem)
  shift == mr->page_shift
  ncont == ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift)
  order == order_base_2(ncont)

And since mr->page_shift == umem_odp->page_shift then ncont ==
ib_umem_num_dma_blocks() == ib_umem_odp_num_pages() for ODP umems.

Link: https://lore.kernel.org/r/20201026131936.1335664-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
1c3d247eee RDMA/mlx5: Remove mlx5_ib_mr->npages
This is the same value as ib_umem_num_pages(mr->umem), use that instead.

Link: https://lore.kernel.org/r/20201026131936.1335664-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
fc3325701a RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
reg_pages should always contain mr->npage since when the mr is finally
de-reg'd it is always subtracted out.

If there were any error exits then mlx5_ib_rereg_user_mr() would leave the
reg_pages adjusted and this will cause it to be double subtracted
eventually.

The manipulation of reg_pages is inherently connected to the umem, so lift
it out of set_mr_fields() and only adjust it around creating/destroying a
umem.

reg_pages is only used for diagnostics in sysfs.

Fixes: 7d0cc6edcc ("IB/mlx5: Add MR cache for large UMR regions")
Link: https://lore.kernel.org/r/20201026131936.1335664-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Jason Gunthorpe
b4d031cdae RDMA/mlx5: Remove mlx5_ib_mr->order
The is only ever set to non-zero if the MR is from the cache, and if it is
cached then the order is in cached_ent->order.

Make it clearer that use_umr_mtt_update() only returns true for cached MRs
and remove the redundant data.

Link: https://lore.kernel.org/r/20201026131936.1335664-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Joe Perches
1c7fd72687 RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:53:21 -03:00
Jason Gunthorpe
676a80adba RDMA: Remove AH from uverbs_cmd_mask
Drivers that need a uverbs AH should instead set the create_user_ah() op
similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never
implemented so are just deleted.

Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe
1f11a7610e RDMA: Check create_flags during create_qp
Each driver should check that the QP attrs create_flags is supported.
Unfortuantely when create_flags was added to the QP attrs the drivers were
not updated. uverbs_ex_cmd_mask was used to block it - even though kernel
drivers use these flags too.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code
to be EOPNOTSUPP.

Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
1c407cb5d7 RDMA: Check flags during create_cq
Each driver should check that the CQ attrs is supported. Unfortuantely
when flags was added to the CQ attrs the drivers were not updated,
uverbs_ex_cmd_mask was used to block it. This was missed when create CQ
was converted to ioctl, so non-zero flags could have been passed into
drivers.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask.

Fixes: 41b2a71fc8 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file")
Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
26e990badd RDMA: Check attr_mask during modify_qp
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.

Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
652caba5b5 RDMA: Check srq_type during create_srq
uverbs was blocking srq_types the driver doesn't support based on the
CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types
during create_srq and move CREATE_XSRQ to the core code.

Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
44ce37bc8b RDMA: Move more uverbs_cmd_mask settings to the core
These functions all depend on the driver providing a specific op:

- REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op
- ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this
  without providing the op
- OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides
  xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow.
- OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd()
- CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq()
- QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but
  sometimes supplies a NULL op.
- RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op
- ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an
  (now deleted) implementation but no userspace

All drivers were checked that no drivers provide the op without also
setting uverbs_cmd_mask so this should have no functional change.

Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
c074bb1e30 RDMA: Remove elements in uverbs_cmd_mask that all drivers set
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the
core code. Only the op reg_user_mr wasn't already being required from the
drivers.

Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:57 -03:00
Jason Gunthorpe
b8e3130dd9 RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions
Since a while now the uverbs layer checks if the driver implements a
function before allowing the ucmd to proceed. This largely obsoletes the
cmd_mask stuff, but there is some tricky bits in drivers preventing it
from being removed.

Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the
core code. These are triggered soley based on the related ops function
pointer.

query_device_ex is not triggered based on an op, but all drivers already
implement something compatible with the extension, so enable it globally
too.

Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:56 -03:00
Parav Pandit
fbdd0049d9 RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:18:19 -03:00
Jason Gunthorpe
e0477b34d9 RDMA: Explicitly pass in the dma_device to ib_register_device
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.

Other than setting the masks in rvt all drivers were doing this already
anyhow.

mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
  dma_set_max_seg_size (1G)

__mlx4_init_one()
  dma_set_max_seg_size (1G)

mlx5_pci_init()
  set_dma_caps()
    dma_set_max_seg_size (2G)

Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.

[1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
[2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/

Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:53:46 -03:00
Avihai Horon
1c15b4f2a4 RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Yishai Hadas
a03bfc37d5 RDMA/mlx5: Sync device with CPU pages upon ODP MR registration
Sync device with CPU pages upon ODP MR registration. mlx5 already has to
zero the HW's version of the PAS list, may as well deliver a PAS list that
matches the current CPU page tables configuration.

Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:44 -03:00
Yishai Hadas
677cf51f71 RDMA/mlx5: Extend advice MR to support non faulting mode
Extend advice MR to support non faulting mode, this can improve
performance by increasing the populated page tables in the device.

Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
8bfafde086 IB/core: Enable ODP sync without faulting
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.

The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.

As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.

Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
36f30e486d IB/core: Improve ODP to use hmm_range_fault()
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:

This includes:
- Dropping the need to allocate and free memory to hold its output

- No need any more to use put_page() to unpin the pages

- The logic to detect contiguous pages is done based on the returned
  order, no need to run per page and evaluate.

In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.

As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().

Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:39:54 -03:00
Leon Romanovsky
b925c555a1 RDMA/drivers: Remove udata check from special QP
GSI QP can't be created from the user space, hence the udata check is
always false (udata == NULL). Remove that check and simplify the flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky
eebe580feb RDMA/mlx5: Delete not needed GSI QP signal QP type
GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too.  This
means that the GSI QP signal QP type can be removed.

Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky
2dc4d6725b RDMA/mlx5: Change GSI QP to have same creation flow like other QPs
There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.

Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
f8225e3488 RDMA/mlx5: Reuse existing fields in parent QP storage object
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields.  This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.

Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
0d9aef8603 RDMA/mlx5: Embed GSI QP into general mlx5_ib QP
The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.

Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Liu Shixin
b942fc0319 RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()
sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25 09:17:42 -03:00
Leon Romanovsky
d7ecab1e5f RDMA/mlx5: Don't call to restrack recursively
The restrack is going to manage memory of all IB objects and must be
called before object is created. GSI QP in the mlx5_ib separated between
creating dummy interface and HW object beneath. This was achieved by
double call to ib_create_qp().

In order to skip such reentry call to internal driver create_qp code.

Link: https://lore.kernel.org/r/20200922091106.2152715-3-leon@kernel.org
Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Jason Gunthorpe
8383da3e4a RDMA/mlx5: Clarify what the UMR is for when creating MRs
Once a mkey is created it can be modified using UMR. This is desirable for
performance reasons. However, different hardware has restrictions on what
modifications are possible using UMR. Make sense of these checks:

- mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
  altered. Most cases create MRs using 0 access flags (now made clear by
  consistent use of set_mkc_access_pd_addr_fields()), but the old logic
  here was tormented. Make it clear that this is checking if the current
  access_flags can be modified using UMR to different access_flags. It is
  always OK to use UMR to change flags that all HW supports.

- mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
  enable and update the PAS/XLT. Enabling requires updating the entity
  size, so UMR ends up completely disabled on this old hardware. Make it
  clear why it is disabled. FRWR, ODP and cache always requires
  mlx5_ib_can_load_pas_with_umr().

- mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
  resized to hold a new PAS list. This only works for cached MR's because
  we don't store the PAS list size in other cases.

To be very clear, arrange things so any pre-created MR's in the cache
check the newly requested access_flags before allowing the MR to leave the
cache. If UMR cannot set the required access_flags the cache fails to
create the MR.

This in turn means relaxed ordering and atomic are now correctly blocked
early for implicit ODP on older HW.

Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
0ec52f0194 RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work
set_reg_wr() always fails if !umr_modify_entity_size_disabled because
mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR
doesn't work and that means the device should not advertise
IB_DEVICE_MEM_MGT_EXTENSIONS.

Fixes: 841b07f99a ("IB/mlx5: Block MR WR if UMR is not possible")
Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
5eb29f0d13 RDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabled
Any mkey that is not enabled and assigned to userspace should have the PD
set to a kernel owned PD.

When cache entries are created for the first time the PDN is set to 0,
which is probably a kernel PD, but be explicit.

When a MR is registered using the hybrid reg_create with UMR xlt & enable
the disabled mkey is pointing at the user PD, keep it pointing at the
kernel until a UMR enables it and sets the user PD.

Fixes: 9ec4483a3f ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache")
Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
1c97ca3da0 RDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create()
reg_create() open codes this helper, use the shared code.

Link: https://lore.kernel.org/r/20200914112653.345244-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:42 -03:00
Jason Gunthorpe
2e4e706e09 RDMA/mlx5: Remove dead check for EAGAIN after alloc_mr_from_cache()
alloc_mr_from_cache() no longer returns EAGAIN, this is just dead code
now.

Fixes: aad719dcf3 ("RDMA/mlx5: Allow MRs to be created in the cache synchronously")
Link: https://lore.kernel.org/r/20200914112653.345244-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:42 -03:00
Jason Gunthorpe
a1255fff5d Merge branch 'mlx_sw_owner_v2' into rdma.git for-next
Leon Romanovsky says:

====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_sw_owner_v2:
  RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
  RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
  RDMA/mlx5: Add sw_owner_v2 bit capability
2020-09-18 10:31:45 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Aharon Landau
376ceb31ff RDMA: Fix link active_speed size
According to the IB spec active_speed size should be u16 and not u8 as
before. Changing it to allow further extensions in offered speeds.

Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 10:31:24 -03:00
Alex Vesker
54a38b6627 RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
Expose the ICM address to access TIR and QP, this will allow sw_owned_v2
devices to steer traffic to TIRs and QPs same as done with sw_owner
capability.

Link: https://lore.kernel.org/r/20200903073857.1129166-4-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 09:40:41 -03:00
Alex Vesker
8310e32704 RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
sw_owner_v2 will replace sw_owner for future devices, this means that if
sw_owner_v2 is set sw_owner should be ignored and DM allocation is
required for sw_owner_v2 devices to function.

Link: https://lore.kernel.org/r/20200903073857.1129166-3-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 09:40:41 -03:00
Leon Romanovsky
c0a6b5ecc5 RDMA: Convert RWQ table logic to ib_core allocation scheme
Move struct ib_rwq_ind_table allocation to ib_core.

Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:33 -03:00
Leon Romanovsky
d18bb3e152 RDMA: Clean MW allocation and free flows
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.

Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:32 -03:00
Aharon Landau
e27014bdb4 RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
Combine two same enums to avoid duplication.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:03 +03:00
Aharon Landau
639bf4415c net/mlx5: Refactor query port speed functions
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.

And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.

According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.

Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:02 +03:00
Jason Gunthorpe
a665aca89a RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
ib_umem_num_pages() should only be used by things working with the SGL in
CPU pages directly.

Drivers building DMA lists should use the new ib_num_dma_blocks() which
returns the number of blocks rdma_umem_for_each_block() will return.

To make this general for DMA drivers requires a different implementation.
Computing DMA block count based on umem->address only works if the
requested page size is < PAGE_SIZE and/or the IOVA == umem->address.

Instead the number of DMA pages should be computed in the IOVA address
space, not umem->address. Thus the IOVA has to be stored inside the umem
so it can be used for these calculations.

For now set it to umem->address by default and fix it up if
ib_umem_find_best_pgsz() was called. This allows drivers to be converted
to ib_umem_num_dma_blocks() safely.

Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Leon Romanovsky
71ff3f6268 RDMA: Make counters destroy symmetrical
Change counters to return failure like any other verbs destroy, however
this flow shouldn't return error at all.

Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
add53535fb RDMA: Restore ability to return error for destroy WQ
Make this interface symmetrical to other destroy paths.

Fixes: a49b1dc7ae ("RDMA: Convert destroy_wq to be void")
Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
d0c45c8556 RDMA: Change XRCD destroy return value
Update XRCD destroy flow to allow command failure.

Fixes: 28ad5f65c3 ("RDMA: Move XRCD to be under ib_core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
43d781b9fa RDMA: Allow fail of destroy CQ
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.

Kernel verbs and drivers that don't have DEVX flows shouldn't fail.

Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
119181d1d4 RDMA: Restore ability to fail on SRQ destroy
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.

Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:24 -03:00
Leon Romanovsky
fd89099d63 RDMA/mlx5: Issue FW command to destroy SRQ on reentry
The HW release can fail and leave the system in limbo state, where SRQ is
removed from the table, but can't be destroyed later.  In every reentry,
the initial xa_erase_irq() check will fail.

Rewrite the erase logic to keep index, but don't store the entry
itself. By doing it, we can safely reinsert entry back in the case of
destroy failure.

Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:04:13 -03:00
Leon Romanovsky
9a9ebf8cd7 RDMA: Restore ability to fail on AH destroy
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.

Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.

Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Leon Romanovsky
91a7c58fce RDMA: Restore ability to fail on PD deallocate
The IB verbs objects are counted by the kernel and ib_core ensures that
deallocate PD will success so it will be called once all other objects
that depends on PD will be released. This is achieved by managing various
reference counters on such objects.

The mlx5 driver didn't follow this standard flow when allowed DEVX objects
that are not managed by ib_core to be interleaved with the ones under
ib_core responsibility.

In such interleaved scenarios deallocate command can fail and ib_core will
leave uobject in internal DB and attempt to clean it later to free
resources anyway.

This change partially restores returned value from dealloc_pd() for all
drivers, but keeping in mind that non-DEVX devices and kernel verbs paths
shouldn't fail.

Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Leon Romanovsky
4b916ed9f9 RDMA/mlx5: Fix potential race between destroy and CQE poll
The SRQ can be destroyed right before mlx5_cmd_get_srq is called.
In such case the latter will return NULL instead of expected SRQ.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 10:12:57 -03:00
Jason Gunthorpe
6989aa62d3 Linux 5.9-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9ML+IeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGA8EIAIy/kTbFS0yrE9yV
 hb98oX0z9+EU9YQg9vhaRWwPd+rJF/JMQZLqYcwbhjG9abaUL3T3fEcSAefMHw8E
 LAt+hYzA38dHt7tqhsFQX3vV1VorvDVICBVN0yRPRWKKikq4OPIHzaAR9tleGAF5
 8btQisl1PjN+obwYmLuNb6aX16OCwAF+uXOwehcoJs9dvMNhwtXRzfOflWzOvOo6
 tE0bHErlylLDfLv4ZzEfczTdks4QJZ7C0xLSf3oN9AAynW42Xnhct4hi8qZY/hCf
 CMaqeN4hdpub6TvQIqBdDqMMjEXGFgeNSnAEBQY9VpvUqz8NTu6sQxwgJEKDF5tg
 d81lv2c=
 =uW/F
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc3' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:28:12 -03:00
Mark Zhang
7c4b1ab9f1 IB/mlx5: Add DCT RoCE LAG support
When DCT QPs work in RoCE LAG mode:
 1. DCT creation is allowed only when it is supported
 2. The "port" of a DCT QP is assigned in a round-robin way

Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Mark Zhang
8f3243a047 IB/mlx5: Add tx_affinity support for DCI QP
DCI QP supports tx_affinity as well.

Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00