Commit Graph

501 Commits

Author SHA1 Message Date
Michael Guralnik
73d09b2fe8 RDMA/mlx5: Introduce mlx5r_cache_rb_key
Switch from using the mkey order to using the new struct as the key to the
RB tree of cache entries.

The key is all the mkey properties that UMR operations can't modify.
Using this key to define the cache entries and to search and create cache
mkeys.

Link: https://lore.kernel.org/r/20230125222807.6921-5-michaelgur@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 13:04:09 -04:00
Michael Guralnik
b958451783 RDMA/mlx5: Change the cache structure to an RB-tree
Currently, the cache structure is a static linear array. Therefore, his
size is limited to the number of entries in it and is not expandable.  The
entries are dedicated to mkeys of size 2^x and no access_flags. Mkeys with
different properties are not cacheable.

In this patch, we change the cache structure to an RB-tree.  This will
allow to extend the cache to support more entries with different mkey
properties.

Link: https://lore.kernel.org/r/20230125222807.6921-4-michaelgur@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 13:04:09 -04:00
Aharon Landau
a2a88b8e22 RDMA/mlx5: Don't keep umrable 'page_shift' in cache entries
mkc.log_page_size can be changed using UMR. Therefore, don't treat it as a
cache entry property.

Removing it from struct mlx5_cache_ent.

All cache mkeys will be created with default PAGE_SHIFT, and updated with
the needed page_shift using UMR when passing them to a user.

Link: https://lore.kernel.org/r/20230125222807.6921-2-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 13:04:09 -04:00
Arumugam Kolappan
abef378c43 RDMA/mlx5: Change debug log level for remote access error syndromes
The mlx5 driver dumps the entire CQE buffer by default for few syndromes.
Some syndromes are expected due to the application behavior [ex:
MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR, MLX5_CQE_SYNDROME_REMOTE_OP_ERR and
MLX5_CQE_SYNDROME_LOCAL_PROT_ERR]. Hence, for these syndromes, the patch
converts the log level from KERN_WARNING to KERN_DEBUG. This enables the
application to get the CQE buffer dump by changing to KERN_DEBUG level
as and when needed.

Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
Link: https://lore.kernel.org/r/1667287664-19377-1-git-send-email-aru.kolappan@oracle.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-06 20:31:35 +02:00
Linus Torvalds
e08466a7c0 v6.1 merge window pull request
- Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw
 
 - rts tracing improvements
 
 - Code improvements: strlscpy conversion, unused parameter, spelling
   mistakes, unused variables, flex arrays
 
 - restrack device details report for hns
 
 - Simplify struct device initialization in SRP
 
 - Eliminate the never-used service_mask support in IB CM
 
 - Make rxe not print to the console for some kinds of network packets
 
 - Asymetric paths and router support in the CM through netlink messages
 
 - DMABUF importer support for mlx5devx umem's
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYz9bgAAKCRCFwuHvBreF
 YevoAP47J/svlOFlFtBhTVF79Ddtf+MMeqeVvLoHHQbCU5rUpAD+KUpTXAvwNcM9
 dHwNXz9ctanP5397qusH0rxOKPo/EA4=
 =lgSv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Not a big list of changes this cycle, mostly small things. The new
  MANA rdma driver should come next cycle along with a bunch of work on
  rxe.

  Summary:

   - Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw

   - rts tracing improvements

   - Code improvements: strlscpy conversion, unused parameter, spelling
     mistakes, unused variables, flex arrays

   - restrack device details report for hns

   - Simplify struct device initialization in SRP

   - Eliminate the never-used service_mask support in IB CM

   - Make rxe not print to the console for some kinds of network packets

   - Asymetric paths and router support in the CM through netlink
     messages

   - DMABUF importer support for mlx5devx umem's"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (84 commits)
  RDMA/rxe: Remove error/warning messages from packet receiver path
  RDMA/usnic: fix set-but-not-unused variable 'flags' warning
  IB/hfi1: Use skb_put_data() instead of skb_put/memcpy pair
  RDMA/hns: Unified Log Printing Style
  RDMA/hns: Replacing magic number with macros in apply_func_caps()
  RDMA/hns: Repacing 'dseg_len' by macros in fill_ext_sge_inl_data()
  RDMA/hns: Remove redundant 'max_srq_desc_sz' in caps
  RDMA/hns: Remove redundant 'num_mtt_segs' and 'max_extend_sg'
  RDMA/hns: Remove redundant 'phy_addr' in hns_roce_hem_list_find_mtt()
  RDMA/hns: Remove redundant 'use_lowmem' argument from hns_roce_init_hem_table()
  RDMA/hns: Remove redundant 'bt_level' for hem_list_alloc_item()
  RDMA/hns: Remove redundant 'attr_mask' in modify_qp_init_to_init()
  RDMA/hns: Remove unnecessary brackets when getting point
  RDMA/hns: Remove unnecessary braces for single statement blocks
  RDMA/hns: Cleanup for a spelling error of Asynchronous
  IB/rdmavt: Add __init/__exit annotations to module init/exit funcs
  RDMA/rxe: Remove redundant num_sge fields
  RDMA/mlx5: Enable ATS support for MRs and umems
  RDMA/mlx5: Add support for dmabuf to devx umem
  RDMA/core: Add UVERBS_ATTR_RAW_FD
  ...
2022-10-07 12:05:29 -07:00
Jason Gunthorpe
33331a728c Linux 6.0
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmM5/fMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+Y0H/jG0HQjUHfIOGjPQ
 T33eaF5POoZKzZmsTNITbtya7B3/OTZZRGdSq9B8t5tElyPnZrdIfaXds17mCI8y
 5DJUEQGdv6fqRttYLGKf6rk1kzABhhaTS8n9BgDFEsmvdqwG4AV6dLQr3HL09gTV
 Lu+Is86RPwmpgH0Z9u7DyFCF8yLjefyu0vl6eFm/SXjCE8gQM/LZQSi9mv5/YDxa
 MVKlsZKIkQm6P8a6r8wbGedKYBped4+4gYedsf/IcS0lvKdLIs/P7YgR5wKklSbM
 tqECvBOmKq1Fwj/oxH+fx0KLX/ZD1xxQwoQd+a9DOSo+BuPBt6KGojYT9gQRyFJp
 R7gyUCo=
 =2UOD
 -----END PGP SIGNATURE-----

Merge tag 'v6.0' into rdma.git for-next

Trvial merge conflicts against rdma.git for-rc resolved matching
linux-next:
            drivers/infiniband/hw/hns/hns_roce_hw_v2.c
            drivers/infiniband/hw/hns/hns_roce_main.c

https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-06 19:48:45 -03:00
Jakub Kicinski
0d5bfebf74 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
updates from mlx5-next 2022-09-24

Updates form mlx5-next including[1]:

1) HW definitions and support for NPPS clock settings.

2) various cleanups

3) Enable hash mode by default for all NICs

4) page tracker and advanced virtualization HW definitions for vfio

[1] https://lore.kernel.org/netdev/20220907233636.388475-1-saeed@kernel.org/

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Remove from FPGA IFC file not-needed definitions
  net/mlx5: Remove unused structs
  net/mlx5: Remove unused functions
  net/mlx5: detect and enable bypass port select flow table
  net/mlx5: Lag, enable hash mode by default for all NICs
  net/mlx5: Lag, set active ports if support bypass port select flow table
  RDMA/mlx5: Don't set tx affinity when lag is in hash mode
  net/mlx5: add IFC bits for bypassing port select flow table
  net/mlx5: Add support for NPPS with real time mode
  net/mlx5: Expose NPPS related registers
  net/mlx5: Query ADV_VIRTUALIZATION capabilities
  net/mlx5: Introduce ifc bits for page tracker
  RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib
====================

Link: https://lore.kernel.org/all/20220927201906.234015-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28 19:20:49 -07:00
Liu, Changcheng
a83bb5df2a RDMA/mlx5: Don't set tx affinity when lag is in hash mode
In hash mode, without setting tx affinity explicitly, the port select
flow table decides which port is used for the traffic.
If port_select_flow_table_bypass capability is supported and tx affinity
is set explicitly for QP/TIS, they will be added into the explicit affinity
table in FW to check which port is used for the traffic.
1. The overloaded explicit affinity table may affect performance.
   To avoid this, do not set tx affinity explicitly by default.
2. The packets of the same flow need to be transmitted on the same port.
   Because the packets of the same flow use different QPs in slow & fast
   path, it shouldn't set tx affinity explicitly for these QPs.

Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-09-27 12:50:27 -07:00
Jason Gunthorpe
72b2f7608a RDMA/mlx5: Enable ATS support for MRs and umems
For mlx5 if ATS is enabled in the PCI config then the device will use ATS
requests for only certain DMA operations. This has to be opted in by the
SW side based on the mkey or umem settings.

ATS slows down the PCI performance, so it should only be set in cases when
it is needed. All of these cases revolve around optimizing PCI P2P
transfers and avoiding bad cases where the bus just doesn't work.

Link: https://lore.kernel.org/r/4-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Maor Gottlieb
9b7d4be967 RDMA/mlx5: Fix UMR cleanup on error flow of driver init
The cited commit removed from the cleanup flow of umr the checks
if the resources were created. This could lead to null-ptr-deref
in case that we had failure in mlx5_ib_stage_ib_reg_init stage.

Fix it by adding new state to the umr that can say if the resources
were created or not and check it in the umr cleanup flow before
destroying the resources.

Fixes: 04876c12c1 ("RDMA/mlx5: Move init and cleanup of UMR to umr.c")
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/4cfa61386cf202e9ce330e8d228ce3b25a36326e.1661763459.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-05 14:49:57 +03:00
Aharon Landau
0113780870 RDMA/mlx5: Rename the mkey cache variables and functions
After replacing the MR cache with an Mkey cache, rename the variables and
functions to fit the new meaning.

Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27 14:45:48 -03:00
Aharon Landau
6b75338695 RDMA/mlx5: Store in the cache mkeys instead of mrs
Currently, the driver stores mlx5_ib_mr struct in the cache entries,
although the only use of the cached MR is the mkey. Store only the mkey in
the cache.

Link: https://lore.kernel.org/r/20220726071911.122765-5-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27 14:45:48 -03:00
Aharon Landau
19591f134c RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrs
total_mrs is used only to calculate the number of mkeys currently in
use. To simplify things, replace it with a new member called "in_use" and
directly store the number of mkeys currently in use.

Link: https://lore.kernel.org/r/20220726071911.122765-4-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27 14:45:48 -03:00
Aharon Landau
86457a92df RDMA/mlx5: Replace cache list with Xarray
The Xarray allows us to store the cached mkeys in memory efficient way.

Entries are reserved in the Xarray using xa_cmpxchg before calling to the
upcoming callbacks to avoid allocations in interrupt context.  The
xa_cmpxchg can sleep when using GFP_KERNEL, so we call it in a loop to
ensure one reserved entry for each process trying to reserve.

Link: https://lore.kernel.org/r/20220726071911.122765-3-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27 14:45:48 -03:00
Aharon Landau
17ae355926 RDMA/mlx5: Replace ent->lock with xa_lock
In the next patch, ent->list will be replaced with an xarray. The xarray
uses an internal lock to protect the indexes. Use it to protect all the
entry fields, and get rid of ent->lock.

Link: https://lore.kernel.org/r/20220726071911.122765-2-michaelgur@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27 14:45:48 -03:00
Mark Bloch
0c6ab0ca9a RDMA/mlx5: Expose steering anchor to userspace
Expose a steering anchor per priority to allow users to re-inject
packets back into default NIC pipeline for additional processing.

MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which
a user can use to re-inject packets at a specific priority.

A FTE (flow table entry) can be created and the flow table ID
used as a destination.

When a packet is taken into a RDMA-controlled steering domain (like
software steering) there may be a need to insert the packet back into
the default NIC pipeline. This exposes a flow table ID to the user that can
be used as a destination in a flow table entry.

With this new method priorities that are exposed to users via
MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID.

As user-created flow tables (via RDMA DEVX) are created with a non-zero UID
thus it's impossible to point to a NIC core flow table (core driver flow tables
are created with UID value of zero) from userspace.
Create flow tables that are exposed to users with the shared UID, this
allows users to point to default NIC flow tables.

Steering loops are prevented at FW level as FW enforces that no flow
table at level X can point to a table at level lower than X.

Link: https://lore.kernel.org/all/20220703205407.110890-6-saeed@kernel.org/
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19 20:50:34 +03:00
Aharon Landau
158e71bb69 RDMA/mlx5: Add a umr recovery flow
When a UMR fails, the UMR QP state changes to an error state. Therefore,
all the further UMR operations will fail too.

Add a recovery flow to the UMR QP, and repost the flushed WQEs.

Link: https://lore.kernel.org/r/6cc24816cca049bd8541317f5e41d3ac659445d3.1652588303.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-06-07 12:57:41 +03:00
Linus Torvalds
780d8ce716 v5.19 pull request
Small collection of incremental improvement patches:
 
 - Minor code cleanup patches, comment improvements, etc from static tools
 
 - Clean the some of the kernel caps, reducing the historical stealth uAPI
   leftovers
 
 - Bug fixes and minor changes for rdmavt, hns, rxe, irdma
 
 - Remove unimplemented cruft from rxe
 
 - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
 
 - flush_workqueue(system_unbound_wq) removal
 
 - Ensure rxe waits for objects to be unused before allowing the core to
   free them
 
 - Several rc quality bug fixes for hfi1
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
 YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
 AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
 =jgTl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small collection of incremental improvement patches:

   - Minor code cleanup patches, comment improvements, etc from static
     tools

   - Clean the some of the kernel caps, reducing the historical stealth
     uAPI leftovers

   - Bug fixes and minor changes for rdmavt, hns, rxe, irdma

   - Remove unimplemented cruft from rxe

   - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
     layer

   - flush_workqueue(system_unbound_wq) removal

   - Ensure rxe waits for objects to be unused before allowing the core
     to free them

   - Several rc quality bug fixes for hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
  RDMA/rtrs-clt: Fix one kernel-doc comment
  RDMA/hfi1: Remove all traces of diagpkt support
  RDMA/hfi1: Consolidate software versions
  RDMA/hfi1: Remove pointless driver version
  RDMA/hfi1: Fix potential integer multiplication overflow errors
  RDMA/hfi1: Prevent panic when SDMA is disabled
  RDMA/hfi1: Prevent use of lock before it is initialized
  RDMA/rxe: Fix an error handling path in rxe_get_mcg()
  IB/core: Fix typo in comment
  RDMA/core: Fix typo in comment
  IB/hf1: Fix typo in comment
  IB/qib: Fix typo in comment
  IB/iser: Fix typo in comment
  RDMA/mlx4: Avoid flush_scheduled_work() usage
  IB/isert: Avoid flush_scheduled_work() usage
  RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
  RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
  RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
  RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
  RDMA/irdma: Add SW mechanism to generate completions on error
  ...
2022-05-26 21:08:40 -07:00
Mark Bloch
34a30d7635 net/mlx5: Lag, expose number of lag ports
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Aharon Landau
c8a02e38f8 RDMA/mlx5: Clean UMR QP type flow from mlx5_ib_post_send()
No internal UMR operation is using mlx5_ib_post_send(), remove the UMR QP
type logic from this function.

Link: https://lore.kernel.org/r/0b2f368f14bc9266ebdf92a601ca4e1e5b1e1188.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25 12:00:10 -03:00
Aharon Landau
636bdbfc99 RDMA/mlx5: Use mlx5_umr_post_send_wait() to update xlt
Move mlx5_ib_update_mr_pas logic to umr.c, and use
mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait().

Since it is the last use of mlx5_ib_post_send_wait(), remove it.

Link: https://lore.kernel.org/r/55a4972f156aba3592a2fc9bcb33e2059acf295f.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25 12:00:08 -03:00
Aharon Landau
b3d47ebd49 RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pas
Move mlx5_ib_update_mr_pas logic to umr.c, and use
mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait().

Link: https://lore.kernel.org/r/ed8f2ee6c64804072155d727149abf7105f92536.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25 11:59:54 -03:00
Aharon Landau
8a8a5d37c7 RDMA/mlx5: Move mkey ctrl segment logic to umr.c
Move set_reg_umr_segment() and its helpers to umr.c.

Link: https://lore.kernel.org/r/5a7fac8ae8543521d19d174663245ae84b910310.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25 11:52:59 -03:00
Aharon Landau
f49c856ac2 RDMA/mlx5: Move umr checks to umr.h
Move mlx5_ib_can_load_pas_with_umr() and mlx5_ib_can_reconfig_with_umr()
to umr.h and rename them accordingly.

Link: https://lore.kernel.org/r/1b799b0142534a63dfd5bacc5f8ad2256d7777ad.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25 11:52:59 -03:00
Aharon Landau
9ee2516c43 RDMA/mlx5: Store ndescs instead of the translation table size
Currently, ent->xlt stores the translation table size. This data should
not be stored in the cache entry but be written directly to the mailbox.
Store ndescs instead, and deduce the translation table size from it
according to the access mode.

Link: https://lore.kernel.org/r/e9dbfaa1f279793a6bd28ee5a31cb4f0f0d70f05.1644947594.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 14:59:13 -04:00
Aharon Landau
56561ac6b2 RDMA/mlx5: Merge similar flows of allocating MR from the cache
When allocating a MR from the cache, the driver calls to get_cache_mr(),
and in case of failure, retries with create_cache_mr(). This is the flow
of mlx5_mr_cache_alloc(), so use it instead.

Link: https://lore.kernel.org/r/53c85fcd4de6ec9de0b8e6cbb1bf5d5fe19900c3.1644947594.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 14:59:13 -04:00
Aharon Landau
185b982678 RDMA/mlx5: Remove redundant work in struct mlx5_cache_ent
delayed_cache_work_func() and the cache_work_func() are both wrappers of
__cache_work_func(). Instead of having a special not delayed work, use the
delayed work with delay = 0.

Link: https://lore.kernel.org/r/18b6ae205e75f087aa4a2a05c81ea8b66d8d88dc.1644947594.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 14:59:13 -04:00
Leon Romanovsky
84aa6c3963 RDMA/mlx5: Delete get_num_static_uars function
There is no need to keep get_num_static_uars in the headers file
as it is not shared and used only once.

Link: https://lore.kernel.org/r/11d78568c3c6ba588ee8465e0d10d96145fc825c.1642960830.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 13:01:54 -04:00
Jason Gunthorpe
c0fe82baae Linux 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmHbZ+YeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGDs4H/RgC8JOV3Dki1VtO
 6OwPxUKKojhVU9LJis7kyG5voB/zE7tK5nI+jC3gYGQUFKWaZ3YY8s3UcV1zvg/b
 a44b91boA+dKxEwOq4RZNQ9mU+QWnNoG5+UqBkmB8vewi3QC3T8xEmpWcERLbU7d
 KrI2T6i4ksJ9OYSYMEMyrvrpt7nt3n1tDX8b71faXjf1zbLeGo9zT53t6BJ/LknV
 AK406Eq/3bg36OZrKFuG7hCJfRE/cSlxF9bxK3sIfMBMQ2YPe1S5+pxl5iBD0nyl
 NaHOBYcLTxPAne3YgIvK0zDdsS+EtPSlaVdWfSmNjQhX2vqEixldgdrOCmwp37vd
 3gV9D28=
 =hrOo
 -----END PGP SIGNATURE-----

Merge tag 'v5.16' into rdma.git for-next

To resolve minor conflict in:
        drivers/infiniband/hw/mlx5/mlx5_ib.h

By merging both hunks.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-13 13:21:03 -04:00
Leon Romanovsky
d82e2b27ad RDMA/mad: Delete duplicated init_query_mad functions
Several drivers used same function to initialize query MAD,
so move that function to global header file.

Link: https://lore.kernel.org/r/af6f35c590ff5ef56d0137351b8b295af0f7c13c.1641369858.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 15:18:36 -04:00
Maor Gottlieb
4163cb3d19 Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"
This patch is not the full fix and still causes to call traces
during mlx5_ib_dereg_mr().

This reverts commit f0ae4afe3d.

Fixes: f0ae4afe3d ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")
Link: https://lore.kernel.org/r/20211222101312.1358616-1-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 09:04:30 -04:00
Jason Gunthorpe
c8f476da84 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
Currently, the driver ignores the user's priority for flow steering rules
in FDB namespace. Change it and create the rule in the right priority.

It will allow to create FDB steering rules in up to 16 different
priorities.
====================

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* mellanox/mlx5-next:
  RDMA/mlx5: Add support to multiple priorities for FDB rules
  net/mlx5: Create more priorities for FDB bypass namespace
  net/mlx5: Refactor mlx5_get_flow_namespace
  net/mlx5: Separate FDB namespace
2021-12-15 08:15:53 -04:00
Kees Cook
e517f76a3c RDMA/mlx5: Use memset_after() to zero struct mlx5_ib_mr
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_after() to zero the end of struct mlx5_ib_mr that should be
initialized.

Link: https://lore.kernel.org/r/20211213223331.135412-10-keescook@chromium.org
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:21:22 -04:00
Maor Gottlieb
a973f86b41 RDMA/mlx5: Add support to multiple priorities for FDB rules
Currently, the driver ignores the user's priority for flow steering
rules in FDB namespace. Change it and create the rule in the right
priority.
It will allow to create FDB steering rules in up to 16 different
priorities.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-13 16:03:00 -08:00
Alaa Hleihel
f0ae4afe3d RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow
For the case of IB_MR_TYPE_DM the mr does doesn't have a umem, even though
it is a user MR. This causes function mlx5_free_priv_descs() to think that
it is a kernel MR, leading to wrongly accessing mr->descs that will get
wrong values in the union which leads to attempt to release resources that
were not allocated in the first place.

For example:
 DMA-API: mlx5_core 0000:08:00.1: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=0 bytes]
 WARNING: CPU: 8 PID: 1021 at kernel/dma/debug.c:961 check_unmap+0x54f/0x8b0
 RIP: 0010:check_unmap+0x54f/0x8b0
 Call Trace:
  debug_dma_unmap_page+0x57/0x60
  mlx5_free_priv_descs+0x57/0x70 [mlx5_ib]
  mlx5_ib_dereg_mr+0x1fb/0x3d0 [mlx5_ib]
  ib_dereg_mr_user+0x60/0x140 [ib_core]
  uverbs_destroy_uobject+0x59/0x210 [ib_uverbs]
  uobj_destroy+0x3f/0x80 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x435/0xd10 [ib_uverbs]
  ? uverbs_finalize_object+0x50/0x50 [ib_uverbs]
  ? lock_acquire+0xc4/0x2e0
  ? lock_acquired+0x12/0x380
  ? lock_acquire+0xc4/0x2e0
  ? lock_acquire+0xc4/0x2e0
  ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
  ? lock_release+0x28a/0x400
  ib_uverbs_ioctl+0xc0/0x140 [ib_uverbs]
  ? ib_uverbs_ioctl+0x7c/0x140 [ib_uverbs]
  __x64_sys_ioctl+0x7f/0xb0
  do_syscall_64+0x38/0x90

Fix it by reorganizing the dereg flow and mlx5_ib_mr structure:
 - Move the ib_umem field into the user MRs structure in the union as it's
   applicable only there.
 - Function mlx5_ib_dereg_mr() will now call mlx5_free_priv_descs() only
   in case there isn't udata, which indicates that this isn't a user MR.

Fixes: f18ec42231 ("RDMA/mlx5: Use a union inside mlx5_ib_mr")
Link: https://lore.kernel.org/r/66bb1dd253c1fd7ceaa9fc411061eefa457b86fb.1637581144.git.leonro@nvidia.com
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-25 13:16:39 -04:00
Jason Gunthorpe
71ee1f1275 Merge brank 'mlx5_mkey' into rdma.git for-next
A small series to clean up the mlx5 mkey code across the mlx5_core and
InfiniBand.

* branch 'mlx5_mkey':
  RDMA/mlx5: Attach ndescs to mlx5_ib_mkey
  RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib
  RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
  RDMA/mlx5: Remove pd from struct mlx5_core_mkey
  RDMA/mlx5: Remove size from struct mlx5_core_mkey
  RDMA/mlx5: Remove iova from struct mlx5_core_mkey

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19 14:09:25 -03:00
Aharon Landau
ae0579acde RDMA/mlx5: Attach ndescs to mlx5_ib_mkey
Generalize the use of ndescs by adding it to mlx5_ib_mkey.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:42:53 +03:00
Aharon Landau
4123bfb0b2 RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib
Move mlx5_core_mkey struct to mlx5_ib, as the mlx5_core doesn't use it
at this point.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19 14:35:28 +03:00
Aharon Landau
a29b934ceb RDMA/mlx5: Add modify_op_stat() support
Add support for ib callback modify_op_stat() to add or remove an optional
counter. When adding, a steering flow table is created with a rule that
catches and counts all the matching packets. When removing, the table and
flow counter are destroyed.

Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 12:48:06 -03:00
Aharon Landau
ffa501ef19 RDMA/mlx5: Add steering support in optional flow counters
Adding steering infrastructure for adding and removing optional counter.
This allows to add and remove the counters dynamically in order not to
hurt performance.

Link: https://lore.kernel.org/r/20211008122439.166063-12-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 12:48:06 -03:00
Aharon Landau
886773d249 RDMA/mlx5: Support optional counters in hw_stats initialization
Add optional counter support when allocate and initialize hw_stats
structure. Optional counters have IB_STAT_FLAG_OPTIONAL flag set and are
disabled by default.

Link: https://lore.kernel.org/r/20211008122439.166063-11-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 12:48:06 -03:00
Aharon Landau
13f30b0fa0 RDMA/counter: Add a descriptor in struct rdma_hw_stats
Add a counter statistic descriptor structure in rdma_hw_stats. In addition
to the counter name, more meta-information will be added.  This code
extension is needed for optional-counter support in the following patches.

Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 12:48:04 -03:00
Leon Romanovsky
514aee660d RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme.  That
change allows us to make sure that restrack properly kref the memory.

Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
8c9e7f0325 RDMA/mlx5: Delete device resource mutex that didn't protect anything
The dev->devr.mutex was intended to protect GSI QP pointer change in the
struct mlx5_ib_port_resources when it is accessed from the
pkey_change_work. However that pointer isn't changed during the runtime
and once IB/core adds MAD, it stays stable.

Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Jason Gunthorpe
2833c977c3 Merge branch 'mlx5_realtime_ts' into rdma.git for-next
Aharon Landau says:

====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.

It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.

This series fixes it, by returning real-time timestamp.
====================

* mlx5_realtime_ts:
  RDMA/mlx5: Support real-time timestamp directly from the device
  RDMA/mlx5: Refactor get_ts_format functions to simplify code

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 15:08:39 -03:00
Aharon Landau
336529518e RDMA/mlx5: Support real-time timestamp directly from the device
Currently, if the user asks for a real-time timestamp, the device will
return a free-running one, and the timestamp will be translated to
real-time in the user-space.

When the device supports only real-time timestamp and not free-running,
the creation of the QP will fail even though the user needs supported the
real-time one. To prevent this, we will return the real-time timestamp
directly from the device.

Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:23:50 -03:00
Maor Gottlieb
9ecf6ac17c RDMA/mlx5: Take qp type from mlx5_ib_qp
Change all the places in the mlx5_ib driver to take the qp type from the
mlx5_ib_qp struct, except the QP initialization flow. It will ensure that
we check the right QP type also for vendor specific QPs.

Link: https://lore.kernel.org/r/b2e16cd65b59cd24fa81c01c7989248da44e58ea.1621413899.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-26 16:49:42 -03:00
Lang Cheng
0bedd3d005 RDMA/mlx5: Remove unused parameter udata
The old version of ib_umem_get() need these udata as a parameter but now
they are unnecessary.

Fixes: c320e527e1 ("IB: Allow calls to ib_umem_get from kernel ULPs")
Link: https://lore.kernel.org/r/1620807142-39157-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20 11:52:17 -03:00
Jason Gunthorpe
fe73f96e7b Merge branch 'mlx5_memic_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Maor Gottlieb says:
====================
This series from Maor extends MEMIC to support atomic operations from the
host in addition to already supported regular read/write.
====================

* 'memic_ops':
  RDMA/mlx5: Expose UAPI to query DM
  RDMA/mlx5: Add support in MEMIC operations
  RDMA/mlx5: Add support to MODIFY_MEMIC command
  RDMA/mlx5: Re-organize the DM code
  RDMA/mlx5: Move all DM logic to separate file
  RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number
  net/mlx5: Add MEMIC operations related bits
2021-04-13 19:37:17 -03:00
Maor Gottlieb
cea85fa5db RDMA/mlx5: Add support in MEMIC operations
MEMIC buffer, in addition to regular read and write operations, can
support atomic operations from the host.

Introduce and implement new UAPI to allocate address space for MEMIC
operations such as atomic. This includes:

1. Expose new IOCTL for request mapping of MEMIC operation.
2. Hold the operations address in a list, so same operation to same DM
   will be allocated only once.
3. Manage refcount on the mlx5_ib_dm object, so it would be keep valid
   until all addresses were unmapped.

Link: https://lore.kernel.org/r/20210411122924.60230-7-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Maor Gottlieb
831df88381 RDMA/mlx5: Move all DM logic to separate file
Move all device memory related code to a separate file.

Link: https://lore.kernel.org/r/20210411122924.60230-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Shay Drory
e5dc370bd9 RDMA/mlx5: Set ODP caps only if device profile support ODP
Currently, ODP caps are set during the init stage of mlx5_ib_dev,
regardless of whether the device profile supports ODP or not.  There is no
point in setting ODP caps if the device profile doesn't support
ODP. Hence, move setting the ODP caps to the odp_init stage.

Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 11:57:38 -03:00
Mark Bloch
1fb7f8973f RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs.  HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 09:31:21 -03:00
Shay Drory
ad50294d4d RDMA/mlx5: Create ODP EQ only when ODP MR is created
There is no need to create the ODP EQ if the user doesn't use ODP MRs.
Hence, create it only when the first ODP MR is created. This EQ will be
destroyed only when the device is unloaded.
This will decrease the number of EQs created per device. for example: If
we creates 1K devices (SF/VF/etc'), than we will decrease the num of EQs
by 1K.

Link: https://lore.kernel.org/r/20210314125418.179716-1-leon@kernel.org
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-23 17:00:14 -03:00
Jason Gunthorpe
e6fb246cca RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()
Now that the SRCU stuff has been removed the entire MR destroy logic can
be made a lot simpler. Currently there are many different ways to destroy a
MR and it makes it really hard to do this task correctly. Route all
destruction through mlx5_ib_dereg_mr() and make it work for all
situations.

Since it turns out all the different MR types do basically the same thing
this removes a lot of knowledge of MR internals from ODP and leaves ODP
just exporting an operation to clean up children.

This fixes a few weird corner cases bugs and firmly uses the correct
ordering of the MR destruction:
 - Stop parallel access to the mkey via the ODP xarray
 - Stop DMA
 - Release the umem
 - Clean up ODP children
 - Free/Recycle the MR

Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:25 -04:00
Jason Gunthorpe
f18ec42231 RDMA/mlx5: Use a union inside mlx5_ib_mr
The struct mlx5_ib_mr can be used for three different things, but only one
at a time:

 - In the user MR cache
 - As a kernel MR
 - As a user MR

Overlay the three things into a single union with the following rules:

 - If the mr is found on the cache_ent->head list then it is a cache MR
   and umem == NULL. The entire union is zero after the MR is removed from
   the cache.

 - If umem != NULL or type == IB_MR_TYPE_USER then it is a user MR.

 - If umem == NULL then it is a kernel MR

This reduces the size of struct mlx5_ib_mr to 552 bytes from 702.

The only place the three flows overlap in the code is during dereg, so add
a few extra checks along there.

Link: https://lore.kernel.org/r/20210304120745.1090751-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:25 -04:00
Jason Gunthorpe
a639e66703 RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr
All of the ODP code assumes when it calls mlx5_mr_cache_alloc() the ODP
related fields are zero'd. This is true if the MR was just allocated, but
if the MR is recycled through the cache then the values are never zero'd.

This causes a bug in the odp_stats, they don't reset when the MR is
reallocated, also is_odp_implicit is never 0'd.

So we can use memset on a block of the mlx5_ib_mr reorganize the structure
to put all the data that can be zero'd by the cache at the end.

It is organized as an anonymous struct because the next patch will make
this a union.

Delete the unused smr_info. Don't set the kernel only desc_size on the
user path. No longer any need to zero mr->parent before freeing it, the
memset() will get it now.

Fixes: a3de94e3d6 ("IB/mlx5: Introduce ODP diagnostic counters")
Link: https://lore.kernel.org/r/20210304120745.1090751-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:25 -04:00
Yishai Hadas
db72438c93 RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.

For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.

The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().

By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.

Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.

As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.

Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08 20:31:11 -04:00
Parav Pandit
131796524f IB/mlx5: Use rdma_for_each_port for port iteration
Instead of open coding the loop for port iteration, use rdma_for_each_port
macro provided by core.

To use such macro, early initialization of phys_port_cnt is needed.
Hence, initialize such constant early enough in the init stage.

Whichever functions are called with port using rdma_for_each_port(),
convert their port type from u8 to unsigned int to match the core API.

Link: https://lore.kernel.org/r/20210203130133.4057329-6-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:01 -04:00
Parav Pandit
7416790e22 RDMA/core: Introduce and use API to read port immutable data
Currently mlx5 driver caches port GID table length for 2 ports.  It is
also cached by IB core as port immutable data.

When mlx5 representor ports are present, which are usually more than 2,
invalid access to port_caps array can happen while validating the GID
table length which is only for 2 ports.

To avoid this, take help of the IB cores port immutable data by exposing
an API to read the port immutable fields.

Remove mlx5 driver's internal cache, thereby reduce code and data.

Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:01 -04:00
Parav Pandit
2019d70e91 IB/mlx5: Avoid calling query device for reading pkey table length
Pkey table length for all the ports of the device is the same.  Currently
get_ports_cap() reads and stores it for each port by querying the device
which reads more than just pkey table length.

For representor device ports which can be in range of hundreds, it queries
is for each such port and end up returning same value for all the ports.

When in representor mode, modify QP accesses pkey port caps for a port
index that can be outside of the port_caps table.

Hence, simplify the logic to query the max pkey table length only once
during device initialization sequence.

Link: https://lore.kernel.org/r/20210203130133.4057329-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:00 -04:00
Parav Pandit
3ce60f443b IB/mlx5: Move mlx5_port_caps from mlx5_core_dev to mlx5_ib_dev
mlx5_port_caps are RDMA specific capabilities. These are not used by the
mlx5_core_device at all. Move them to mlx5_ib_dev where it is used and
reduce the scope of it to multiple drivers.

Link: https://lore.kernel.org/r/20210203130133.4057329-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:00 -04:00
Jianxin Xiong
90da7dc820 RDMA/mlx5: Support dma-buf based userspace memory region
Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Link: https://lore.kernel.org/r/1608067636-98073-5-git-send-email-jianxin.xiong@intel.com
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-20 16:49:25 -04:00
Parav Pandit
559a3eacc4 IB/mlx5: Make function static
mlx5_query_mad_ifc_smp_attr_node_info() is internal to mad.c Hence, make
it static.

Link: https://lore.kernel.org/r/20210113121703.559778-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:14:14 -04:00
Linus Torvalds
009bd55dfc RDMA 5.11 pull request
A smaller set of patches, nothing stands out as being particularly major
 this cycle:
 
 - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw, cxgb4,
   mlx4 and mlx5
 
 - Bug fixes and polishing for the new rts ULP
 
 - Cleanup of uverbs checking for allowed driver operations
 
 - Use sysfs_emit all over the place
 
 - Lots of bug fixes and clarity improvements for hns
 
 - hip09 support for hns
 
 - NDR and 50/100Gb signaling rates
 
 - Remove dma_virt_ops and go back to using the IB DMA wrappers
 
 - mlx5 optimizations for contiguous DMA regions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl/aNXUACgkQOG33FX4g
 mxqlMQ/+O6UhxKnDAnMB+HzDGvOm+KXNHOQBuzxz4ZWXqtUrW8WU5ca3PhXovc4z
 /QX0HhMhQmVsva5mjp1OGVATxQ2E+yasqFLg4QXAFWFR3N7s0u/sikE9i1DoPvOC
 lsmLTeRauCFaE4mJD5nvYwm+riECX0GmyVVW7v6V05xwAp0hwdhyU7Kb6Yh3lxsE
 umTz+onPNJcD6Tc4snziyC5QEp5ebEjAaj4dVI1YPR5X0c2RwC5E1CIDI6u4OQ2k
 j7/+Kvo8LNdYNERGiR169x6c1L7WS6dYnGMMeXRgyy0BVbVdRGDnvCV9VRmF66w5
 99fHfDjNMNmqbGNt/4/gwNdVrR9aI4jMZWCh7SmsguX6XwNOlhYldy3x3WnlkfkQ
 e4O0huJceJqcB2Uya70GqufnAetRXsbjzcvWxpR5YAwRmcRkm1f6aGK3BxPjWEbr
 BbYRpiKMxxT4yTe65BuuThzx6g4pNQHe0z3BM/dzMJQAX+PZcs1CPQR8F8PbCrZR
 Ad7qw4HJ587PoSxPi3toVMpYZRP6cISh1zx9q/JCj8cxH9Ri4MovUCS3cF63Ny3B
 1LJ2q0x8FuLLjgZJogKUyEkS8OO6q7NL8WumjvrYWWx19+jcYsV81jTRGSkH3bfY
 F7Esv5K2T1F2gVsCe1ZFFplQg6ja1afIcc+LEl8cMJSyTdoSub4=
 =9t8b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A smaller set of patches, nothing stands out as being particularly
  major this cycle. The biggest item would be the new HIP09 HW support
  from HNS, otherwise it was pretty quiet for new work here:

   - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw,
     cxgb4, mlx4 and mlx5

   - Bug fixes and polishing for the new rts ULP

   - Cleanup of uverbs checking for allowed driver operations

   - Use sysfs_emit all over the place

   - Lots of bug fixes and clarity improvements for hns

   - hip09 support for hns

   - NDR and 50/100Gb signaling rates

   - Remove dma_virt_ops and go back to using the IB DMA wrappers

   - mlx5 optimizations for contiguous DMA regions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits)
  RDMA/cma: Don't overwrite sgid_attr after device is released
  RDMA/mlx5: Fix MR cache memory leak
  RDMA/rxe: Use acquire/release for memory ordering
  RDMA/hns: Simplify AEQE process for different types of queue
  RDMA/hns: Fix inaccurate prints
  RDMA/hns: Fix incorrect symbol types
  RDMA/hns: Clear redundant variable initialization
  RDMA/hns: Fix coding style issues
  RDMA/hns: Remove unnecessary access right set during INIT2INIT
  RDMA/hns: WARN_ON if get a reserved sl from users
  RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
  RDMA/hns: Do shift on traffic class when using RoCEv2
  RDMA/hns: Normalization the judgment of some features
  RDMA/hns: Limit the length of data copied between kernel and userspace
  RDMA/mlx4: Remove bogus dev_base_lock usage
  RDMA/uverbs: Fix incorrect variable type
  RDMA/core: Do not indicate device ready when device enablement fails
  RDMA/core: Clean up cq pool mechanism
  RDMA/core: Update kernel documentation for ib_create_named_qp()
  MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address
  ...
2020-12-16 13:42:26 -08:00
Maor Gottlieb
ca991a7d14 RDMA/mlx5: Assign dev to DM MR
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and
can cause a NULL pointer dereference if userspace dumps the MR via rdma
tool.

Assign the IB device together with the other fields and remove the
redundant reference of mlx5_ib_dev from mlx5_ib_mr.

Cc: stable@vger.kernel.org
Fixes: 6c29f57ea4 ("IB/mlx5: Device memory mr registration support")
Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 15:52:54 -04:00
Jason Gunthorpe
38f8ff5b44 RDMA/mlx5: Reorganize mlx5_ib_reg_user_mr()
This function handles an ODP and regular MR flow all mushed together, even
though the two flows are quite different. Split them into two dedicated
functions.

Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
6e0954b11c RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.

Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.

The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.

Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Leon Romanovsky
93f8244431 RDMA/mlx5: Convert mlx5_ib to use auxiliary bus
The conversion to auxiliary bus solves long standing issue with
existing mlx5_ib<->mlx5_core coupling. It required to have both
modules in initramfs if one of them needed for the boot.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:50 +02:00
Jason Gunthorpe
878f7b31c3 RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx
Since devx uses the new rdma_for_each_block() to fill the PAS it can also
use ib_umem_find_best_pgsz().

However, the umem constructionin devx is complicated, the umem must still
respect all the HW limits such as page_offset_quantized and the IOVA
alignment.

Since we don't know what the user intends to use the umem for we have to
limit it to PAGE_SIZE.

There are users trying to mix umem's with mkeys so this makes them work
reliably, at least for an identity IOVA, by ensuring the IOVA matches the
selected page size.

Last user of mlx5_ib_get_buf_offset() so it can also be removed.

Fixes: aeae94579c ("IB/mlx5: Add DEVX support for memory registration")
Link: https://lore.kernel.org/r/20201115114311.136250-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
c08fbdc577 RDMA/mlx5: mlx5_umem_find_best_quantized_pgoff() for CQ
This fixes a bug where the page_offset was not being considered when
building a CQ. The HW specification says it 'must be zero', so use
a variant of mlx5_umem_find_best_quantized_pgoff() with a 0 pgoff_bitmask
to force this result.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20201115114311.136250-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
a59b7b05ef RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for QP
Delete custom logic in the QP in favor of more general variant.

Link: https://lore.kernel.org/r/20201115114311.136250-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
b045db62f6 RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ
SRQ uses a quantized and scaled page_offset, which is another variation of
ib_umem_find_best_pgsz(). Add mlx5_umem_find_best_quantized_pgoff() to
perform this calculation for each mailbox. A macro shows how the
calculation is directly connected to the mailbox format.

This new routine replaces the limited mlx5_ib_cont_pages() and
mlx5_ib_get_buf_offset() pairing which would reject valid configurations
rather than adjust the page_size to make it work.

In turn this is much more aggressive about choosing large page sizes for
these objects and when THP is enabled it will now often find a single page
solution.

Link: https://lore.kernel.org/r/20201115114311.136250-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe
d5c7916fe4 RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc's
Now that all the PAS arrays or UMR XLT's for mkcs are filled using
rdma_for_each_block() we can use the common ib_umem_find_best_pgsz()
algorithm.

Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe
f1eaac37da RDMA/mlx5: Split mlx5_ib_update_xlt() into ODP and non-ODP cases
Mixing these together is just a mess, make a dedicated version,
mlx5_ib_update_mr_pas(), which directly loads the whole MTT for a non-ODP
MR.

The split out version can trivially use a simple loop with
rdma_for_each_block() which allows using the core code to compute the MR
pages and avoids seeking in the SGL list after each chunk as the
__mlx5_ib_populate_pas() call required.

Significantly speeds loading large MTTs.

Link: https://lore.kernel.org/r/20201026132314.1336717-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe
8010d74b99 RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()
The memory allocation is quite complicated, and makes this function hard
to understand. Refactor things so that a function call sets up the WR, SG,
DMA mapping and buffer, further splitting that into buffer and DMA/wr.

This also slightly changes the buffer allocation logic to try an order 0
page allocation (with OOM warnings on) before going to the emergency page.

Link: https://lore.kernel.org/r/20201026132314.1336717-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:55 -04:00
Jason Gunthorpe
f22c30aa6d RDMA/mlx5: Move xlt_emergency_page_mutex into mr.c
This is the only user, so remove the wrappers.

Link: https://lore.kernel.org/r/20201026132314.1336717-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe
aab8d3966d RDMA/mlx5: Change mlx5_ib_populate_pas() to use rdma_for_each_block()
This routine converts the umem SGL into a list of fixed pages for DMA,
which is exactly what rdma_umem_for_each_dma_block() is for, use the
common code directly.

Link: https://lore.kernel.org/r/20201026132314.1336717-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe
f8fb311063 RDMA/mlx5: Remove npages from mlx5_ib_cont_pages()
Most callers don't need this, and the few that do can get it as
ib_umem_num_pages(umem).

Link: https://lore.kernel.org/r/20201026131936.1335664-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
7db0eea916 RDMA/mlx5: Remove ncont from mlx5_ib_cont_pages()
This is the same as ib_umem_num_dma_blocks(umem, 1UL << page_shift), have
the callers compute it directly.

Link: https://lore.kernel.org/r/20201026131936.1335664-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
95741ee3f0 RDMA/mlx5: Remove order from mlx5_ib_cont_pages()
Only alloc_mr_from_cache() needs order and can trivially compute it, so
lift it to the one call site and remove the NULL arguments.

Link: https://lore.kernel.org/r/20201026131936.1335664-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
f0093fb1a7 RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr
For the user MR path, instead of calling this after getting the umem, call
it as part of creating the struct mlx5_ib_mr and distill its output to a
single page_shift stored inside the mr.

This avoids passing around the tuple of its output. Based on the umem and
page_shift, the output arguments can be computed using:

  count == ib_umem_num_pages(mr->umem)
  shift == mr->page_shift
  ncont == ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift)
  order == order_base_2(ncont)

And since mr->page_shift == umem_odp->page_shift then ncont ==
ib_umem_num_dma_blocks() == ib_umem_odp_num_pages() for ODP umems.

Link: https://lore.kernel.org/r/20201026131936.1335664-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
1c3d247eee RDMA/mlx5: Remove mlx5_ib_mr->npages
This is the same value as ib_umem_num_pages(mr->umem), use that instead.

Link: https://lore.kernel.org/r/20201026131936.1335664-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
b4d031cdae RDMA/mlx5: Remove mlx5_ib_mr->order
The is only ever set to non-zero if the MR is from the cache, and if it is
cached then the order is in cached_ent->order.

Make it clearer that use_umr_mtt_update() only returns true for cached MRs
and remove the redundant data.

Link: https://lore.kernel.org/r/20201026131936.1335664-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Yishai Hadas
a03bfc37d5 RDMA/mlx5: Sync device with CPU pages upon ODP MR registration
Sync device with CPU pages upon ODP MR registration. mlx5 already has to
zero the HW's version of the PAS list, may as well deliver a PAS list that
matches the current CPU page tables configuration.

Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:44 -03:00
Leon Romanovsky
eebe580feb RDMA/mlx5: Delete not needed GSI QP signal QP type
GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too.  This
means that the GSI QP signal QP type can be removed.

Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky
2dc4d6725b RDMA/mlx5: Change GSI QP to have same creation flow like other QPs
There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.

Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
f8225e3488 RDMA/mlx5: Reuse existing fields in parent QP storage object
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields.  This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.

Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
0d9aef8603 RDMA/mlx5: Embed GSI QP into general mlx5_ib QP
The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.

Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Jason Gunthorpe
8383da3e4a RDMA/mlx5: Clarify what the UMR is for when creating MRs
Once a mkey is created it can be modified using UMR. This is desirable for
performance reasons. However, different hardware has restrictions on what
modifications are possible using UMR. Make sense of these checks:

- mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
  altered. Most cases create MRs using 0 access flags (now made clear by
  consistent use of set_mkc_access_pd_addr_fields()), but the old logic
  here was tormented. Make it clear that this is checking if the current
  access_flags can be modified using UMR to different access_flags. It is
  always OK to use UMR to change flags that all HW supports.

- mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
  enable and update the PAS/XLT. Enabling requires updating the entity
  size, so UMR ends up completely disabled on this old hardware. Make it
  clear why it is disabled. FRWR, ODP and cache always requires
  mlx5_ib_can_load_pas_with_umr().

- mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
  resized to hold a new PAS list. This only works for cached MR's because
  we don't store the PAS list size in other cases.

To be very clear, arrange things so any pre-created MR's in the cache
check the newly requested access_flags before allowing the MR to leave the
cache. If UMR cannot set the required access_flags the cache fails to
create the MR.

This in turn means relaxed ordering and atomic are now correctly blocked
early for implicit ODP on older HW.

Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Leon Romanovsky
c0a6b5ecc5 RDMA: Convert RWQ table logic to ib_core allocation scheme
Move struct ib_rwq_ind_table allocation to ib_core.

Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:33 -03:00
Leon Romanovsky
d18bb3e152 RDMA: Clean MW allocation and free flows
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.

Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:32 -03:00
Leon Romanovsky
add53535fb RDMA: Restore ability to return error for destroy WQ
Make this interface symmetrical to other destroy paths.

Fixes: a49b1dc7ae ("RDMA: Convert destroy_wq to be void")
Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
d0c45c8556 RDMA: Change XRCD destroy return value
Update XRCD destroy flow to allow command failure.

Fixes: 28ad5f65c3 ("RDMA: Move XRCD to be under ib_core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
43d781b9fa RDMA: Allow fail of destroy CQ
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.

Kernel verbs and drivers that don't have DEVX flows shouldn't fail.

Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
119181d1d4 RDMA: Restore ability to fail on SRQ destroy
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.

Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:24 -03:00
Leon Romanovsky
9a9ebf8cd7 RDMA: Restore ability to fail on AH destroy
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.

Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.

Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Jason Gunthorpe
7923774368 Merge branch 'mlx5_uar' into rdma.git /for-next
Meir Lichtinger says:

====================
ConnectX-7 supports setting relaxed ordering read/write mkey attribute by
UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to
configure UMR control segment
====================

Based on the mlx5-next branch at
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_uar':
  RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
  RDMA/mlx5: Use MLX5_SET macro instead of local structure
  RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
2020-07-27 11:44:36 -03:00
Meir Lichtinger
896ec97353 RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.

With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.

Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Meir Lichtinger
2224635938 RDMA/mlx5: Use MLX5_SET macro instead of local structure
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.

Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00