Commit Graph

1916 Commits

Author SHA1 Message Date
Mark Bloch
979bf468fc {net, RDMA}/mlx5: Extend send to vport rules
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.

While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch
6aeb16a134 RDMA/mlx5: Fill port info based on the relevant eswitch
In shared FDB a single RDMA device can have representors that are
connected to two different eswitches. Use the right eswitch when
preparing the response to userspace.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Leon Romanovsky
5f6bb7e322 RDMA/mlx5: Delete not-available udata check
XRC_TGT QPs are created through kernel verbs and don't have udata at all.

Fixes: 6eefa839c4 ("RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata")
Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/b68228597e730675020aa5162745390a2d39d3a2.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 15:26:17 -03:00
Leon Romanovsky
20da44dfe8 RDMA/mlx5: Drop in-driver verbs object creations
There is no real value in bypassing IB/core APIs for creating standard
objects with standard types. The open-coded variant didn't have any
restrack task management calls and caused to such objects to be not
present when running rdmatoool.

Link: https://lore.kernel.org/r/f745590e5fb7d56f90fdb25f64ee3983ba17e1e4.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:28 -03:00
Leon Romanovsky
514aee660d RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme.  That
change allows us to make sure that restrack properly kref the memory.

Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
0dc0da15ed RDMA/mlx5: Rework custom driver QP type creation
Starting from commit 2b1f747071 ("RDMA/core: Allow drivers to disable
restrack DB") the restrack is able to handle non-standard QP types either.

That change allows us to rewrite custom QP calls to their IB/core
counterparts, so we will use general QP creation flow even for the driver
QP types.

Link: https://lore.kernel.org/r/51682ab82298748941f38bd23ee3bf77ef1cab7b.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
8c9e7f0325 RDMA/mlx5: Delete device resource mutex that didn't protect anything
The dev->devr.mutex was intended to protect GSI QP pointer change in the
struct mlx5_ib_port_resources when it is accessed from the
pkey_change_work. However that pointer isn't changed during the runtime
and once IB/core adds MAD, it stays stable.

Link: https://lore.kernel.org/r/6e338c561033df20d92e1371fc6a7a0d93aad945.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00
Leon Romanovsky
b0791dbf12 RDMA/mlx5: Cancel pkey work before destroying device resources
In the driver release flow, we are ensuring that notifier is disabled and
no new works can be added to pkey_change_handler. It means that we can
cancel that handler before destroying resources to make sure that our
unwind routine is symmetrical to the allocation one.

Link: https://lore.kernel.org/r/f2b1ea1bad952e4e7a48a6f731de9e0344986b29.1627040189.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:26 -03:00
Aharon Landau
d6793ca97b RDMA/mlx5: Delay emptying a cache entry when a new MR is added to it recently
Fixing a typo that causes a cache entry to shrink immediately after adding
to it new MRs if the entry size exceeds the high limit.  In doing so, the
cache misses its purpose to prevent the creation of new mkeys on the
runtime by using the cached ones.

Fixes: b9358bdbc7 ("RDMA/mlx5: Fix locking in MR cache work queue")
Link: https://lore.kernel.org/r/fcb546986be346684a016f5ca23a0567399145fa.1627370131.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-02 12:45:22 -03:00
Tal Gilboa
616d576934 IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
is_apu_thread_cq() used to detect CQs which are attached to APU
threads. This was extended to support other elements as well,
so the function was renamed to is_apu_cq().

c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't
reflected when the APU support was first introduced.

Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-07-25 11:39:04 +03:00
Lior Nahmanson
11656f593a RDMA/mlx5: Add DCS offload support
DCS is an offload to SW load balancing of DC initiator work requests.

A single DCI can be connected to only one target at the time and can't
start new connection until the previous work request is completed.  This
limitation will cause to delay when the initiator process needs to
transfer data to multiple targets at the same time.  The SW solution is to
use a process that handling and spreading the work request on many DCIs
according to destinations.

This feature is an offload to this process and coming to reduce the load
from the CPU and improve the performance.

Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-20 15:04:14 -03:00
Lior Nahmanson
2013b4d525 RDMA/mlx5: Separate DCI QP creation logic
This patch isolates DCI QP creation logic to separate function, so this
change will reduce complexity when adding new features to DCI QP without
interfering with other QP types.

The code was copied from create_user_qp() while taking only DCI relevant bits.

Link: https://lore.kernel.org/r/b4530bdd999349c59691224f016ff1efb5dc3b92.1624258894.git.leonro@nvidia.com
Reviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-20 15:04:14 -03:00
Linus Torvalds
e04360a2ea RDMA v5.14 merge window Pull Request
This PR contains a replacement driver for Intel iWarp hardware. This new
 driver supports the old ethernet hardware and also newer chips that can do
 ROCE. Otherwise this contains the typical mix of patches:
 
 - Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
 
 - Many static checker driven code clean ups, including a wide refcount_t
   conversion
 
 - Several series for the hns driver, more HIP09 HW capabilities, migration
   to new HW register manipulators, and code cleanups
 
 - Minor fixes and improvements  in srp, rts, and cm
 
 - Improvements throughout for sysfs related code to use DEVICE_ATTR_*,
   make the ib_port sysfs first-class, and overall use sysfs APIs properly
 
 - Intel's new irdma driver replacing i40iw
 
 - rxe general clean ups and Memory Window support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmDcunQACgkQOG33FX4g
 mxqSBA//dsZi/UzpzgU+YqyMFmUp04wd2/iCYzOcCViNPQZCyCARbGaMXI4kMa4s
 8dM5xU76OnCuNSnXHaIwvHC3CdN9GUm08j9eWY7syvAiKtXCjzv7qmCVfBw35UyK
 IXKfXh57toTSSAIfxw8yKc97QgaDSJ2zQ34fXkoE0AvTlfyN6pHQe9ef/Ca0ejS4
 awUGYVG/oilLXrEHcSSAv5UoX6hOUje6jqqRgp5jmZTI3g7SlIPL8mWgXBkHAYmd
 kDX7lBd09CKo2bmR071/kF6xUzvbCg1tmeE6lZze7gE+aKlBkZcvCBe1RAh3sBzK
 ysLfON5GGw5qnkMaY8j5h3sgWvi3qTTEW+jCAmmVi/6z4PF47mvmVVn+/pZc3y2e
 PqH43cunhwS0KuoUJ5Sd48J/UvabrvdbCNZrjCGCpt45EF4VwKxYMh74Bf0ABEQS
 i9eKR/+wyHG6Uv1U37fIXsqa8yUttl9aV/s88s8irn4xhG8ygBLZgeVQNeGUfvdV
 1W0XLEjRmKFezC1FhiPOz7CLIgL3BfSU1V+S7p0Gvb6ijZqyZTfRUaWbaD3KJpRT
 1kwzE4qp6IbJMEqgQH/lq0xBzvzF48FPvBslX5kwlm0phQRrMCwMVIafutpu395q
 ySeStEvsTVfz/JUHL3ZaEJyTRjAvPL0lXLH80XUpgWk9GzsksOM=
 =wyqt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This contains a replacement driver for Intel iWarp hardware. This new
  driver supports the old ethernet hardware and also newer chips that
  can do ROCE.

  Other than that, this contains the typical mix of patches:

   - Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5

   - Many static checker driven code clean ups, including a wide
     refcount_t conversion

   - Several series for the hns driver, more HIP09 HW capabilities,
     migration to new HW register manipulators, and code cleanups

   - Minor fixes and improvements in srp, rts, and cm

   - Improvements throughout for sysfs related code to use
     DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use
     sysfs APIs properly

   - Intel's new irdma driver replacing i40iw

   - rxe general clean ups and Memory Window support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits)
  RDMA/core: Always release restrack object
  RDMA/mlx5: Don't access NULL-cleared mpi pointer
  RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles
  RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
  RDMA/rxe: Missing unlock on error in get_srq_wqe()
  RDMA/cma: Fix rdma_resolve_route() memory leak
  RDMA/core/sa_query: Remove unused argument
  RDMA/cma: Fix incorrect Packet Lifetime calculation
  RDMA/cma: Protect RMW with qp_mutex
  RDMA/cma: Remove unnecessary INIT->INIT transition
  RDMA/hns: Add window selection field of congestion control
  RDMA/hfi1: Remove use of kmap()
  RDMA/irdma: Remove use of kmap()
  RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1
  IB/isert: Align target max I/O size to initiator size
  RDMA/hns: Fix incorrect vlan enable bit in QPC
  MAINTAINERS: Update Broadcom RDMA maintainers
  RDMA/irdma: Use the queried port attributes
  RDMA/rxe: Fix redundant skb_put_zero
  RDMA/rxe: Fix extra copy in prepare_ack_packet
  ...
2021-07-01 14:54:03 -07:00
Leon Romanovsky
4a754d7637 RDMA/mlx5: Don't access NULL-cleared mpi pointer
The "dev->port[i].mp.mpi" is set to NULL during mlx5_ib_unbind_slave_port()
execution, however that field is needed to add device to unaffiliated list.

Such flow causes to the following kernel panic while unloading mlx5_ib
module in multi-port mode, hence the device should be added to the list
prior to unbind call.

 RPC: Unregistered rdma transport module.
 RPC: Unregistered rdma backchannel transport module.
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] SMP NOPTI
 CPU: 4 PID: 1904 Comm: modprobe Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_24_12_08 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:mlx5_ib_cleanup_multiport_master+0x18b/0x2d0 [mlx5_ib]
 Code: 00 04 0f 85 c4 00 00 00 48 89 df e8 ef fa ff ff 48 8b 83 40 0d 00 00 48 8b 15 b9 e8 05 00 4a 8b 44 28 20 48 89 05 ad e8 05 00 <48> c7 00 d0 57 c5 a0 48 89 50 08 48 89 02 39 ab 88 0a 00 00 0f 86
 RSP: 0018:ffff888116ee3df8 EFLAGS: 00010296
 RAX: 0000000000000000 RBX: ffff8881154f6000 RCX: 0000000000000080
 RDX: ffffffffa0c557d0 RSI: ffff88810b69d200 RDI: 000000000002d8a0
 RBP: 0000000000000002 R08: ffff888110780408 R09: 0000000000000000
 R10: ffff88812452e1c0 R11: fffffffffff7e028 R12: 0000000000000000
 R13: 0000000000000080 R14: ffff888102c58000 R15: 0000000000000000
 FS:  00007f884393a740(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001249f6004 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  mlx5_ib_stage_init_cleanup+0x16/0xd0 [mlx5_ib]
  __mlx5_ib_remove+0x33/0x90 [mlx5_ib]
  mlx5r_remove+0x22/0x30 [mlx5_ib]
  auxiliary_bus_remove+0x18/0x30
  __device_release_driver+0x177/0x220
  driver_detach+0xc4/0x100
  bus_remove_driver+0x58/0xd0
  auxiliary_driver_unregister+0x12/0x20
  mlx5_ib_cleanup+0x13/0x897 [mlx5_ib]
  __x64_sys_delete_module+0x154/0x230
  ? exit_to_user_mode_prepare+0x104/0x140
  do_syscall_64+0x3f/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f8842e095c7
 Code: 73 01 c3 48 8b 0d d9 48 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 48 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffc68f6e758 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00005638207929c0 RCX: 00007f8842e095c7
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000563820792a28
 RBP: 00005638207929c0 R08: 00007ffc68f6d701 R09: 0000000000000000
 R10: 00007f8842e82880 R11: 0000000000000206 R12: 0000563820792a28
 R13: 0000000000000001 R14: 0000563820792a28 R15: 00007ffc68f6fb40
 Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm ib_umad mlx5_ib(-) mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core mlx5_core ptp pps_core [last unloaded: rpcrdma]
 CR2: 0000000000000000
 ---[ end trace a0bb7e20804e9e9b ]---

Fixes: 7ce6095e3b ("RDMA/mlx5: Don't add slave port to unaffiliated list")
Link: https://lore.kernel.org/r/899ac1b33a995be5ec0e16a4765c4e43c2b1ba5b.1624956444.git.leonro@nvidia.com
Reviewed-by: Itay Aveksis <itayav@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-29 19:57:02 -03:00
Jason Gunthorpe
2833c977c3 Merge branch 'mlx5_realtime_ts' into rdma.git for-next
Aharon Landau says:

====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.

It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.

This series fixes it, by returning real-time timestamp.
====================

* mlx5_realtime_ts:
  RDMA/mlx5: Support real-time timestamp directly from the device
  RDMA/mlx5: Refactor get_ts_format functions to simplify code

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 15:08:39 -03:00
Jason Gunthorpe
fdcebbc2ac Linux 5.13-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDPuyMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvxgH/RKvSuRPwkJ2Jcp9
 VLi5kCbqtJlYLq6tB6peSJ8otKgxkcRwC0pIY4LlYIAWYboktLQ5RKp/9nB2h2FN
 aMZUMu6AI/lVJyFMI5MnKnJIUiUq+WXR3lSSlw68vwFLFdzqUZFNq+bqeiVvnIy1
 yqA6naj24Tu/RbYffQoPvdSJcU2SLXRMxwD8HRGiU2d51RaFsOvsZvF+P5TVcsEV
 ZmttJeER21CaI/A809eqaFmyGrUOcZZK9roZEbMwanTZOMw18biEsLu/UH4kBX01
 JC4+RlGxcWjQ5YNZgChsgoOK/CHzc6ITztTntdeDWAvwZjQFzV7pCy4/3BWne3O+
 5178yHM=
 =o8cN
 -----END PGP SIGNATURE-----

Merge tag 'v5.13-rc7' into rdma.git for-next

Linux 5.13-rc7

Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:43:51 -03:00
Aharon Landau
336529518e RDMA/mlx5: Support real-time timestamp directly from the device
Currently, if the user asks for a real-time timestamp, the device will
return a free-running one, and the timestamp will be translated to
real-time in the user-space.

When the device supports only real-time timestamp and not free-running,
the creation of the QP will fail even though the user needs supported the
real-time one. To prevent this, we will return the real-time timestamp
directly from the device.

Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 14:23:50 -03:00
Aharon Landau
9a1ac95a59 RDMA/mlx5: Refactor get_ts_format functions to simplify code
QPC, SQC and RQC timestamp formats and capabilities are always equal
because they represent general hardware support. So instead of code
duplication, let's merge them into general enum and logic.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-06-22 09:35:16 +03:00
Avihai Horon
1477d44ce4 RDMA/mlx5: Enable Relaxed Ordering by default for kernel ULPs
Relaxed Ordering is a capability that can only benefit users that support
it. All kernel ULPs should support Relaxed Ordering, as they are designed
to read data only after observing the CQE and use the DMA API correctly.

Hence, implicitly enable Relaxed Ordering by default for MR transfers in
kernel ULPs.

Link: https://lore.kernel.org/r/b7e820aab7402b8efa63605f4ea465831b3b1e5e.1623236426.git.leonro@nvidia.com
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 12:33:08 -03:00
Jakub Kicinski
adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Jason Gunthorpe
915e4af59f RDMA: Remove rdma_set_device_sysfs_group()
The driver's device group can be specified as part of the ops structure
like the device's port group. No need for the complicated API.

Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:58:32 -03:00
Jason Gunthorpe
4b5f4d3fb4 RDMA: Split the alloc_hw_stats() ops to port and device variants
This is being used to implement both the port and device global stats,
which is causing some confusion in the drivers. For instance EFA and i40iw
both seem to be misusing the device stats.

Split it into two ops so drivers that don't support one or the other can
leave the op NULL'd, making the calling code a little simpler to
understand.

Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com
Tested-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:58:29 -03:00
Leon Romanovsky
7ce6095e3b RDMA/mlx5: Don't add slave port to unaffiliated list
The mlx5_ib_bind_slave_port() doesn't remove multiport device from the
unaffiliated list, but mlx5_ib_unbind_slave_port() did it. This unbalanced
flow caused to the situation where mlx5_ib_unaffiliated_port_list was
changed during iteration.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/2726e6603b1e6ecfe76aa5a12a063af72173bcf7.1622477058.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 15:26:01 -03:00
Shay Drory
c36326d38d net/mlx5: Round-Robin EQs over IRQs
Whenever users provided affinity for an EQ creation request, map the
EQ to a matching IRQ.
Matching IRQ=IRQ with the same affinity and type (completion/control) of
the EQ created.

This mapping is being done in agressive dedicated IRQ allocation scheme,
which described bellow.

First, we check whether there is a matching IRQ that his min threshold
is not exhausted.
   - min_eqs_threshold = 3 for control EQ.
   - min_eqs_threshold = 1 for completion EQ.
In case no matching IRQ was found, try to request a new IRQ.
In case we can't request a new IRQ, reuse least-used matching IRQ.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14 20:58:00 -07:00
Leon Romanovsky
e4e3f24b82 net/mlx5: Provide cpumask at EQ creation phase
The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14 20:57:57 -07:00
Alaa Hleihel
2ba0aa2fee IB/mlx5: Fix initializing CQ fragments buffer
The function init_cq_frag_buf() can be called to initialize the current CQ
fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
during CQ resize operation.

However, the offending commit started to use function get_cqe() for
getting the CQEs, the issue with this change is that get_cqe() always
returns CQEs from cq->buf, which leads us to initialize the wrong buffer,
and in case of enlarging the CQ we try to access elements beyond the size
of the current cq->buf and eventually hit a kernel panic.

 [exception RIP: init_cq_frag_buf+103]
  [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
  [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
  [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
  [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
  [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
  [ffff9f799ddcbec8] kthread at ffffffffa66c5da1
  [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd

Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
takes the correct source buffer as a parameter.

Fixes: 388ca8be00 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.com
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10 08:59:34 -03:00
Aharon Landau
6466f03fdf RDMA/mlx5: Delete right entry from MR signature database
The value mr->sig is stored in the entry upon mr allocation, however, ibmr
is wrongly entered here as "old", therefore, xa_cmpxchg() does not replace
the entry with NULL, which leads to the following trace:

 WARNING: CPU: 28 PID: 2078 at drivers/infiniband/hw/mlx5/main.c:3643 mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib]
 Modules linked in: nvme_rdma nvme_fabrics nvme_core 8021q garp mrp bonding bridge stp llc rfkill rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_tad
 CPU: 28 PID: 2078 Comm: reboot Tainted: G               X --------- ---  5.13.0-0.rc2.19.el9.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.9.1 12/07/2018
 RIP: 0010:mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib]
 Code: 8d bb 70 1f 00 00 be 00 01 00 00 e8 9d 94 ce da 48 3d 00 01 00 00 75 02 5b c3 0f 0b 5b c3 0f 0b 48 83 bb b0 20 00 00 00 74 d5 <0f> 0b eb d1 4
 RSP: 0018:ffffa8db06d33c90 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffff97f890a44000 RCX: ffff97f900ec0160
 RDX: 0000000000000000 RSI: 0000000080080001 RDI: ffff97f890a44000
 RBP: ffffffffc0c189b8 R08: 0000000000000001 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000300 R12: ffff97f890a44000
 R13: ffffffffc0c36030 R14: 00000000fee1dead R15: 0000000000000000
 FS:  00007f0d5a8a3b40(0000) GS:ffff98077fb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000555acbf4f450 CR3: 00000002a6f56002 CR4: 00000000001706e0
 Call Trace:
  mlx5r_remove+0x39/0x60 [mlx5_ib]
  auxiliary_bus_remove+0x1b/0x30
  __device_release_driver+0x17a/0x230
  device_release_driver+0x24/0x30
  bus_remove_device+0xdb/0x140
  device_del+0x18b/0x3e0
  mlx5_detach_device+0x59/0x90 [mlx5_core]
  mlx5_unload_one+0x22/0x60 [mlx5_core]
  shutdown+0x31/0x3a [mlx5_core]
  pci_device_shutdown+0x34/0x60
  device_shutdown+0x15b/0x1c0
  __do_sys_reboot.cold+0x2f/0x5b
  ? vfs_writev+0xc7/0x140
  ? handle_mm_fault+0xc5/0x290
  ? do_writev+0x6b/0x110
  do_syscall_64+0x40/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: e6fb246cca ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Link: https://lore.kernel.org/r/f3f585ea0db59c2a78f94f65eedeafc5a2374993.1623309971.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10 08:59:34 -03:00
Maor Gottlieb
2adcb4c5a5 RDMA: Verify port when creating flow rule
Validate port value provided by the user and with that remove no longer
needed validation by the driver.  The missing check in the mlx5_ib driver
could cause to the below oops.

Call trace:
  _create_flow_rule+0x2d4/0xf28 [mlx5_ib]
  mlx5_ib_create_flow+0x2d0/0x5b0 [mlx5_ib]
  ib_uverbs_ex_create_flow+0x4cc/0x624 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd4/0x150 [ib_uverbs]
  ib_uverbs_cmd_verbs.isra.7+0xb28/0xc50 [ib_uverbs]
  ib_uverbs_ioctl+0x158/0x1d0 [ib_uverbs]
  do_vfs_ioctl+0xd0/0xaf0
  ksys_ioctl+0x84/0xb4
  __arm64_sys_ioctl+0x28/0xc4
  el0_svc_common.constprop.3+0xa4/0x254
  el0_svc_handler+0x84/0xa0
  el0_svc+0x10/0x26c
 Code: b9401260 f9615681 51000400 8b001c20 (f9403c1a)

Fixes: 436f2ad05a ("IB/core: Export ib_create/destroy_flow through uverbs")
Link: https://lore.kernel.org/r/faad30dc5219a01727f47db3dc2f029d07c82c00.1623309971.git.leonro@nvidia.com
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10 08:59:33 -03:00
Yevgeny Kliteynik
3f3f05ab88 net/mlx5: Added new parameters to reformat context
Adding new reformat context type (INSERT_HEADER) requires adding two new
parameters to reformat context - reformat_param_0 and reformat_param_1.
As defined by HW spec, these parameters have different meaning for
different reformat context type.

The first parameter (reformat_param_0) is not new to HW spec, but it
wasn't used by any of the supported reformats. The second parameter
(reformat_param_1) is new to the HW spec - it was added to allow
supporting INSERT_HEADER.

For NSERT_HEADER, reformat_param_0 indicates the header used to
reference the location of the inserted header, and reformat_param_1
indicates the offset of the inserted header from the reference point
defined by reformat_param_0.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09 18:36:07 -07:00
Mark Bloch
edc0b0bccc RDMA/mlx5: Block FDB rules when not in switchdev mode
Allow creating FDB steering rules only when in switchdev mode.

The only software model where a userspace application can manipulate
FDB entries is when it manages the eswitch. This is only possible in
switchdev mode where we expose a single RDMA device with representors
for all the vports that are connected to the eswitch.

Fixes: 52438be441 ("RDMA/mlx5: Allow inserting a steering rule to the FDB")
Link: https://lore.kernel.org/r/e928ae7c58d07f104716a2a8d730963d1bd01204.1623052923.git.leonro@nvidia.com
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08 17:02:15 -03:00
Mark Zhang
a0ffb4c12f RDMA/mlx5: Use different doorbell memory for different processes
In a fork scenario, the parent and child can have same virtual address and
also share the uverbs fd.  That causes to the list_for_each_entry search
return same doorbell physical page for all processes, even though that
page has been COW' or copied.

This patch takes the mm_struct into consideration during search, to make
sure that VA's belonging to different processes are not intermixed.

Resolves the malfunction of uverbs after fork in some specific cases.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/feacc23fe0bc6e1088c6824d5583798745e72405.1622726212.git.leonro@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-03 14:19:53 -03:00
Leon Romanovsky
f974428872 RDMA/core: Sanitize WQ state received from the userspace
The mlx4 and mlx5 implemented differently the WQ input checks.  Instead of
duplicating mlx4 logic in the mlx5, let's prepare the input in the central
place.

The mlx5 implementation didn't check for validity of state input.  It is
not real bug because our FW checked that, but still worth to fix.

Fixes: f213c05272 ("IB/uverbs: Add WQ support")
Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com
Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02 15:20:11 -03:00
Linus Torvalds
d7c5303fbc Networking fixes for 5.13-rc4, including fixes from bpf, netfilter,
can and wireless trees. Notably including fixes for the recently
 announced "FragAttacks" WiFi vulnerabilities. Rather large batch,
 touching some core parts of the stack, too, but nothing hair-raising.
 
 Current release - regressions:
 
  - tipc: make node link identity publish thread safe
 
  - dsa: felix: re-enable TAS guard band mode
 
  - stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()
 
  - stmmac: fix system hang if change mac address after interface ifdown
 
 Current release - new code bugs:
 
  - mptcp: avoid OOB access in setsockopt()
 
  - bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers
 
  - ethtool: stats: fix a copy-paste error - init correct array size
 
 Previous releases - regressions:
 
  - sched: fix packet stuck problem for lockless qdisc
 
  - net: really orphan skbs tied to closing sk
 
  - mlx4: fix EEPROM dump support
 
  - bpf: fix alu32 const subreg bound tracking on bitwise operations
 
  - bpf: fix mask direction swap upon off reg sign change
 
  - bpf, offload: reorder offload callback 'prepare' in verifier
 
  - stmmac: Fix MAC WoL not working if PHY does not support WoL
 
  - packetmmap: fix only tx timestamp on request
 
  - tipc: skb_linearize the head skb when reassembling msgs
 
 Previous releases - always broken:
 
  - mac80211: address recent "FragAttacks" vulnerabilities
 
  - mac80211: do not accept/forward invalid EAPOL frames
 
  - mptcp: avoid potential error message floods
 
  - bpf, ringbuf: deny reserve of buffers larger than ringbuf to prevent
                  out of buffer writes
 
  - bpf: forbid trampoline attach for functions with variable arguments
 
  - bpf: add deny list of functions to prevent inf recursion of tracing
         programs
 
  - tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
 
  - can: isotp: prevent race between isotp_bind() and isotp_setsockopt()
 
  - netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check,
               fallback to non-AVX2 version
 
 Misc:
 
  - bpf: add kconfig knob for disabling unpriv bpf by default
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCuy2gACgkQMUZtbf5S
 IruE5BAAhihia5EaiV71Bz/Cqr/d+osv5u283riKT8kBft0bWFVFFnT3iweWyR0/
 5X+bB6zmr80Cuqh45ZeYyq+zJtiAAlsbD5hqBIGdMriSWLxciNKjVJRzuEjuqnek
 USMW/LqGyf4NhmLogmQKpx8XcKSG7VYuK7vPrsH8us1dL5vIssceIXn8R9Dzj9NN
 P77K5Z+Oka8XQJgetNLxR3tDAM/92RwIshotkhJbRwgiUvzb+wbnrnSOAZCIPgku
 ydJyOxOklln1Sx07SejgzEl33ri0CkioDPThBWpOn7Mu0JrYKukXPKludoZcRYuJ
 2jNLYfbH0ZS5EkOfk89h7j7MDoAJMUK72M+S1w5DEYz6eH2EjhAq9noZ6E1iQH+U
 9vfoIvQjPh6Zhyk5QeM4dpt0cvR7rSElXkLVxo/x0dSBAi2rIng1bKeCUtv2J689
 CsoD0oghtEzvUTYVxY6iNr15OFGl6KsZv4tVQ709gGA36sDlK8ozGbJH5WReobBl
 f8H2WJlj2tVW5V75yUoio8TumDw34yk/5xlJFzm9GOwkqBrUcqOraHtHdUIsa4qr
 KbELQQ9QVt4zYdLAiWy5BL/QLycp0ibmA1IB8W1bxEVSK1JXzREHzPxv85KOfZkn
 8+vzNHmk2PEZYYsExiEykc5jXKOCPs8L0rJ6p4OverlbpDZcwIg=
 =peMK
 -----END PGP SIGNATURE-----

Merge tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc4, including fixes from bpf, netfilter,
  can and wireless trees. Notably including fixes for the recently
  announced "FragAttacks" WiFi vulnerabilities. Rather large batch,
  touching some core parts of the stack, too, but nothing hair-raising.

  Current release - regressions:

   - tipc: make node link identity publish thread safe

   - dsa: felix: re-enable TAS guard band mode

   - stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()

   - stmmac: fix system hang if change mac address after interface
     ifdown

  Current release - new code bugs:

   - mptcp: avoid OOB access in setsockopt()

   - bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers

   - ethtool: stats: fix a copy-paste error - init correct array size

  Previous releases - regressions:

   - sched: fix packet stuck problem for lockless qdisc

   - net: really orphan skbs tied to closing sk

   - mlx4: fix EEPROM dump support

   - bpf: fix alu32 const subreg bound tracking on bitwise operations

   - bpf: fix mask direction swap upon off reg sign change

   - bpf, offload: reorder offload callback 'prepare' in verifier

   - stmmac: Fix MAC WoL not working if PHY does not support WoL

   - packetmmap: fix only tx timestamp on request

   - tipc: skb_linearize the head skb when reassembling msgs

  Previous releases - always broken:

   - mac80211: address recent "FragAttacks" vulnerabilities

   - mac80211: do not accept/forward invalid EAPOL frames

   - mptcp: avoid potential error message floods

   - bpf, ringbuf: deny reserve of buffers larger than ringbuf to
     prevent out of buffer writes

   - bpf: forbid trampoline attach for functions with variable arguments

   - bpf: add deny list of functions to prevent inf recursion of tracing
     programs

   - tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT

   - can: isotp: prevent race between isotp_bind() and
     isotp_setsockopt()

   - netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check,
     fallback to non-AVX2 version

  Misc:

   - bpf: add kconfig knob for disabling unpriv bpf by default"

* tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (172 commits)
  net: phy: Document phydev::dev_flags bits allocation
  mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer
  mptcp: avoid error message on infinite mapping
  mptcp: drop unconditional pr_warn on bad opt
  mptcp: avoid OOB access in setsockopt()
  nfp: update maintainer and mailing list addresses
  net: mvpp2: add buffer header handling in RX
  bnx2x: Fix missing error code in bnx2x_iov_init_one()
  net: zero-initialize tc skb extension on allocation
  net: hns: Fix kernel-doc
  sctp: fix the proc_handler for sysctl encap_port
  sctp: add the missing setting for asoc encap_port
  bpf, selftests: Adjust few selftest result_unpriv outcomes
  bpf: No need to simulate speculative domain for immediates
  bpf: Fix mask direction swap upon off reg sign change
  bpf: Wrap aux data inside bpf_sanitize_info container
  bpf: Fix BPF_LSM kconfig symbol dependency
  selftests/bpf: Add test for l3 use of bpf_redirect_peer
  bpftool: Add sock_release help info for cgroup attach/prog load command
  net: dsa: microchip: enable phy errata workaround on 9567
  ...
2021-05-26 17:44:49 -10:00
Maor Gottlieb
9ecf6ac17c RDMA/mlx5: Take qp type from mlx5_ib_qp
Change all the places in the mlx5_ib driver to take the qp type from the
mlx5_ib_qp struct, except the QP initialization flow. It will ensure that
we check the right QP type also for vendor specific QPs.

Link: https://lore.kernel.org/r/b2e16cd65b59cd24fa81c01c7989248da44e58ea.1621413899.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-26 16:49:42 -03:00
Lang Cheng
0bedd3d005 RDMA/mlx5: Remove unused parameter udata
The old version of ib_umem_get() need these udata as a parameter but now
they are unnecessary.

Fixes: c320e527e1 ("IB: Allow calls to ib_umem_get from kernel ULPs")
Link: https://lore.kernel.org/r/1620807142-39157-4-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20 11:52:17 -03:00
Sergey Gorenko
c906b86e9c RDMA/mlx5: Add SQD2RTS bit to the alloc ucontext response
The new bit in the comp_mask is needed to mark that kernel supports
SQD2RTS transition for the modify QP command.

Link: https://lore.kernel.org/r/7ce705fedac1b2b8e3a2f4013e04244dc5946344.1620641808.git.leonro@nvidia.com
Reviewed-by: Evgenii Kochetov <evgeniik@nvidia.com>
Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20 11:41:07 -03:00
Sergey Gorenko
021c1f24f0 RDMA/mlx5: Support SQD2RTS for modify QP
The transition of the QP state from SQD to RTS is allowed by the IB
specification. The hardware also supports that, but it is not
implemented in mlx5_ib.

This commit adds SQD2RTS command to the modify QP in mlx5_ib to support
the missing feature. The feature is required by the signature pipelining
API that will be added to rdma-core.

Link: https://lore.kernel.org/r/ab4876360bfba0e9d64a5e8599438e32e0cb351e.1620641808.git.leonro@nvidia.com
Reviewed-by: Evgenii Kochetov <evgeniik@nvidia.com>
Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20 11:41:07 -03:00
Maor Gottlieb
cfa3b79711 RDMA/mlx5: Fix query DCT via DEVX
When executing DEVX command to query QP object, we need to take the QP
type from the mlx5_ib_qp struct which hold the driver specific QP types as
well, such as DC.

Fixes: 34613eb1d2 ("IB/mlx5: Enable modify and query verbs objects via DEVX")
Link: https://lore.kernel.org/r/6eee15d63f09bb70787488e0cf96216e2957f5aa.1621413654.git.leonro@nvidia.com
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-19 13:27:54 -03:00
Maor Gottlieb
3410fbcd47 {net, RDMA}/mlx5: Fix override of log_max_qp by other device
mlx5_core_dev holds pointer to static profile, hence when the
log_max_qp of the profile is override by some device, then it
effect all other mlx5 devices that share the same profile.
Fix it by having a profile instance for every mlx5 device.

Fixes: 883371c453 ("net/mlx5: Check FW limitations on log_max_qp before setting it")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-18 23:01:19 -07:00
Maor Gottlieb
97f30d324c RDMA/mlx5: Recover from fatal event in dual port mode
When there is fatal event on the slave port, the device is marked as not
active. We need to mark it as active again when the slave is recovered to
regain full functionality.

Fixes: d69a24e036 ("IB/mlx5: Move IB event processing onto a workqueue")
Link: https://lore.kernel.org/r/8906754455bb23019ef223c725d2c0d38acfb80b.1620711734.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-17 13:52:47 -03:00
Maor Gottlieb
6863b4d7bf RDMA/mlx5: Verify that DM operation is reasonable
Fix the complaint from smatch by verifing that the user requested DM
operation is not greater than 31.

divers/infiniband/hw/mlx5/dm.c:220 mlx5_ib_handler_MLX5_IB_METHOD_DM_MAP_OP_ADDR()
error: undefined (user controlled) shift '(((1))) << op'

Fixes: cea85fa5db ("RDMA/mlx5: Add support in MEMIC operations")
Link: https://lore.kernel.org/r/458b1d7710c3cf01360c8771893f483665569786.1620711734.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-17 13:52:47 -03:00
Yang Li
e6a1f8c696 RDMA/mlx5: Remove redundant assignment to ret
Variable 'ret' is set to the rerurn value of function
mlx5_mr_cache_alloc() but this value is never read as it is overwritten
with a new value later on, hence it is a redundant assignment and can be
removed

Clean up the following clang-analyzer warning:

 drivers/infiniband/hw/mlx5/odp.c:421:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]

Fixes: e6fb246cca ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Link: https://lore.kernel.org/r/1620296001-120406-1-git-send-email-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11 13:41:03 -03:00
Linus Torvalds
f34b2cf178 RDMA merge window pull request
This is significantly bug fixes and general cleanups. The noteworthy new
 features are fairly small:
 
 - XRC support for HNS and improves RQ operations
 
 - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and
   qib
 
 - Quite a few general cleanups on spelling, error handling, static checker
   detections, etc
 
 - Increase the number of device ports supported beyond 255. High port
   count software switches now exist
 
 - Several bug fixes for rtrs
 
 - mlx5 Device Memory support for host controlled atomics
 
 - Report SRQ tables through to rdma-tool
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmCMMHEACgkQOG33FX4g
 mxri3Q//RAgIExCGHebQ9xkptZHVyTLLJMpiMl2cqk3ZVRdDZ7QdiQjIqY2KqlUK
 nxBj7EXJeX6rV5a1xqCcOO1gBetB28TSwnCNE2ZqrXP5B59ISW8D052IWza3UkUz
 WmHLARxHQlyKBWA4+ZAgfoUGL0NmWA8QPf56t/RK/3/OsuYnGzcnWmmFbt8XKFcH
 NtO3KC45mKWDqqG0A0XRrLbEQz/ElO3OuPBqlBKgB3ZgGPzgsOUTOGkm1tCcZ89L
 /pvZGB7SklKZdCX8TxdpVGd9h0zHl8pqh1yEzvTA1ypNAYSUId2mvZXluU8J5yJl
 FLk7E1IxE5050FNEc7T5uZdUVntulYiqL2558coRI34l5w26pKGjIMxw/nTB8hg8
 4ZfBtKVemIG6yzW5Up6iBpK7qWYpvLWVShwYAWhbNsjN7JGzJuh1gJnjbmYgyz2P
 RTMU9wjFPLL2wZxg4LDHACVJNBb82j6KKuE+kZWpk11ro7INw9+7YwRuTo7/ezxC
 BwXKu8wF4igwSigV55jM+WnGXLhxdC3qmx/2cbtWyLM/PzdRL96tM0RWW5v8/Nv7
 teFhkt+f3RVqcfYH5K1qCXy3UFrxG6bxFSvcHHSBx2bdIrqhuTY5FqszAYImeW2j
 iHoyIsuSuGu79HQgOzAQZsEyksWi6OYDvA9Q9VBoPP4bJ3DOAa4=
 =vsXA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is significantly bug fixes and general cleanups. The noteworthy
  new features are fairly small:

   - XRC support for HNS and improves RQ operations

   - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw
     and qib

   - Quite a few general cleanups on spelling, error handling, static
     checker detections, etc

   - Increase the number of device ports supported beyond 255. High port
     count software switches now exist

   - Several bug fixes for rtrs

   - mlx5 Device Memory support for host controlled atomics

   - Report SRQ tables through to rdma-tool"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits)
  IB/qib: Remove redundant assignment to ret
  RDMA/nldev: Add copy-on-fork attribute to get sys command
  RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
  RDMA/siw: Fix a use after free in siw_alloc_mr
  IB/hfi1: Remove redundant variable rcd
  RDMA/nldev: Add QP numbers to SRQ information
  RDMA/nldev: Return SRQ information
  RDMA/restrack: Add support to get resource tracking for SRQ
  RDMA/nldev: Return context information
  RDMA/core: Add CM to restrack after successful attachment to a device
  RDMA/cma: Skip device which doesn't support CM
  RDMA/rxe: Fix a bug in rxe_fill_ip_info()
  RDMA/mlx5: Expose private query port
  RDMA/mlx4: Remove an unused variable
  RDMA/mlx5: Fix type assignment for ICM DM
  IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
  RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
  RDMA/cxgb4: add missing qpid increment
  IB/ipoib: Remove unnecessary struct declaration
  RDMA/bnxt_re: Get rid of custom module reference counting
  ...
2021-05-01 09:15:05 -07:00
Mark Bloch
9a89d3ad6d RDMA/mlx5: Expose private query port
Expose a non standard query port via IOCTL that will be used to expose
port attributes that are specific to mlx5 devices.

The new interface receives a port number to query and returns a structure
that contains the available attributes for that port.  This will be used
to fill the gap between pure DEVX use cases and use cases where a kernel
needs to inform userspace about various kernel driver configurations that
userspace must use in order to work correctly.

Flags is used to indicate which fields are valid on return.

MLX5_IB_UAPI_QUERY_PORT_VPORT:
	The vport number of the queered port.

MLX5_IB_UAPI_QUERY_PORT_VPORT_VHCA_ID:
	The VHCA ID of the vport of the queered port.

MLX5_IB_UAPI_QUERY_PORT_VPORT_STEERING_ICM_RX:
	The vport's RX ICM address used for sw steering.

MLX5_IB_UAPI_QUERY_PORT_VPORT_STEERING_ICM_TX:
	The vport's TX ICM address used for sw steering.

MLX5_IB_UAPI_QUERY_PORT_VPORT_REG_C0:
	The metadata used to tag egress packets of the vport.

MLX5_IB_UAPI_QUERY_PORT_ESW_OWNER_VHCA_ID:
	The E-Switch owner vhca id of the vport.

Link: https://lore.kernel.org/r/6e2ef13e5a266a6c037eb0105eb1564c7bb52f23.1618743394.git.leonro@nvidia.com
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-20 17:00:18 -03:00
Maor Gottlieb
982b7c140e RDMA/mlx5: Fix type assignment for ICM DM
We should hold the UAPI DM type in the base struct and not the internal
mlx5 type.

Fixes: 251b9d7887 ("RDMA/mlx5: Re-organize the DM code")
Link: https://lore.kernel.org/r/58dedbd5c132660f808e59166d434e2eaa6ecf7a.1618753425.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-20 09:42:38 -03:00
Parav Pandit
dedbc2d358 IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
Currently when GID is deleted, it zero out all the fields of the RoCE
address in the SET_ROCE_ADDRESS command for a specified index.

roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command.

This assumes that device has RoCEv1 always enabled which is not always
correct. For example Subfunction does not support RoCEv1.

Due to this assumption a previously added RoCEv2 GID is always deleted as
RoCEv1 GID. This results in a below syndrome:

   mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x12822d)

Hence set the right RoCE version during GID deletion provided by the core.

Link: https://lore.kernel.org/r/d3f54129c90ca329caf438dbe31875d8ad08d91a.1618753425.git.leonro@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-20 09:41:10 -03:00
Jason Gunthorpe
fe73f96e7b Merge branch 'mlx5_memic_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Maor Gottlieb says:
====================
This series from Maor extends MEMIC to support atomic operations from the
host in addition to already supported regular read/write.
====================

* 'memic_ops':
  RDMA/mlx5: Expose UAPI to query DM
  RDMA/mlx5: Add support in MEMIC operations
  RDMA/mlx5: Add support to MODIFY_MEMIC command
  RDMA/mlx5: Re-organize the DM code
  RDMA/mlx5: Move all DM logic to separate file
  RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number
  net/mlx5: Add MEMIC operations related bits
2021-04-13 19:37:17 -03:00
Maor Gottlieb
18731642d4 RDMA/mlx5: Expose UAPI to query DM
Expose UAPI to query MEMIC DM, this will let user space application
that didn't allocate the DM but has access to by owning the matching
command FD to retrieve its information.

Link: https://lore.kernel.org/r/20210411122924.60230-8-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:37 -03:00
Maor Gottlieb
cea85fa5db RDMA/mlx5: Add support in MEMIC operations
MEMIC buffer, in addition to regular read and write operations, can
support atomic operations from the host.

Introduce and implement new UAPI to allocate address space for MEMIC
operations such as atomic. This includes:

1. Expose new IOCTL for request mapping of MEMIC operation.
2. Hold the operations address in a list, so same operation to same DM
   will be allocated only once.
3. Manage refcount on the mlx5_ib_dm object, so it would be keep valid
   until all addresses were unmapped.

Link: https://lore.kernel.org/r/20210411122924.60230-7-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Maor Gottlieb
39cc792ff2 RDMA/mlx5: Add support to MODIFY_MEMIC command
Add two functions to allocate and deallocate MEMIC operations
by using the MODIFY_MEMIC command.

Link: https://lore.kernel.org/r/20210411122924.60230-6-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Maor Gottlieb
251b9d7887 RDMA/mlx5: Re-organize the DM code
1. Inline the checks from check_dm_type_support() into their
   respective allocation functions.
2. Fix use after free when driver fails to copy the MEMIC address to the
   user by moving the allocation code into their respective functions,
   hence we avoid the explicit call to free the DM in the error flow.
3. Split mlx5_ib_dm struct to memic and icm proper typesafety
   throughout.

Fixes: dc2316eba7 ("IB/mlx5: Fix device memory flows")
Link: https://lore.kernel.org/r/20210411122924.60230-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Maor Gottlieb
831df88381 RDMA/mlx5: Move all DM logic to separate file
Move all device memory related code to a separate file.

Link: https://lore.kernel.org/r/20210411122924.60230-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13 19:36:36 -03:00
Jason Gunthorpe
a0354d2308 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================

This pr contains changes from  mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.

1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/

2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/

From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.

From Mikhael, Remove unused struct field

From Saeed, Cleanup W=1 prototype warning

From Zheng, Esw related cleanup

From Tariq, User order-0 page allocation for EQs

====================

* mlx5-next:
  net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
  net/mlx5: Dynamically assign MSI-X vectors count
  net/mlx5: Add dynamic MSI-X capabilities bits
  PCI/IOV: Add sysfs MSI-X vector assignment interface
  net/mlx5: Use order-0 allocations for EQs
  net/mlx5: Add IFC bits needed for single FDB mode
  net/mlx5: E-Switch, Refactor send to vport to be more generic
  RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
  net/mlx5: E-Switch, Add eswitch pointer to each representor
  net/mlx5: E-Switch, Add match on vhca id to default send rules
  net/mlx5: Remove unused mlx5_core_health member recover_work
  net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
  net/mlx5: Cleanup prototype warning

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12 13:49:48 -03:00
Jakub Kicinski
95b5c29132 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next 2021-04-09

This pr contains changes from  mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.

1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/

2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/

From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.

From Mikhael, Remove unused struct field

From Saeed, Cleanup W=1 prototype warning

From Zheng, Esw related cleanup

From Tariq, User order-0 page allocation for EQs

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
  net/mlx5: Dynamically assign MSI-X vectors count
  net/mlx5: Add dynamic MSI-X capabilities bits
  PCI/IOV: Add sysfs MSI-X vector assignment interface
  net/mlx5: Use order-0 allocations for EQs
  net/mlx5: Add IFC bits needed for single FDB mode
  net/mlx5: E-Switch, Refactor send to vport to be more generic
  RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
  net/mlx5: E-Switch, Add eswitch pointer to each representor
  net/mlx5: E-Switch, Add match on vhca id to default send rules
  net/mlx5: Remove unused mlx5_core_health member recover_work
  net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
  net/mlx5: Cleanup prototype warning
====================

Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 18:07:21 -07:00
Praveen Kumar Kannoju
7e111bbff9 IB/mlx5: Reduce max order of memory allocated for xlt update
To update xlt (during mlx5_ib_reg_user_mr()), the driver can request up to
1 MB (order-8) memory, depending on the size of the MR. This costly
allocation can sometimes take very long to return (a few seconds). This
causes the calling application to hang for a long time, especially when
the system is fragmented.  To avoid these long latency spikes, the calls
the higher order allocations need to fail faster in case they are not
available.

In order to acheive this we need __GFP_NORETRY flag in the gfp_mask before
during fetching the free pages. Allow the algorithm to automatically fall
back to smaller page sizes.

Link: https://lore.kernel.org/r/1617425635-35631-1-git-send-email-praveen.kannoju@oracle.com
Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07 20:25:17 -03:00
Shay Drory
e5dc370bd9 RDMA/mlx5: Set ODP caps only if device profile support ODP
Currently, ODP caps are set during the init stage of mlx5_ib_dev,
regardless of whether the device profile supports ODP or not.  There is no
point in setting ODP caps if the device profile doesn't support
ODP. Hence, move setting the ODP caps to the odp_init stage.

Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 11:57:38 -03:00
Maor Gottlieb
c73700806d RDMA/mlx5: Fix drop packet rule in egress table
Initial drop action support missed that drop action can be added to egress
flow tables as well. Add the missing support.

This requires making sure that dest_type isn't set to PORT which in turn
exposes a possibility of passing dst while indicating number of dsts as
zero. Explicitly check for number of dsts and pass the appropriate
pointer.

Fixes: f29de9eee7 ("RDMA/mlx5: Add support for drop action in DV steering")
Link: https://lore.kernel.org/r/20210318135123.680759-1-leon@kernel.org
Reviewed-by: Mark Bloch <markb@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 11:52:47 -03:00
Mark Bloch
1fb7f8973f RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs.  HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26 09:31:21 -03:00
Linus Torvalds
2ba9bea2d3 RDMA 5.12 second rc pull request
- Typo causing a regression in mlx5 devx
 
 - Regression in the recent hns rework causing the HW to get out of sync
 
 - Longstanding cxgb4 adaptor crash when destroying cm ids
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmBcxe0ACgkQOG33FX4g
 mxruFQ/+JMfHtWowI9l32N+SI93zWAQVN6bcO5XiUeziCLEsktU2dh5Lu8ZWWVmB
 BX1US24oBuGKs+YNx1ayshlgQj7EgojnGZODGtL4O157IvvgMrm4gFN84EoN3gMA
 8QzPVgZrrvyjDc8tSEINAW5crkpmhqdeg6XYM9BTUrVLqy+rXWv6V5E8Gnvtexmq
 h/+UgbIxW00SVVxpNyAFeuu1IlbgSYU0DvU4xpha/XKX6Ifyl9SeKmn6y+1UEU4q
 AYd/6UuYvK26G8tA3Zteh3lR8cUiLeorIwB6B5WoMDZXhyBz+PhaBS2ypHsrrnyr
 IIEDres5/zEm355nT8j0hTMv40ZUlj4UFRf2eWqcdrLD54yb6n8g4X2rmsASoe1A
 Z3CrhkV39dPEsB7JXmGB9j6W47PWGYBYUGpYTMRr69K7eB3viDIC+i6fDK7eKS0e
 u+fO3K9kU2B/PSJvWVCPKn2GCOw3jRuqwbTFUt0kvo8E90yzV6yqjgyoTKt+PgBn
 t0SUyPEGue+KEKgJd/sp62OL8LPulFksx+E7ksvqqbmSTNAjH+bs/FrbUYoN0r3v
 kmkxgxLzipu1aYtLwcAtNJpEya3c6s4ERBFeJMk01xUkiiWPG9RLa66Zg2IzDqWW
 J55irlJIqBm792vB+vH7+FHU3YjEGh66ycGRqsBWiw2oUxmlxEc=
 =vjhp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Not much going on, just some small bug fixes:

   - Typo causing a regression in mlx5 devx

   - Regression in the recent hns rework causing the HW to get out of
     sync

   - Long-standing cxgb4 adaptor crash when destroying cm ids"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
  RDMA/hns: Fix bug during CMDQ initialization
  RDMA/mlx5: Fix typo in destroy_mkey inbox
2021-03-25 11:23:35 -07:00
Shay Drory
ad50294d4d RDMA/mlx5: Create ODP EQ only when ODP MR is created
There is no need to create the ODP EQ if the user doesn't use ODP MRs.
Hence, create it only when the first ODP MR is created. This EQ will be
destroyed only when the device is unloaded.
This will decrease the number of EQs created per device. for example: If
we creates 1K devices (SF/VF/etc'), than we will decrease the num of EQs
by 1K.

Link: https://lore.kernel.org/r/20210314125418.179716-1-leon@kernel.org
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-23 17:00:14 -03:00
Leon Romanovsky
b5486430bb RDMA/mlx5: Add missing returned error check of mlx5_ib_dereg_mr
Fix the following smatch error:

drivers/infiniband/hw/mlx5/mr.c:1950 mlx5_ib_dereg_mr() error: uninitialized symbol 'rc'.

Fixes: e6fb246cca ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Link: https://lore.kernel.org/r/20210314082250.10143-1-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-22 09:28:51 -03:00
Mark Bloch
3a46f4fb55 net/mlx5: E-Switch, Refactor send to vport to be more generic
Now that each representor stores a pointer to the managing E-Switch
use that information when creating the send-to-vport rules.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 13:07:46 -08:00
Mark Bloch
658cfceb62 RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
Now that a pointer to the managing E-Switch is stored in the representor
use it.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 13:07:23 -08:00
Jason Gunthorpe
7610ab57de RDMA/mlx5: Allow larger pages in DevX umem
The umem DMA list calculation was locked at 4k pages due to confusion
around how this API works and is used when larger pages are present.

The conclusion is:

 - umem's cannot extend past what is mapped into the process, so creating
   a lage page size and referring to a sub-range is not allowed

 - umem's must always have a page offset of zero, except for sub PAGE_SIZE
   umems

 - The feature of umem_offset to create multiple objects inside a umem
   is buggy and isn't used anyplace. Thus we can assume all users of the
   current API have umem_offset == 0 as well

Provide a new page size calculator that limits the DMA list to the VA
range and enforces umem_offset == 0.

Allow user space to specify the page sizes which it can accept, this
bitmap must be derived from the intended use of the umem, based on
per-usage HW limitations.

Link: https://lore.kernel.org/r/20210304130501.1102577-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:20:37 -04:00
Yishai Hadas
2904bb37b3 IB/core: Split uverbs_get_const/default to consider target type
Change uverbs_get_const/uverbs_get_const_default to work properly with
both signed/unsigned parameters.

Current APIs mix s64 and u64 which leads to incorrect check when u64
value was supplied and its upper bit was set. In that case
uverbs_get_const() / uverbs_get_const_default() lower bound check may
fail unexpectedly, target is unsigned (lower bound is 0) but value
became negative as of the s64 usage.

Split to have two different APIs, no change to callers as the required
API will be called internally according to the target type.

Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:20:36 -04:00
Mark Zhang
6fe6e56863 RDMA/mlx5: Fix mlx5 rates to IB rates map
Correct the map between mlx5 rates and corresponding ib rates, as they
don't always have a fixed offset between them.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20210304124517.1100608-4-leon@kernel.org
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:12:39 -04:00
Maor Gottlieb
7852546f52 RDMA/mlx5: Fix query RoCE port
mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it
should be used only when driver is loaded. Instead we just need to read
the roce_en field.

In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled.

Fixes: 7a58779edd ("IB/mlx5: Improve query port for representor port")
Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:12:39 -04:00
Jason Gunthorpe
14d05b552b RDMA/mlx5: Rename mlx5_mr_cache_invalidate() to revoke_mr()
Now that this is only used in a few places in mr.c give it a sensible
name. It has nothing to do with the cache and can be invoked on any
MR. DMA is stopped and the user cannot touch the MR any further once it
completes.

Link: https://lore.kernel.org/r/20210304120745.1090751-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:26 -04:00
Jason Gunthorpe
e6fb246cca RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()
Now that the SRCU stuff has been removed the entire MR destroy logic can
be made a lot simpler. Currently there are many different ways to destroy a
MR and it makes it really hard to do this task correctly. Route all
destruction through mlx5_ib_dereg_mr() and make it work for all
situations.

Since it turns out all the different MR types do basically the same thing
this removes a lot of knowledge of MR internals from ODP and leaves ODP
just exporting an operation to clean up children.

This fixes a few weird corner cases bugs and firmly uses the correct
ordering of the MR destruction:
 - Stop parallel access to the mkey via the ODP xarray
 - Stop DMA
 - Release the umem
 - Clean up ODP children
 - Free/Recycle the MR

Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:25 -04:00
Jason Gunthorpe
f18ec42231 RDMA/mlx5: Use a union inside mlx5_ib_mr
The struct mlx5_ib_mr can be used for three different things, but only one
at a time:

 - In the user MR cache
 - As a kernel MR
 - As a user MR

Overlay the three things into a single union with the following rules:

 - If the mr is found on the cache_ent->head list then it is a cache MR
   and umem == NULL. The entire union is zero after the MR is removed from
   the cache.

 - If umem != NULL or type == IB_MR_TYPE_USER then it is a user MR.

 - If umem == NULL then it is a kernel MR

This reduces the size of struct mlx5_ib_mr to 552 bytes from 702.

The only place the three flows overlap in the code is during dereg, so add
a few extra checks along there.

Link: https://lore.kernel.org/r/20210304120745.1090751-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:25 -04:00
Jason Gunthorpe
a639e66703 RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr
All of the ODP code assumes when it calls mlx5_mr_cache_alloc() the ODP
related fields are zero'd. This is true if the MR was just allocated, but
if the MR is recycled through the cache then the values are never zero'd.

This causes a bug in the odp_stats, they don't reset when the MR is
reallocated, also is_odp_implicit is never 0'd.

So we can use memset on a block of the mlx5_ib_mr reorganize the structure
to put all the data that can be zero'd by the cache at the end.

It is organized as an anonymous struct because the next patch will make
this a union.

Delete the unused smr_info. Don't set the kernel only desc_size on the
user path. No longer any need to zero mr->parent before freeing it, the
memset() will get it now.

Fixes: a3de94e3d6 ("IB/mlx5: Introduce ODP diagnostic counters")
Link: https://lore.kernel.org/r/20210304120745.1090751-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 20:03:25 -04:00
Mark Zhang
22053df0a3 RDMA/mlx5: Fix typo in destroy_mkey inbox
Set mkey_index to the correct place when preparing the destroy_mkey inbox,
this will fix the following syndrome:

 mlx5_core 0000:08:00.0: mlx5_cmd_check:782:(pid 980716): DEALLOC_PD(0x801)
 op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x31b635)

Link: https://lore.kernel.org/r/20210311142024.1282004-1-leon@kernel.org
Fixes: 1368ead04c ("RDMA/mlx5: Use strict get/set operations for obj_id")
Reviewed-by: Ido Kalir <idok@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11 13:35:48 -04:00
Maor Gottlieb
8256c69b2d RDMA/mlx5: Fix timestamp default mode
1. Don't set the ts_format bit to default when it reserved - device is
   running in the old mode (free running).
2. XRC doesn't have a CQ therefore the ts format in the QP
   context should be default / free running.
3. Set ts_format to WQ.

Fixes: 2fe8d4b878 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-10 11:01:57 -08:00
Leon Romanovsky
f91803998c RDMA/mlx5: Set correct kernel-doc identifier
The W=1 allmodconfig build produces the following warning:

drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line:
  * Parse a series of data segments for page fault handling.

Fix it by changing /** to be /* as it is written in kernel-doc
documentation.

Fixes: 5e769e444d ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'")
Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-03 13:21:24 -04:00
YueHaibing
3a9b3d4536 IB/mlx5: Add missing error code
Set err to -ENOMEM if kzalloc fails instead of 0.

Fixes: 7597385371 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01 14:49:09 -04:00
Jason Gunthorpe
7289e26f39 Linux 5.11
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmAppPgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGeXYH/imZPBd4A1jIMehN
 5HV2A53Z+MXmmaMuGj9X1KV6vsf55/xB+IhOoFdtRAIsO8c2yYSCO8i4+4R0XfYA
 +/YFJeq672rojQnmh6XbpR8dugaAV7CUHy6n7KDsyvtT6EOCpwFSwkOb4X3tBRX6
 TlYgm2d/xgV/wRHSgLVugK0MdFCLMAnyb7mkPfar9QrMgG1BiDKLq07xmwnS23On
 TkqpJ9yZ/rJpUrrUqQYPShSO/FmA+fSfWs0CDv7EIrJ40LUScD6PZxSHWTIHtjLk
 E4jFda6wuqLRVWsBwaBzUIdD0zk7X5quHRzEpbC5ga16SK6yrWvE5YJJXCguIEuZ
 f3FMRYs=
 =CAjn
 -----END PGP SIGNATURE-----

Merge tag 'v5.11' into rdma.git for-next

Linux 5.11

Merged to resolve conflicts with RDMA rc commits

- drivers/infiniband/sw/rxe/rxe_net.c
  The final logic is to call rxe_get_dev_from_net() again with the master
  netdev if the packet was rx'd on a vlan. To keep the elimination of the
  local variables requires a trivial edit to the code in -rc

Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-18 11:19:29 -04:00
Jason Gunthorpe
68ad4d1cc6 Merge branch 'mlx5_timestamp' into rdma.git for-next
Leon Romanovsky says:

====================
Add an extra timestamp format for mlx5_ib device.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* branch 'mlx5_timestamp':
  RDMA/mlx5: Fail QP creation if the device can not support the CQE TS
  net/mlx5: Add new timestamp mode bits
2021-02-16 14:49:36 -04:00
Aharon Landau
2fe8d4b878 RDMA/mlx5: Fail QP creation if the device can not support the CQE TS
In ConnectX6Dx device, HW can work in real time timestamp mode according
to the device capabilities per RQ/SQ/QP.

When the flag IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION is set, the user
expect to get TS on the CQEs in free running format, so we need to fail
the QP creation if the current mode of the device doesn't support it.

Link: https://lore.kernel.org/r/20210209131107.698833-3-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16 14:48:47 -04:00
Tal Gilboa
7232c132d1 RDMA/mlx5: Allow CQ creation without attached EQs
The traditional DevX CQ creation flow goes through mlx5_core_create_cq()
which checks that the given EQN corresponds to an existing EQ and attaches
a devx handler to the EQN for the CQ.

In some cases the EQ will not be a kernel EQ, but will be controlled by
modify CQ, don't block creating these just because the EQN can't be found
in the kernel.

Link: https://lore.kernel.org/r/20210211085549.1277674-1-leon@kernel.org
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16 14:42:59 -04:00
Patrisious Haddad
c70f51de85 RDMA/mlx5: Support 400Gbps IB rate in mlx5 driver
Support 400Gbps IB rate in mlx5 driver.

Link: https://lore.kernel.org/r/20210209130429.698237-1-leon@kernel.org
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-09 14:00:35 -04:00
Yishai Hadas
db72438c93 RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.

For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.

The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().

By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.

Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.

As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.

Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08 20:31:11 -04:00
Parav Pandit
131796524f IB/mlx5: Use rdma_for_each_port for port iteration
Instead of open coding the loop for port iteration, use rdma_for_each_port
macro provided by core.

To use such macro, early initialization of phys_port_cnt is needed.
Hence, initialize such constant early enough in the init stage.

Whichever functions are called with port using rdma_for_each_port(),
convert their port type from u8 to unsigned int to match the core API.

Link: https://lore.kernel.org/r/20210203130133.4057329-6-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:01 -04:00
Parav Pandit
7416790e22 RDMA/core: Introduce and use API to read port immutable data
Currently mlx5 driver caches port GID table length for 2 ports.  It is
also cached by IB core as port immutable data.

When mlx5 representor ports are present, which are usually more than 2,
invalid access to port_caps array can happen while validating the GID
table length which is only for 2 ports.

To avoid this, take help of the IB cores port immutable data by exposing
an API to read the port immutable fields.

Remove mlx5 driver's internal cache, thereby reduce code and data.

Link: https://lore.kernel.org/r/20210203130133.4057329-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:01 -04:00
Parav Pandit
7a58779edd IB/mlx5: Improve query port for representor port
Improve query port functionality for representor port as below.

1. RoCE Qkey violation counters are not applicable for representor port.
2. Avoid setting gid_tbl_len twice for representor port.
3. Avoid setting ip_gids and IB_PORT_CM_SUP property for representor port
   as GID table is empty and CM support is not present in representor
   mode.

Link: https://lore.kernel.org/r/20210203130133.4057329-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:00 -04:00
Parav Pandit
2019d70e91 IB/mlx5: Avoid calling query device for reading pkey table length
Pkey table length for all the ports of the device is the same.  Currently
get_ports_cap() reads and stores it for each port by querying the device
which reads more than just pkey table length.

For representor device ports which can be in range of hundreds, it queries
is for each such port and end up returning same value for all the ports.

When in representor mode, modify QP accesses pkey port caps for a port
index that can be outside of the port_caps table.

Hence, simplify the logic to query the max pkey table length only once
during device initialization sequence.

Link: https://lore.kernel.org/r/20210203130133.4057329-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:00 -04:00
Parav Pandit
3ce60f443b IB/mlx5: Move mlx5_port_caps from mlx5_core_dev to mlx5_ib_dev
mlx5_port_caps are RDMA specific capabilities. These are not used by the
mlx5_core_device at all. Move them to mlx5_ib_dev where it is used and
reduce the scope of it to multiple drivers.

Link: https://lore.kernel.org/r/20210203130133.4057329-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-05 12:06:00 -04:00
Parav Pandit
d6fd59e14e IB/mlx5: Support default partition key for representor port
Representor port has only one default pkey.  Hence have simpler query pkey
callback or it.

Link: https://lore.kernel.org/r/20210127150010.1876121-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-02 19:31:44 -04:00
Parav Pandit
d286ac1d05 IB/mlx5: Return appropriate error code instead of ENOMEM
When mlx5_ib_stage_init_init() fails, return the error code related to
failure instead of -ENOMEM.

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Link: https://lore.kernel.org/r/20210127150010.1876121-8-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-02 13:21:18 -04:00
Mark Bloch
2614488d1f RDMA/mlx5: Allow creating all QPs even when non RDMA profile is used
The cited commit disallowed creating any QP which isn't raw ethernet, reg
umr or the special UD qp for testing WC, this proved too strict.

While modify can't be done (no GIDS/GID table for example) just creating a
QP is okay.

This patch partially reverts the bellow mentioned commit and places the
restriction at the modify QP stage and not at the creation.  DEVX commands
should be used to manipulate such QPs.

Fixes: 42caf9cb59 ("RDMA/mlx5: Allow only raw Ethernet QPs when RoCE isn't enabled")
Link: https://lore.kernel.org/r/20210125120709.836718-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-28 15:19:43 -04:00
Lee Jones
30cd9fc5e7 RDMA/hw/mlx5/qp: Demote non-conformant kernel-doc header
Fixes the following W=1 kernel build warning(s):

 drivers/infiniband/hw/mlx5/qp.c:5384: warning: Function parameter or member 'qp' not described in 'mlx5_ib_qp_set_counter'
 drivers/infiniband/hw/mlx5/qp.c:5384: warning: Function parameter or member 'counter' not described in 'mlx5_ib_qp_set_counter'

Link: https://lore.kernel.org/r/20210121094519.2044049-3-lee.jones@linaro.org
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-22 14:37:29 -04:00
Lee Jones
5e769e444d RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'
Fixes the following W=1 kernel build warning(s):

 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'dev' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'pfault' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe_end' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'bytes_mapped' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'total_wqe_bytes' not described in 'pagefault_data_segments'
 drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'receive_queue' not described in 'pagefault_data_segments'

Link: https://lore.kernel.org/r/20210121094519.2044049-2-lee.jones@linaro.org
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-22 14:37:29 -04:00
Jianxin Xiong
90da7dc820 RDMA/mlx5: Support dma-buf based userspace memory region
Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Link: https://lore.kernel.org/r/1608067636-98073-5-git-send-email-jianxin.xiong@intel.com
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-20 16:49:25 -04:00
Parav Pandit
de641d74fb Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"
This reverts commit fbdd0049d9.

Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.

This results in empty GID table due to RDMA device being detached from its
net device.

Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.

Fixes: fbdd0049d9 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:22:59 -04:00
Parav Pandit
559a3eacc4 IB/mlx5: Make function static
mlx5_query_mad_ifc_smp_attr_node_info() is internal to mad.c Hence, make
it static.

Link: https://lore.kernel.org/r/20210113121703.559778-5-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:14:14 -04:00
Parav Pandit
ab40530a2e IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex
mutex_destroy() call for device's cap_mask_mutex mutex is missing, let's
add it to annotate destruction.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20210113121703.559778-4-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 20:14:14 -04:00
Yishai Hadas
1368ead04c RDMA/mlx5: Use strict get/set operations for obj_id
Use strict get/set operations for obj_id based on the specific object
type. This comes to prevent any miss match between the general header to
the legacy header commands.

Link: https://lore.kernel.org/r/20201230130121.180350-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 10:40:54 -04:00
Yishai Hadas
8798e4ad0a RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation
Use the correct obj_id upon DEVX TIR creation by strictly taking the tirn
24 bits and not the general obj_id which is 32 bits.

Fixes: 7efce3691d ("IB/mlx5: Add obj create and destroy functionality")
Link: https://lore.kernel.org/r/20201230130121.180350-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-19 10:40:54 -04:00
Mark Bloch
1c3aa6bd0b RDMA/mlx5: Fix wrong free of blue flame register on error
If the allocation of the fast path blue flame register fails, the driver
should free the regular blue flame register allocated a statement above,
not the one that it just failed to allocate.

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Link: https://lore.kernel.org/r/20210113121703.559778-6-leon@kernel.org
Reported-by: Hans Petter Selasky <hanss@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14 12:51:29 -04:00
Parav Pandit
2cb091f629 IB/mlx5: Fix error unwinding when set_has_smi_cap fails
When set_has_smi_cap() fails, multiport master cleanup is missed. Fix it
by doing the correct error unwinding goto.

Fixes: a989ea01cb ("RDMA/mlx5: Move SMI caps logic")
Link: https://lore.kernel.org/r/20210113121703.559778-3-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-14 12:51:18 -04:00
Linus Torvalds
009bd55dfc RDMA 5.11 pull request
A smaller set of patches, nothing stands out as being particularly major
 this cycle:
 
 - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw, cxgb4,
   mlx4 and mlx5
 
 - Bug fixes and polishing for the new rts ULP
 
 - Cleanup of uverbs checking for allowed driver operations
 
 - Use sysfs_emit all over the place
 
 - Lots of bug fixes and clarity improvements for hns
 
 - hip09 support for hns
 
 - NDR and 50/100Gb signaling rates
 
 - Remove dma_virt_ops and go back to using the IB DMA wrappers
 
 - mlx5 optimizations for contiguous DMA regions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl/aNXUACgkQOG33FX4g
 mxqlMQ/+O6UhxKnDAnMB+HzDGvOm+KXNHOQBuzxz4ZWXqtUrW8WU5ca3PhXovc4z
 /QX0HhMhQmVsva5mjp1OGVATxQ2E+yasqFLg4QXAFWFR3N7s0u/sikE9i1DoPvOC
 lsmLTeRauCFaE4mJD5nvYwm+riECX0GmyVVW7v6V05xwAp0hwdhyU7Kb6Yh3lxsE
 umTz+onPNJcD6Tc4snziyC5QEp5ebEjAaj4dVI1YPR5X0c2RwC5E1CIDI6u4OQ2k
 j7/+Kvo8LNdYNERGiR169x6c1L7WS6dYnGMMeXRgyy0BVbVdRGDnvCV9VRmF66w5
 99fHfDjNMNmqbGNt/4/gwNdVrR9aI4jMZWCh7SmsguX6XwNOlhYldy3x3WnlkfkQ
 e4O0huJceJqcB2Uya70GqufnAetRXsbjzcvWxpR5YAwRmcRkm1f6aGK3BxPjWEbr
 BbYRpiKMxxT4yTe65BuuThzx6g4pNQHe0z3BM/dzMJQAX+PZcs1CPQR8F8PbCrZR
 Ad7qw4HJ587PoSxPi3toVMpYZRP6cISh1zx9q/JCj8cxH9Ri4MovUCS3cF63Ny3B
 1LJ2q0x8FuLLjgZJogKUyEkS8OO6q7NL8WumjvrYWWx19+jcYsV81jTRGSkH3bfY
 F7Esv5K2T1F2gVsCe1ZFFplQg6ja1afIcc+LEl8cMJSyTdoSub4=
 =9t8b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A smaller set of patches, nothing stands out as being particularly
  major this cycle. The biggest item would be the new HIP09 HW support
  from HNS, otherwise it was pretty quiet for new work here:

   - Driver bug fixes and updates: bnxt_re, cxgb4, rxe, hns, i40iw,
     cxgb4, mlx4 and mlx5

   - Bug fixes and polishing for the new rts ULP

   - Cleanup of uverbs checking for allowed driver operations

   - Use sysfs_emit all over the place

   - Lots of bug fixes and clarity improvements for hns

   - hip09 support for hns

   - NDR and 50/100Gb signaling rates

   - Remove dma_virt_ops and go back to using the IB DMA wrappers

   - mlx5 optimizations for contiguous DMA regions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (147 commits)
  RDMA/cma: Don't overwrite sgid_attr after device is released
  RDMA/mlx5: Fix MR cache memory leak
  RDMA/rxe: Use acquire/release for memory ordering
  RDMA/hns: Simplify AEQE process for different types of queue
  RDMA/hns: Fix inaccurate prints
  RDMA/hns: Fix incorrect symbol types
  RDMA/hns: Clear redundant variable initialization
  RDMA/hns: Fix coding style issues
  RDMA/hns: Remove unnecessary access right set during INIT2INIT
  RDMA/hns: WARN_ON if get a reserved sl from users
  RDMA/hns: Avoid filling sl in high 3 bits of vlan_id
  RDMA/hns: Do shift on traffic class when using RoCEv2
  RDMA/hns: Normalization the judgment of some features
  RDMA/hns: Limit the length of data copied between kernel and userspace
  RDMA/mlx4: Remove bogus dev_base_lock usage
  RDMA/uverbs: Fix incorrect variable type
  RDMA/core: Do not indicate device ready when device enablement fails
  RDMA/core: Clean up cq pool mechanism
  RDMA/core: Update kernel documentation for ib_create_named_qp()
  MAINTAINERS: SOFT-ROCE: Change Zhu Yanjun's email address
  ...
2020-12-16 13:42:26 -08:00
Maor Gottlieb
e899389029 RDMA/mlx5: Fix MR cache memory leak
If the MR cache entry invalidation failed, then we detach this entry from
the cache, therefore we must to free the memory as well.

Allcation backtrace for the leaker:

    [<00000000d8e423b0>] alloc_cache_mr+0x23/0xc0 [mlx5_ib]
    [<000000001f21304c>] create_cache_mr+0x3f/0xf0 [mlx5_ib]
    [<000000009d6b45dc>] mlx5_ib_alloc_implicit_mr+0x41/0×210 [mlx5_ib]
    [<00000000879d0d68>] mlx5_ib_reg_user_mr+0x9e/0×6e0 [mlx5_ib]
    [<00000000be74bf89>] create_qp+0x2fc/0xf00 [ib_uverbs]
    [<000000001a532d22>] ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ+0x1d9/0×230 [ib_uverbs]
    [<0000000070f46001>] rdma_alloc_commit_uobject+0xb5/0×120 [ib_uverbs]
    [<000000006d8a0b38>] uverbs_alloc+0x2b/0xf0 [ib_uverbs]
    [<00000000075217c9>] ksysioctl+0x234/0×7d0
    [<00000000eb5c120b>] __x64_sys_ioctl+0x16/0×20
    [<00000000db135b48>] do_syscall_64+0x59/0×2e0

Fixes: 1769c4c575 ("RDMA/mlx5: Always remove MRs from the cache before destroying them")
Link: https://lore.kernel.org/r/20201213132940.345554-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-14 15:14:27 -04:00
Tom Rix
7f1d2dfa30 RDMA/mlx5: Remove unneeded semicolon
A semicolon is not needed after a switch statement.

Link: https://lore.kernel.org/r/20201031134638.2135060-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-10 12:00:27 -04:00
Maor Gottlieb
ca991a7d14 RDMA/mlx5: Assign dev to DM MR
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and
can cause a NULL pointer dereference if userspace dumps the MR via rdma
tool.

Assign the IB device together with the other fields and remove the
redundant reference of mlx5_ib_dev from mlx5_ib_mr.

Cc: stable@vger.kernel.org
Fixes: 6c29f57ea4 ("IB/mlx5: Device memory mr registration support")
Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 15:52:54 -04:00
Jason Gunthorpe
ef3642c4f5 RDMA/mlx5: Fix error unwinds for rereg_mr
This is all a giant train wreck of error handling, in many cases the MR is
left in some corrupted state where continuing on is going to lead to
chaos, or various unwinds/order is missed.

rereg had three possible completely different actions, depending on flags
and various details about the MR. Split the three actions into three
functions, and call the right action from the start.

For each action carefully design the error handling to fit the action:

- UMR access/PD update is a simple UMR, if it fails the MR isn't changed,
  so do nothing

- PAS update over UMR is multiple UMR operations. To keep everything sane
  revoke access to the MKey while it is being changed and restore it once
  the MR is correct.

- Recreating the mkey should completely build a parallel MR with a fully
  loaded PAS then swap and destroy the old one. If it fails the original
  should be left untouched. This is handled in the core code. Directly
  call the normal MR creation functions, possibly re-using the existing
  umem.

Add support for working with ODP MRs. The READ/WRITE access flags can be
changed by UMR and we can trivially convert to/from ODP MRs using the
logic to build a completely new MR.

This new logic also fixes various problems with MRs continuing to work
while their PAS lists are no longer valid, eg during a page size change.

Link: https://lore.kernel.org/r/20201130075839.278575-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
38f8ff5b44 RDMA/mlx5: Reorganize mlx5_ib_reg_user_mr()
This function handles an ODP and regular MR flow all mushed together, even
though the two flows are quite different. Split them into two dedicated
functions.

Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
6e0954b11c RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.

Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.

The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.

Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Jason Gunthorpe
adac4cb3c1 RDMA/uverbs: Check ODP in ib_check_mr_access() as well
No reason only one caller checks this. This properly blocks ODP
from the rereg flow if the device does not support ODP.

Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07 14:06:23 -04:00
Leon Romanovsky
04b222f957 RDMA/mlx5: Remove IB representors dead code
Delete dead code.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:54 +02:00
Leon Romanovsky
e87114022e net/mlx5: Simplify eswitch mode check
Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:54 +02:00
Leon Romanovsky
93f8244431 RDMA/mlx5: Convert mlx5_ib to use auxiliary bus
The conversion to auxiliary bus solves long standing issue with
existing mlx5_ib<->mlx5_core coupling. It required to have both
modules in initramfs if one of them needed for the boot.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-12-06 07:43:50 +02:00
Leon Romanovsky
2b1f747071 RDMA/core: Allow drivers to disable restrack DB
Driver QP types are special case with no IBTA restrictions. For example,
EFA implemented creation of this QP type as regular one, while mlx5
separated create to two step: create and modify. That separation causes to
the situation where DC QP (mlx5) is always added to the same xarray index
zero.

This change allows to drivers like mlx5 simply disable restrack DB
tracking, but it doesn't disable kref on the memory.

Fixes: 52e0a118a2 ("RDMA/restrack: Track driver QP types in resource tracker")
Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-27 11:38:46 -04:00
Parav Pandit
7ec3df174f RDMA/mlx5: Use PCI device for dma mappings
DMA operation of the IB device is done using ib_device->dma_device.

Instead of accessing parent of the IB device, use the PCI dma device which
is setup to ib_device->dma_device during IB device registration.

Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Leon Romanovsky
d4b2d19dc5 RDMA/mlx5: Silence the overflow warning while building offset mask
Coverity reports "Potentially overflowing expression ..." warning, which
is correct thing to complain from the compiler point of view, but this is
not possible in the current code. Still, this is a small error as there
are some future situations that might need to use a 32 bit offset. Use ULL
so the calculation works up to 63.

Fixes: b045db62f6 ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ")
Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:05 -04:00
Jason Gunthorpe
d0b7721c5e RDMA/mlx5: Check for ERR_PTR from uverbs_zalloc()
The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR
not NULL like other allocators:

drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR'

Fixes: 878f7b31c3 ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx")
Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 15:49:04 -04:00
Avihai Horon
f957d4d09a RDMA/mlx5: Enable querying AH for XRC QP types
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.

Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.

Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.

Hence, we enable it now for INI and TGT QPs as well.

Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-26 11:58:19 -04:00
Gustavo A. R. Silva
808b2c925d IB/mlx5: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding the new pseudo-keyword fallthrough; instead of
letting the code fall through to the next case.

Link: https://lore.kernel.org/r/2b0c87362bc86f6adfe56a5a6685837b71022bbf.1605896059.git.gustavoars@kernel.org
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-23 15:54:11 -04:00
Jason Gunthorpe
bf3b7b7ba9 Merge branch 'for-rc' into rdma.git
From https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

The rc RDMA branch is needed due to dependencies on the next patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-17 15:20:26 -04:00
Jason Gunthorpe
8a7904a672 RDMA/mlx5: Lower setting the umem's PAS for SRQ
Some of the SRQ types are created using a WQ, and the WQ requires a
different parameter set to mlx5_umem_find_best_quantized_pgoff() as it has
a 5 bit page_offset.

Add the umem to the mlx5_srq_attr and defer computing the PAS data until
the code has figured out what kind of mailbox to use. Compute the PAS
directly from the umem for each of the four unique mailbox types.

This also avoids allocating memory to store the user PAS, instead it is
written directly to the mailbox as in most other cases.

Fixes: 01949d0109 ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0")
Link: https://lore.kernel.org/r/20201115114311.136250-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
878f7b31c3 RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx
Since devx uses the new rdma_for_each_block() to fill the PAS it can also
use ib_umem_find_best_pgsz().

However, the umem constructionin devx is complicated, the umem must still
respect all the HW limits such as page_offset_quantized and the IOVA
alignment.

Since we don't know what the user intends to use the umem for we have to
limit it to PAGE_SIZE.

There are users trying to mix umem's with mkeys so this makes them work
reliably, at least for an identity IOVA, by ensuring the IOVA matches the
selected page size.

Last user of mlx5_ib_get_buf_offset() so it can also be removed.

Fixes: aeae94579c ("IB/mlx5: Add DEVX support for memory registration")
Link: https://lore.kernel.org/r/20201115114311.136250-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
c08fbdc577 RDMA/mlx5: mlx5_umem_find_best_quantized_pgoff() for CQ
This fixes a bug where the page_offset was not being considered when
building a CQ. The HW specification says it 'must be zero', so use
a variant of mlx5_umem_find_best_quantized_pgoff() with a 0 pgoff_bitmask
to force this result.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20201115114311.136250-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
a59b7b05ef RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for QP
Delete custom logic in the QP in favor of more general variant.

Link: https://lore.kernel.org/r/20201115114311.136250-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:30 -04:00
Jason Gunthorpe
7579dcdf73 RDMA/mlx5: Directly compute the PAS list for raw QP RQ's
The RQ WQ created when making a raw ethernet QP copies the PAS list from
a dummy QPC command created earlier in the flow. The WQC and QPC PAS lists
are not fully compatible as the page_offset is a different size.

Create the RQ WQ's PAS list directly and do not try to copy it from
another command structure.

Like the prior patch, this also means that badly aligned buffers were not
correctly rejected.

Link: https://lore.kernel.org/r/20201115114311.136250-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe
ad480ea5d6 RDMA/mlx5: Use mlx5_umem_find_best_quantized_pgoff() for WQ
This fixes a subtle bug, the WQ mailbox has only 5 bits to describe the
page_offset, while mlx5_ib_get_buf_offset() is hard wired to only work
with 6 bit page_offsets.

Thus it did not properly reject badly aligned buffers.

Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Link: https://lore.kernel.org/r/20201115114311.136250-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Jason Gunthorpe
b045db62f6 RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ
SRQ uses a quantized and scaled page_offset, which is another variation of
ib_umem_find_best_pgsz(). Add mlx5_umem_find_best_quantized_pgoff() to
perform this calculation for each mailbox. A macro shows how the
calculation is directly connected to the mailbox format.

This new routine replaces the limited mlx5_ib_cont_pages() and
mlx5_ib_get_buf_offset() pairing which would reject valid configurations
rather than adjust the page_size to make it work.

In turn this is much more aggressive about choosing large page sizes for
these objects and when THP is enabled it will now often find a single page
solution.

Link: https://lore.kernel.org/r/20201115114311.136250-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-16 16:53:29 -04:00
Leon Romanovsky
c5633a72a1 RDMA/core: Make FD destroy callback void
All FD object destroy implementations return 0, so declare this callback
void.

Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Leon Romanovsky
efa968ee20 RDMA/core: Postpone uobject cleanup on failure till FD close
Remove the ib_is_destroyable_retryable() concept.

The idea here was to allow the drivers to forcibly clean the HW object
even if they otherwise didn't want to (eg because of usecnt). This was an
attempt to clean up in a world where drivers were not allowed to fail HW
object destruction.

Now that we are going back to allowing HW objects to fail destroy this
doesn't make sense. Instead if a uobject's HW object can't be destroyed it
is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to
clean it. Multiple passes over the uobject list allow hidden dependencies
to be resolved. If that fails the HW driver is broken, throw a WARN_ON and
leak the HW object memory.

All the other tricky failure paths (eg on creation error unwind) have
already been updated to this new model.

Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Meir Lichtinger
f946e45f59 IB/mlx5: Add support for NDR link speed
The IBTA specification has new speed - NDR. That speed supports signaling
rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX
device to IB speed and width. Added translation of new 100Gb, 200Gb and
400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.

Link: https://lore.kernel.org/r/20201026133738.1340432-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:48:57 -04:00
Jason Gunthorpe
d5c7916fe4 RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc's
Now that all the PAS arrays or UMR XLT's for mkcs are filled using
rdma_for_each_block() we can use the common ib_umem_find_best_pgsz()
algorithm.

Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe
f1eaac37da RDMA/mlx5: Split mlx5_ib_update_xlt() into ODP and non-ODP cases
Mixing these together is just a mess, make a dedicated version,
mlx5_ib_update_mr_pas(), which directly loads the whole MTT for a non-ODP
MR.

The split out version can trivially use a simple loop with
rdma_for_each_block() which allows using the core code to compute the MR
pages and avoids seeking in the SGL list after each chunk as the
__mlx5_ib_populate_pas() call required.

Significantly speeds loading large MTTs.

Link: https://lore.kernel.org/r/20201026132314.1336717-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Jason Gunthorpe
8010d74b99 RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()
The memory allocation is quite complicated, and makes this function hard
to understand. Refactor things so that a function call sets up the WR, SG,
DMA mapping and buffer, further splitting that into buffer and DMA/wr.

This also slightly changes the buffer allocation logic to try an order 0
page allocation (with OOM warnings on) before going to the emergency page.

Link: https://lore.kernel.org/r/20201026132314.1336717-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:55 -04:00
Jason Gunthorpe
f22c30aa6d RDMA/mlx5: Move xlt_emergency_page_mutex into mr.c
This is the only user, so remove the wrappers.

Link: https://lore.kernel.org/r/20201026132314.1336717-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe
aab8d3966d RDMA/mlx5: Change mlx5_ib_populate_pas() to use rdma_for_each_block()
This routine converts the umem SGL into a list of fixed pages for DMA,
which is exactly what rdma_umem_for_each_dma_block() is for, use the
common code directly.

Link: https://lore.kernel.org/r/20201026132314.1336717-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:53:54 -04:00
Jason Gunthorpe
f8fb311063 RDMA/mlx5: Remove npages from mlx5_ib_cont_pages()
Most callers don't need this, and the few that do can get it as
ib_umem_num_pages(umem).

Link: https://lore.kernel.org/r/20201026131936.1335664-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
7db0eea916 RDMA/mlx5: Remove ncont from mlx5_ib_cont_pages()
This is the same as ib_umem_num_dma_blocks(umem, 1UL << page_shift), have
the callers compute it directly.

Link: https://lore.kernel.org/r/20201026131936.1335664-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
95741ee3f0 RDMA/mlx5: Remove order from mlx5_ib_cont_pages()
Only alloc_mr_from_cache() needs order and can trivially compute it, so
lift it to the one call site and remove the NULL arguments.

Link: https://lore.kernel.org/r/20201026131936.1335664-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
f0093fb1a7 RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr
For the user MR path, instead of calling this after getting the umem, call
it as part of creating the struct mlx5_ib_mr and distill its output to a
single page_shift stored inside the mr.

This avoids passing around the tuple of its output. Based on the umem and
page_shift, the output arguments can be computed using:

  count == ib_umem_num_pages(mr->umem)
  shift == mr->page_shift
  ncont == ib_umem_num_dma_blocks(mr->umem, 1 << mr->page_shift)
  order == order_base_2(ncont)

And since mr->page_shift == umem_odp->page_shift then ncont ==
ib_umem_num_dma_blocks() == ib_umem_odp_num_pages() for ODP umems.

Link: https://lore.kernel.org/r/20201026131936.1335664-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
1c3d247eee RDMA/mlx5: Remove mlx5_ib_mr->npages
This is the same value as ib_umem_num_pages(mr->umem), use that instead.

Link: https://lore.kernel.org/r/20201026131936.1335664-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:52:26 -04:00
Jason Gunthorpe
fc3325701a RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
reg_pages should always contain mr->npage since when the mr is finally
de-reg'd it is always subtracted out.

If there were any error exits then mlx5_ib_rereg_user_mr() would leave the
reg_pages adjusted and this will cause it to be double subtracted
eventually.

The manipulation of reg_pages is inherently connected to the umem, so lift
it out of set_mr_fields() and only adjust it around creating/destroying a
umem.

reg_pages is only used for diagnostics in sysfs.

Fixes: 7d0cc6edcc ("IB/mlx5: Add MR cache for large UMR regions")
Link: https://lore.kernel.org/r/20201026131936.1335664-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Jason Gunthorpe
b4d031cdae RDMA/mlx5: Remove mlx5_ib_mr->order
The is only ever set to non-zero if the MR is from the cache, and if it is
cached then the order is in cached_ent->order.

Make it clearer that use_umr_mtt_update() only returns true for cached MRs
and remove the redundant data.

Link: https://lore.kernel.org/r/20201026131936.1335664-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 14:31:40 -04:00
Joe Perches
1c7fd72687 RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:53:21 -03:00
Jason Gunthorpe
676a80adba RDMA: Remove AH from uverbs_cmd_mask
Drivers that need a uverbs AH should instead set the create_user_ah() op
similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never
implemented so are just deleted.

Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe
1f11a7610e RDMA: Check create_flags during create_qp
Each driver should check that the QP attrs create_flags is supported.
Unfortuantely when create_flags was added to the QP attrs the drivers were
not updated. uverbs_ex_cmd_mask was used to block it - even though kernel
drivers use these flags too.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code
to be EOPNOTSUPP.

Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
1c407cb5d7 RDMA: Check flags during create_cq
Each driver should check that the CQ attrs is supported. Unfortuantely
when flags was added to the CQ attrs the drivers were not updated,
uverbs_ex_cmd_mask was used to block it. This was missed when create CQ
was converted to ioctl, so non-zero flags could have been passed into
drivers.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask.

Fixes: 41b2a71fc8 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file")
Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
26e990badd RDMA: Check attr_mask during modify_qp
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.

Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
652caba5b5 RDMA: Check srq_type during create_srq
uverbs was blocking srq_types the driver doesn't support based on the
CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types
during create_srq and move CREATE_XSRQ to the core code.

Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
44ce37bc8b RDMA: Move more uverbs_cmd_mask settings to the core
These functions all depend on the driver providing a specific op:

- REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op
- ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this
  without providing the op
- OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides
  xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow.
- OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd()
- CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq()
- QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but
  sometimes supplies a NULL op.
- RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op
- ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an
  (now deleted) implementation but no userspace

All drivers were checked that no drivers provide the op without also
setting uverbs_cmd_mask so this should have no functional change.

Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
c074bb1e30 RDMA: Remove elements in uverbs_cmd_mask that all drivers set
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the
core code. Only the op reg_user_mr wasn't already being required from the
drivers.

Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:57 -03:00
Jason Gunthorpe
b8e3130dd9 RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions
Since a while now the uverbs layer checks if the driver implements a
function before allowing the ucmd to proceed. This largely obsoletes the
cmd_mask stuff, but there is some tricky bits in drivers preventing it
from being removed.

Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the
core code. These are triggered soley based on the related ops function
pointer.

query_device_ex is not triggered based on an op, but all drivers already
implement something compatible with the extension, so enable it globally
too.

Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:56 -03:00
Parav Pandit
fbdd0049d9 RDMA/mlx5: Fix devlink deadlock on net namespace deletion
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:18:19 -03:00
Jason Gunthorpe
e0477b34d9 RDMA: Explicitly pass in the dma_device to ib_register_device
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.

Other than setting the masks in rvt all drivers were doing this already
anyhow.

mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
  dma_set_max_seg_size (1G)

__mlx4_init_one()
  dma_set_max_seg_size (1G)

mlx5_pci_init()
  set_dma_caps()
    dma_set_max_seg_size (2G)

Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.

[1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
[2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/

Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:53:46 -03:00
Avihai Horon
1c15b4f2a4 RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Yishai Hadas
a03bfc37d5 RDMA/mlx5: Sync device with CPU pages upon ODP MR registration
Sync device with CPU pages upon ODP MR registration. mlx5 already has to
zero the HW's version of the PAS list, may as well deliver a PAS list that
matches the current CPU page tables configuration.

Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:44 -03:00
Yishai Hadas
677cf51f71 RDMA/mlx5: Extend advice MR to support non faulting mode
Extend advice MR to support non faulting mode, this can improve
performance by increasing the populated page tables in the device.

Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
8bfafde086 IB/core: Enable ODP sync without faulting
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.

The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.

As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.

Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
36f30e486d IB/core: Improve ODP to use hmm_range_fault()
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:

This includes:
- Dropping the need to allocate and free memory to hold its output

- No need any more to use put_page() to unpin the pages

- The logic to detect contiguous pages is done based on the returned
  order, no need to run per page and evaluate.

In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.

As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().

Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:39:54 -03:00
Leon Romanovsky
b925c555a1 RDMA/drivers: Remove udata check from special QP
GSI QP can't be created from the user space, hence the udata check is
always false (udata == NULL). Remove that check and simplify the flow.

Link: https://lore.kernel.org/r/20200926102450.2966017-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Leon Romanovsky
eebe580feb RDMA/mlx5: Delete not needed GSI QP signal QP type
GSI QP doesn't need signal QP type because it is initialized statically to
zero, which is IB_SIGNAL_ALL_WR also wr->send_flags isn't set too.  This
means that the GSI QP signal QP type can be removed.

Link: https://lore.kernel.org/r/20200926102450.2966017-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:49 -03:00
Leon Romanovsky
2dc4d6725b RDMA/mlx5: Change GSI QP to have same creation flow like other QPs
There is no reason to have separate create flow for the GSI QP, while
general create_qp routine has all needed checks and ability to allocate
and free the proper struct mlx5_ib_qp.

Link: https://lore.kernel.org/r/20200926102450.2966017-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
f8225e3488 RDMA/mlx5: Reuse existing fields in parent QP storage object
Remove duplication of mlx5_ib_qp and mlx5_ib_gsi_qp fields.  This change
returns the memory footprint of mlx5_ib QP to be as it was before
embedding GSI QP.

Link: https://lore.kernel.org/r/20200926102450.2966017-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Leon Romanovsky
0d9aef8603 RDMA/mlx5: Embed GSI QP into general mlx5_ib QP
The GSI QPs have different create flow from the regular QPs, but it is not
really needed. Update the code to use mlx5_ib_qp as a storage class for
all outside of GSI calls.

Link: https://lore.kernel.org/r/20200926102450.2966017-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:09:48 -03:00
Liu Shixin
b942fc0319 RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()
sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25 09:17:42 -03:00
Leon Romanovsky
d7ecab1e5f RDMA/mlx5: Don't call to restrack recursively
The restrack is going to manage memory of all IB objects and must be
called before object is created. GSI QP in the mlx5_ib separated between
creating dummy interface and HW object beneath. This was achieved by
double call to ib_create_qp().

In order to skip such reentry call to internal driver create_qp code.

Link: https://lore.kernel.org/r/20200922091106.2152715-3-leon@kernel.org
Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Jason Gunthorpe
8383da3e4a RDMA/mlx5: Clarify what the UMR is for when creating MRs
Once a mkey is created it can be modified using UMR. This is desirable for
performance reasons. However, different hardware has restrictions on what
modifications are possible using UMR. Make sense of these checks:

- mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
  altered. Most cases create MRs using 0 access flags (now made clear by
  consistent use of set_mkc_access_pd_addr_fields()), but the old logic
  here was tormented. Make it clear that this is checking if the current
  access_flags can be modified using UMR to different access_flags. It is
  always OK to use UMR to change flags that all HW supports.

- mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
  enable and update the PAS/XLT. Enabling requires updating the entity
  size, so UMR ends up completely disabled on this old hardware. Make it
  clear why it is disabled. FRWR, ODP and cache always requires
  mlx5_ib_can_load_pas_with_umr().

- mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
  resized to hold a new PAS list. This only works for cached MR's because
  we don't store the PAS list size in other cases.

To be very clear, arrange things so any pre-created MR's in the cache
check the newly requested access_flags before allowing the MR to leave the
cache. If UMR cannot set the required access_flags the cache fails to
create the MR.

This in turn means relaxed ordering and atomic are now correctly blocked
early for implicit ODP on older HW.

Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
0ec52f0194 RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work
set_reg_wr() always fails if !umr_modify_entity_size_disabled because
mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR
doesn't work and that means the device should not advertise
IB_DEVICE_MEM_MGT_EXTENSIONS.

Fixes: 841b07f99a ("IB/mlx5: Block MR WR if UMR is not possible")
Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
5eb29f0d13 RDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabled
Any mkey that is not enabled and assigned to userspace should have the PD
set to a kernel owned PD.

When cache entries are created for the first time the PDN is set to 0,
which is probably a kernel PD, but be explicit.

When a MR is registered using the hybrid reg_create with UMR xlt & enable
the disabled mkey is pointing at the user PD, keep it pointing at the
kernel until a UMR enables it and sets the user PD.

Fixes: 9ec4483a3f ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache")
Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:43 -03:00
Jason Gunthorpe
1c97ca3da0 RDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create()
reg_create() open codes this helper, use the shared code.

Link: https://lore.kernel.org/r/20200914112653.345244-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:42 -03:00
Jason Gunthorpe
2e4e706e09 RDMA/mlx5: Remove dead check for EAGAIN after alloc_mr_from_cache()
alloc_mr_from_cache() no longer returns EAGAIN, this is just dead code
now.

Fixes: aad719dcf3 ("RDMA/mlx5: Allow MRs to be created in the cache synchronously")
Link: https://lore.kernel.org/r/20200914112653.345244-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 13:02:42 -03:00
Jason Gunthorpe
a1255fff5d Merge branch 'mlx_sw_owner_v2' into rdma.git for-next
Leon Romanovsky says:

====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_sw_owner_v2:
  RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
  RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
  RDMA/mlx5: Add sw_owner_v2 bit capability
2020-09-18 10:31:45 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Aharon Landau
376ceb31ff RDMA: Fix link active_speed size
According to the IB spec active_speed size should be u16 and not u8 as
before. Changing it to allow further extensions in offered speeds.

Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 10:31:24 -03:00
Alex Vesker
54a38b6627 RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
Expose the ICM address to access TIR and QP, this will allow sw_owned_v2
devices to steer traffic to TIRs and QPs same as done with sw_owner
capability.

Link: https://lore.kernel.org/r/20200903073857.1129166-4-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 09:40:41 -03:00
Alex Vesker
8310e32704 RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
sw_owner_v2 will replace sw_owner for future devices, this means that if
sw_owner_v2 is set sw_owner should be ignored and DM allocation is
required for sw_owner_v2 devices to function.

Link: https://lore.kernel.org/r/20200903073857.1129166-3-leon@kernel.org
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 09:40:41 -03:00
Leon Romanovsky
c0a6b5ecc5 RDMA: Convert RWQ table logic to ib_core allocation scheme
Move struct ib_rwq_ind_table allocation to ib_core.

Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:33 -03:00
Leon Romanovsky
d18bb3e152 RDMA: Clean MW allocation and free flows
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.

Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:32 -03:00
Aharon Landau
e27014bdb4 RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
Combine two same enums to avoid duplication.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:03 +03:00
Aharon Landau
639bf4415c net/mlx5: Refactor query port speed functions
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.

And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.

According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.

Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:02 +03:00
Jason Gunthorpe
a665aca89a RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
ib_umem_num_pages() should only be used by things working with the SGL in
CPU pages directly.

Drivers building DMA lists should use the new ib_num_dma_blocks() which
returns the number of blocks rdma_umem_for_each_block() will return.

To make this general for DMA drivers requires a different implementation.
Computing DMA block count based on umem->address only works if the
requested page size is < PAGE_SIZE and/or the IOVA == umem->address.

Instead the number of DMA pages should be computed in the IOVA address
space, not umem->address. Thus the IOVA has to be stored inside the umem
so it can be used for these calculations.

For now set it to umem->address by default and fix it up if
ib_umem_find_best_pgsz() was called. This allows drivers to be converted
to ib_umem_num_dma_blocks() safely.

Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11 10:24:53 -03:00
Leon Romanovsky
71ff3f6268 RDMA: Make counters destroy symmetrical
Change counters to return failure like any other verbs destroy, however
this flow shouldn't return error at all.

Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
add53535fb RDMA: Restore ability to return error for destroy WQ
Make this interface symmetrical to other destroy paths.

Fixes: a49b1dc7ae ("RDMA: Convert destroy_wq to be void")
Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
d0c45c8556 RDMA: Change XRCD destroy return value
Update XRCD destroy flow to allow command failure.

Fixes: 28ad5f65c3 ("RDMA: Move XRCD to be under ib_core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
43d781b9fa RDMA: Allow fail of destroy CQ
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.

Kernel verbs and drivers that don't have DEVX flows shouldn't fail.

Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
119181d1d4 RDMA: Restore ability to fail on SRQ destroy
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.

Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:24 -03:00
Leon Romanovsky
fd89099d63 RDMA/mlx5: Issue FW command to destroy SRQ on reentry
The HW release can fail and leave the system in limbo state, where SRQ is
removed from the table, but can't be destroyed later.  In every reentry,
the initial xa_erase_irq() check will fail.

Rewrite the erase logic to keep index, but don't store the entry
itself. By doing it, we can safely reinsert entry back in the case of
destroy failure.

Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:04:13 -03:00
Leon Romanovsky
9a9ebf8cd7 RDMA: Restore ability to fail on AH destroy
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.

Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.

Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Leon Romanovsky
91a7c58fce RDMA: Restore ability to fail on PD deallocate
The IB verbs objects are counted by the kernel and ib_core ensures that
deallocate PD will success so it will be called once all other objects
that depends on PD will be released. This is achieved by managing various
reference counters on such objects.

The mlx5 driver didn't follow this standard flow when allowed DEVX objects
that are not managed by ib_core to be interleaved with the ones under
ib_core responsibility.

In such interleaved scenarios deallocate command can fail and ib_core will
leave uobject in internal DB and attempt to clean it later to free
resources anyway.

This change partially restores returned value from dealloc_pd() for all
drivers, but keeping in mind that non-DEVX devices and kernel verbs paths
shouldn't fail.

Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Leon Romanovsky
4b916ed9f9 RDMA/mlx5: Fix potential race between destroy and CQE poll
The SRQ can be destroyed right before mlx5_cmd_get_srq is called.
In such case the latter will return NULL instead of expected SRQ.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03 10:12:57 -03:00
Jason Gunthorpe
6989aa62d3 Linux 5.9-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9ML+IeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGA8EIAIy/kTbFS0yrE9yV
 hb98oX0z9+EU9YQg9vhaRWwPd+rJF/JMQZLqYcwbhjG9abaUL3T3fEcSAefMHw8E
 LAt+hYzA38dHt7tqhsFQX3vV1VorvDVICBVN0yRPRWKKikq4OPIHzaAR9tleGAF5
 8btQisl1PjN+obwYmLuNb6aX16OCwAF+uXOwehcoJs9dvMNhwtXRzfOflWzOvOo6
 tE0bHErlylLDfLv4ZzEfczTdks4QJZ7C0xLSf3oN9AAynW42Xnhct4hi8qZY/hCf
 CMaqeN4hdpub6TvQIqBdDqMMjEXGFgeNSnAEBQY9VpvUqz8NTu6sQxwgJEKDF5tg
 d81lv2c=
 =uW/F
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc3' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:28:12 -03:00
Mark Zhang
7c4b1ab9f1 IB/mlx5: Add DCT RoCE LAG support
When DCT QPs work in RoCE LAG mode:
 1. DCT creation is allowed only when it is supported
 2. The "port" of a DCT QP is assigned in a round-robin way

Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Mark Zhang
8f3243a047 IB/mlx5: Add tx_affinity support for DCI QP
DCI QP supports tx_affinity as well.

Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:34:28 -03:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Maor Gottlieb
e6ac9f6006 RDMA/mlx5: Enable sniffer when device is in switchdev mode
In order to allow sniffer when the RDMA device is in switchdev mode, we
don't need to set the source port when creating the sniffer rule.

Link: https://lore.kernel.org/r/20200803060214.15328-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:03:32 -03:00
Mark Zhang
c531024bb1 RDMA/mlx5: Add new IB rates support
Support 56, 25, 100, 200 and 50Gbps IB rates in mlx5 driver.

Link: https://lore.kernel.org/r/20200802081712.1993490-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 15:03:32 -03:00
Leon Romanovsky
d6673746d6 RDMA: Remove constant domain argument from flow creation call
The "domain" argument is constant and modern device (mlx5) doesn't support
anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and
simplify code.

Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Leon Romanovsky
70c1430fba RDMA/mlx5: Replace open-coded offsetofend() macro
Clean mlx5_ib from open-coded implementations of offsetofend().

Link: https://lore.kernel.org/r/20200730081235.1581127-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Leon Romanovsky
156f378985 RDMA/mlx5: Simplify multiple else-if cases with switch keyword
Improve readability of fs.c by converting multiple else-if constructions
to be implemented with switch keyword.

Link: https://lore.kernel.org/r/20200730081235.1581127-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18 14:47:34 -03:00
Linus Torvalds
d7806bbd22 RDMA 5.9 merge window pull request
Smaller set of RDMA updates. A smaller number of 'big topics' with the
 majority of changes being driver updates.
 
 - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re
 
 - Removal of dead or redundant code across the drivers
 
 - RAW resource tracker dumps to include a device specific data blob for
   device objects to aide device debugging
 
 - Further advance the IOCTL interface, remove the ability to turn it off.
   Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands
 
 - Remove stubs related to devices with no pkey table
 
 - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a
   device to give higher performance
 
 - Several more static checker, syzkaller and rare crashers fixed
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl8sSA0ACgkQOG33FX4g
 mxpp1w/8Df/KIB38PVHpKraIW10bX03KsXwoskMYCA+ITYWM5ce+P7YF+yXXGs69
 Vh2vUYHlr1RvqXQkq3Y3LjzCPKTYFuNFVQRZF1LrfbfOpSS9aoQqoxwgKs08dibm
 YDeRwueWneksWhXeEZLA0QoKd4kEWrScA/n7VGYQ4YcWw8FLKa9t6OMSGivCrFLu
 QA+sA9nytrvMWC5uJUCdeVwlRnoaICPYHmM5yafOykPyEciRw2jU1kzTRVy5Z0Hu
 iCsXm2lJPcVoMgSjW6SgktY3oBkQeSu3ZZesT3eTM6FJsoDYkuSiKjNmWSZjW1zv
 x6CFGjVVin41rN4FMTeqqnwYoML9Q/obbyHvBHs5MTd5J8tLDhesQj3Ev7CUaUed
 b0s38v+oEL1w22nkOChfeyfh7eLcy3yiszqvkIU9ABk8mF0p1guGQYsfguzbsq0K
 3ZRw/361SxCUBvU6P8CdQbIJlhkH+Un7d81qyt+rhLgaZYm/N+d8auIKUxP1jCxh
 q9hss2Cj2U9eZsA/wGNqV1LNazfEAAj/5qjItMirbRd90FL8h+AP2LfJfC7p+id3
 3BfOui0JbZqNTTl4ftTxPuxtWDEdTPgwi7JvQd/be9HRlSV8DYCSMUzYFn8A+Zya
 cbxjxFuBJWmF+y9csDIVBTdFi+j9hO6notw+G89NznuB3QlPl50=
 =0z2L
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A quiet cycle after the larger 5.8 effort. Substantially cleanup and
  driver work with a few smaller features this time.

   - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re

   - Removal of dead or redundant code across the drivers

   - RAW resource tracker dumps to include a device specific data blob
     for device objects to aide device debugging

   - Further advance the IOCTL interface, remove the ability to turn it
     off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands

   - Remove stubs related to devices with no pkey table

   - A shared CQ scheme to allow multiple ULPs to share the CQ rings of
     a device to give higher performance

   - Several more static checker, syzkaller and rare crashers fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits)
  RDMA/mlx5: Fix flow destination setting for RDMA TX flow table
  RDMA/rxe: Remove pkey table
  RDMA/umem: Add a schedule point in ib_umem_get()
  RDMA/hns: Fix the unneeded process when getting a general type of CQE error
  RDMA/hns: Fix error during modify qp RTS2RTS
  RDMA/hns: Delete unnecessary memset when allocating VF resource
  RDMA/hns: Remove redundant parameters in set_rc_wqe()
  RDMA/hns: Remove support for HIP08_A
  RDMA/hns: Refactor hns_roce_v2_set_hem()
  RDMA/hns: Remove redundant hardware opcode definitions
  RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP
  RDMA/include: Replace license text with SPDX tags
  RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq
  RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting
  RDMA/cma: Execute rdma_cm destruction from a handler properly
  RDMA/cma: Remove unneeded locking for req paths
  RDMA/cma: Using the standard locking pattern when delivering the removal event
  RDMA/cma: Simplify DEVICE_REMOVAL for internal_id
  RDMA/efa: Add EFA 0xefa1 PCI ID
  RDMA/efa: User/kernel compatibility handshake mechanism
  ...
2020-08-06 16:43:36 -07:00
Michael Guralnik
23fcc7dee2 RDMA/mlx5: Fix flow destination setting for RDMA TX flow table
For RDMA TX flow table, set destination type to be 'port' and prevent
creation of flows with TIR destination.

As RDMA TX is an egress flow table the rules on this flow table should
not forward traffic back to the NIC and should set the destination to be
the port.

Without the setting of this destination type flow rules on the RDMA TX
flow tables are not created as FW invokes a syndrome for undefined
destination for the rule.

Fixes: 24670b1a31 ("net/mlx5: Add support for RDMA TX steering")
Link: https://lore.kernel.org/r/20200803055849.14947-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-05 21:09:39 -03:00
Linus Torvalds
99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
Leon Romanovsky
7fa84b5708 RDMA/mlx5: Initialize QP mutex for the debug kernels
In DCT and RSS RAW QP creation flows, the QP mutex wasn't initialized and
the magic field inside lock was missing. This caused to the following
kernel warning for kernels build with CONFIG_DEBUG_MUTEXES.

 DEBUG_LOCKS_WARN_ON(lock->magic != lock)
 WARNING: CPU: 3 PID: 16261 at kernel/locking/mutex.c:938 __mutex_lock+0x60e/0x940
 Modules linked in: bonding nf_tables ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel mlx5_ib mlx5_core mlxfw ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core [last unloaded: mlxfw]
 CPU: 3 PID: 16261 Comm: ib_send_bw Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__mutex_lock+0x60e/0x940
 Code: c0 0f 84 6d fa ff ff 44 8b 15 4e 9d ba 00 45 85 d2 0f 85 5d fa ff ff 48 c7 c6 f2 de 2b 82 48 c7 c7 f1 8a 2b 82 e8 d2 4d 72 ff <0f> 0b 4c 8b 4d 88 e9 3f fa ff ff f6 c2 04 0f 84 37 fe ff ff 48 89
 RSP: 0018:ffff88810bb8b870 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: ffff88829f1dd880 RSI: 0000000000000000 RDI: ffffffff81192afa
 RBP: ffff88810bb8b910 R08: 0000000000000000 R09: 0000000000000028
 R10: 0000000000000000 R11: 0000000000003f85 R12: 0000000000000002
 R13: ffff88827d8d3ce0 R14: ffffffffa059f615 R15: ffff8882a4d02610
 FS:  00007f3f6988e740(0000) GS:ffff8882f5b80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000556556158000 CR3: 000000010a63c005 CR4: 0000000000360ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ? cmd_exec+0x947/0xe60 [mlx5_core]
  ? __mutex_lock+0x76/0x940
  ? mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib]
  mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib]
  mlx5_ib_counter_bind_qp+0x9b/0xe0 [mlx5_ib]
  __rdma_counter_bind_qp+0x6b/0xa0 [ib_core]
  rdma_counter_bind_qp_auto+0x363/0x520 [ib_core]
  _ib_modify_qp+0x316/0x580 [ib_core]
  ib_modify_qp_with_udata+0x19/0x30 [ib_core]
  modify_qp+0x4c4/0x600 [ib_uverbs]
  ib_uverbs_ex_modify_qp+0x87/0xe0 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x129/0x1c0 [ib_uverbs]
  ib_uverbs_cmd_verbs.isra.5+0x5d5/0x11f0 [ib_uverbs]
  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x120/0x120 [ib_uverbs]
  ? lock_acquire+0xb9/0x3a0
  ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs]
  ? ib_uverbs_ioctl+0x175/0x210 [ib_uverbs]
  ib_uverbs_ioctl+0x14b/0x210 [ib_uverbs]
  ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs]
  ksys_ioctl+0x234/0x7d0
  ? exc_page_fault+0x202/0x640
  ? do_syscall_64+0x1f/0x2e0
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x59/0x2e0
  ? asm_exc_page_fault+0x8/0x30
  ? rcu_read_lock_sched_held+0x52/0x60
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: b4aaa1f0b4 ("IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP")
Link: https://lore.kernel.org/r/20200730082719.1582397-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30 11:03:33 -03:00
Leon Romanovsky
81530ab08e RDMA/mlx5: Allow providing extra scatter CQE QP flag
Scatter CQE feature relies on two flags MLX5_QP_FLAG_SCATTER_CQE and
MLX5_QP_FLAG_ALLOW_SCATTER_CQE, both of them can be provided without
relation to device capability.

Relax global validity check to allow MLX5_QP_FLAG_ALLOW_SCATTER_CQE QP
flag.

Existing user applications are failing on this new validity check.

Fixes: 90ecb37a75 ("RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags")
Fixes: 37518fa49f ("RDMA/mlx5: Process all vendor flags in one place")
Link: https://lore.kernel.org/r/20200728120255.805733-1-leon@kernel.org
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29 14:19:01 -03:00
Leon Romanovsky
71cab8ef5c RDMA/mlx5: Delete unreachable code
Delete two occurrences of unreachable code discovered by the Coverity.

Link: https://lore.kernel.org/r/20200727095746.495915-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-28 16:25:37 -03:00
Jason Gunthorpe
5351a56b1a RDMA/mlx5: Fix prefetch memory leak if get_prefetchable_mr fails
destroy_prefetch_work() must always be called if the work is not going
to be queued. The num_sge also should have been set to i, not i-1
which avoids the condition where it shouldn't have been called in the
first place.

Cc: stable@vger.kernel.org
Fixes: fb985e278a ("RDMA/mlx5: Use SRCU properly in ODP prefetch")
Link: https://lore.kernel.org/r/20200727095712.495652-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:50:20 -03:00
Jason Gunthorpe
7923774368 Merge branch 'mlx5_uar' into rdma.git /for-next
Meir Lichtinger says:

====================
ConnectX-7 supports setting relaxed ordering read/write mkey attribute by
UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to
configure UMR control segment
====================

Based on the mlx5-next branch at
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_uar':
  RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
  RDMA/mlx5: Use MLX5_SET macro instead of local structure
  RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
2020-07-27 11:44:36 -03:00
Meir Lichtinger
896ec97353 RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.

With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.

Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Meir Lichtinger
2224635938 RDMA/mlx5: Use MLX5_SET macro instead of local structure
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX
device data structures instead of using structure defined specifically for
mlx5_ib module.

Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27 11:19:00 -03:00
Maor Gottlieb
d4d7f59643 RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flow
According to the locking scheme, mlx5_ib_update_xlt() should be called
with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the
below WARN from lockdep_assert_held():

  WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core
  CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G        W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib]
  Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00
  RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940
  RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100
  RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001
  R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec
  R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000
  FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib]
   pagefault_mr+0x315/0x440 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib]
   process_one_work+0x215/0x5c0
   worker_thread+0x3c/0x380
   ? process_one_work+0x5c0/0x5c0
   kthread+0x133/0x150
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30

Hold the SRCU during prefetch, even though it strictly isn't needed since
prefetch is holding the num_deferred_work it does make it easier to reason
about.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 16:44:06 -03:00
Maor Gottlieb
c94e272b57 RDMA/mlx5: Allow SQ modification
Currently the SQ is set to a ready state when the RAW QP is modified to
INIT.  When the TIS is modified, e.g. to change the lag_tx_affinity, then
SQs which are already in the ready state will not be affected.

Open a window to modify the SQ behavior by setting the SQ as ready only
when QP was modified to RTS.

Link: https://lore.kernel.org/r/20200716105416.1423826-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24 15:49:19 -03:00
Jason Gunthorpe
a862192e92 RDMA/mlx5: Prevent prefetch from racing with implicit destruction
Prefetch work in mlx5_ib_prefetch_mr_work can be queued and able to run
concurrently with destruction of the implicit MR. The num_deferred_work
was intended to serialize this, but there is a race:

       CPU0                                          CPU1

    mlx5_ib_free_implicit_mr()
      xa_erase(odp_mkeys)
      synchronize_srcu()
      __xa_erase(implicit_children)
                                      mlx5_ib_prefetch_mr_work()
                                        pagefault_mr()
                                         pagefault_implicit_mr()
                                          implicit_get_child_mr()
                                           xa_cmpxchg()
                                        atomic_dec_and_test(num_deferred_mr)
      wait_event(imr->q_deferred_work)
      ib_umem_odp_release(odp_imr)
        kfree(odp_imr)

At this point in mlx5_ib_free_implicit_mr() the implicit_children list is
supposed to be empty forever so that destroy_unused_implicit_child_mr()
and related are not and will not be running.

Since it is not empty the destroy_unused_implicit_child_mr() flow ends up
touching deallocated memory as mlx5_ib_free_implicit_mr() already tore down the
imr parent.

The solution is to flush out the prefetch wq by driving num_deferred_work
to zero after creation of new prefetch work is blocked.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065435.130722-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-21 13:51:35 -03:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Daria Velikovsky
0829d2da60 RDMA/mlx5: Init dest_type when create flow
When using action drop dest_type was never assigned to any value.  Add
initialization of dest_type to -1 since 0 is valid.

Fixes: f29de9eee7 ("RDMA/mlx5: Add support for drop action in DV steering")
Link: https://lore.kernel.org/r/20200707110259.882276-1-leon@kernel.org
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 14:11:53 -03:00
Maor Gottlieb
c3d6057e07 RDMA/mlx5: Use xa_lock_irq when access to SRQ table
SRQ table is accessed both from interrupt and process context,
therefore we must use xa_lock_irq.

   inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
   kworker/u17:9/8573   takes:
   ffff8883e3503d30 (&xa->xa_lock#13){?...}-{2:2}, at: mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
   {IN-HARDIRQ-W} state was registered at:
     lock_acquire+0xb9/0x3a0
     _raw_spin_lock+0x25/0x30
     srq_event_notifier+0x2b/0xc0 [mlx5_ib]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     forward_event+0x36/0xc0 [mlx5_core]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     mlx5_eq_async_int+0xc5/0x160 [mlx5_core]
     notifier_call_chain+0x45/0x70
     __atomic_notifier_call_chain+0x69/0x100
     mlx5_irq_int_handler+0x19/0x30 [mlx5_core]
     __handle_irq_event_percpu+0x43/0x2a0
     handle_irq_event_percpu+0x30/0x70
     handle_irq_event+0x34/0x60
     handle_edge_irq+0x7c/0x1b0
     do_IRQ+0x60/0x110
     ret_from_intr+0x0/0x2a
     default_idle+0x34/0x160
     do_idle+0x1ec/0x220
     cpu_startup_entry+0x19/0x20
     start_secondary+0x153/0x1a0
     secondary_startup_64+0xa4/0xb0
   irq event stamp: 20907
   hardirqs last  enabled at (20907):   _raw_spin_unlock_irq+0x24/0x30
   hardirqs last disabled at (20906):   _raw_spin_lock_irq+0xf/0x40
   softirqs last  enabled at (20746):   __do_softirq+0x2c9/0x436
   softirqs last disabled at (20681):   irq_exit+0xb3/0xc0

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&xa->xa_lock#13);
     <Interrupt>
       lock(&xa->xa_lock#13);

    *** DEADLOCK ***

   2 locks held by kworker/u17:9/8573:
    #0: ffff888295218d38 ((wq_completion)mlx5_ib_page_fault){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0
    #1: ffff888401647e78 ((work_completion)(&pfault->work)){+.+.}-{0:0}, at: process_one_work+0x1f1/0x5f0

   stack backtrace:
   CPU: 0 PID: 8573 Comm: kworker/u17:9 Tainted: GO      5.7.0_for_upstream_min_debug_2020_06_14_11_31_46_41 #1
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
   Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
   Call Trace:
    dump_stack+0x71/0x9b
    mark_lock+0x4f2/0x590
    ? print_shortest_lock_dependencies+0x200/0x200
    __lock_acquire+0xa00/0x1eb0
    lock_acquire+0xb9/0x3a0
    ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    _raw_spin_lock+0x25/0x30
    ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
    mlx5_ib_eqe_pf_action+0x257/0xa30 [mlx5_ib]
    ? process_one_work+0x209/0x5f0
    process_one_work+0x27b/0x5f0
    ? __schedule+0x280/0x7e0
    worker_thread+0x2d/0x3c0
    ? process_one_work+0x5f0/0x5f0
    kthread+0x111/0x130
    ? kthread_park+0x90/0x90
    ret_from_fork+0x24/0x30

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200712102641.15210-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 09:28:16 -03:00
Gal Pressman
6c72a038bf RDMA/mlx5: Remove unused to_mibmr function
The to_mibmr function is unused, remove it.

Link: https://lore.kernel.org/r/20200705141143.47303-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10 16:40:39 -03:00
Leon Romanovsky
0a03715068 RDMA/mlx5: Set PD pointers for the error flow unwind
ib_pd is accessed internally during destroy of the TIR/TIS, but PD
can be not set yet. This leading to the following kernel panic.

  BUG: kernel NULL pointer dereference, address: 0000000000000074
  PGD 8000000079eaa067 P4D 8000000079eaa067 PUD 7ae81067 PMD 0 Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 709 Comm: syz-executor.0 Not tainted 5.8.0-rc3 #41 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline]
  RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline]
  RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397
  Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48
  RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140
  RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00
  R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000
  R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000
  FS:  00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_create_qp+0xf36/0xf90 drivers/infiniband/hw/mlx5/qp.c:3014
   _ib_create_qp drivers/infiniband/core/core_priv.h:333 [inline]
   create_qp+0x57f/0xd20 drivers/infiniband/core/uverbs_cmd.c:1443
   ib_uverbs_create_qp+0xcf/0x100 drivers/infiniband/core/uverbs_cmd.c:1564
   ib_uverbs_write+0x5fa/0x780 drivers/infiniband/core/uverbs_main.c:664
   __vfs_write+0x3f/0x90 fs/read_write.c:495
   vfs_write+0xc7/0x1f0 fs/read_write.c:559
   ksys_write+0x5e/0x110 fs/read_write.c:612
   do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x466479
  Code: Bad RIP value.
  RSP: 002b:00007ffd057b62b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
  RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
  RBP: 00000000019bf8fc R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
  R13: 0000000000000bf6 R14: 00000000004cb859 R15: 00000000006fefc0

Fixes: 6c41965d64 ("RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path")
Link: https://lore.kernel.org/r/20200707110612.882962-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 20:15:59 -03:00
Aya Levin
530c8632b5 IB/mlx5: Fix 50G per lane indication
Some released FW versions mistakenly don't set the capability that 50G per
lane link-modes are supported for VFs (ptys_extended_ethernet capability
bit).

Use PTYS.ext_eth_proto_capability instead, as this indication is always
accurate. If PTYS.ext_eth_proto_capability is valid
(has a non-zero value) conclude that the HCA supports 50G per lane.

Otherwise, conclude that the HCA doesn't support 50G per lane.

Fixes: 08e8676f16 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Link: https://lore.kernel.org/r/20200707110612.882962-3-leon@kernel.org
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-08 20:15:58 -03:00
Leon Romanovsky
1e2b5a90de RDMA/mlx5: Delete one-time used functions
Merge them into their callers, usually the only thing the caller did was
to call the one function, so this is clearer.

Link: https://lore.kernel.org/r/20200702081809.423482-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:08:03 -03:00
Leon Romanovsky
d8b7515e25 RDMA/mlx5: Cleanup DEVX initialization flow
Move DEVX initialization and cleanup flows to the devx.c instead of having
almost empty functions in main.c

Link: https://lore.kernel.org/r/20200702081809.423482-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
f7c4ffda0c RDMA/mlx5: Separate flow steering logic from main.c
Move flow steering logic to be in separate file and rename flow.c to be
fs.c because it is better describe the content.

Link: https://lore.kernel.org/r/20200702081809.423482-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
64825827ae RDMA/mlx5: Separate counters from main.c
There are number of counters types supported in mlx5_ib: HW counters,
congestion counters, Q-counters and flow counters. Almost all supporting
code was placed in main.c that made almost impossible to maintain the code
anymore. Let's create separate code namespace for the counters to easy
future generalization effort.

Link: https://lore.kernel.org/r/20200702081809.423482-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
b572ebe667 RDMA/mlx5: Separate restrack callbacks initialization from main.c
The restrack code has separate .c, so move callbacks initialization to
that file to improve code locality.

Link: https://lore.kernel.org/r/20200702081809.423482-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
ac47bf5ef1 RDMA/mlx5: Limit the scope of mlx5_ib_enable_driver function
The mlx5_ib_enable_driver() is local function and doesn't need to be
shared in mlx5_ib, so change it's signature to have static keyword in it.

Link: https://lore.kernel.org/r/20200702081809.423482-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-07 14:05:51 -03:00
Leon Romanovsky
28ad5f65c3 RDMA: Move XRCD to be under ib_core responsibility
Update the code to allocate and free ib_xrcd structure in the
ib_core instead of inside drivers.

Link: https://lore.kernel.org/r/20200630101855.368895-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 20:11:24 -03:00
Leon Romanovsky
3b023e1b68 RDMA/core: Create and destroy counters in the ib_core
Move allocation and destruction of counters under ib_core responsibility

Link: https://lore.kernel.org/r/20200630101855.368895-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 20:04:40 -03:00
Yishai Hadas
05f71ef979 RDMA/mlx5: Introduce UAPI to query PD attributes
Introduce UAPI to query PD attributes, this can be used to retrieve PD
attributes by having the PD handle of the created one and owning the
command FD for the ucontxet.

Link: https://lore.kernel.org/r/20200630093916.332097-7-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:34 -03:00
Yishai Hadas
0fb556b2b5 RDMA/mlx5: Implement the query ucontext functionality
Implement the query ucontext functionality by returning the original
ucontext data as part of an extra mlx5 attribute that holds the driver
UAPI response.

Link: https://lore.kernel.org/r/20200630093916.332097-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Yishai Hadas
45ec21c971 RDMA/mlx5: Refactor mlx5_ib_alloc_ucontext() response
Refactor mlx5_ib_alloc_ucontext() to set its response fields in a
cleaner way.

It includes,
- Move the relevant code to a self contained function.
- Calculate the response length once and drop redundant code all around.
- Reuse previously set ucontext fields once preparing the response.

The self contained function will be used in next patch as part of
implementing the query ucontext functionality.

Link: https://lore.kernel.org/r/20200630093916.332097-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:50:33 -03:00
Leon Romanovsky
f4375443b7 RDMA/mlx5: Get XRCD number directly for the internal use
The mlx5_ib creates XRC domain and uses for creating internal SRQ.
However all that is needed is XRCD number and not full blown ib_xrcd
objects.

Update the code to get and store the number only.

Link: https://lore.kernel.org/r/20200706122716.647338-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:32:23 -03:00
Gal Pressman
42a3b15396 RDMA: Remove the udata parameter from alloc_mr callback
Allocating an MR flow can only be initiated by kernel users, and not from
userspace so a udata parameter is redundant.

Link: https://lore.kernel.org/r/20200706120343.10816-4-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:25:53 -03:00
Maor Gottlieb
d473f4dc2f RDMA/mlx5: Introduce ODP prefetch counter
For debugging purpose it will be easier to understand if prefetch works
okay if it has its own counter. Introduce ODP prefetch counter and count
per MR the total number of prefetched pages.

In addition remove comment which is not relevant anymore and anyway not in
the correct place.

Link: https://lore.kernel.org/r/20200621104147.53795-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-03 09:16:25 -03:00
Leon Romanovsky
f81b4565c1 RDMA/mlx5: Fix legacy IPoIB QP initialization
Legacy IPoIB sets IB_QP_CREATE_NETIF_QP QP create flag and because mlx5
doesn't use this flag, the process_create_flags() failed to create IPoIB
QPs.

Fixes: 2978975ce7 ("RDMA/mlx5: Process create QP flags in one place")
Link: https://lore.kernel.org/r/20200630122147.445847-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02 11:17:10 -03:00
Maor Gottlieb
28b5fa687f RDMA/mlx5: Add support to get MR resource in RAW format
Add support to get MR (mkey) resource dump in RAW format.

Link: https://lore.kernel.org/r/20200623113043.1228482-12-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
1ccecc88af RDMA/mlx5: Add support to get CQ resource in RAW format
Add support to get CQ resource dump in RAW format.

Link: https://lore.kernel.org/r/20200623113043.1228482-11-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
1776dd234a RDMA/mlx5: Add support to get QP resource in RAW format
Add a generic function to use the resource dump mechanism to get the
QP resource data.

Link: https://lore.kernel.org/r/20200623113043.1228482-10-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-24 08:52:29 -03:00
Maor Gottlieb
f443452900 RDMA: Add dedicated MR resource tracker function
In order to avoid double multiplexing of the resource when it is a MR, add
a dedicated callback function.

Link: https://lore.kernel.org/r/20200623113043.1228482-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-23 11:46:27 -03:00
Leon Romanovsky
6eefa839c4 RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata
Don't deref udata if it is NULL

  BUG: kernel NULL pointer dereference, address: 0000000000000030
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000   SMP PTI
  CPU: 2 PID: 1592 Comm: python3 Not tainted 5.7.0-rc6+ #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:create_qp+0x39e/0xae0 [mlx5_ib]
  Code: c0 0d 00 00 bf 10 01 00 00 e8 be a9 e4 e0 48 85 c0 49 89 c2 0f 84 0c 07 00 00 41 8b 85 74 63 01 00 0f c8 a9 00 00 00 10 74 0a <41> 8b 46 30 0f c8 41 89 42 14 41 8b 52 18 41 0f b6 4a 1c 0f ca 89
  RSP: 0018:ffffc9000067f8b0 EFLAGS: 00010206
  RAX: 0000000010170000 RBX: ffff888441313000 RCX: 0000000000000000
  RDX: 0000000000000200 RSI: 0000000000000000 RDI: ffff88845b1d4400
  RBP: ffffc9000067fa60 R08: 0000000000000200 R09: ffff88845b1d4200
  R10: ffff88845b1d4200 R11: ffff888441313000 R12: ffffc9000067f950
  R13: ffff88846ac00140 R14: 0000000000000000 R15: ffff88846c2bc000
  FS:  00007faa1a3c0540(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000030 CR3: 0000000446dca003 CR4: 0000000000760ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   ? __switch_to_asm+0x40/0x70
   ? __switch_to_asm+0x34/0x70
   mlx5_ib_create_qp+0x897/0xfa0 [mlx5_ib]
   ib_create_qp+0x9e/0x300 [ib_core]
   create_qp+0x92d/0xb20 [ib_uverbs]
   ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs]
   ? release_resource+0x30/0x30
   ib_uverbs_create_qp+0xc4/0xe0 [ib_uverbs]
   ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc8/0xf0 [ib_uverbs]
   ib_uverbs_run_method+0x223/0x770 [ib_uverbs]
   ? track_pfn_remap+0xa7/0x100
   ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
   ? remap_pfn_range+0x358/0x490
   ib_uverbs_cmd_verbs.isra.6+0x19b/0x370 [ib_uverbs]
   ? rdma_umap_priv_init+0x82/0xe0 [ib_core]
   ? vm_mmap_pgoff+0xec/0x120
   ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
   ksys_ioctl+0x92/0xb0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x48/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/20200621115959.60126-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:40:53 -03:00
Max Gurtovoy
9e0dc7b9e1 RDMA/mlx5: Fix integrity enabled QP creation
create_flags checks was refactored and broke the creation on integrity
enabled QPs and actually broke the NVMe/RDMA and iSER ULP's when using
mlx5 driven devices.

Fixes: 2978975ce7 ("RDMA/mlx5: Process create QP flags in one place")
Link: https://lore.kernel.org/r/20200617130230.2846915-1-leon@kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 15:14:57 -03:00
Leon Romanovsky
2c0f5292d5 RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
Like any other QP type, rely on FW for the RAW_PACKET QPs to decide if ECE
is supported or not. This fixes an inability to create RAW_PACKET QPs with
latest rdma-core with the ECE support.

Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/20200618112507.3453496-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 14:59:12 -03:00
Maor Gottlieb
d44335572f RDMA/mlx5: Fix remote gid value in query QP
Remote gid is not copied to the right address. Fix it by using
rdma_ah_set_dgid_raw to copy the remote gid value from the QP context on
query QP.

Fixes: 70bd7fb876 ("RDMA/mlx5: Remove manually crafted QP context the query call")
Link: https://lore.kernel.org/r/20200618112507.3453496-3-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 14:42:27 -03:00
Leon Romanovsky
6c41965d64 RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
destroy_qp_common is called for flows where QP is already created by
HW. While it is called from IB/core, the ibqp.* fields will be fully
initialized, but it is not the case if this function is called during QP
creation.

Don't rely on ibqp fields as much as possible and initialize
send_cq/recv_cq as temporal solution till all drivers will be converted to
IB/core QP allocation scheme.

refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 5372 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 5372 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 mlx5_core_put_rsc+0x70/0x80
 destroy_resource_common+0x8e/0xb0
 mlx5_core_destroy_qp+0xaf/0x1d0
 mlx5_ib_destroy_qp+0xeb0/0x1460
 ib_destroy_qp_user+0x2d5/0x7d0
 create_qp+0xed3/0x2130
 ib_uverbs_create_qp+0x13e/0x190
 ? ib_uverbs_ex_create_qp
 ib_uverbs_write+0xaa5/0xdf0
 __vfs_write+0x7c/0x100
 ksys_write+0xc8/0x200
 do_syscall_64+0x9c/0x390
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 08d5397660 ("RDMA/mlx5: Copy response to the user in one place")
Link: https://lore.kernel.org/r/20200617130148.2846643-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 14:26:04 -03:00
Leon Romanovsky
ab183d460d RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
Missed steps during ECE handshake left userspace application with less
options for the ECE handshake. Pass ECE options in the additional
transitions.

Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200616104536.2426384-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:52:29 -03:00
Tom Seewald
4f5747cf8e RDMA/mlx5: Fix -Wformat warning in check_ucmd_data()
Variables of type size_t should use %zu rather than %lu [1]. The variables
"inlen", "ucmd", "last", and "size" are all size_t, so use the correct
format specifiers.

[1] https://www.kernel.org/doc/html/latest/core-api/printk-formats.html

Fixes: e383085c24 ("RDMA/mlx5: Set ECE options during QP create")
Link: https://lore.kernel.org/r/20200605023012.9527-1-tseewald@gmail.com
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-15 15:39:36 -03:00
Colin Ian King
2ef5612391 RDMA/mlx5: Remove duplicated assignment to resp.response_length
The assignment to resp.response_length is never read since it is being
updated again on the next statement. The assignment is redundant so
removed it.

Fixes: a645a89d9a ("RDMA/mlx5: Return ECE DC support")
Link: https://lore.kernel.org/r/20200604143902.56021-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-15 10:48:38 -03:00
Masahiro Yamada
a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Linus Torvalds
242b233198 RDMA 5.8 merge window pull request
A few large, long discussed works this time. The RNBD block driver has
 been posted for nearly two years now, and the removal of FMR has been a
 recurring discussion theme for a long time. The usual smattering of
 features and bug fixes.
 
 - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa
 
 - Continuing driver cleanups in bnxt_re, hns
 
 - Big cleanup of mlx5 QP creation flows
 
 - More consistent use of src port and flow label when LAG is used and a
   mlx5 implementation
 
 - Additional set of cleanups for IB CM
 
 - 'RNBD' network block driver and target. This is a network block RDMA
   device specific to ionos's cloud environment. It brings strong multipath
   and resiliency capabilities.
 
 - Accelerated IPoIB for HFI1
 
 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds
 
 - Support for exchanging the new IBTA defiend ECE data during RDMA CM
   exchanges
 
 - Removal of the very old and insecure FMR interface from all ULPs and
   drivers. FRWR should be preferred for at least a decade now.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g
 mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3
 FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7
 OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N
 ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo
 n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi
 qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc
 qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik
 696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb
 YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ
 nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp
 wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY=
 =9zTe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A more active cycle than most of the recent past, with a few large,
  long discussed works this time.

  The RNBD block driver has been posted for nearly two years now, and
  flowing through RDMA due to it also introducing a new ULP.

  The removal of FMR has been a recurring discussion theme for a long
  time.

  And the usual smattering of features and bug fixes.

  Summary:

   - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa

   - Continuing driver cleanups in bnxt_re, hns

   - Big cleanup of mlx5 QP creation flows

   - More consistent use of src port and flow label when LAG is used and
     a mlx5 implementation

   - Additional set of cleanups for IB CM

   - 'RNBD' network block driver and target. This is a network block
     RDMA device specific to ionos's cloud environment. It brings strong
     multipath and resiliency capabilities.

   - Accelerated IPoIB for HFI1

   - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple
     async fds

   - Support for exchanging the new IBTA defiend ECE data during RDMA CM
     exchanges

   - Removal of the very old and insecure FMR interface from all ULPs
     and drivers. FRWR should be preferred for at least a decade now"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits)
  RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
  RDMA/mlx5: Return ECE DC support
  RDMA/mlx5: Don't rely on FW to set zeros in ECE response
  RDMA/mlx5: Return an error if copy_to_user fails
  IB/hfi1: Use free_netdev() in hfi1_netdev_free()
  RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr()
  RDMA/core: Move and rename trace_cm_id_create()
  IB/hfi1: Fix hfi1_netdev_rx_init() error handling
  RDMA: Remove 'max_map_per_fmr'
  RDMA: Remove 'max_fmr'
  RDMA/core: Remove FMR device ops
  RDMA/rdmavt: Remove FMR memory registration
  RDMA/mthca: Remove FMR support for memory registration
  RDMA/mlx4: Remove FMR support for memory registration
  RDMA/i40iw: Remove FMR leftovers
  RDMA/bnxt_re: Remove FMR leftovers
  RDMA/mlx5: Remove FMR leftovers
  RDMA/core: Remove FMR pool API
  RDMA/rds: Remove FMR support for memory registration
  RDMA/srp: Remove support for FMR memory registration
  ...
2020-06-05 14:05:57 -07:00
Leon Romanovsky
a645a89d9a RDMA/mlx5: Return ECE DC support
The DC QPs are many-to-one QP types that means that first connection will
establish ECE options that coming connections should follow.  Due to this
property, the ECE code was removed between first [1] and second [2] ECE
submissions.

This patch returns the dropped code, because ECE is a property of a
connection and like any other connection users are needed to manage this
data. Allow them to set ECE parameter for DC too and avoid need of having
compatibility flag for the DC ECE.

[1]
https://lore.kernel.org/linux-rdma/20200523132243.817936-1-leon@kernel.org/
[2]
https://lore.kernel.org/linux-rdma/20200525174401.71152-1-leon@kernel.org/

Link: https://lore.kernel.org/r/20200602125548.172654-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:02 -03:00
Leon Romanovsky
92cd667c0e RDMA/mlx5: Don't rely on FW to set zeros in ECE response
The FW returns zeros in case feature is not enabled, but it is better to
have the capability check and ensure that returned result is cleared.

Fixes: 3e09a427ae ("RDMA/mlx5: Get ECE options from FW during create QP")
Link: https://lore.kernel.org/r/20200602125548.172654-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:02 -03:00
Leon Romanovsky
6512f11d38 RDMA/mlx5: Return an error if copy_to_user fails
In theoretical event, the ib_copy_to_udata() can fail, so return -EFAULT
error to the user, so he will destroy the QP.

Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200602125548.172654-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-03 15:45:01 -03:00
Jason Gunthorpe
4d12c04caa RDMA: Remove 'max_map_per_fmr'
Now that FMR support is gone, this attribute can be deleted from all
places.

Link: https://lore.kernel.org/r/13-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:54 -03:00
Gal Pressman
d29d58e772 RDMA/mlx5: Remove FMR leftovers
Remove a few leftovers from FMR functionality which are no longer used.

Link: https://lore.kernel.org/r/5-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-02 20:32:53 -03:00
David S. Miller
1806c13dc2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.

The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31 17:48:46 -07:00
Saeed Mahameed
971ae1ed03 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
net/mlx5: Add ability to read and write ECE options
  net/mlx5: Add support for RDMA TX FT headers modifying
  net/mlx5: Move iseg access helper routines close to mlx5_core driver
  net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits
  net/mlx5: Add support in forward to namespace
  {IB/net}/mlx5: Simplify don't trap code
  net/mlx5: Replace zero-length array with flexible-array

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-29 14:38:57 -07:00
Mark Zhang
802dcc7fc5 RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode
The mlx5 VF driver doesn't set QP tx port affinity because it doesn't know
if the lag is active or not, since the "lag_active" works only for PF
interfaces. In this case for VF interfaces only one lag is used which
brings performance issue.

Add a lag_tx_port_affinity CAP bit; When it is enabled and
"num_lag_ports > 1", then driver always set QP tx affinity, regardless
of lag state.

Link: https://lore.kernel.org/r/20200527055014.355093-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 15:15:30 -03:00
Leon Romanovsky
50aec2c313 RDMA/mlx5: Return ECE data after modify QP
After users sets the ECE option, FW will return the agreed/supported bits
through an output structures of modify QP stages for regular QPs or
through create QP for the DCT.

Link: https://lore.kernel.org/r/20200526115440.205922-9-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
5f62a521ff RDMA/mlx5: Set ECE options during modify QP
The most common way to set ECE option will be during modify QP command in
INIT2RTR, RTR2RTS and RTS2RTS stages, so update mlx5 to support it.

The new bit in the comp_mask is needed to mark that kernel supports ECE
and can receive data instead of "reserved" field in the struct
mlx5_ib_modify_qp.

Link: https://lore.kernel.org/r/20200526115440.205922-8-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
f18e26af6a RDMA/mlx5: Convert modify QP to use MLX5_SET macros
Instead of hand crafted mlx5_qp_context and mlx5_qp_path use common
MLX5_SET() macros.

Link: https://lore.kernel.org/r/20200526115440.205922-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
70bd7fb876 RDMA/mlx5: Remove manually crafted QP context the query call
As a preparation to removal hand crafted mlx5_qp_context, convert
query_qp_attr() to use proper MLX5_GET() macros.

Link: https://lore.kernel.org/r/20200526115440.205922-6-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
64bae2d455 RDMA/mlx5: Use direct modify QP implementation
As a preparation to removal hand crafted mlx5_qp_context, convert counter
code to use mlx5_cmd_exec_in() directly.

Link: https://lore.kernel.org/r/20200526115440.205922-5-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
e383085c24 RDMA/mlx5: Set ECE options during QP create
Allow users to ask creation of QPs with specific ECE options.  Such early
set even before RDMA-CM connection is established is useful if user knows
exactly which option he needs.

Link: https://lore.kernel.org/r/20200526115440.205922-4-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Leon Romanovsky
3e09a427ae RDMA/mlx5: Get ECE options from FW during create QP
Supported ECE options are returned from FW in the create_qp phase and zero
means that field is not valid. Such default value allows us to reuse
reserved field without worries about comp_mask.

Update create QP API to return ECE options.

Link: https://lore.kernel.org/r/20200526115440.205922-3-leon@kernel.org
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:07:49 -03:00
Jason Gunthorpe
e4fdf7625b Merge branch 'mellanox/mlx5-next' into rdma.git for/next
From the mlx5-next branch at
  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in following patches

* branch 'mellanox/mlx5-next':
  net/mlx5: Add ability to read and write ECE options
  net/mlx5: Add support for RDMA TX FT headers modifying
  net/mlx5: Move iseg access helper routines close to mlx5_core driver
  net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 16:01:17 -03:00
Mark Zhang
d246a30615 IB/mlx5: Fix DEVX support for MLX5_CMD_OP_INIT2INIT_QP command
The commit citied in the Fixes line wasn't complete and solved
only part of the problems. Update the mlx5_ib to properly support
MLX5_CMD_OP_INIT2INIT_QP command in the DEVX, that is required when
modify the QP tx_port_affinity.

Fixes: 819f7427ba ("RDMA/mlx5: Add init2init as a modify command")
Link: https://lore.kernel.org/r/20200527135703.482501-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27 15:53:21 -03:00
Maor Gottlieb
189277f381 RDMA/mlx5: Fix NULL pointer dereference in destroy_prefetch_work
q_deferred_work isn't initialized when creating an explicit ODP memory
region. This can lead to a NULL pointer dereference when user performs
asynchronous prefetch MR. Fix it by initializing q_deferred_work for
explicit ODP.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 4 PID: 6074 Comm: kworker/u16:6 Not tainted 5.7.0-rc1-for-upstream-perf-2020-04-17_07-03-39-64 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib]
  RIP: 0010:__wake_up_common+0x49/0x120
  Code: 04 89 54 24 0c 89 4c 24 08 74 0a 41 f6 01 04 0f 85 8e 00 00 00 48 8b 47 08 48 83 e8 18 4c 8d 67 08 48 8d 50 18 49 39 d4 74 66 <48> 8b 70 18 31 db 4c 8d 7e e8 eb 17 49 8b 47 18 48 8d 50 e8 49 8d
  RSP: 0000:ffffc9000097bd88 EFLAGS: 00010082
  RAX: ffffffffffffffe8 RBX: ffff888454cd9f90 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888454cd9f90
  RBP: ffffc9000097bdd0 R08: 0000000000000000 R09: ffffc9000097bdd0
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff888454cd9f98
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
  FS:  0000000000000000(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000044c19e002 CR4: 0000000000760ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   __wake_up_common_lock+0x7a/0xc0
   destroy_prefetch_work+0x5a/0x60 [mlx5_ib]
   mlx5_ib_prefetch_mr_work+0x64/0x80 [mlx5_ib]
   process_one_work+0x15b/0x360
   worker_thread+0x49/0x3d0
   kthread+0xf5/0x130
   ? rescuer_thread+0x310/0x310
   ? kthread_bind+0x10/0x10
   ret_from_fork+0x1f/0x30

Fixes: de5ed007a0 ("IB/mlx5: Fix implicit ODP race")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200521072504.567406-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:51:50 -03:00
Jason Gunthorpe
0ac8903cbb RDMA/core: Allow the ioctl layer to abort a fully created uobject
While creating a uobject every create reaches a point where the uobject is
fully initialized. For ioctls that go on to copy_to_user this means they
need to open code the destruction of a fully created uobject - ie the
RDMA_REMOVE_DESTROY sort of flow.

Open coding this creates bugs, eg the CQ does not properly flush the
events list when it does its error unwind.

Provide a uverbs_finalize_uobj_create() function which indicates that the
uobject is fully initialized and that abort should call to destroy_hw to
destroy the uobj->object and related.

Methods can call this function if they go on to have error cases after
setting uobj->object. Once done those error cases can simply do return,
without an error unwind.

Link: https://lore.kernel.org/r/20200519072711.257271-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 20:10:46 -03:00
Jason Gunthorpe
eafd47fc20 Linux 5.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7BzV8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg8EH/A2pXMTxtc96RI4S
 sttEsUQqbakFS0Z/2tQPpMGr/qW2e5eHgsTX/a3SiUeZiIXk6f4lMFkMuctzBf7p
 X77cNEDwGOEdbtCXTsMcmKSde7sP2zCXsPB8xTWLyE6rnaFRgikwwkeqgkIKhp1h
 bvOQV0t9HNGvxGAM0iZeOvQAvFl4vd7nS123/MYbir9cugfQUSJRueQ4BiCiJqVE
 6cNA7/vFzDJuFGszzIrJ7HXn/IdQMMWHkvTDjgBw0GZw1mDbGFbfbZwOeTz1ojCt
 smUQ4tIFxBa/VA5zx7dOy2P2keHbSVf4VLkZRPcceT7OqVS65ETmFDp+qt5NdWM5
 vZ8+7/0=
 =CyYH
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc6' into rdma.git for-next

Linux 5.7-rc6

Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
resolved by deleting dr_cq_event, matching how netdev resolved it.

Required for dependencies in the following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21 17:08:27 -03:00
Aharon Landau
819f7427ba RDMA/mlx5: Add init2init as a modify command
Missing INIT2INIT entry in the list of modify commands caused DEVX
applications to be unable to modify_qp for this transition state. Add the
MLX5_CMD_OP_INIT2INIT_QP opcode to the list of allowed DEVX opcodes.

Fixes: e662e14d80 ("IB/mlx5: Add DEVX support for modify and query commands")
Link: https://lore.kernel.org/r/20200513095550.211345-1-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-19 21:02:12 -03:00
Michael Guralnik
ecf814e0e1 net/mlx5: Add support for RDMA TX FT headers modifying
Support adding header modifying actions to the RDMA TX flow table.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-18 09:21:46 -07:00
Shay Drory
daeee97690 RDMA/mlx5: Update mlx5_ib driver name
Current description doesn't include new devices, change it by updating to
have more generic description and remove DRIVER_NAME and DRIVER_VERSION
defines.

Link: https://lore.kernel.org/r/20200513095304.210240-1-leon@kernel.org
Signed-off-by: Shay Drory <shayd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-17 20:40:20 -03:00
Leon Romanovsky
59dde4d19c RDMA/mlx5: Fix query_srq_cmd() function
The output buffer used in mlx5_cmd_exec_inout() was wrongly changed from
pre-allocated srq_out pointer to an input "out" point. That leads to
unpredictable results in the get_srqc() call later.

Fixes: 31578defe4 ("RDMA/mlx5: Update mlx5_ib to use new cmd interface")
Link: https://lore.kernel.org/r/20200513100809.246315-1-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 16:01:50 -03:00
Daria Velikovsky
f29de9eee7 RDMA/mlx5: Add support for drop action in DV steering
When drop action is used the matching packet will stop processing in
steering and will be dropped. This functionality will allow users to drop
matching packets.

Link: https://lore.kernel.org/r/20200504054227.271486-1-leon@kernel.org
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:58:54 -03:00
Maor Gottlieb
8c112a5f29 RDMA/mlx5: Add support in steering default miss
User can configure default miss rule in order to skip matching in the user
domain and forward the packet to the kernel steering domain.  When user
requests a default miss rule, we add steering rule to forward the traffic
to the next namespace.

Link: https://lore.kernel.org/r/20200504053012.270689-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:55:41 -03:00
Maor Gottlieb
b9019507aa RDMA/mlx5: Refactor DV create flow
Move part of the code that get the destinations into function so the code
will be more readable.  In addition change the variables definition to be
in reversed christmas tree.

Link: https://lore.kernel.org/r/20200504053012.270689-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:55:40 -03:00
Jason Gunthorpe
10c2615513 Merge branch 'mellanox/mlx5-next' into rdma.git for/next
From the mlx5-next branch at
  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in following patches

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support in forward to namespace
  {IB/net}/mlx5: Simplify don't trap code
  net/mlx5: Replace zero-length array with flexible-array

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-13 15:54:19 -03:00
Maor Gottlieb
14c129e301 {IB/net}/mlx5: Simplify don't trap code
The fs_core already supports creation of rules with multiple
actions/destinations. Refactor fs_core to handle the case
when don't trap rule is created with destination. Adapt the
calling code in the driver.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-05-13 18:56:18 +03:00
Colin Ian King
52c81f47f0 RDMA/mlx5: Remove duplicated assignment to variable rcqe_sz
The variable rcqe_sz is being unnecessarily assigned twice, fix this by
removing one of the duplicates.

Fixes: 8bde2c509e ("RDMA/mlx5: Update all DRIVER QP places to use QP subtype")
Link: https://lore.kernel.org/r/20200507151610.52636-1-colin.king@canonical.com
Addresses-Coverity: ("Evaluation order violation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-07 21:02:58 -03:00
David S. Miller
3793faad7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 22:10:13 -07:00
Mark Bloch
42caf9cb59 RDMA/mlx5: Allow only raw Ethernet QPs when RoCE isn't enabled
When operating in switchdev mode or using devlink to disable RoCE
only raw Ethernet QPs are allowed to be created.

When in switchdev mode this can lead to passing an invalid port number
as part of the modify qp firmware cmd and will lead to a syndrome
reported back to the user, such as:

 * mlx5_cmd_check:803:(pid 50148): RST2INIT_QP(0x502) op_mod(0x0) failed,
   status bad parameter(0x3), syndrome (0x177405).

Internal UD QP might be used to test for write combining support (even if
externally we report RoCE as disabled) check for that specific flag and
allow is specifically.

Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Link: https://lore.kernel.org/r/20200506071602.7177-3-leon@kernel.org
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:52:01 -03:00
Mark Bloch
8d93efb8c5 RDMA/mlx5: Assign profile before calling stages
Assign the profile to the IB device before executing stages. This will
allow to check which profile is being used from within a stage.

Link: https://lore.kernel.org/r/20200506071602.7177-2-leon@kernel.org
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:52:01 -03:00
Leon Romanovsky
029e88fd1e RDMA/mlx5: Move all WR logic from qp.c to separate file
Split qp.c by removing all WR logic to separate file.

Link: https://lore.kernel.org/r/20200506065513.4668-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:42:45 -03:00
Max Gurtovoy
6671cde83d RDMA/mlx5: Refactor mlx5_post_send() to improve readability
Add small helpers in order to avoid code duplication and improve code
readability. Decrease the amount of code in the gigantic post_send
function and divide it to readable methods that will help in code
maintenance in the future.

Link: https://lore.kernel.org/r/20200506065513.4668-3-leon@kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:42:45 -03:00
Leon Romanovsky
31578defe4 RDMA/mlx5: Update mlx5_ib to use new cmd interface
Reuse newly introduced mlx5_cmd_exec_in() and mlx5_cmd_exec_inout() to
reduce code duplication in mlx5_ib module.

Link: https://lore.kernel.org/r/20200506065513.4668-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 17:42:45 -03:00
Mark Zhang
5ac55dfc6d RDMA/mlx5: Set UDP source port based on the grh.flow_label
Calculate UDP source port based on the grh.flow_label. If grh.flow_label
is not valid, we will use minimal supported UDP source port.

Link: https://lore.kernel.org/r/20200504051935.269708-6-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 16:51:44 -03:00
Mark Zhang
2b880b2e5e RDMA/mlx5: Define RoCEv2 udp source port when set path
Calculate and set UDP source port based on the flow label. If flow label
is not defined in GRH then calculate it based on lqpn/rqpn.

Link: https://lore.kernel.org/r/20200504051935.269708-4-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-06 16:51:44 -03:00
Maor Gottlieb
cfc1a89e44 RDMA/mlx5: Set lag tx affinity according to slave
The patch sets the lag tx affinity of the data QPs and the GSI QPs
according to the LAG xmit slave.

For GSI QPs, in case the link layer is Ethenet (RoCE) we create two GSI
QPs, one for each physical port. When the driver selects the GSI QP, it
will consider the port affinity result.  For connected QPs, the driver
sets the affinity of the xmit slave.

The above, ensures that RC QP and it's corresponding GSI QP will transmit
from the same physical port.

Link: https://lore.kernel.org/r/20200430192146.12863-17-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
5163b2743a RDMA/mlx5: Refactor affinity related code
Move affinity related code in modify qp to function.  It's a preparation
for next patch the extend the affinity calculation to consider the xmit
slave.

Link: https://lore.kernel.org/r/20200430192146.12863-16-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:54 -03:00
Maor Gottlieb
fa5d010c56 RDMA: Group create AH arguments in struct
Following patch adds additional argument to the create AH function, so it
make sense to group ah_attr and flags arguments in struct.

Link: https://lore.kernel.org/r/20200430192146.12863-13-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:53 -03:00
Aharon Landau
0eacc574aa RDMA/mlx5: Verify that QP is created with RQ or SQ
RAW packet QP and underlay QP must be created with either
RQ or SQ, check that.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200427154636.381474-37-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
968f0b6f9c RDMA/mlx5: Consolidate into special function all create QP calls
Finish separation to blocks of mlx5_ib_create_qp() functions,
so all internal create QP implementation are located in one place.

Link: https://lore.kernel.org/r/20200427154636.381474-36-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
6367da46d3 RDMA/mlx5: Remove redundant destroy QP call
After major refactoring in create QP flow, it is no needed to call
to destroy QP in XRC_TGT flow.

Link: https://lore.kernel.org/r/20200427154636.381474-35-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
08d5397660 RDMA/mlx5: Copy response to the user in one place
Update all the places in create QP flows to copy response
to the user in one place.

Link: https://lore.kernel.org/r/20200427154636.381474-34-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:46 -03:00
Leon Romanovsky
6f2cf76e6e RDMA/mlx5: Handle udate outlen checks in one place
Place in one function all udata size checks. This will allow
us move ib_copy_to_udata() in general place and ensure that
it will be performed after call to the FW.

Link: https://lore.kernel.org/r/20200427154636.381474-33-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
5d6fffed1c RDMA/mlx5: Promote RSS RAW QP flags check to higher level
Move check that user didn't supplied RSS RAW QP unsupported
command flags to the function that checks all such flags.

Link: https://lore.kernel.org/r/20200427154636.381474-32-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
f78d358cec RDMA/mlx5: Group all create QP parameters to simplify in-kernel interfaces
The amount of parameters passed in and out between internal mlx5
create QP functions is too large to easily follow the flow. Change
it by grouping all create QP parameter into one structure.

Link: https://lore.kernel.org/r/20200427154636.381474-31-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
747c519cdb RDMA/mlx5: Reduce amount of duplication in QP destroy
Delete both PD argument and checks if udata was provided, in favour
of unified destroy QP functions.

Link: https://lore.kernel.org/r/20200427154636.381474-30-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:45 -03:00
Leon Romanovsky
98fc1126c4 RDMA/mlx5: Separate to user/kernel create QP flows
The kernel and user create QP flows have very little common code,
separate them to simplify the future work of creating per-type
create_*_qp() functions.

Link: https://lore.kernel.org/r/20200427154636.381474-29-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
04bcc1c2d0 RDMA/mlx5: Separate XRC_TGT QP creation from common flow
XRC_TGT QP doesn't fail into kernel or user flow separation. It is
initiated by the user, but is created through in-kernel verbs flow
and doesn't have PD and udata in similar way to kernel QPs.

So let's separate creation of that QP type from the common flow.

Link: https://lore.kernel.org/r/20200427154636.381474-28-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
21aad80b17 RDMA/mlx5: Globally parse DEVX UID
Remove duplication in parsing of DEVX UID.

Link: https://lore.kernel.org/r/20200427154636.381474-27-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
0ce300b15a RDMA/mlx5: Delete impossible inlen check
The inlen is set to be above zero in all flows before
and can't be negative at this stage.

Link: https://lore.kernel.org/r/20200427154636.381474-26-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:44 -03:00
Leon Romanovsky
03c4077b28 RDMA/mlx5: Rely on existence of udata to separate kernel/user flows
Instead of keeping special field to separate kernel/user create/destroy
flows, rely on existence of udata pointer. All allocation flows are
using kzalloc() and leave uninitialized pointers as NULL which makes
MLX5_QP_EMPTY and MLX5_QP_KERNEL flows to be the same.

Link: https://lore.kernel.org/r/20200427154636.381474-25-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
76883a6cc1 RDMA/mlx5: Remove second user copy in create_user_qp
Combine copy_from_user() from create_user_qp() and general code.

Link: https://lore.kernel.org/r/20200427154636.381474-24-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
5ce0592b0e RDMA/mlx5: Combine copy of create QP command in RSS RAW QP
Change the create QP flow to handle all copy_from_user() operations in
one place.

Link: https://lore.kernel.org/r/20200427154636.381474-23-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
266424eba6 RDMA/mlx5: Promote RSS RAW QP attribute check in higher level
Perform check of attributes of RAW PACKET QP in separate function.

Link: https://lore.kernel.org/r/20200427154636.381474-22-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:43 -03:00
Leon Romanovsky
7aede1a25f RDMA/mlx5: Store QP type in the vendor QP structure
QP type is stored in the IB/core QP struct, but it doesn't have all the
needed information, like internal QP type used in the driver itself.
Update mlx5_ib to have cached QP type which includes both IBTA and
Mellanox specific one.

Such change allows us to make even further cleanup of QP creation flow.

Link: https://lore.kernel.org/r/20200427154636.381474-21-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:42 -03:00
Leon Romanovsky
3ae7e66a01 RDMA/mlx5: Delete unsupported QP types
There is no need to explicitly check unsupported QP types,
rely on  "default" keyword in switch-case to catch them.

Link: https://lore.kernel.org/r/20200427154636.381474-20-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-30 18:45:42 -03:00
Jason Gunthorpe
dfb25edd97 Merge branch 'mlx5_ib_qp_refactor_1' into rdma.git for-next
Leon Romanovsky says:

====================
This is first part of series which tries to return some sanity to
mlx5_ib_create_qp() function. Such refactoring is required to make
extension of that function with less worries of breaking driver.

Extra goal of such refactoring is to ensure that QP is allocated at the
beginning of function and released at the end. It will allow us to move QP
allocation to be under IB/core responsibility.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_ib_qp_refactor_1': (66 commits)
  RDMA/mlx5: Process all vendor flags in one place
  RDMA/mlx5: Return all configured create flags through query QP
  RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags
  RDMA/mlx5: Use flags_en mechanism to mark QP created with WQE signature
  RDMA/mlx5: Process create QP flags in one place
  RDMA/mlx5: Delete create QP flags obfuscation
  RDMA/mlx5: Initial separation of RAW_PACKET QP from common flow
  RDMA/mlx5: Remove second copy from user for non RSS RAW QPs
  RDMA/mlx5: Move DRIVER QP flags check into separate function
  RDMA/mlx5: Update all DRIVER QP places to use QP subtype
  RDMA/mlx5: Split scatter CQE configuration for DCT QP
  RDMA/mlx5: Separate create QP flows to be based on type
  RDMA/mlx5: Set QP subtype immediately when it is known
  RDMA/mlx5: Avoid setting redundant NULL for XRC QPs
  RDMA/mlx5: Prepare QP allocation for future removal
  RDMA/mlx5: Perform check if QP creation flow is valid
  RDMA/mlx5: Delete impossible GSI port check
  RDMA/mlx5: Organize QP types checks in one place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 21:44:51 -03:00
Leon Romanovsky
37518fa49f RDMA/mlx5: Process all vendor flags in one place
Check that vendor flags provided through ucmd are valid.

Link: https://lore.kernel.org/r/20200427154636.381474-19-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:24 -03:00
Leon Romanovsky
a8f3ea61e1 RDMA/mlx5: Return all configured create flags through query QP
The "flags" field in struct mlx5_ib_qp contains all UAPI flags
configured at the create QP stage. Return all the data as is
without masking.

Link: https://lore.kernel.org/r/20200427154636.381474-18-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:24 -03:00
Leon Romanovsky
90ecb37a75 RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags
In similar way to wqe_sig, the scat_cqe was treated differently from
other create QP vendor flags. Change it to be similar to other flags
and use flags_en mechanism.

Link: https://lore.kernel.org/r/20200427154636.381474-17-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
c95e6d5397 RDMA/mlx5: Use flags_en mechanism to mark QP created with WQE signature
MLX5_QP_FLAG_SIGNATURE is exposed to the users but in the kernel
the create_qp flow treated it differently from other MLX5_QP_FLAG_*s.
Fix it by ditching wq_sig boolean variable and use general flag_en
mechanism.

Link: https://lore.kernel.org/r/20200427154636.381474-16-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
2978975ce7 RDMA/mlx5: Process create QP flags in one place
create_flags is checked in too many places and scattered across all
the code, consolidate all the checks inside one function, so we will
be easily see the flow. As part of such change, delete unreachable code,
because IB/core is responsible sanitize the input.

Link: https://lore.kernel.org/r/20200427154636.381474-15-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
2be08c308f RDMA/mlx5: Delete create QP flags obfuscation
There is no point in redefinition of stable and exposed to users create
flags. Their values won't be changed and it is equal to used by the
mlx5. Delete the mlx5 definitions and use IB/core fields.

Link: https://lore.kernel.org/r/20200427154636.381474-14-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:23 -03:00
Leon Romanovsky
5d0dc3d96c RDMA/mlx5: Initial separation of RAW_PACKET QP from common flow
Create initial function for IB_QPT_RAW_PACKET flow.

Link: https://lore.kernel.org/r/20200427154636.381474-13-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
2dfac92dbb RDMA/mlx5: Remove second copy from user for non RSS RAW QPs
Change the common code to use already copied user command buffer.

Link: https://lore.kernel.org/r/20200427154636.381474-12-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
2fdddbd5c9 RDMA/mlx5: Move DRIVER QP flags check into separate function
Perform validation of DRIVER QP in relevant function.

Link: https://lore.kernel.org/r/20200427154636.381474-11-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
8bde2c509e RDMA/mlx5: Update all DRIVER QP places to use QP subtype
Instead of overwriting QP init attributes with driver QP subtype,
use that subtype directly. This change will allow us to remove
logic which cached QP init attributes.

Link: https://lore.kernel.org/r/20200427154636.381474-10-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
fd9dab7edc RDMA/mlx5: Split scatter CQE configuration for DCT QP
DCT QPs have separate creation flow and can be easily extracted
from configure_responder_scat_cqe(), this makes both updated
functions more clear.

Link: https://lore.kernel.org/r/20200427154636.381474-9-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:22 -03:00
Leon Romanovsky
47c806121a RDMA/mlx5: Separate create QP flows to be based on type
Move driver QP creation flow to separate functions to simplify
the create_qp() and allow future separation of create_qp_common()
to subtypes.

Link: https://lore.kernel.org/r/20200427154636.381474-8-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
318d2b06fb RDMA/mlx5: Set QP subtype immediately when it is known
There is no need to delay QP subtype assignment to the end of the
create_qp() function and it is better to move it to be immediately
after it is checked so we would be able to rewrite later checks
to be based on it and not on over-written struct ib_qp_init_attr.

Link: https://lore.kernel.org/r/20200427154636.381474-7-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
c86936e6eb RDMA/mlx5: Avoid setting redundant NULL for XRC QPs
There is no need to set NULL in recv_cq and send_cq, they are already
set to NULL by the IB/core logic.

Link: https://lore.kernel.org/r/20200427154636.381474-6-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
9c2ba4ede4 RDMA/mlx5: Prepare QP allocation for future removal
Unify the QP memory allocation across different paths,
so it will be in one place.

Link: https://lore.kernel.org/r/20200427154636.381474-5-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:21 -03:00
Leon Romanovsky
2242cc25ce RDMA/mlx5: Perform check if QP creation flow is valid
Fast check that kernel and user flows provides enough
data to create QP.

Link: https://lore.kernel.org/r/20200427154636.381474-4-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:20 -03:00
Leon Romanovsky
1265d9f7a5 RDMA/mlx5: Delete impossible GSI port check
GSI QP is created in the kernel with very strict parameters,
there is no possible way that port number will be wrong in
such flow.

Link: https://lore.kernel.org/r/20200427154636.381474-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:20 -03:00
Leon Romanovsky
6eb7edffb2 RDMA/mlx5: Organize QP types checks in one place
Perform check if QP type is supported in one place at the beginning of
the create_qp function instead of current implementation with checks
buried inside of the code.

Link: https://lore.kernel.org/r/20200427154636.381474-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-28 20:42:20 -03:00
Raed Salem
244faedfd4 net/mlx5: Refactor imm_inval_pkey field in cqe struct
The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata

Current implementation doesn't reflect the intended usage of the
field at usage time.

Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:15 -07:00
Erez Shitrit
dff8e2d152 net/mlx5: Use aligned variable while allocating ICM memory
The alignment value is part of the input structure, so use it and spare
extra memory allocation when is not needed.
Now, using the new ability when allocating icm for Direct-Rule
insertion.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-28 12:45:10 -07:00
Huy Nguyen
d65dbedfd2 net/mlx5: Add support for COPY steering action
Add COPY type to modify_header action. IPsec feature is the first
feature that needs COPY steering action.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-28 12:44:44 -07:00
Leon Romanovsky
e0b4b4722d net/mlx5: Update transobj.c new cmd interface
Do mass update of transobj.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:16 +03:00
Leon Romanovsky
5d1c9a114a net/mlx5: Update vport.c to new cmd interface
Do mass update of vport.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:02 +03:00
Aharon Landau
2d7e3ff7b6 RDMA/mlx5: Set GRH fields in query QP on RoCE
GRH fields such as sgid_index, hop limit, et. are set in the QP context
when QP is created/modified.

Currently, when query QP is performed, we fill the GRH fields only if the
GRH bit is set in the QP context, but this bit is not set for RoCE. Adjust
the check so we will set all relevant data for the RoCE too.

Since this data is returned to userspace, the below is an ABI regression.

Fixes: d8966fcd4c ("IB/core: Use rdma_ah_attr accessor functions")
Link: https://lore.kernel.org/r/20200413132028.930109-1-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:43:46 -03:00
Leon Romanovsky
333fbaa025 net/mlx5: Move QP logic to mlx5_ib
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:21 +03:00
Leon Romanovsky
42f9bbd112 RDMA/mlx5: Alphabetically sort build artifacts
Sort .o objects in makefile to make addition of new object
less cumbersome.

Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
9c275ee4ad net/mlx5: Delete not-used cmd header
The structures defined in the cmd header are not used and can be safely
removed from the driver. This patch removes that file and deletes all
relevant includes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
bfd745f8f3 RDMA/mlx5: Delete Q counter allocations command
Remove mlx5_ib implementation of Q counter allocation logic
together with cleaning boolean which controlled validity of the
counter. It is not needed, because counter_id == 0 means that
counter is not valid.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
66247fbb28 net/mlx5: Remove Q counter low level helper APIs
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.

This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Yishai Hadas
cf26deff90 RDMA/mlx5: Fix udata response upon SRQ creation
Fix udata response upon SRQ creation to use the UAPI structure (i.e.
mlx5_ib_create_srq_resp). It did not zero the reserved field in userspace.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200406173540.1466477-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14 15:56:34 -03:00
Linus Torvalds
919dce2470 RDMA 5.7 pull request
The majority of the patches are cleanups, refactorings and clarity
 improvements
 
 - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
 
 - Lots of cleanup patches for hns
 
 - Convert more places to use refcount
 
 - Aggressively lock the RDMA CM code that syzkaller says isn't working
 
 - Work to clarify ib_cm
 
 - Use the new ib_device lifecycle model in bnxt_re
 
 - Fix mlx5's MR cache which seems to be failing more often with the new
   ODP code
 
 - mlx5 'dynamic uar' and 'tx steering' user interfaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CSr0ACgkQOG33FX4g
 mxrtKg//XovbOfYAO7nC05FtGz9iEkIUBiwQOjUojgNSi6RMNDqRW1bmqKUugm1o
 9nXA6tw+fueEvUNSD541SCcxkUZJzWvubO9wHB6N3Fgy68N3Vf2rKV3EBBTh99rK
 Cb7rnmTTN6izRyI1wdyP2sjDJGyF8zvsgIrG2sibzLnqlyScrnD98YS0FdPZUfOa
 a1mmXBN/T7eaQ4TbE3lLLzGnifRlGmZ5vxEvOQmAlOdqAfIKQdbbW7oCRLVIleso
 gfQlOOvIgzHRwQ3VrFa3i6ETYtywXq7EgmQxCjqPVJQjCA79n5TLBkP1iRhvn8xi
 3+LO4YCkiSJ/NjTA2d9KwT6K4djj3cYfcueuqo2MoXXr0YLiY6TLv1OffKcUIq7c
 LM3d4CSwIAG+C2FZwaQrdSGa2h/CNfLAEeKxv430zggeDNKlwHJPV5w3rUJ8lT56
 wlyT7Lzosl0O9Z/1BSLYckTvbBCtYcmanVyCfHG8EJKAM1/tXy5LS8btJ3e51rPu
 XekR9ELrTTA2CTuoSCQGP6J0dBD2U7qO4XRCQ9N5BYLrI6RdP7Z4xYzzey49Z3Cx
 JaF86eurM7nS5biUszTtwww8AJMyYicB+0VyjBfk+mhv90w8tS1vZ1aZKzaQ1L6Z
 jWn8WgIN4rWY0YGQs6PiovT1FplyGs3p1wNmjn92WO0wZZ3WsmQ=
 =ae+a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "The majority of the patches are cleanups, refactorings and clarity
  improvements.

  This cycle saw some more activity from Syzkaller, I think we are now
  clean on all but one of those bugs, including the long standing and
  obnoxious rdma_cm locking design defect. Continue to see many drivers
  getting cleanups, with a few new user visible features.

  Summary:

   - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1

   - Lots of cleanup patches for hns

   - Convert more places to use refcount

   - Aggressively lock the RDMA CM code that syzkaller says isn't
     working

   - Work to clarify ib_cm

   - Use the new ib_device lifecycle model in bnxt_re

   - Fix mlx5's MR cache which seems to be failing more often with the
     new ODP code

   - mlx5 'dynamic uar' and 'tx steering' user interfaces"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
  RDMA/bnxt_re: make bnxt_re_ib_init static
  IB/qib: Delete struct qib_ivdev.qp_rnd
  RDMA/hns: Fix uninitialized variable bug
  RDMA/hns: Modify the mask of QP number for CQE of hip08
  RDMA/hns: Reduce the maximum number of extend SGE per WQE
  RDMA/hns: Reduce PFC frames in congestion scenarios
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
  IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
  IB/hfi1: Fix memory leaks in sysfs registration and unregistration
  IB/mlx5: Move to fully dynamic UAR mode once user space supports it
  IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
  IB/mlx5: Extend QP creation to get uar page index from user space
  IB/mlx5: Extend CQ creation to get uar page index from user space
  IB/mlx5: Expose UAR object and its alloc/destroy commands
  IB/hfi1: Get rid of a warning
  RDMA/hns: Remove redundant judgment of qp_type
  RDMA/hns: Remove redundant assignment of wc->smac when polling cq
  RDMA/hns: Remove redundant qpc setup operations
  RDMA/hns: Remove meaningless prints
  ...
2020-04-01 18:18:18 -07:00
Saeed Mahameed
e999a7343d Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:11 -07:00
David S. Miller
f0b5989745 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment conflict in mac80211.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:25:29 -07:00
Jason Gunthorpe
dbdf8909d0 Merge branch 'mlx5_tx_steering' into rdma.git for-next
Leon Romanovsky says:

====================
Those two patches from Michael extends mlx5_core and mlx5_ib flow steering
to support RDMA TX in similar way to already supported RDMA RX.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_tx_steering':
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
2020-03-27 13:26:59 -03:00
Michael Guralnik
af9c38411d RDMA/mlx5: Add support for RDMA TX flow table
Enable user application to add rules for RDMA TX steering table.
Rules in this steering table will allow to steer transmitted RDMA
traffic.

Link: https://lore.kernel.org/r/20200324061425.1570190-3-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 13:24:48 -03:00
Yishai Hadas
0a2fd01c28 IB/mlx5: Move to fully dynamic UAR mode once user space supports it
Move to fully dynamic UAR mode once user space supports it.  In this case
we prevent any legacy mode of UARs on the allocated context and prevent
redundant allocation of the static ones.

Link: https://lore.kernel.org/r/20200324060143.1569116-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:05 -03:00
Leon Romanovsky
2152862298 IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
struct mlx5_bfreg_info is used by mlx5_ib only but is exposed to both RDMA
and netdev parts of mlx5 driver. Move that struct to mlx5_ib namespace,
clean vertical space alignment and convert lib_uar_4k from bool to
bitfield.

Link: https://lore.kernel.org/r/20200324060143.1569116-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Yishai Hadas
ac42a5ee92 IB/mlx5: Extend QP creation to get uar page index from user space
Extend QP creation to get uar page index from user space, this mode can be
used with the UAR dynamic mode APIs to allocate/destroy a UAR object.

As part of enabling this option blocked the weird/un-supported cross
channel option which uses index 0 hard-coded.

This QP flag wasn't exposed to user space as part of any formal upstream
release, the dynamic option can allow having valid UAR page index instead.

Link: https://lore.kernel.org/r/20200324060143.1569116-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Yishai Hadas
64d99f6a62 IB/mlx5: Extend CQ creation to get uar page index from user space
Extend CQ creation to get uar page index from user space, this mode can be
used with the UAR dynamic mode APIs to allocate/destroy a UAR object.

Link: https://lore.kernel.org/r/20200324060143.1569116-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Yishai Hadas
342ee59de9 IB/mlx5: Expose UAR object and its alloc/destroy commands
Expose UAR object and its alloc/destroy commands to be used over the ioctl
interface by user space applications.

This API supports both BF & NC modes and enables a dynamic allocation of
UARs once really needed.

As the number of driver objects were limited by the core ones when the
merged tree is prepared, had to decrease the number of core objects to
enable the new UAR object usage.

Link: https://lore.kernel.org/r/20200324060143.1569116-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 12:59:04 -03:00
Maor Gottlieb
ba80013fba RDMA/mlx5: Block delay drop to unprivileged users
It has been discovered that this feature can globally block the RX port,
so it should be allowed for highly privileged users only.

Fixes: 03404e8ae652("IB/mlx5: Add support to dropless RQ")
Link: https://lore.kernel.org/r/20200322124906.1173790-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-25 09:56:30 -03:00
Yishai Hadas
1f3db16188 IB/mlx5: Generally use the WC auto detection test result
Now that we have direct and reliable detection of WC support by the
system, use is broadly. The only case we have to worry about is when the
WC autodetector cannot run.

For this fringe case generally assume that that WC is available, except in
the well defined case of no PAT support on x86 which is tested by calling
arch_can_pci_mmap_wc().

If WC is wrongly assumed to be available then it causes a small
performance hit on paths in userspace that are tuned to the assumption
that WC is available. There is no functional loss.

It is very unlikely that any platforms exist that lack WC and also care
about the micro optimization of WC in the fringe case where autodetection
does not work.

By removing the fairly bogus CONFIG tests this makes WC work broadly on
all arches and all platforms.

Link: https://lore.kernel.org/r/20200318100323.46659-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 20:22:21 -03:00
Leon Romanovsky
950bf4f177 RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
The main difference between send and receive SW completions is related to
separate treatment of WQ queue. For receive completions, the initial index
to be flushed is stored in "tail", while for send completions, it is in
deleted "last_poll".

  CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
  Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
  NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
  REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
  MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
  CFAR: c00800000e586af0 IRQMASK: 0
  GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
  GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
  GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
  GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
  GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
  GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
  NIP [c000003c7c00a000] 0xc000003c7c00a000
  LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
  Call Trace:
  [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
  [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
  [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
  [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
  [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
  [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80

Fixes: 8e3b688301 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 19:54:57 -03:00
Dan Carpenter
a766fa8473 IB/mlx5: Fix a NULL vs IS_ERR() check
The kzalloc() function returns NULL, not error pointers.

Fixes: 30f2fe40c7 ("IB/mlx5: Introduce UAPIs to manage packet pacing")
Link: https://lore.kernel.org/r/20200320132641.GF95012@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-24 19:47:55 -03:00
Jason Gunthorpe
d613bd64c6 Merge branch 'mlx5_mr_cache' into rdma.git for-next
Leon Romanovsky says:

====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_mr-cache':
  RDMA/mlx5: Allow MRs to be created in the cache synchronously
  RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
  RDMA/mlx5: Fix locking in MR cache work queue
  RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
  RDMA/mlx5: Fix MR cache size and limit debugfs
  RDMA/mlx5: Always remove MRs from the cache before destroying them
  RDMA/mlx5: Simplify how the MR cache bucket is located
  RDMA/mlx5: Rename the tracking variables for the MR cache
  RDMA/mlx5: Replace spinlock protected write with atomic var
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation
2020-03-13 11:11:07 -03:00
Jason Gunthorpe
aad719dcf3 RDMA/mlx5: Allow MRs to be created in the cache synchronously
If the cache is completely out of MRs, and we are running in cache mode,
then directly, and synchronously, create an MR that is compatible with the
cache bucket using a sleeping mailbox command. This ensures that the
thread that is waiting for the MR absolutely will get one.

When a MR allocated in this way becomes freed then it is compatible with
the cache bucket and will be recycled back into it.

Deletes the very buggy ent->compl scheme to create a synchronous MR
allocation.

Link: https://lore.kernel.org/r/20200310082238.239865-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:02 -03:00
Jason Gunthorpe
1c78a21a0c RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
Currently if the work queue is running then it is in 'hysteresis' mode and
will fill until the cache reaches the high water mark. This implicit state
is very tricky and doesn't interact with pending very well.

Instead of self re-scheduling the work queue after the add_keys() has
started to create the new MR, have the queue scheduled from
reg_mr_callback() only after the requested MR has been added.

This avoids the bad design of an in-rush of queue'd work doing back to
back add_keys() until EAGAIN then sleeping. The add_keys() will be paced
one at a time as they complete, slowly filling up the cache.

Also, fix pending to be only manipulated under lock.

Link: https://lore.kernel.org/r/20200310082238.239865-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:02 -03:00
Jason Gunthorpe
b9358bdbc7 RDMA/mlx5: Fix locking in MR cache work queue
All of the members of mlx5_cache_ent must be accessed while holding the
spinlock, add the missing spinlock in the __cache_work_func().

Using cache->stopped and flush_workqueue() is an inherently racy way to
shutdown self-scheduling work on a queue. Replace it with ent->disabled
under lock, and always check disabled before queuing any new work. Use
cancel_work_sync() to shutdown the queue.

Use READ_ONCE/WRITE_ONCE for dev->last_add to manage concurrency as
coherency is less important here.

Split fill_delay from the bitfield. C bitfield updates are not atomic and
this is just a mess. Use READ_ONCE/WRITE_ONCE, but this could also use
test_bit()/set_bit().

Link: https://lore.kernel.org/r/20200310082238.239865-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:02 -03:00
Jason Gunthorpe
ad2d3ef46d RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
Accesses to these members needs to be locked. There is no reason not to
hold a spinlock while calling queue_work(), so move the tests into a
helper and always call it under lock.

The helper should be called when available_mrs is adjusted.

Link: https://lore.kernel.org/r/20200310082238.239865-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
a1d8854aae RDMA/mlx5: Fix MR cache size and limit debugfs
The size_write function is supposed to adjust the total_mr's to match the
user's request, but lacks locking and safety checking.

total_mrs can only be adjusted by at most available_mrs. mrs already
assigned to users cannot be revoked. Ensure that the user provides a
target value within the range of available_mrs and within the high/low
water mark.

limit_write has confusing and wrong sanity checking, and doesn't have the
ability to deallocate on limit reduction.

Since both functions use the same algorithm to adjust the available_mrs,
consolidate it into one function and write it correctly. Fix the locking
and by holding the spinlock for all accesses to ent->X.

Always fail if the user provides a malformed string.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200310082238.239865-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
1769c4c575 RDMA/mlx5: Always remove MRs from the cache before destroying them
The cache bucket tracks the total number of MRs that exists, both inside
and outside of the cache. Removing a MR from the cache (by setting
cache_ent to NULL) without updating total_mrs will cause the tracking to
leak and be inflated.

Further fix the rereg_mr path to always destroy the MR. reg_create will
always overwrite all the MR data in mlx5_ib_mr, so the MR must be
completely destroyed, in all cases, before this function can be
called. Detach the MR from the cache and unconditionally destroy it to
avoid leaking HW mkeys.

Fixes: afd1417404 ("IB/mlx5: Use direct mkey destroy command upon UMR unreg failure")
Fixes: 56e11d628c ("IB/mlx5: Added support for re-registration of MRs")
Link: https://lore.kernel.org/r/20200310082238.239865-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
b91e1751fb RDMA/mlx5: Simplify how the MR cache bucket is located
There are many bad APIs here that are accepting a cache bucket index
instead of a bucket pointer. Many of the callers already have a bucket
pointer, so this results in a lot of confusing uses of order2idx().

Pass the struct mlx5_cache_ent into add_keys(), remove_keys(), and
alloc_cached_mr().

Once the MR is in the cache, store the cache bucket pointer directly in
the MR, replacing the 'bool allocated_from cache'.

In the end there is only one place that needs to form index from order,
alloc_mr_from_cache(). Increase the safety of this function by disallowing
it from accessing cache entries in the ODP special area.

Link: https://lore.kernel.org/r/20200310082238.239865-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Jason Gunthorpe
7c8691a396 RDMA/mlx5: Rename the tracking variables for the MR cache
The old names do not clearly indicate the intent.

Link: https://lore.kernel.org/r/20200310082238.239865-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:01 -03:00
Saeed Mahameed
f743ff3b37 RDMA/mlx5: Replace spinlock protected write with atomic var
mkey variant calculation was spinlock protected to make it atomic, replace
that with one atomic variable.

Link: https://lore.kernel.org/r/20200310082238.239865-4-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 11:08:00 -03:00
Michael Guralnik
a3cfdd3928 {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:10 +02:00
Saeed Mahameed
fc6a9f86f0 {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:04 +02:00
Saeed Mahameed
54c62e13ad {IB,net}/mlx5: Setup mkey variant before mr create command invocation
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.

To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:00 +02:00
Leon Romanovsky
a762d460a0 RDMA/mlx5: Use offsetofend() instead of duplicated variant
Convert mlx5 driver to use offsetofend() instead of its duplicated
variant.

Link: https://lore.kernel.org/r/20200310091438.248429-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-13 10:45:12 -03:00
David S. Miller
1d34357931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 22:34:48 -07:00
David S. Miller
bf3347c4d1 Merge branch 'ct-offload' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux 2020-03-12 12:34:23 -07:00
Alex Vesker
41e684ef3f IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.

Newer devices and firmware will have configurations with the flexparser
but without mpls support.

Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.

Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.

Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes: e818e255a5 ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:41:35 -03:00
Erez Shitrit
0897f301bc RDMA/mlx5: Remove duplicate definitions of SW_ICM macros
Those macros are already defined in include/linux/mlx5/driver.h, so delete
their duplicate variants.

Link: https://lore.kernel.org/r/20200310075706.238592-1-leon@kernel.org
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:39:09 -03:00
Mark Zhang
ec16b6bbda RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
When we read the global counter and there's any dynamic counter allocated,
the value of a hwcounter is the sum of the default counter and all dynamic
counters. So the number of hwcounters of a dynamically allocated counter
must be same as of the default counter, otherwise there will be read
violations.

This fixes the KASAN slab-out-of-bounds bug:

  BUG: KASAN: slab-out-of-bounds in rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
  Read of size 8 at addr ffff8884192a5778 by task rdma/10138

  CPU: 7 PID: 10138 Comm: rdma Not tainted 5.5.0-for-upstream-dbg-2020-02-06_18-30-19-27 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0xb7/0x10b
   print_address_description.constprop.4+0x1e2/0x400
   ? rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   __kasan_report+0x15c/0x1e0
   ? mlx5_ib_query_q_counters+0x13f/0x270 [mlx5_ib]
   ? rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   kasan_report+0xe/0x20
   rdma_counter_get_hwstat_value+0x36d/0x390 [ib_core]
   ? rdma_counter_query_stats+0xd0/0xd0 [ib_core]
   ? memcpy+0x34/0x50
   ? nla_put+0xe2/0x170
   nldev_stat_get_doit+0x9c7/0x14f0 [ib_core]
   ...
   do_syscall_64+0x95/0x490
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7fcc457fe65a
  Code: bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 fa f1 2b 00 45 89 c9 4c 63 d1 48 63 ff 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 f3 c3 0f 1f 40 00 41 55 41 54 4d 89 c5 55
  RSP: 002b:00007ffc0586f868 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcc457fe65a
  RDX: 0000000000000020 RSI: 00000000013db920 RDI: 0000000000000003
  RBP: 00007ffc0586fa90 R08: 00007fcc45ac10e0 R09: 000000000000000c
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004089c0
  R13: 0000000000000000 R14: 00007ffc0586fab0 R15: 00000000013dc9a0

  Allocated by task 9700:
   save_stack+0x19/0x80
   __kasan_kmalloc.constprop.7+0xa0/0xd0
   mlx5_ib_counter_alloc_stats+0xd1/0x1d0 [mlx5_ib]
   rdma_counter_alloc+0x16d/0x3f0 [ib_core]
   rdma_counter_bind_qpn_alloc+0x216/0x4e0 [ib_core]
   nldev_stat_set_doit+0x8c2/0xb10 [ib_core]
   rdma_nl_rcv_msg+0x3d2/0x730 [ib_core]
   rdma_nl_rcv+0x2a8/0x400 [ib_core]
   netlink_unicast+0x448/0x620
   netlink_sendmsg+0x731/0xd10
   sock_sendmsg+0xb1/0xf0
   __sys_sendto+0x25d/0x2c0
   __x64_sys_sendto+0xdd/0x1b0
   do_syscall_64+0x95/0x490
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 18d422ce8c ("IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support")
Link: https://lore.kernel.org/r/20200305124052.196688-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 14:36:47 -03:00
Jason Gunthorpe
6f00a54c2c Linux 5.6-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5lkYceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGpHQH/RJrzcaZHo4lw88m
 Jf7vBZ9DYUlRgqE0pxTHWmodNObKRqpwOUGflUcWbb/7GD2LQUfeqhSECVQyTID9
 N9y7FcPvx321Qhc3EkZ24DBYk0+DQ0K2FVUrSa/PxO0n7czxxXWaLRDmlSULEd3R
 D4pVs3zEWOBXJHUAvUQ5R+lKfkeWKNeeepeh+rezuhpdWFBRNz4Jjr5QUJ8od5xI
 sIwobYmESJqTRVBHqW8g2T2/yIsFJ78GCXs8DZLe1wxh40UbxdYDTA0NDDTHKzK6
 lxzBgcmKzuge+1OVmzxLouNWMnPcjFlVgXWVerpSy3/SIFFkzzUWeMbqm6hKuhOn
 wAlcIgI=
 =VQUc
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc5' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 12:49:09 -03:00
Jason Gunthorpe
3e3cf2e82c Merge branch 'mlx5_packet_pacing' into rdma.git for-next
Yishai Hadas Says:

====================
Expose raw packet pacing APIs to be used by DEVX based applications.  The
existing code was refactored to have a single flow with the new raw APIs.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_packet_pacing':
  IB/mlx5: Introduce UAPIs to manage packet pacing
  net/mlx5: Expose raw packet pacing APIs
2020-03-10 11:54:17 -03:00
Yishai Hadas
30f2fe40c7 IB/mlx5: Introduce UAPIs to manage packet pacing
Introduce packet pacing uobject and its alloc and destroy
methods.

This uobject holds mlx5 packet pacing context according to the device
specification and enables managing packet pacing device entries that are
needed by DEVX applications.

Link: https://lore.kernel.org/r/20200219190518.200912-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-10 11:53:52 -03:00
Parav Pandit
79db784e79 IB/mlx5: Fix missing congestion control debugfs on rep rdma device
Cited commit missed to include low level congestion control related
debugfs stage initialization.  This resulted in missing debugfs entries
for cc_params of a RDMA device.

Add them back.

Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Link: https://lore.kernel.org/r/20200227125407.99803-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:20:14 -04:00
Parav Pandit
9e3aaf6883 IB/mlx5: Add np_min_time_between_cnps and rp_max_rate debug params
Add two debugfs parameters described below.

np_min_time_between_cnps - Minimum time between sending CNPs from the
                           port.
                           Unit = microseconds.
                           Default = 0 (no min wait time; generated
                           based on incoming ECN marked packets).

rp_max_rate - Maximum rate at which reaction point node can transmit.
              Once this limit is reached, RP is no longer rate limited.
              Unit = Mbits/sec
              Default = 0 (full speed)

Link: https://lore.kernel.org/r/20200227125246.99472-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 14:20:03 -04:00
Artemy Kovalyov
de5ed007a0 IB/mlx5: Fix implicit ODP race
Following race may occur because of the call_srcu and the placement of
the synchronize_srcu vs the xa_erase.

CPU0				   CPU1

mlx5_ib_free_implicit_mr:	   destroy_unused_implicit_child_mr:
 xa_erase(odp_mkeys)
 synchronize_srcu()
				    xa_lock(implicit_children)
				    if (still in xarray)
				       atomic_inc()
				       call_srcu()
				    xa_unlock(implicit_children)
 xa_erase(implicit_children):
   xa_lock(implicit_children)
   __xa_erase()
   xa_unlock(implicit_children)

 flush_workqueue()
				   [..]
				    free_implicit_child_mr_rcu:
				     (via call_srcu)
				      queue_work()

 WARN_ON(atomic_read())
				   [..]
				    free_implicit_child_mr_work:
				     (via wq)
				      free_implicit_child_mr()
 mlx5_mr_cache_invalidate()
				     mlx5_ib_update_xlt() <-- UMR QP fail
				     atomic_dec()

The wait_event() solves the race because it blocks until
free_implicit_child_mr_work() completes.

Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:25:00 -04:00
Alexander Lobakin
91b74bf531 IB/mlx5: Optimize u64 division on 32-bit arches
Commit f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR
capabilities") introduced a straight "/" division of the u64 variable
"bar_size".

This was fixed with commit 685eff5131 ("IB/mlx5: Use div64_u64 for
num_var_hw_entries calculation"). However, div64_u64() is redundant here
as mlx5_var_table::stride_size is of type u32.  Make the actual code way
more optimized on 32-bit kernels using div_u64() and fix 80 chars
break-through by the way.

Fixes: 685eff5131 ("IB/mlx5: Use div64_u64 for num_var_hw_entries calculation")
Link: https://lore.kernel.org/r/20200217073629.8051-1-alobakin@dlink.ru
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:11:49 -04:00
Jason Gunthorpe
c13cac2a21 Linux 5.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5cOX8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw0AH/0nex1uEpzUTm+Gw
 D8QPFr3y61sYLu7sIMVt39+Zl6OSxvOOX14QIM/mrNrzjjRI8EXGvYgES5gSO4on
 6NLS6/64c1oQThDHCsxusKoSWLZ9KqP2vRPt7tZjn7DZMzEsuLhlINKBeupcqALX
 FnOBr768P+if/j0WcDR2pBaMg3ch+XC5sfYav7kapjgWUqCx9BvrHKLXXdlEGUC0
 7Ku7PH+nF7CIHiTay+i89odvOd8aLGsa/SUf5XGauKkH65VgQkmksgPeZUPqTnyC
 MEsyLJLfn4AP3ySwqzfSLac8jqZG8FGBt4DgM2MQBHibctzfeMIznfcfh/A8+Edx
 jqLKLAs=
 =4075
 -----END PGP SIGNATURE-----

Merge tag 'v5.6-rc4' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 13:11:06 -04:00
Kamal Heib
bb8865f435 RDMA/providers: Fix return value when QP type isn't supported
The proper return code is "-EOPNOTSUPP" when the requested QP type is
not supported by the provider.

Link: https://lore.kernel.org/r/20200130082049.463-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04 12:13:42 -04:00
Michael Guralnik
5e29d1443c RDMA/mlx5: Prevent UMR usage with RO only when we have RO caps
Relaxed ordering is not supported in UMR so we are disabling UMR usage
when user passes relaxed ordering access flag.

Enable using UMR when user requested relaxed ordering but there are no
relaxed ordering capabilities.

This will prevent user from unnecessarily registering a new mkey.

Fixes: d6de0bb185 ("RDMA/mlx5: Set relaxed ordering when requested")
Link: https://lore.kernel.org/r/20200227113834.94233-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02 11:10:38 -04:00
Paul Blakey
0f0d3827c0 net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
Multi chain support requires the miss path to continue the processing
from the last chain id, and for that we need to save the chain
miss tag (a mapping for 32bit chain id) on reg_c0 which will
come in a next patch.

Currently reg_c0 is exclusively used to store the source port
metadata, giving it 32bit, it is created from 16bits of vcha_id,
and 16bits of vport number.

We will move this source port metadata to upper 16bits, and leave the
lower bits for the chain miss tag. We compress the reg_c0 source port
metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
the vport number.

Since we compress the vport number to 8bits statically, and leave two
top ids for special PF/ECPF numbers, we will only support a max of 254
vports with this strategy.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-19 17:49:48 -08:00
Jason Gunthorpe
685eff5131 IB/mlx5: Use div64_u64 for num_var_hw_entries calculation
On i386:

ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!
ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!

Fixes: f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR capabilities")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reported-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-14 15:21:52 -04:00
Leon Romanovsky
9b6d3bbc13 RDMA/mlx5: Prevent overflow in mmap offset calculations
The cmd and index variables declared as u16 and the result is supposed to
be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) <<
16" to be u64 and leaves the end result to be u16.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 10:39:23 -04:00
Yishai Hadas
a8af8694a5 RDMA/mlx5: Fix async events cleanup flows
As in the prior patch, the devx code is not fully cleaning up its
event_lists before finishing driver_destroy allowing a later read to
trigger user after free conditions.

Re-arrange things so that the event_list is always empty after destroy and
ensure it remains empty until the file is closed.

Fixes: f7c8416cce ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13 09:48:17 -04:00
Mark Zhang
10189e8e6f IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported
When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.

This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.

Fixes: d14133dd41 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11 14:23:01 -04:00
Linus Torvalds
8fdd4019bc RDMA subsystem updates for 5.6
- Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe,
   i40iw
 
 - Larger series doing cleanup and rework for hns and hfi1.
 
 - Some general reworking of the CM code to make it a little more
   understandable
 
 - Unify the different code paths connected to the uverbs FD scheme
 
 - New UAPI ioctls conversions for get context and get async fd
 
 - Trace points for CQ and CM portions of the RDMA stack
 
 - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs
 
 - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic
   connected to a MR
 
 - A couple of bug fixes that came too late to make rc7
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl4zPwQACgkQOG33FX4g
 mxoURw//fuQmuJ7aTMH+0qrhaZUmzXOcI/WKvY0YMyYLvxolRcIO+uCL239wxezR
 9iTHPO7HeYXUQ4W8Hi/fTyuQ9hzaPOP3wgOJfQhm4QT/XDpRW0H3Mb+hTLHTUAcA
 rgKc9suAn+5BbIDOz7hEfeOTssx1wYrLsaHDc11NZ42JuG6uvPR33lhXiKWG+5tH
 2MpfeTU6BjL035dm3YZXCo+ouobpdMuvzJItYIsB2E5Nl0s91SMzsymIYiD0gb3t
 yUJ3wqPW3pchfAl8VEn+W5AHTUYYgGjmEblL8WdVq5JRrkQgQzj8QtCRT9NOPAT0
 LivCvgBrm0kscaQS2TjtG56Ojbwz8z1QPE/4shf0pj/G2lZfacYDAeaUc/2VafxY
 y/KG+3dB1DxtYY3eXJUxbB7Vpk7kfr35p5b75NdMhd2t49oPgV7EKoZMLYGzfX4S
 PtyNyNSiwx8qsRTr4lznOMswmrDLfG4XiywWgYo6NGOWyKYlARWIYBAEQZ0DPTiE
 9mqJ19gusdSdAgm8LGDInPmH6/AojGOVzYonJFWdlOtwCXGNXL4Gx02x4WYHykDG
 w+oy5NMJbU3b6+MWEagkuQNcrwqv02MT1mB/Lgv4GPm6rS0UXR7zUPDeccE50fSL
 X36k28UlftlPlaD7PeJdTOAhyBv5DxfpL5rbB2TfpUTpNxjayuU=
 =hepK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A very quiet cycle with few notable changes. Mostly the usual list of
  one or two patches to drivers changing something that isn't quite rc
  worthy. The subsystem seems to be seeing a larger number of rework and
  cleanup style patches right now, I feel that several vendors are
  prepping their drivers for new silicon.

  Summary:

   - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
     rxe, i40iw

   - Larger series doing cleanup and rework for hns and hfi1.

   - Some general reworking of the CM code to make it a little more
     understandable

   - Unify the different code paths connected to the uverbs FD scheme

   - New UAPI ioctls conversions for get context and get async fd

   - Trace points for CQ and CM portions of the RDMA stack

   - mlx5 driver support for virtio-net formatted rings as RDMA raw
     ethernet QPs

   - verbs support for setting the PCI-E relaxed ordering bit on DMA
     traffic connected to a MR

   - A couple of bug fixes that came too late to make rc7"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
  RDMA/core: Make the entire API tree static
  RDMA/efa: Mask access flags with the correct optional range
  RDMA/cma: Fix unbalanced cm_id reference count during address resolve
  RDMA/umem: Fix ib_umem_find_best_pgsz()
  IB/mlx4: Fix leak in id_map_find_del
  IB/opa_vnic: Spelling correction of 'erorr' to 'error'
  IB/hfi1: Fix logical condition in msix_request_irq
  RDMA/cm: Remove CM message structs
  RDMA/cm: Use IBA functions for complex structure members
  RDMA/cm: Use IBA functions for simple structure members
  RDMA/cm: Use IBA functions for swapping get/set acessors
  RDMA/cm: Use IBA functions for simple get/set acessors
  RDMA/cm: Add SET/GET implementations to hide IBA wire format
  RDMA/cm: Add accessors for CM_REQ transport_type
  IB/mlx5: Return the administrative GUID if exists
  RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
  IB/mlx4: Fix memory leak in add_gid error flow
  IB/mlx5: Expose RoCE accelerator counters
  RDMA/mlx5: Set relaxed ordering when requested
  RDMA/core: Add the core support field to METHOD_GET_CONTEXT
  ...
2020-01-31 14:40:36 -08:00
Jason Gunthorpe
8889f6fa35 RDMA/core: Make the entire API tree static
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates
the following error.

on x86_64:

 ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC':
 main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class'
 ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler'

This is happening because some parts of the UAPI description are not
static. This is a hold over from earlier code that relied on struct
pointers to refer to object types, now object types are referenced by
number. Remove the unused globals and add statics to the remaining UAPI
description elements.

Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete
mlx5_ib_get_devx_tree().

The compiler now trims alot more unused code, including the above
problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS.

Fixes: 7be76bef32 ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-30 16:28:52 -04:00
Linus Torvalds
bd2463ac7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Add WireGuard

 2) Add HE and TWT support to ath11k driver, from John Crispin.

 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.

 4) Add variable window congestion control to TIPC, from Jon Maloy.

 5) Add BCM84881 PHY driver, from Russell King.

 6) Start adding netlink support for ethtool operations, from Michal
    Kubecek.

 7) Add XDP drop and TX action support to ena driver, from Sameeh
    Jubran.

 8) Add new ipv4 route notifications so that mlxsw driver does not have
    to handle identical routes itself. From Ido Schimmel.

 9) Add BPF dynamic program extensions, from Alexei Starovoitov.

10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.

11) Add support for macsec HW offloading, from Antoine Tenart.

12) Add initial support for MPTCP protocol, from Christoph Paasch,
    Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.

13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
    Cherian, and others.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
  net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
  udp: segment looped gso packets correctly
  netem: change mailing list
  qed: FW 8.42.2.0 debug features
  qed: rt init valid initialization changed
  qed: Debug feature: ilt and mdump
  qed: FW 8.42.2.0 Add fw overlay feature
  qed: FW 8.42.2.0 HSI changes
  qed: FW 8.42.2.0 iscsi/fcoe changes
  qed: Add abstraction for different hsi values per chip
  qed: FW 8.42.2.0 Additional ll2 type
  qed: Use dmae to write to widebus registers in fw_funcs
  qed: FW 8.42.2.0 Parser offsets modified
  qed: FW 8.42.2.0 Queue Manager changes
  qed: FW 8.42.2.0 Expose new registers and change windows
  qed: FW 8.42.2.0 Internal ram offsets modifications
  MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
  Documentation: net: octeontx2: Add RVU HW and drivers overview
  octeontx2-pf: ethtool RSS config support
  octeontx2-pf: Add basic ethtool support
  ...
2020-01-28 16:02:33 -08:00
Danit Goldberg
4bbd4923d1 IB/mlx5: Return the administrative GUID if exists
A user can change the operational GUID (a.k.a affective GUID) through
link/infiniband. Therefore it is preferred to return the currently set
GUID if it exists instead of the operational.

This way the PF can query which VF GUID will be set in the next bind.  In
order to align with MAC address, zero is returned if administrative GUID
is not set.

For example, before setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off

Then:

 $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00

After setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off

Fixes: 9c0015ef09 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25 14:54:39 -04:00
Jason Gunthorpe
e8b3a426fb Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' into rdma.git for-next

From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

* tag 'rds-odp-for-5.5':
  net/rds: Use prefetch for On-Demand-Paging MR
  net/rds: Handle ODP mr registration/unregistration
  net/rds: Detect need of On-Demand-Paging memory registration
  RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
  IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
  RDMA/mlx5: Don't fake udata for kernel path
  IB/mlx5: Add ODP WQE handlers for kernel QPs
  IB/core: Add interface to advise_mr for kernel users
  IB/core: Introduce ib_reg_user_mr
  IB: Allow calls to ib_umem_get from kernel ULPs

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21 09:55:04 -04:00
David S. Miller
ad063075d4 Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
 user MRs through kernel ULPs as a proxy. The immediate use case is to
 allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
 MRs to be created and such MRs were not possible to create prior this
 series.
 
 The first part of this patchset extends RDMA to have special verb
 ib_reg_user_mr(). The common use case that uses this function is a
 userspace application that allocates memory for HCA access but the
 responsibility to register the memory at the HCA is on an kernel ULP.
 This ULP acts as an agent for the userspace application.
 
 The second part provides advise MR functionality for ULPs. This is
 integral part of ODP flows and used to trigger pagefaults in advance
 to prepare memory before running working set.
 
 The third part is actual user of those in-kernel APIs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
 scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
 +UcZb31auy5P8ueJYokRLhLAyRcOIAg=
 =yaHb
 -----END PGP SIGNATURE-----

Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Leon Romanovsky says:

====================
Use ODP MRs for kernel ULPs

The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.

The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.

The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.

The third part is actual user of those in-kernel APIs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21 10:22:51 +01:00
Saeed Mahameed
12e9e0d0d9 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
  net/mlx5e: Add discard counters per priority
  net/mlx5e: Expose FEC feilds and related capability bit
  net/mlx5: Add mlx5_ifc definitions for connection tracking support
  net/mlx5: Add copy header action struct layout
  net/mlx5: Expose resource dump register mapping
  net/mlx5: Add structures and defines for MIRC register
  net/mlx5: Read MCAM register groups 1 and 2
  net/mlx5: Add structures layout for new MCAM access reg groups
  net/mlx5: Expose vDPA emulation device capabilities
  net/mlx5: Add Virtio Emulation related device capabilities

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:48:24 -08:00
Paul Blakey
61dc7b0141 net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16 15:41:59 -08:00
Avihai Horon
d7fab91637 IB/mlx5: Expose RoCE accelerator counters
Introduce the following RoCE accelerator counters:
* roce_adp_retrans - number of adaptive retransmission for RoCE traffic.
* roce_adp_retrans_to - number of times RoCE traffic reached time out
  due to adaptive retransmission.
* roce_slow_restart - number of times RoCE slow restart was used.
* roce_slow_restart_cnps - number of times RoCE slow restart
  generate CNP packets.
* roce_slow_restart_trans - number of times RoCE slow restart change
  state to slow restart.

Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 16:11:52 -04:00
Michael Guralnik
d6de0bb185 RDMA/mlx5: Set relaxed ordering when requested
Enable relaxed ordering in the mkey context when requested. As relaxed
ordering is not currently supported in UMR, disable UMR usage for relaxed
ordering MRs.

Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16 15:55:47 -04:00
Jason Gunthorpe
8ffc324851 RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
Till recently it was not possible for userspace to specify a different
IOVA, but with the new ibv_reg_mr_iova() library call this can be done.

To compute the user_va we must compute:
  user_va = (iova - iova_start) + user_va_start

while being cautious of overflow and other math problems.

The iova is not reliably stored in the mmkey when the MR is created. Only
the cached creation path (the common one) set it, so it must also be set
when creating uncached MRs.

Fix the weird use of iova when computing the starting page index in the
MR. In the normal case, when iova == umem.address:
  iova & (~(BIT(page_shift) - 1)) ==
  ALIGN_DOWN(umem.address, odp->page_size) ==
  ib_umem_start(odp)

And when iova is different using it in math with a user_va is wrong.

Finally, do not allow an implicit ODP to be created with a non-zero IOVA
as we have no support for that.

Fixes: 7bdf65d411 ("IB/mlx5: Handle page faults")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:15:07 +02:00
Moni Shoua
a73a895588 IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs
The ODP handler for WQEs in RQ or SRQ is not implented for kernel QPs.
Therefore don't report support in these if query comes from a kernel user.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:15:00 +02:00
Leon Romanovsky
4835709176 RDMA/mlx5: Don't fake udata for kernel path
Kernel paths must not set udata and provide NULL pointer,
instead of faking zeroed udata struct.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:53 +02:00
Moni Shoua
da9ee9d8a8 IB/mlx5: Add ODP WQE handlers for kernel QPs
One of the steps in ODP page fault handler for WQEs is to read a WQE
from a QP send queue or receive queue buffer at a specific index.

Since the implementation of this buffer is different between kernel and
user QP the implementation of the handler needs to be aware of that and
handle it in a different way.

ODP for kernel MRs is currently supported only for RDMA_READ
and RDMA_WRITE operations so change the handler to
- read a WQE from a kernel QP send queue
- fail if access to receive queue or shared receive queue is
  required for a kernel QP

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:47 +02:00
Moni Shoua
c320e527e1 IB: Allow calls to ib_umem_get from kernel ULPs
So far the assumption was that ib_umem_get() and ib_umem_odp_get()
are called from flows that start in UVERBS and therefore has a user
context. This assumption restricts flows that are initiated by ULPs
and need the service that ib_umem_get() provides.

This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device
directly by relying on the fact that both UVERBS and ULPs sets that
field correctly.

Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16 16:14:28 +02:00
Jason Gunthorpe
93887e66ff RDMA/mlx5: Simplify devx async commands
With the new FD structure the async commands do not need to hold any
references while running. The existing mlx5_cmd_exec_cb() and
mlx5_cmd_cleanup_async_ctx() provide enough synchronization to ensure
that all outstanding commands are completed before the uobject can be
destructed.

Remove the now confusing get_file() and the type erasure of the
devx_async_cmd_event_file.

Link: https://lore.kernel.org/r/1578504126-9400-4-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
f7c8416cce RDMA/core: Simplify destruction of FD uobjects
FD uobjects have a weird split between the struct file and uobject
world. Simplify this to make them pure uobjects and use a generic release
method for all struct file operations.

This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always
called before erasing the linked list contents to make the concurrancy
simpler to understand.

For this to work the uobject destruction must fence anything that it is
cleaning up - the design must not rely on struct file lifetime.

Only deliver_event() relies on the struct file to when adding new events
to the queue, add a is_destroyed check under lock to block it.

Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Jason Gunthorpe
6898d1c661 RDMA/mlx5: Use RCU and direct refcounts to keep memory alive
dispatch_event_fd() runs from a notifier with minimal locking, and relies
on RCU and a file refcount to keep the uobject and eventfd alive.

As the next patch wants to remove the file_operations release function
from the drivers, re-organize things so that the devx_event_notifier()
path uses the existing RCU to manage the lifetime of the uobject and
eventfd.

Move the refcount puts to a call_rcu so that the objects are guaranteed to
exist and remove the indirect file refcount.

Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:17:19 -04:00
Yishai Hadas
3f59b6c3e6 IB/mlx5: Add mmap support for VAR
Add mmap support for VAR, it uses the 'offset' command mode with
involvement of IB core APIs to find the previously allocated mmap entry.

Link: https://lore.kernel.org/r/20191212110928.334995-6-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00