Commit Graph

202 Commits

Author SHA1 Message Date
Bob Pearson
1858d98b83 RDMA/rxe: Remove duplicate entries in struct rxe_mr
Struct rxe_mem had pd, lkey and rkey values both in itself and in the
struct ib_mr which is also included in rxe_mem.

Delete these entries and replace references with the ones in ibmr.Add
mr_pd, mr_lkey and mr_rkey macros which extract these values from mr.

Link: https://lore.kernel.org/r/20201008212818.265303-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-08 20:07:50 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Linus Torvalds
b1df2a0783 RDMA second 5.9-rc pull request
A number of driver bug fixes and a few recent regressions:
 
 - Several bug fixes for bnxt_re. Crashing, incorrect data reported,
   and corruption on new HW
 
 - Memory leak and crash in rxe
 
 - Fix sysfs corruption in rxe if the netdev name is too long
 
 - Fix a crash on error unwind in the new cq_pool code
 
 - Fix kobject panics in rtrs by working device lifetime properly
 
 - Fix a data corruption bug in iser target related to misaligned buffers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl9atAYACgkQOG33FX4g
 mxqSdw//Qi29dnxzVGpsaO4/krd/VmI6NT6eNpgK7Nqx80DaCYer0JhtwZOUxHqK
 KbHIV9XB/f6BSI67c9ydYj4PNX6FpFnoUWQLvqZwip5VM7R6ifIVjm0ap1jCAUSS
 axDLFZOySIOYNhcZ5I+MtY/kxykKBjMteuMXdpBe4FwZ+XSmsC5KkfRH/+FUhjVG
 peL6aRVDv9TByH8w+iZE1wSmVrOphOE1C/jN5TyotQTmKe7IHoXJtkalosYHXFWw
 KZaaz52e4IYKVFl4HIcl6+FfPExhxsyfDtRHluvn+vzY/wFy1RZw6F0BZt7mioy5
 J8R6w82xEe/SNugTGuvIzqXOymmy9H4CrG9pHy4NRMMzC28LGI7qHJgVhr/jZy8+
 GPxR26cywDhPsd4XA2K3mvs7DVSoBUPYlIUnHdYjfBZl/ghColg9+XGyNv6pdrke
 Q7Kog5blcpOAahBX+ElBLvIZXk5oEk5W+3H/M0OeuVMQ/DrMtALrCnwpp4wDKVvO
 9QuYfGgQ+25xbV9kwzckLGo5eedN3cRD/v4hcqvQUZo+9zLYZ/HZRMjpOdrscQ+I
 QL4FgpcLpOASKZ+bYjjpFxK3rNVTDT9CYJw4/hxEaOhxRhtAO1Q9mJdvJTK6dj09
 oR9LPyefQkyKCAt+heWHKKkEYDiwT8U1SlR8STotg24VHIj6Rb4=
 =2DDd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "A number of driver bug fixes and a few recent regressions:

   - Several bug fixes for bnxt_re. Crashing, incorrect data reported,
     and corruption on new HW

   - Memory leak and crash in rxe

   - Fix sysfs corruption in rxe if the netdev name is too long

   - Fix a crash on error unwind in the new cq_pool code

   - Fix kobject panics in rtrs by working device lifetime properly

   - Fix a data corruption bug in iser target related to misaligned
     buffers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/isert: Fix unaligned immediate-data handling
  RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
  RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'
  RDMA/core: Fix reported speed and width
  RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ
  RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
  RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address
  RDMA/bnxt_re: Restrict the max_gids to 256
  RDMA/bnxt_re: Static NQ depth allocation
  RDMA/bnxt_re: Fix the qp table indexing
  RDMA/bnxt_re: Do not report transparent vlan from QP1
  RDMA/mlx4: Read pkey table length instead of hardcoded value
  RDMA/rxe: Fix panic when calling kmem_cache_create()
  RDMA/rxe: Fix memleak in rxe_mem_init_user
  RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
  RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
2020-09-11 10:02:36 -07:00
Leon Romanovsky
43d781b9fa RDMA: Allow fail of destroy CQ
Like any other verbs objects, CQ shouldn't fail during destroy, but
mlx5_ib didn't follow this contract with mixed IB verbs objects with
DEVX. Such mix causes to the situation where FW and kernel are fully
interdependent on the reference counting of each side.

Kernel verbs and drivers that don't have DEVX flows shouldn't fail.

Fixes: e39afe3d6d ("RDMA: Convert CQ allocations to be under core responsibility")
Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:29 -03:00
Leon Romanovsky
119181d1d4 RDMA: Restore ability to fail on SRQ destroy
In similar way to other IB objects, restore the ability to return error on
SRQ destroy. Strictly speaking, this change is not necessary, and provided
here to ensure a symmetrical interface like other destroy functions.

Fixes: 68e326dea1 ("RDMA: Handle SRQ allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 14:14:24 -03:00
Leon Romanovsky
9a9ebf8cd7 RDMA: Restore ability to fail on AH destroy
Like any other IB verbs objects, AH are refcounted by ib_core. The release
of those objects are controlled by ib_core with promise that AH destroy
can't fail.

Being SW object for now, this change makes dealloc_ah() to behave like any
other destroy IB flows.

Fixes: d345691471 ("RDMA: Handle AH allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Leon Romanovsky
91a7c58fce RDMA: Restore ability to fail on PD deallocate
The IB verbs objects are counted by the kernel and ib_core ensures that
deallocate PD will success so it will be called once all other objects
that depends on PD will be released. This is achieved by managing various
reference counters on such objects.

The mlx5 driver didn't follow this standard flow when allowed DEVX objects
that are not managed by ib_core to be interleaved with the ones under
ib_core responsibility.

In such interleaved scenarios deallocate command can fail and ib_core will
leave uobject in internal DB and attempt to clean it later to free
resources anyway.

This change partially restores returned value from dealloc_pd() for all
drivers, but keeping in mind that non-DEVX devices and kernel verbs paths
shouldn't fail.

Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09 13:57:22 -03:00
Jason Gunthorpe
6989aa62d3 Linux 5.9-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9ML+IeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGA8EIAIy/kTbFS0yrE9yV
 hb98oX0z9+EU9YQg9vhaRWwPd+rJF/JMQZLqYcwbhjG9abaUL3T3fEcSAefMHw8E
 LAt+hYzA38dHt7tqhsFQX3vV1VorvDVICBVN0yRPRWKKikq4OPIHzaAR9tleGAF5
 8btQisl1PjN+obwYmLuNb6aX16OCwAF+uXOwehcoJs9dvMNhwtXRzfOflWzOvOo6
 tE0bHErlylLDfLv4ZzEfczTdks4QJZ7C0xLSf3oN9AAynW42Xnhct4hi8qZY/hCf
 CMaqeN4hdpub6TvQIqBdDqMMjEXGFgeNSnAEBQY9VpvUqz8NTu6sQxwgJEKDF5tg
 d81lv2c=
 =uW/F
 -----END PGP SIGNATURE-----

Merge tag 'v5.9-rc3' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:28:12 -03:00
Bob Pearson
63fa15dbd4 RDMA/rxe: Add SPDX hdrs to rxe source files
Add SPDX headers to all rxe .c and .h files.

Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:20:02 -03:00
Yi Zhang
60b1af64eb RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
'parent' sysfs reads will yield '\0' bytes when the interface name has 15
chars, and there will no "\n" output.

To reproduce, create one interface with 15 chars:

 [root@test ~]# ip a s enp0s29u1u7u3c2
 2: enp0s29u1u7u3c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
     link/ether 02:21:28:57:47:17 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::ac41:338f:5bcd:c222/64 scope link noprefixroute
        valid_lft forever preferred_lft forever
 [root@test ~]# modprobe rdma_rxe
 [root@test ~]# echo enp0s29u1u7u3c2 > /sys/module/rdma_rxe/parameters/add
 [root@test ~]# cat /sys/class/infiniband/rxe0/parent
 enp0s29u1u7u3c2[root@test ~]#
 [root@test ~]# f="/sys/class/infiniband/rxe0/parent"
 [root@test ~]# echo "$(<"$f")"
 -bash: warning: command substitution: ignored null byte in input
 enp0s29u1u7u3c2

Use scnprintf and PAGE_SIZE to fill the sysfs output buffer.

Cc: stable@vger.kernel.org
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200820153646.31316-1-yi.zhang@redhat.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24 13:59:58 -03:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Kamal Heib
76251e15ea RDMA/rxe: Remove pkey table
The RoCE spec requires RoCE devices to support only the default pkey.
However the rxe driver maintains a 64 enties pkey table and uses only the
first entry. Remove the pkey table and hard code a table of length one
hard wired with the default pkey. Replace all checks of the pkey_table
with a comparison to the default_pkey instead.

Link: https://lore.kernel.org/r/20200721101618.686110-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-31 16:17:56 -03:00
Mikhail Malygin
5f0b2a6093 RDMA/rxe: Prevent access to wr->next ptr afrer wr is posted to send queue
rxe_post_send_kernel() iterates over linked list of wr's, until the
wr->next ptr is NULL.  However if we've got an interrupt after last wr is
posted, control may be returned to the code after send completion callback
is executed and wr memory is freed.

As a result, wr->next pointer may contain incorrect value leading to
panic. Store the wr->next on the stack before posting it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200716190340.23453-1-m.malygin@yadro.com
Signed-off-by: Mikhail Malygin <m.malygin@yadro.com>
Signed-off-by: Sergey Kojushev <s.kojushev@yadro.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 16:12:07 -03:00
Kamal Heib
420bd9e2d9 RDMA/rxe: Remove rxe_link_layer()
Instead of returning IB_LINK_LAYER_ETHERNET from rxe_link_layer, return it
directly from get_link_layer callback and remove rxe_link_layer().

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-5-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Kamal Heib
293d8440a0 RDMA/rxe: Return void from rxe_mem_init_dma()
The return value from rxe_mem_init_dma() is always 0 - change it to be
void and fix the callers accordingly.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Gal Pressman
42a3b15396 RDMA: Remove the udata parameter from alloc_mr callback
Allocating an MR flow can only be initiated by kernel users, and not from
userspace so a udata parameter is redundant.

Link: https://lore.kernel.org/r/20200706120343.10816-4-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06 19:25:53 -03:00
Maor Gottlieb
fa5d010c56 RDMA: Group create AH arguments in struct
Following patch adds additional argument to the create AH function, so it
make sense to group ah_attr and flags arguments in struct.

Link: https://lore.kernel.org/r/20200430192146.12863-13-maorg@mellanox.com
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-02 20:19:53 -03:00
Bart Van Assche
97458fd510 RDMA/rxe: Increase DMA max_segment_size parameter
Increase the DMA max_segment_size parameter from 64 KB to 2 GB.

Link: https://lore.kernel.org/r/20191025225830.257535-3-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:52:03 -03:00
Kamal Heib
f3fceba5da RDMA/rxe: Verify modify_device mask
Verify that the passed mask to rxe_modify_device() is supported.

Link: https://lore.kernel.org/r/20190923104158.5331-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01 13:06:10 -03:00
Kamal Heib
72a7720fca RDMA: Introduce ib_port_phys_state enum
In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Andrew Boyer <aboyer@tobark.org>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12 10:18:52 -04:00
Leon Romanovsky
e39afe3d6d RDMA: Convert CQ allocations to be under core responsibility
Ensure that CQ is allocated and freed by IB/core and not by drivers.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:39:49 -04:00
Leon Romanovsky
a52c8e2469 RDMA: Clean destroy CQ in drivers do not return errors
Like all other destroy commands, .destroy_cq() call is not supposed
to fail. In all flows, the attempt to return earlier caused to memory
leaks.

This patch converts .destroy_cq() to do not return any errors.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:17:10 -04:00
Jason Gunthorpe
7a15414252 RDMA: Move owner into struct ib_device_ops
This more closely follows how other subsytems work, with owner being a
member of the structure containing the function pointers.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:03 -03:00
Jason Gunthorpe
72c6ec18eb RDMA: Move uverbs_abi_ver into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Jason Gunthorpe
b9560a419b RDMA: Move driver_id into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Leon Romanovsky
68e326dea1 RDMA: Handle SRQ allocations by IB/core
Convert SRQ allocation from drivers to be in the IB/core

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky
d345691471 RDMA: Handle AH allocations by IB/core
Simplify drivers by ensuring lifetime of ib_ah object. The changes
in .create_ah() go hand in hand with relevant update in .destroy_ah().

We will use this opportunity and convert .destroy_ah() to don't fail, as
it was suggested a long time ago, because there is nothing to do in case
of failure during destroy.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Shamir Rabinovitch
ff23dfa134 IB: Pass only ib_udata in function prototypes
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.

Make ib_udata the only argument psssed.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 15:00:47 -03:00
Shamir Rabinovitch
c4367a2635 IB: Pass uverbs_attr_bundle down ib_x destroy path
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:57:35 -03:00
Leon Romanovsky
a2a074ef39 RDMA: Handle ucontext allocations by IB/core
Following the PD conversion patch, do the same for ucontext allocations.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22 14:11:37 -07:00
Steve Wise
66920e1b25 rdma_rxe: Use netlink messages to add/delete links
Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow
dynamically adding new RXE links.  Deprecate the old module options for
now.

Cc: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:19 -07:00
Jason Gunthorpe
ca22354b14 RDMA/rxe: Close a race after ib_register_device
Since rxe allows unregistration from other threads the rxe pointer can
become invalid any moment after ib_register_driver returns. This could
cause a user triggered use after free.

Add another driver callback to be called right after the device becomes
registered to complete any device setup required post-registration.  This
callback has enough core locking to prevent the device from becoming
unregistered.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:18 -07:00
Jason Gunthorpe
c367074b6c RDMA/rxe: Use driver_unregister and new unregistration API
rxe does not have correct locking for its registration/unregistration
paths, use the core code to handle it instead. In this mode
ib_unregister_device will also do the dealloc, so rxe is required to do
clean up from a callback.

The core code ensures that unregistration is done only once, and generally
takes care of locking and concurrency problems for rxe.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:18 -07:00
Jason Gunthorpe
4c173f596b RDMA/rxe: Use ib_device_get_by_netdev() instead of open coding
The core API handles the locking correctly and is faster if there are
multiple devices.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19 20:52:18 -07:00
Shamir Rabinovitch
8994445054 IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs
Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 15:38:38 -07:00
Leon Romanovsky
21a428a019 RDMA: Handle PD allocations by IB/core
The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().

We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08 16:51:04 -07:00
Kamal Heib
fa40718804 RDMA/rxe: Move rxe_init_av() to rxe_av.c
Move the function rxe_init_av() to rxe_av.c file and use it instead of
calling rxe_av_from_attr() and rxe_av_fill_ip_info(), also remove the
unused rxe_dev parameter from rxe_init_av().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 15:57:49 -07:00
Parav Pandit
5474723115 RDMA: Introduce and use rdma_device_to_ibdev()
Introduce and use rdma_device_to_ibdev() API for those drivers which are
registering one sysfs group and also use in ib_core.

In subsequent patch, device->provider_ibdev one-to-one mapping is no
longer holds true during accessing sysfs entries.
Therefore, introduce an API rdma_device_to_ibdev() that provides such
information.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 13:12:03 -07:00
Parav Pandit
ea4baf7f11 RDMA: Rename port_callback to init_port
Most provider routines are callback routines which ib core invokes.
_callback suffix doesn't convey information about when such callback is
invoked. Therefore, rename port_callback to init_port.

Additionally, store the init_port function pointer in ib_device_ops, so
that it can be accessed in subsequent patches when binding rdma device to
net namespace.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 13:05:14 -07:00
Gal Pressman
2553ba217e RDMA: Mark if destroy address handle is in a sleepable context
Introduce a 'flags' field to destroy address handle callback and add a
flag that marks whether the callback is executed in an atomic context or
not.

This will allow drivers to wait for completion instead of polling for it
when it is allowed.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:28:03 -07:00
Gal Pressman
b090c4e3a0 RDMA: Mark if create address handle is in a sleepable context
Introduce a 'flags' field to create address handle callback and add a flag
that marks whether the callback is executed in an atomic context or not.

This will allow drivers to wait for completion instead of polling for it
when it is allowed.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:17:19 -07:00
Shamir Rabinovitch
e00b64f7c5 RDMA: Cleanup undesired pd->uobject usage
Drivers should be using udata to determine if a method is invoked from
user space or kernel space. A pd does not necessarily say a different
objects is kernel or user.

Transforming the tests to use udata eliminates a large number of uobject
references from the drivers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 19:15:48 -07:00
Kamal Heib
573efc4b3c RDMA/rxe: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:15 -07:00
Yuval Shaia
59590b8ad2 IB/{mlx5,ocrdma,qedr,rxe}: Omit port validation from IB verbs
RDMA core layer already make sure port is valid, no need to check it here
again.

For the pkey validation this depends on commit b3ac5742fead ("RDMA/core:
Validate port number in query_pkey verb")

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Zhu Yanjun
8c9959689b IB/rxe: make rxe_unregister_device void
Since the function rxe_unregister_device always returns 0, it is changed
to void.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-08 14:22:54 -07:00
Andrew Boyer
5736c7c499 RDMA/rxe: Distinguish between down links and disabled links
In ib_query_port(), use the netdev's IFF_UP flag to determine phys_state
(flag set = down = POLLING, flag clear = disabled = DISABLED).

Callers can then use the phys_state field to distinguish between links
which have a dead partner, cable missing, etc., from links which are
turned off on the local node. This is useful for HA and supportability.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-11-08 14:22:53 -07:00
Parav Pandit
508a523f6b RDMA/drivers: Use core provided API for registering device attributes
Use rdma_set_device_sysfs_group() to register device attributes and
simplify the driver.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-17 03:45:01 -06:00
Jason Gunthorpe
e349f858d2 RDMA: Fully setup the device name in ib_register_device
The current code has two copies of the device name, ibdev->dev and
dev_name(&ibdev->dev), and they are setup at different times, which is
very confusing.

Set them both up at the same time and make dev_name() the lead name, which
is the proper use of the driver core APIs. To make it very clear that the
name is not valid until registration pass it in to the
ib_register_device() call rather than messing with ibdev->name directly.

Also the reorganization now checks that dev_name is unique even if it does
not contain a %.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-09-26 13:51:36 -06:00
Bart Van Assche
d34ac5cd3a RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:09:34 -06:00
Bart Van Assche
f696bf6d64 RDMA: Constify the argument of the work request conversion functions
When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:00:20 -06:00
Bart Van Assche
2f229bcf25 RDMA/rxe: Simplify the error handling code in rxe_create_ah()
This patch not only simplifies the error handling code in rxe_create_ah()
but also removes the dead code that was left behind by commit 47ec386662
("RDMA: Convert drivers to use sgid_attr instead of sgid_index").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:55:28 -06:00
Parav Pandit
47ec386662 RDMA: Convert drivers to use sgid_attr instead of sgid_index
The core code now ensures that all driver callbacks that receive an
rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.

Drivers can use this pointer instead of calling a query function with
sgid_index. This simplifies the drivers and also avoids races where a
gid_index lookup may return different data if it is changed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Jason Gunthorpe
0394808d9e Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Update mlx4 to support user MR creation against read-only memory, previously
it required the memory to be writable.

Based on rdma for-rc due to dependencies.

* mr_fix: (2 commits)
  IB/mlx4: Mark user MR as writable if actual virtual memory is writable
  IB/core: Make testing MR flags for writability a static inline function
2018-05-28 11:44:35 -06:00
Alexandru Moise
1661d3b0e2 nvmet,rxe: defer ip datagram sending to tasklet
This addresses 3 separate problems:

1. When using NVME over Fabrics we may end up sending IP
packets in interrupt context, we should defer this work
to a tasklet.

[   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
[   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
[   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
[   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
[   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
[   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
[   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
[   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
[   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
[   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
[   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
[   50.966121] Call Trace:
[   50.966845]  <IRQ>
[   50.967497]  __dev_queue_xmit+0x62d/0x690
[   50.968722]  dev_queue_xmit+0x10/0x20
[   50.969894]  neigh_resolve_output+0x173/0x190
[   50.971244]  ip_finish_output2+0x2b8/0x370
[   50.972527]  ip_finish_output+0x1d2/0x220
[   50.973785]  ? ip_finish_output+0x1d2/0x220
[   50.975010]  ip_output+0xd4/0x100
[   50.975903]  ip_local_out+0x3b/0x50
[   50.976823]  rxe_send+0x74/0x120
[   50.977702]  rxe_requester+0xe3b/0x10b0
[   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
[   50.980260]  rxe_do_task+0x85/0x100
[   50.981386]  rxe_run_task+0x2f/0x40
[   50.982470]  rxe_post_send+0x51a/0x550
[   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
[   50.985024]  __nvmet_req_complete+0x95/0xa0
[   50.986287]  nvmet_req_complete+0x15/0x60
[   50.987469]  nvmet_bio_done+0x2d/0x40
[   50.988564]  bio_endio+0x12c/0x140
[   50.989654]  blk_update_request+0x185/0x2a0
[   50.990947]  blk_mq_end_request+0x1e/0x80
[   50.991997]  nvme_complete_rq+0x1cc/0x1e0
[   50.993171]  nvme_pci_complete_rq+0x117/0x120
[   50.994355]  __blk_mq_complete_request+0x15e/0x180
[   50.995988]  blk_mq_complete_request+0x6f/0xa0
[   50.997304]  nvme_process_cq+0xe0/0x1b0
[   50.998494]  nvme_irq+0x28/0x50
[   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
[   51.000986]  handle_irq_event_percpu+0x32/0x80
[   51.002356]  handle_irq_event+0x3c/0x60
[   51.003463]  handle_edge_irq+0x1c9/0x200
[   51.004473]  handle_irq+0x23/0x30
[   51.005363]  do_IRQ+0x46/0xd0
[   51.006182]  common_interrupt+0xf/0xf
[   51.007129]  </IRQ>

2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
when using NVMEoF in order to solve lock ordering between neigh->ha_lock
seqlock and the nvme queue lock:

[   77.833783]  Possible interrupt unsafe locking scenario:
[   77.833783]
[   77.835831]        CPU0                    CPU1
[   77.837129]        ----                    ----
[   77.838313]   lock(&(&n->ha_lock)->seqcount);
[   77.839550]                                local_irq_disable();
[   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
[   77.843222]                                lock(&(&n->ha_lock)->seqcount);
[   77.845178]   <Interrupt>
[   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
[   77.847986]
[   77.847986]  *** DEADLOCK ***

3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:

[   47.634271]  Possible interrupt unsafe locking scenario:
[   47.634271]
[   47.636452]        CPU0                    CPU1
[   47.637861]        ----                    ----
[   47.639285]   lock(&(&sch->q.lock)->rlock);
[   47.640654]                                local_irq_disable();
[   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
[   47.644521]                                lock(&(&sch->q.lock)->rlock);
[   47.646480]   <Interrupt>
[   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
[   47.648492]
[   47.648492]  *** DEADLOCK ***

Using NVMEoF after this patch seems to finally be stable, without it,
rxe eventually deadlocks the whole system and causes RCU stalls.

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:51 -04:00
Zhu Yanjun
e12ee8ce51 IB/rxe: remove unused function variable
In the functions rxe_mem_init_dma, rxe_mem_init_user, rxe_mem_init_fast
and copy_data, the function variable rxe is not used. So this function
variable rxe is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 12:18:29 -04:00
Mikhail Malygin
efc365e729 IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
On ppc64le arch rxe_add command causes oops in kernel log:

[   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
[   92.499710] SMP NR_CPUS=2048 NUMA pSeries
[   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
 nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
a2xxx(E)
[   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[   92.499834] Supported: No, Unsupported modules are loaded
[   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
[   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
[   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
[   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
[   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
[   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
               GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
               GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
               GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
               GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
               GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
               GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
               GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
               GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
[   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
[   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499886] Call Trace:
[   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
[   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
[   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
[   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
[   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
[   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
[   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
[   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
[   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
[   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
[   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
[   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
[   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
[   92.499949] Instruction dump:
[   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
[   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
[   92.499962] ---[ end trace bed077e15eb420cf ]---

It fails in dma_get_required_mask, that has ppc-specific implementation,
and fail if provided device argument is NULL

Signed-off-by: Mikhail Malygin <mikhail@malygin.me>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:50 -06:00
Parav Pandit
39e00b6cf6 IB/rxe: Removed GID add/del dummy routines
rxe driver's add_gid() and del_gid() callbacks are doing simple
checks which are already done by the ib core before invoking these
callback routines.
Therefore, code is simplified to skip implementing add_gid() and
del_gid() callback functions.
They are only invoked by ib_core if they are implemented.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 10:15:33 -06:00
Parav Pandit
414448d249 RDMA: Use ib_gid_attr during GID modification
Now that ib_gid_attr contains device, port and index, simplify the
provider APIs add_gid() and del_gid() to use device, port and index
fields from the ib_gid_attr attributes structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:34:16 -06:00
Parav Pandit
3e44e0ee08 IB/providers: Avoid null netdev check for RoCE
Now that IB core GID cache ensures that all RoCE entries have an
associated netdev remove null checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
0e1f9b9244 RDMA/providers: Simplify query_gid callback of RoCE providers
ib_query_gid() fetches the GID from the software cache maintained in
ib_core for RoCE ports.

Therefore, simplify the provider drivers for RoCE to treat query_gid()
callback as never called for RoCE, and only require non-RoCE devices to
implement it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:47 -06:00
Matan Barak
0ede73bc01 IB/uverbs: Extend uverbs_ioctl header with driver_id
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Jason Gunthorpe
0c43ab371b RDMA/rxe: Use structs to describe the uABI instead of opencoding
Open coding pointer math is not acceptable for describing the uABI in
RDMA. Provide structs for all the cases.

The udata is casted to the struct as close to the verbs entry point
as possible for maximum clarity. Function signatures and so forth
are revised to allow for this.

Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:02 -06:00
Jason Gunthorpe
b92ec0fe32 RDMA/rxe: Get rid of confusing udata parameter to rxe_cq_chk_attr
It isn't used and it couldn't possibly ever be used correctly.

Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:02 -06:00
Bart Van Assche
a6544a624c RDMA/rxe: Fix an out-of-bounds read
This patch avoids that KASAN reports the following when the SRP initiator
calls srp_post_send():

==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x5c4/0x980 [rdma_rxe]
Read of size 8 at addr ffff880066606e30 by task 02-mq/1074

CPU: 2 PID: 1074 Comm: 02-mq Not tainted 4.16.0-rc3-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x5c4/0x980 [rdma_rxe]
srp_post_send.isra.16+0x149/0x190 [ib_srp]
srp_queuecommand+0x94d/0x1670 [ib_srp]
scsi_dispatch_cmd+0x1c2/0x550 [scsi_mod]
scsi_queue_rq+0x843/0xa70 [scsi_mod]
blk_mq_dispatch_rq_list+0x143/0xac0
blk_mq_do_dispatch_ctx+0x1c5/0x260
blk_mq_sched_dispatch_requests+0x2bf/0x2f0
__blk_mq_run_hw_queue+0xdb/0x160
__blk_mq_delay_run_hw_queue+0xba/0x100
blk_mq_run_hw_queue+0xf2/0x190
blk_mq_sched_insert_request+0x163/0x2f0
blk_execute_rq+0xb0/0x130
scsi_execute+0x14e/0x260 [scsi_mod]
scsi_probe_and_add_lun+0x366/0x13d0 [scsi_mod]
__scsi_scan_target+0x18a/0x810 [scsi_mod]
scsi_scan_target+0x11e/0x130 [scsi_mod]
srp_create_target+0x1522/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

The buggy address belongs to the page:
page:ffffea0001998180 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880066606d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
ffff880066606d80: f1 00 f2 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2
>ffff880066606e00: f2 00 00 00 00 00 f2 f2 f2 f3 f3 f3 f3 00 00 00
                                    ^
ffff880066606e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880066606f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Zhu Yanjun
316663c6bf IB/rxe: remove redudant parameter in rxe_av_fill_ip_info
In the function rxe_av_fill_ip_info, the parameter rxe is not used.
So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun
45a290f8f6 IB/rxe: change the function rxe_av_fill_ip_info to void
The function rxe_av_fill_ip_info always returns 0. So the function
type is changed to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun
9c96f3d4dd IB/rxe: remove unnecessary parameter in rxe_av_to_attr
In the function rxe_av_to_attr, the parameter rxe is
not used. So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun
ca3d9feee6 IB/rxe: change the function to void from int
The function rxe_av_from_attr always return 0. So change the
function to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun
1a241db19c IB/rxe: remove redudant parameter in function
In the function rxe_av_from_attr, the parameter rxe
is not used. So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Bart Van Assche
6f301e06de RDMA/rxe: Fix a race condition related to the QP error state
The following sequence:
* Change queue pair state into IB_QPS_ERR.
* Post a work request on the queue pair.

Triggers the following race condition in the rdma_rxe driver:
* rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
  that examines the QP send queue.
* rxe_post_send() posts a work request on the QP send queue.

If rxe_completer() runs prior to rxe_post_send(), it will drain the send
queue and the driver will assume no further action is necessary.
However, once we post the send to the send queue, because the queue is
in error, no send completion will ever happen and the send will get
stuck.  In order to process the send, we need to make sure that
rxe_completer() gets run after a send is posted to a queue pair in an
error state.  This patch ensures that happens.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-10 16:57:27 -05:00
Thomas Bogendoerfer
3192c53e5a IB/rxe: don't crash, if allocation of crc algorithm failed
Following crash happens, if crc algorithm couldn't be allocated:

[ 1087.989072] rdma_rxe: loaded
[ 1097.855397] PCLMULQDQ-NI instructions are not detected.
[ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
[ 1097.901248] BUG: unable to handle kernel
[ 1097.901249] NULL pointer dereference
[ 1097.901250]  at 0000000000000046
[...]

Reason is that rxe->tfm is assigned the error return, which will then
be used for crypto_free_shash() in rxe_cleanup. Fix by using a
temporary variable and assigning it rxe->tfm after allocation succeeded.

Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:43:50 -05:00
Bart Van Assche
ea6ee93b40 RDMA/rxe: Suppress gcc 7 fall-through complaints
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:07 -04:00
Andrew Boyer
bfc3ae0566 IB/rxe: Disable completion upcalls when a CQ is destroyed
This prevents the stack from accessing userspace objects while they
are being torn down.

One possible sequence of events:
 - Userspace program exits
 - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
   ib_destroy_cq(), etc. and releasing/freeing the UCQ
   - The QP still has tasklets running, so it isn't destroyed yet
   - The CQ is referenced by the QP, so the CQ isn't destroyed yet
   - The UCQ is kfree()'d anyway
 - A send work request completes
 - rxe_send_complete() calls cq->ibcq.comp_handler()
 - ib_uverbs_comp_handler() runs and crashes; the event queue is checked
   for is_closed, but it has no way to check the ib_ucq_object before
   accessing it

The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:32 -04:00
Doug Ledford
a5f66725c7 Merge branch 'misc' into k.o/for-next 2017-07-27 09:00:38 -04:00
Yuval Shaia
d41861942f IB/core: Add generic function to extract IB speed from netdev
Logic of retrieving netdev speed from net_device and translating it to
IB speed is implemented in rxe, in usnic and in bnxt drivers.

Define new function which merges all.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Kamal Heib
61013828f6 IB/rxe: Use __func__ to print function's name
Its better to use __func__ to print functions name instead of writing
the name in the print statement.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib
c05d26647b IB/rxe: Use DEVICE_ATTR_RO macro to show parent field
Use DEVICE_ATTR RO() macro and rename the show function accordingly.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Vijay Immanuel
1217197142 rxe: fix broken receive queue draining
If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
yonatanc
56012e1cad IB/rxe: Set dma_mask and coherent_dma_mask
The RXE coupled with dummy device causes to the kernel panic attached
below.  The panic happens when ib_register_device tries to set dma_mask
by accessing a NULLed parent device.

The RXE does not actually use DMA, so we can set the dma_mask
to architecture value.

[16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core]
[16240.205289] RSP: 0018:ffffc9000220fc10 EFLAGS: 00010246
[16240.209909] RAX: 0000000000000024 RBX: ffff880220d1a2a8 RCX: 0000000000000000
[16240.212244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[16240.214385] RBP: ffffc9000220fcb0 R08: 0000000000000000 R09: 000000000000023f
[16240.254465] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
[16240.259467] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880220d1a2a8
[16240.263314] FS:  00007fd8ecca0740(0000) GS:ffff8802364c0000(0000) knlGS:0000000000000000
[16240.267292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16240.273503] CR2: 0000000000000218 CR3: 00000002253ba000 CR4: 00000000000006e0
[16240.277066] Call Trace:
[16240.281836]  ? __kmalloc+0x26f/0x280
[16240.286596]  rxe_register_device+0x297/0x300 [rdma_rxe]
[16240.291377]  rxe_add+0x535/0x5b0 [rdma_rxe]
[16240.297586]  rxe_net_add+0x3e/0xc0 [rdma_rxe]
[16240.302375]  rxe_param_set_add+0x65/0x144 [rdma_rxe]
[16240.307769]  param_attr_store+0x68/0xd0
[16240.311640]  module_attr_store+0x1d/0x30
[16240.316421]  sysfs_kf_write+0x3a/0x50
[16240.317802]  kernfs_fop_write+0xff/0x180
[16240.322989]  __vfs_write+0x37/0x140
[16240.328164]  ? handle_mm_fault+0xce/0x240
[16240.333340]  vfs_write+0xb2/0x1b0
[16240.335013]  SyS_write+0x55/0xc0
[16240.340632]  entry_SYSCALL_64_fastpath+0x1a/0xa9

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:27 -04:00
Jia-Ju Bai
07d432bb97 rxe: Fix a sleep-in-atomic bug in post_one_send
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
  init_send_wqe
    copy_from_user --> may sleep

There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 13:02:01 -04:00
Sagi Grimberg
67cf3623e0 rxe: expose num_possible_cpus() cnum_comp_vectors
They're completely logical, so don't impose an artificial limitation.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:33:02 -04:00
Dasaratharaman Chandramouli
44c58487d5 IB/core: Define 'ib' and 'roce' rdma_ah_attr types
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
eca7ddf965 IB/rxe: Initialize ib_ah_attr during query_ah
Zero out ib_ah_attr before calling query_ah. Set ah_flags
appropriately.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Colin Ian King
27b0b83233 IB/rxe: fix typo: "algorithmi" -> "algorithm"
trivial fix to typo in pr_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:07:58 -04:00
Yuval Shaia
4d6f28591f {net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function
This logic seems to be duplicated in (at least) three separate files.
Move it to one place so code can be re-use.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2017-04-25 14:21:34 -04:00
yonatanc
cee2688e3c IB/rxe: Offload CRC calculation when possible
Use CPU ability to perform CRC calculations, by
replacing direct calls to crc32_le() with crypto_shash_updata().

The overall performance gain measured with ib_send_bw tool is 10% and it
was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

---------------------------------------------------------------------------------------------
|             | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
---------------------------------------------------------------------------------------------
| crc32_le    | 1048576 | 9000       | inf             | 497.60             | 0.000498      |
| CRC offload | 1048576 | 9000       | inf             | 546.70             | 0.000547      |
---------------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00
Yonatan Cohen
0b1e5b99a4 IB/rxe: Add port protocol stats
Expose new counters using the get_hw_stats callback.
We expose the following counters:

+---------------------+----------------------------------------+
|      Name           |           Description                  |
|---------------------+----------------------------------------|
|sent_pkts            | number of sent pkts                    |
|---------------------+----------------------------------------|
|rcvd_pkts            | number of received packets             |
|---------------------+----------------------------------------|
|out_of_sequence      | number of errors due to packet         |
|                     | transport sequence number              |
|---------------------+----------------------------------------|
|duplicate_request    | number of received duplicated packets. |
|                     | A request that previously executed is  |
|                     | named duplicated.                      |
|---------------------+----------------------------------------|
|rcvd_rnr_err         | number of received RNR by completer    |
|---------------------+----------------------------------------|
|send_rnr_err         | number of sent RNR by responder        |
|---------------------+----------------------------------------|
|rcvd_seq_err         | number of out of sequence packets      |
|                     | received                               |
|---------------------+----------------------------------------|
|ack_deffered         | number of deferred handling of ack     |
|                     | packets.                               |
|---------------------+----------------------------------------|
|retry_exceeded_err   | number of times retry exceeded         |
|---------------------+----------------------------------------|
|completer_retry_err  | number of times completer decided to   |
|                     | retry                                  |
|---------------------+----------------------------------------|
|send_err             | number of failed send packet           |
+---------------------+----------------------------------------+

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:43:28 -04:00
Linus Torvalds
ac1820fb28 This is a tree wide change and has been kept separate for that reason.
Bart Van Assche noted that the ib DMA mapping code was significantly
 similar enough to the core DMA mapping code that with a few changes
 it was possible to remove the IB DMA mapping code entirely and
 switch the RDMA stack to use the core DMA mapping code.  This resulted
 in a nice set of cleanups, but touched the entire tree.  This branch
 will be submitted separately to Linus at the end of the merge window
 as per normal practice for tree wide changes like this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
 CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
 Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
 xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
 ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
 IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
 ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
 XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
 3e7mgpcwC+3XfA/6vU3F
 =e0Si
 -----END PGP SIGNATURE-----

Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma DMA mapping updates from Doug Ledford:
 "Drop IB DMA mapping code and use core DMA code instead.

  Bart Van Assche noted that the ib DMA mapping code was significantly
  similar enough to the core DMA mapping code that with a few changes it
  was possible to remove the IB DMA mapping code entirely and switch the
  RDMA stack to use the core DMA mapping code.

  This resulted in a nice set of cleanups, but touched the entire tree
  and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
  IB/core: Remove ib_device.dma_device
  nvme-rdma: Switch from dma_device to dev.parent
  RDS: net: Switch from dma_device to dev.parent
  IB/srpt: Modify a debug statement
  IB/srp: Switch from dma_device to dev.parent
  IB/iser: Switch from dma_device to dev.parent
  IB/IPoIB: Switch from dma_device to dev.parent
  IB/rxe: Switch from dma_device to dev.parent
  IB/vmw_pvrdma: Switch from dma_device to dev.parent
  IB/usnic: Switch from dma_device to dev.parent
  IB/qib: Switch from dma_device to dev.parent
  IB/qedr: Switch from dma_device to dev.parent
  IB/ocrdma: Switch from dma_device to dev.parent
  IB/nes: Remove a superfluous assignment statement
  IB/mthca: Switch from dma_device to dev.parent
  IB/mlx5: Switch from dma_device to dev.parent
  IB/mlx4: Switch from dma_device to dev.parent
  IB/i40iw: Remove a superfluous assignment statement
  IB/hns: Switch from dma_device to dev.parent
  ...
2017-02-25 13:45:43 -08:00
Linus Torvalds
af17fe7a63 Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree,
 I keept it separate from the remainder of the RDMA stack submission
 that is based on 4.10-rc3.
 
 This branch contains:
 
 - Various mlx4 and mlx5 fixes and minor changes
 - Support for adding a tag match rule to flow specs
 - Support for cvlan offload operation for raw ethernet QPs
 - A change to the core IB code to recognize raw eth capabilities and
   enumerate them (touches non-Mellanox code)
 - Implicit On-Demand Paging memory registration support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrx+WAAoJELgmozMOVy/du70P/1kpW2xY9Le04c3K7na2XOYl
 AUVIDrW/8Go63tpOaM7jBT3k4GlwVFr3IOmBpS24KbW/THxjhyUeP5L5+z2x+go+
 jkQOgtPWWEHr5zP3MzsNyB8fDx1YQOnJwEXxybQRW/cbw4CLjnhP+ezd6FdV/3Yy
 pPEqDVlAErzvNweG+n2r1pjcUbR8uneC3inyMLnyzUBz4CHKmC8fgD3/qJIM+DNb
 gtFT5xHFIXKCigWdQ/EwsTDcHub43V8OXlI5sO7loG6vToOUATMkjI4oOUNhDmYS
 X7XLN3yRK9QHEfb5kutXIZEWzTGh7LiFtUYGaNNYqqzDfSiMRc9NC5kTOfplEXDV
 Uo+AGb6Fh1zYIOzNk7o+tazIv3LaLv6+Fcm+9bbe0VUIqasaylsePqaTwMuIzx/I
 xP5nitmd5lbYo8WdlasVdG6mH1DlJEUbU30v4DpmTpxCP6jGpog7lexyGyF3TgzS
 NhnG0IiIClWh3WQ2/GdsFK/obIdFkpLeASli1hwD81vzPfly9zc2YpgqydZI3WCr
 q6hTXYnANcP6+eciCpQPO7giRdXdiKey08Uoq/2jxb7Qbm4daG6UwopjvH9/lm1F
 m6UDaDvzNYm+Rx+bL/+KSx9JO9+fJB1L51yCmvLGpWi6yJI4ZTfanHNMBsCua46N
 Kev/DSpIAzX1WOBkte+a
 =rspQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull Mellanox rdma updates from Doug Ledford:
 "Mellanox specific updates for 4.11 merge window

  Because the Mellanox code required being based on a net-next tree, I
  keept it separate from the remainder of the RDMA stack submission that
  is based on 4.10-rc3.

  This branch contains:

   - Various mlx4 and mlx5 fixes and minor changes

   - Support for adding a tag match rule to flow specs

   - Support for cvlan offload operation for raw ethernet QPs

   - A change to the core IB code to recognize raw eth capabilities and
     enumerate them (touches non-Mellanox code)

   - Implicit On-Demand Paging memory registration support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
  IB/mlx5: Fix configuration of port capabilities
  IB/mlx4: Take source GID by index from HW GID table
  IB/mlx5: Fix blue flame buffer size calculation
  IB/mlx4: Remove unused variable from function declaration
  IB: Query ports via the core instead of direct into the driver
  IB: Add protocol for USNIC
  IB/mlx4: Support raw packet protocol
  IB/mlx5: Support raw packet protocol
  IB/core: Add raw packet protocol
  IB/mlx5: Add implicit MR support
  IB/mlx5: Expose MR cache for mlx5_ib
  IB/mlx5: Add null_mkey access
  IB/umem: Indicate that process is being terminated
  IB/umem: Update on demand page (ODP) support
  IB/core: Add implicit MR flag
  IB/mlx5: Support creation of a WQ with scatter FCS offload
  IB/mlx5: Enable QP creation with cvlan offload
  IB/mlx5: Enable WQ creation and modification with cvlan offload
  IB/mlx5: Expose vlan offloads capabilities
  IB/uverbs: Enable QP creation with cvlan offload
  ...
2017-02-23 11:27:49 -08:00
Or Gerlitz
c4550c63b3 IB: Query ports via the core instead of direct into the driver
Change the drivers to call ib_query_port in their get port
immutable handler instead of their own query port handler.

Doing this required to set the core cap flags of this device
before the ib_query_port call is made, since the IB core might
need these caps to serve the port query.

Drivers are ensured by the IB core that the port attributes passed
to the port query verb implementation are zero, and hence we
removed the zeroing from the drivers.

This patch doesn't add any new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:22 -05:00
Bart Van Assche
0bbb3b7496 IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
Make the rxe and rdmavt drivers use dma_virt_ops. Update the
comments that refer to the source files removed by this patch.
Remove struct ib_dma_mapping_ops. Remove ib_device.dma_ops.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Andrew Boyer <andrew.boyer@dell.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Jonathan Toppins <jtoppins@redhat.com>
Cc: Alex Estrin <alex.estrin@intel.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:31:32 -05:00
Bart Van Assche
85e9f1dbbd IB/rxe: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
839f5ac0d8 IB/rxe: Remove a pointless indirection layer
Neither rxe->ifc_ops nor any of the function pointers in struct
struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops
indirection mechanism.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Doug Ledford
9032ad78bb Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-test 2016-12-14 14:44:47 -05:00
Moni Shoua
477864c8fc IB/core: Let create_ah return extended response to user
Add struct ib_udata to the signature of create_ah callback that is
implemented by IB device drivers. This allows HW drivers to return extra
data to the userspace library.
This patch prepares the ground for mlx5 driver to resolve destination
mac address for a given GID and return it to userspace.
This patch was previously submitted by Knut Omang as a part of the
patch set to support Oracle's Infiniband HCA (SIF).

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:38:27 -05:00
Andrew Boyer
5b9ea16c54 IB/rxe: Fix ref leak in rxe_create_qp()
The udata->inlen error path needs to clean up the ref
added by rxe_alloc().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer
accacb8f51 IB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTS
Peek at the CQ after arming it so that we can return a hint.
This avoids missed completions due to a race between posting
CQEs and arming the CQ.

For example, CM teardown waits on MAD requests to complete with
ib_cq_poll_work(). Without this fix, the last completion might be
left on the CQ, hanging the kthread doing the teardown.

The console backtraces look like this:

[ 4199.911284] Call Trace:
[ 4199.911401]  [<ffffffff9657fe95>] schedule+0x35/0x80
[ 4199.911556]  [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0
[ 4199.911727]  [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20
[ 4199.911891]  [<ffffffff96580903>] wait_for_completion+0xb3/0x130
[ 4199.912067]  [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70
[ 4199.912243]  [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm]
[ 4199.912422]  [<ffffffff961615d5>] ? printk+0x57/0x73
[ 4199.912578]  [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm]
[ 4199.912759]  [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm]
[ 4199.912941]  [<ffffffffc076f2cc>] 0xffffffffc076f2cc

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Parav Pandit
e404f945a6 IB/rxe: improved debug prints & code cleanup
1. Debugging qp state transitions and qp errors in loopback and
multiple QP tests is difficult without qp numbers in debug logs.
This patch adds qp number to important debug logs.

2. Instead of having rxe: prefix in few logs and not having in
few logs, using uniform module name prefix using pr_fmt macro.

3. Code cleanup for various warnings reported by checkpatch for
incomplete unsigned data type, line over 80 characters, return
statements.

Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06 13:50:04 -04:00
Parav Pandit
063af59597 IB/rxe: Avoid scheduling tasklet for userspace QP
This patch avoids scheduing tasklet for WQE and protocol processing
for user space QP. It performs the task in calling process context.

To improve code readability kernel specific post_send handling moved to
post_send_kernel() function.

Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06 13:50:04 -04:00
Moni Shoua
8700e3e7c4 Soft RoCE driver
Soft RoCE (RXE) - The software RoCE driver

ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device.  This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.

The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.

A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices.  The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.

Architecture:

     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
+---------------------------------------------------------------+
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Soft RoCE resources:

[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04 11:13:12 -04:00