Commit Graph

1916 Commits

Author SHA1 Message Date
Yishai Hadas
7be76bef32 IB/mlx5: Introduce VAR object and its alloc/destroy methods
Introduce VAR object and its alloc/destroy KABI methods. The internal
implementation uses the IB core API to manage mmap/munamp calls.

Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Yishai Hadas
f164be8c03 IB/mlx5: Extend caps stage to handle VAR capabilities
Extend caps stage to handle VAR capabilities.

Link: https://lore.kernel.org/r/20191212110928.334995-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Ingo Molnar
57ad87ddce Merge branch 'x86/mm' into efi/core, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:53:14 +01:00
Parav Pandit
4cca96a8d9 IB/mlx5: Do reverse sequence during device removal
When IB device profile initialization completes, device is marked as
active.

However, IB device is not marked inactive, during device removal flow. It
should be the mirror of the add flow.

Hence, mark it inactive during remove sequence.

Link: https://lore.kernel.org/r/20191212113024.336702-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07 20:18:10 -04:00
zhengbin
2ab367a70a RDMA/mlx5: use true,false for bool variable
Fixes coccicheck warning:

drivers/infiniband/hw/mlx5/mr.c:150:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/mr.c:1455:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/infiniband/hw/mlx5/qp.c:1874:6-20: WARNING: Assignment of 0/1 to bool variable

Link: https://lore.kernel.org/r/1577176812-2238-6-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 19:14:00 -04:00
Artemy Kovalyov
cbe4b8f0a5 IB/mlx5: Unify ODP MR code paths to allow extra flexibility
Building MR translation table in the ODP case requires additional
flexibility, namely random access to DMA addresses. Make both direct and
indirect ODP MR use same code path, separated from the non-ODP MR code
path.

With the restructuring the correct page_shift is now used around
__mlx5_ib_populate_pas().

Fixes: d2183c6f19 ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Link: https://lore.kernel.org/r/20191222124649.52300-2-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 17:00:13 -04:00
Prabhath Sajeepa
b5671afe5e IB/mlx5: Fix outstanding_pi index for GSI qps
Commit b0ffeb537f ("IB/mlx5: Fix iteration overrun in GSI qps") changed
the way outstanding WRs are tracked for the GSI QP. But the fix did not
cover the case when a call to ib_post_send() fails and updates index to
track outstanding.

Since the prior commmit outstanding_pi should not be bounded otherwise the
loop generate_completions() will fail.

Fixes: b0ffeb537f ("IB/mlx5: Fix iteration overrun in GSI qps")
Link: https://lore.kernel.org/r/1576195889-23527-1-git-send-email-psajeepa@purestorage.com
Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 15:56:41 -04:00
Yishai Hadas
dc2316eba7 IB/mlx5: Fix device memory flows
Fix device memory flows so that only once there will be no live mmaped
VA to a given allocation the matching object will be destroyed.

This prevents a potential scenario that existing VA that was mmaped by
one process might still be used post its deallocation despite that it's
owned now by other process.

The above is achieved by integrating with IB core APIs to manage
mmap/munmap. Only once the refcount will become 0 the DM object and its
underlay area will be freed.

Fixes: 3b113a1ec3 ("IB/mlx5: Support device memory type attribute")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 16:55:36 -05:00
Maor Gottlieb
ed9085fed9 IB/mlx5: Fix steering rule of drop and count
There are two flow rule destinations: QP and packet. While users are
setting DROP packet rule, the QP should not be set as a destination.

Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12 15:38:16 -05:00
Ingo Molnar
eb243d1d28 x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>
pat.h is a file whose main purpose is to provide the memtype_*() APIs.

PAT is the low level hardware mechanism - but the high level abstraction
is memtype.

So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Linus Torvalds
aa32f11691 hmm related patches for 5.5
This is another round of bug fixing and cleanup. This time the focus is on
 the driver pattern to use mmu notifiers to monitor a VA range. This code
 is lifted out of many drivers and hmm_mirror directly into the
 mmu_notifier core and written using the best ideas from all the driver
 implementations.
 
 This removes many bugs from the drivers and has a very pleasing
 diffstat. More drivers can still be converted, but that is for another
 cycle.
 
 - A shared branch with RDMA reworking the RDMA ODP implementation
 
 - New mmu_interval_notifier API. This is focused on the use case of
   monitoring a VA and simplifies the process for drivers
 
 - A common seq-count locking scheme built into the mmu_interval_notifier
   API usable by drivers that call get_user_pages() or hmm_range_fault()
   with the VA range
 
 - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev
   drivers to the new API. This deletes a lot of wonky driver code.
 
 - Two improvements for hmm_range_fault(), from testing done by Ralph
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3cCjQACgkQOG33FX4g
 mxpp8xAAiR9iOdT28m/tx1GF31XludrMhRZVIiz0vmCIxIiAkWekWEfAEVm9PDnh
 wdrxTJohSs+B65AK3sfToOM3AIuNCuFVWmbbHI5qmOO76vaSvcZa905Z++pNsawO
 Bn8mgRCprYoFHcxWLvTvnA5U0g1S2BSSOwBSZI43CbEnVvHjYAR6MnvRqfGMk+NF
 bf8fTk/x+fl0DCemhynlBLuJkogzoE2Hgl0yPY5bFna4PktOxdpa1yPaQsiqZ7e6
 2s2NtM3pbMBJk0W42q5BU+aPhiqfxFFszasPSLBduXrD2xDsG76HJdHj5VydKmfL
 nelG4BvqJozXTEZWvTEePYhCqaZ41eJZ7Asw8BXtmacVqE5mDlTXo/Zdgbz7yEOR
 mI5MVyjD5rauZJldUOWXbwrPoWVFRvboauehiSgqvxvT9HvlFp9GKObSuu4gubBQ
 mzxs4t48tPhA7bswLmw0/pETSogFuVDfaB7hsyY0gi8EwxMFMpw2qFypm1PEEF+C
 BuUxCSShzvNKrraNe5PWaNNFd3AzIwAOWJHE+poH4bCoXQVr5nA+rq2gnHkdY5vq
 /xrBCyxkf0U05YoFGYembPVCInMehzp9Xjy8V+SueSvCg2/TYwGDCgGfsbe9dNOP
 Bc40JpS7BDn5w9nyLUJmOx7jfruNV6kx1QslA7NDDrB/rzOlsEc=
 =Hj8a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This is another round of bug fixing and cleanup. This time the focus
  is on the driver pattern to use mmu notifiers to monitor a VA range.
  This code is lifted out of many drivers and hmm_mirror directly into
  the mmu_notifier core and written using the best ideas from all the
  driver implementations.

  This removes many bugs from the drivers and has a very pleasing
  diffstat. More drivers can still be converted, but that is for another
  cycle.

   - A shared branch with RDMA reworking the RDMA ODP implementation

   - New mmu_interval_notifier API. This is focused on the use case of
     monitoring a VA and simplifies the process for drivers

   - A common seq-count locking scheme built into the
     mmu_interval_notifier API usable by drivers that call
     get_user_pages() or hmm_range_fault() with the VA range

   - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen
     GntDev drivers to the new API. This deletes a lot of wonky driver
     code.

   - Two improvements for hmm_range_fault(), from testing done by Ralph"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap
  mm/hmm: make full use of walk_page_range()
  xen/gntdev: use mmu_interval_notifier_insert
  mm/hmm: remove hmm_mirror and related
  drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror
  drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror
  drm/amdgpu: Call find_vma under mmap_sem
  nouveau: use mmu_interval_notifier instead of hmm_mirror
  nouveau: use mmu_notifier directly for invalidate_range_start
  drm/radeon: use mmu_interval_notifier_insert
  RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv
  RDMA/odp: Use mmu_interval_notifier_insert()
  mm/hmm: define the pre-processor related parts of hmm.h even if disabled
  mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror
  mm/mmu_notifier: add an interval tree notifier
  mm/mmu_notifier: define the header pre-processor parts even if disabled
  mm/hmm: allow snapshot of the special zero page
2019-11-30 10:33:14 -08:00
Linus Torvalds
9a3d7fd275 Driver core patches for 5.5-rc1
Here is the "big" set of driver core patches for 5.5-rc1
 
 There's a few minor cleanups and fixes in here, but the majority of the
 patches in here fall into two buckets:
   - debugfs api cleanups and fixes
   - driver core device link support for boot dependancy issues
 
 The debugfs api cleanups are working to slowly refactor the debugfs apis
 so that it is even harder to use incorrectly.  That work has been
 happening for the past few kernel releases and will continue over time,
 it's a long-term project/goal
 
 The driver core device link support missed 5.4 by just a bit, so it's
 been sitting and baking for many months now.  It's from Saravana Kannan
 to help resolve the problems that DT-based systems have at boot time
 with dependancy graphs and kernel modules.  Turns out that no one has
 actually tried to build a generic arm64 kernel with loads of modules and
 have it "just work" for a variety of platforms (like a distro kernel)
 The big problem turned out to be a lack of depandancy information
 between different areas of DT entries, and the work here resolves that
 problem and now allows devices to boot properly, and quicker than a
 monolith kernel.
 
 All of these patches have been in linux-next for a long time with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXd6m6Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yntJQCcCqg6RQ7LTdHuZv1ETeefXlsfk00An1Jtean6
 42bWGx52bGFvAcpjWy8R
 =P7hq
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.5-rc1

  There's a few minor cleanups and fixes in here, but the majority of
  the patches in here fall into two buckets:

   - debugfs api cleanups and fixes

   - driver core device link support for boot dependancy issues

  The debugfs api cleanups are working to slowly refactor the debugfs
  apis so that it is even harder to use incorrectly. That work has been
  happening for the past few kernel releases and will continue over
  time, it's a long-term project/goal

  The driver core device link support missed 5.4 by just a bit, so it's
  been sitting and baking for many months now. It's from Saravana Kannan
  to help resolve the problems that DT-based systems have at boot time
  with dependancy graphs and kernel modules. Turns out that no one has
  actually tried to build a generic arm64 kernel with loads of modules
  and have it "just work" for a variety of platforms (like a distro
  kernel). The big problem turned out to be a lack of dependency
  information between different areas of DT entries, and the work here
  resolves that problem and now allows devices to boot properly, and
  quicker than a monolith kernel.

  All of these patches have been in linux-next for a long time with no
  reported issues"

* tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits)
  tracing: Remove unnecessary DEBUG_FS dependency
  of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
  debugfs: Fix !DEBUG_FS debugfs_create_automount
  of: property: Add device link support for "iommu-map"
  of: property: Fix the semantics of of_is_ancestor_of()
  i2c: of: Populate fwnode in of_i2c_get_board_info()
  drivers: base: Fix Kconfig indentation
  firmware_loader: Fix labels with comma for builtin firmware
  driver core: Allow device link operations inside sync_state()
  driver core: platform: Declare ret variable only once
  cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h>
  crypto: hisilicon: no need to check return value of debugfs_create functions
  driver core: platform: use the correct callback type for bus_find_device
  firmware_class: make firmware caching configurable
  driver core: Clarify documentation for fwnode_operations.add_links()
  mailbox: tegra: Fix superfluous IRQ error message
  net: caif: Fix debugfs on 64-bit platforms
  mac80211: Use debugfs_create_xul() helper
  media: c8sectpfe: no need to check return value of debugfs_create functions
  of: property: Add device link support for iommus, mboxes and io-channels
  ...
2019-11-27 11:06:20 -08:00
Linus Torvalds
d768869728 RDMA subsystem updates for 5.5
Mainly a collection of smaller of driver updates this cycle.
 
 - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr,
   iw_cxgb4, vmw_pvrdma, mlx5
 
 - Improvements in SRPT from working with iWarp
 
 - SRIOV VF support for bnxt_re
 
 - Skeleton kernel-doc files for drivers/infiniband
 
 - User visible counters for events related to ODP
 
 - Common code for tracking of mmap lifetimes so that drivers can link HW
   object liftime to a VMA
 
 - ODP bug fixes and rework
 
 - RDMA READ support for efa
 
 - Removal of the very old cxgb3 driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3dwFYACgkQOG33FX4g
 mxohHRAAnkUr0SKh1uV7vuhA8sUlejkwAaw2V2Sm3E1y4GiPpfRAak/FcbEu9F8l
 E7Q7VI04DRKTTpTRmREVyKYjlKr5QOA4mSryNMfBZobjkp+t++U1GC9YKL3zh65U
 odyJedtv3zKCLzV9RBwX06C8Vi1PQNQp7RbQ42xH/IyKXJIZPHeCUeKv7PsfTWVo
 vhuXFc9pPwqHDfRVTbj6s1g+OwVuRHc+SWep6eTKLGYvt8CqdN9WEpA0sJrlwPet
 NLTZTrFDpuC12RPc4Lo5c7R5MeAzHgojZCeSFaL2DhJLOx3kfmU30wG+rV94Lvsq
 Y2yt6BwKeLleKzpR5ApkuIVHQt2KECPrJbmVLDFqi+rT7yvzsd2AB+uiCS50+LPG
 h2UpWJdKWtrnSzTqbJQieXd+oT253Dk+ciy7zbdPSGPwz1dc/Cna9TTn26X2SezR
 bmRhHOykrh7LCNrv/7jiSq/zWApGTlR9YZ86x9Li+lJ3kkG+7d2DBEofDU+oKE5F
 p0+1Cer1td5IkN3N8oDpvyXAbCIv7qdWwXB2Cm7dSfUETIpAKnXiBxFCHyvmsuus
 7gGyKZh2490N3SoIL1muu12dIvhPC2Obg9ckCNU3ldw9+uPJkCzrs+n+JtkIW0mi
 i1fMjRQ6FumVT6/KdavYgeCsHmItVqv03SGdFsHF3SGwvWo6y8Y=
 =UV2u
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Again another fairly quiet cycle with few notable core code changes
  and the usual variety of driver bug fixes and small improvements.

   - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr,
     iw_cxgb4, vmw_pvrdma, mlx5

   - Improvements in SRPT from working with iWarp

   - SRIOV VF support for bnxt_re

   - Skeleton kernel-doc files for drivers/infiniband

   - User visible counters for events related to ODP

   - Common code for tracking of mmap lifetimes so that drivers can link
     HW object liftime to a VMA

   - ODP bug fixes and rework

   - RDMA READ support for efa

   - Removal of the very old cxgb3 driver"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits)
  RDMA/hns: Delete unnecessary callback functions for cq
  RDMA/hns: Rename the functions used inside creating cq
  RDMA/hns: Redefine the member of hns_roce_cq struct
  RDMA/hns: Redefine interfaces used in creating cq
  RDMA/efa: Expose RDMA read related attributes
  RDMA/efa: Support remote read access in MR registration
  RDMA/efa: Store network attributes in device attributes
  IB/hfi1: remove redundant assignment to variable ret
  RDMA/bnxt_re: Fix missing le16_to_cpu
  RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices
  RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series
  RDMA/bnxt_re: Fix Kconfig indentation
  IB/mlx5: Implement callbacks for getting VFs GUID attributes
  IB/ipoib: Add ndo operation for getting VFs GUID attributes
  IB/core: Add interfaces to get VF node and port GUIDs
  net/core: Add support for getting VF GUIDs
  RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset
  RDMA/cm: Use refcount_t type for refcount variable
  IB/mlx5: Support extended number of strides for Striding RQ
  IB/mlx4: Update HW GID table while adding vlan GID
  ...
2019-11-27 10:17:28 -08:00
Jason Gunthorpe
3694e41e41 Merge branch 'ib-guids' into rdma.git for-next
Danit Goldberg says:

====================
This series extends RTNETLINK to provide IB port and node GUIDs, which
were configured for Infiniband VFs.

The functionality to set VF GUIDs already existed for a long time, and
here we are adding the missing "get" so that netlink will be symmetric and
various cloud orchestration tools will be able to manage such VFs more
naturally.

The iproute2 was extended too to present those GUIDs.

- ip link show <device>

For example:
- ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33
- ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10
- ip link show ib4
    ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256
    link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0     link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
    spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'ib-guids': (35 commits)
  IB/mlx5: Implement callbacks for getting VFs GUID attributes
  IB/ipoib: Add ndo operation for getting VFs GUID attributes
  IB/core: Add interfaces to get VF node and port GUIDs
  net/core: Add support for getting VF GUIDs

  net/mlx5: Add new chain for netfilter flow table offload
  net/mlx5: Refactor creating fast path prio chains
  net/mlx5: Accumulate levels for chains prio namespaces
  net/mlx5: Define fdb tc levels per prio
  net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines
  net/mlx5: Simplify fdb chain and prio eswitch defines
  IB/mlx5: Load profile according to RoCE enablement state
  IB/mlx5: Rename profile and init methods
  net/mlx5: Handle "enable_roce" devlink param
  net/mlx5: Document flow_steering_mode devlink param
  devlink: Add new "enable_roce" generic device param
  net/mlx5: fix spelling mistake "metdata" -> "metadata"
  net/mlx5: fix kvfree of uninitialized pointer spec
  IB/mlx5: Introduce and use mlx5_core_is_vf()
  net/mlx5: E-switch, Enable metadata on own vport
  net/mlx5: Refactor ingress acl configuration
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25 10:31:47 -04:00
Jason Gunthorpe
f25a546e65 RDMA/odp: Use mmu_interval_notifier_insert()
Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:

 zap_page_range()
  mmu_notifier_invalidate_range_start()
   [..]
    ib_umem_notifier_invalidate_range_start()
       down_read(&per_mm->umem_rwsem)
  unmap_single_vma()
    [..]
      __split_huge_page_pmd()
        mmu_notifier_invalidate_range_start()
        [..]
           ib_umem_notifier_invalidate_range_start()
              down_read(&per_mm->umem_rwsem)   // DEADLOCK

        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
  mmu_notifier_invalidate_range_end()
     up_read(&per_mm->umem_rwsem)

The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.

Fixes: ca748c39ea ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca
Tested-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23 19:56:44 -04:00
Danit Goldberg
9c0015ef09 IB/mlx5: Implement callbacks for getting VFs GUID attributes
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW
for VFs attributes and return node and port GUIDs.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22 18:17:24 +02:00
Mark Zhang
c16339b69c IB/mlx5: Support extended number of strides for Striding RQ
Extends the minimum single WQE strides from 64 to 8, which is exposed
by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps.
Choose right number of strides based on FW capability.

Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19 16:05:41 -04:00
Christoph Hellwig
72b894b09a IB/umem: remove the dmasync argument to ib_umem_get
The argument is always ignored, so remove it.

Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17 10:37:00 -04:00
Saeed Mahameed
c94ef13b04 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
1) New generic devlink param "enable_roce", for downstream devlink
   reload support

2) Do vport ACL configuration on per vport basis when
   enabling/disabling a vport. This enables to have vports enabled/disabled
   outside of eswitch config for future

3) Split the code for legacy vs offloads mode and make it clear

4) Tide up vport locking and workqueue usage

5) Fix metadata enablement for ECPF

6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION

7) E-Switch and flow steering core low level support and refactoring for
   netfilter flowtables offload

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-13 14:24:58 -08:00
Yevgeny Kliteynik
208d70f562 IB/mlx5: Support flow counters offset for bulk counters
Add support for flow steering counters action with a non-base counter
ID (offset) for bulk counters.

When creating a flow counter object, save the bulk value.  This value is
used when a flow action with a non-base counter ID is requested - to
validate that the required offset is in the range of the allocated bulk.

Link: https://lore.kernel.org/r/20191103140723.77411-1-leon@kernel.org
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-13 15:42:36 -04:00
Leon Romanovsky
e26e7b88f6 RDMA: Change MAD processing function to remove extra casting and parameter
All users of process_mad() converts input pointers from ib_mad_hdr to be
ib_mad, update the function declaration to use ib_mad directly.

Also remove not used input MAD size parameter.

Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-12 20:20:15 -04:00
Michael Guralnik
94de879c28 IB/mlx5: Load profile according to RoCE enablement state
When RoCE is disabled load mlx5_ib in raw_eth profile.
Clean pf_profile roce capability checks as it will not be used without
roce capability.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11 12:15:29 -08:00
Michael Guralnik
b5a498baf9 IB/mlx5: Rename profile and init methods
Rename uplink_rep_profile and its unique init and cleanup stages to
suit its upcoming use as the profile when RoCE is disabled.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11 12:15:29 -08:00
Leon Romanovsky
ffa2fd1323 RDMA/mlx5: Rewrite MAD processing logic to be readable
Multiple "if"s and "||" make extension of process_mad() function
as a tedious task, rewrite that function to be more readable.

Link: https://lore.kernel.org/r/20191029062745.7932-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 16:00:06 -04:00
Leon Romanovsky
dd0b0159f7 RDMA/mad: Do not check MAD sizes in roce and ib drivers
All callers for process_mad allocate MAD structures with proper sizes,
there is no need to recheck it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:55:22 -04:00
Leon Romanovsky
be4a8d4673 RDMA/mad: Allocate zeroed MAD buffer
Ensure that MAD output buffer is zero-based allocated in all the callers
of process_mad and remove the various memset()'s from the drivers.

Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 15:38:28 -04:00
Michal Kalderon
c043ff2cfb RDMA: Connect between the mmap entry and the umap_priv structure
The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped.  The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before.  Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06 13:08:01 -04:00
Greg Kroah-Hartman
09b0965ee8 IB: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-04 08:38:58 +01:00
Parav Pandit
e53a9d26cf IB/mlx5: Introduce and use mlx5_core_is_vf()
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().

This enables to clearly identify PF, VF and non virtual functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:40:27 -07:00
Michael Guralnik
11f552e217 IB/mlx5: Test write combining support
Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.

Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine.  The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.

The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.

Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.

Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:52:51 -03:00
Leon Romanovsky
546d30099e RDMA/mlx5: Return proper error value
Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-31 15:31:49 -03:00
Jason Gunthorpe
bb3dba3300 Merge branch 'odp_rework' into rdma.git for-next
Jason Gunthorpe says:

====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.

This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.

The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.

This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================

Merge conflicts with the odp statistics patch resolved.

* branch 'odp_rework':
  RDMA/odp: Remove broken debugging call to invalidate_range
  RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
  RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
  RDMA/mlx5: Rework implicit ODP destroy
  RDMA/mlx5: Avoid double lookups on the pagefault path
  RDMA/mlx5: Reduce locking in implicit_mr_get_data()
  RDMA/mlx5: Use an xarray for the children of an implicit ODP
  RDMA/mlx5: Split implicit handling from pagefault_mr
  RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
  RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
  RDMA/mlx5: Rework implicit_mr_get_data
  RDMA/mlx5: Delete struct mlx5_priv->mkey_table
  RDMA/mlx5: Use a dedicated mkey xarray for ODP
  RDMA/mlx5: Split sig_err MR data into its own xarray
  RDMA/mlx5: Use SRCU properly in ODP prefetch

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:47:52 -03:00
Jason Gunthorpe
09689703d2 RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
d561987f34 RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
These mkeys are entirely internal and are never used by the HW for
page fault. They should also never be used by userspace for prefetch.
Simplify & optimize things by not including them in the xarray.

Since the prefetch path can now never see a child mkey there is no need
for the second synchronize_srcu() during imr destroy.

Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
5256edcb98 RDMA/mlx5: Rework implicit ODP destroy
Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
b70d785d23 RDMA/mlx5: Avoid double lookups on the pagefault path
Now that the locking is simplified combine pagefault_implicit_mr() with
implicit_mr_get_data() so that we sweep over the idx range only once,
and do the single xlt update at the end, after the child umems are
setup.

This avoids double iteration/xa_loads plus the sketchy failure path if the
xa_load() fails.

Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
3389baa831 RDMA/mlx5: Reduce locking in implicit_mr_get_data()
Now that the child MRs are stored in an xarray we can rely on the SRCU
lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
to resolve races with concurrent page fault.

This reduces the scope of the critical section of umem_mutex for implicit
MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
the child MR is already in the xarray. This makes it consistent with the
normal ODP MR critical section for umem_lock, and the locking approach
used for destroying an unusued implicit child MR.

The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
since it is no longer called with any locks.

Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
423f52d650 RDMA/mlx5: Use an xarray for the children of an implicit ODP
Currently the child leaves are stored in the shared interval tree and
every lookup for a child must be done under the interval tree rwsem.

This is further complicated by dropping the rwsem during iteration (ie the
odp_lookup(), odp_next() pattern), which requires a very tricky an
difficult to understand locking scheme with SRCU.

Instead reserve the interval tree for the exclusive use of the mmu
notifier related code in umem_odp.c and give each implicit MR a xarray
containing all the child MRs.

Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
of VA, and a 2 level will index 2TB, making xarray a much better
data structure choice than an interval tree.

The locking properties of xarray will be used in the next patches to
rework the implicit ODP locking scheme into something simpler.

At this point, the xarray is locked by the implicit MR's umem_mutex, and
read can also be locked by the odp_srcu.

Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:14 -03:00
Jason Gunthorpe
54375e7382 RDMA/mlx5: Split implicit handling from pagefault_mr
The single routine has a very confusing scheme to advance to the next
child MR when working on an implicit parent. This scheme can only be used
when working with an implicit parent and must not be triggered when
working on a normal MR.

Re-arrange things by directly putting all the single-MR stuff into one
function and calling it in a loop for the implicit case. Simplify some of
the error handling in the new pagefault_real_mr() to remove unneeded gotos.

Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
9162420dde RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
Instead of rewriting all the IOVA's to 0 as things progress down the tree
make the IOVA of the children equal to placement in the tree. This makes
things easier to understand by keeping mmkey.iova == HW configuration.

Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
c2edcd6935 RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
This makes the routines easier to understand, particularly with respect
the locking requirements of the entire sequence. The implicit_mr_alloc()
had a lot of ifs specializing it to each of the callers, and only a very
small amount of code was actually shared.

Following patches will cause the flow in the two functions to diverge
further.

Link: https://lore.kernel.org/r/20191009160934.3143-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
3d5f3c54e7 RDMA/mlx5: Rework implicit_mr_get_data
This function is intended to loop across each MTT chunk in the implicit
parent that intersects the range [io_virt, io_virt+bnct).  But it is has a
confusing construction, so:

- Consistently use imr and odp_imr to refer to the implicit parent
  to avoid confusion with the normal mr and odp of the child
- Directly compute the inclusive start/end indexes by shifting. This is
  clearer to understand the intent and avoids any errors from unaligned
  values of addr
- Iterate directly over the range of MTT indexes, do not make a loop
  out of goto
- Follow 'success oriented flow', with goto error unwind
- Directly calculate the range of idx's that need update_xlt
- Ensure that any leaf MR added to the interval tree always results in an
  update to the XLT

Link: https://lore.kernel.org/r/20191009160934.3143-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
74bddb3682 RDMA/mlx5: Delete struct mlx5_priv->mkey_table
No users are left, delete it.

Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
806b101b2b RDMA/mlx5: Use a dedicated mkey xarray for ODP
There is a per device xarray storing mkeys that is used to store every
mkey in the system. However, this xarray is now only read by ODP for
certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT).

Create an xarray only for use by ODP, that only contains ODP related
MKeys. This xarray is protected by SRCU and all erases are protected by a
synchronize.

This improves performance:

 - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp()
   can be deleted. The xarray will also consume fewer nodes.

 - normal MR's are never mixed with ODP MRs in a SRCU data structure so
   performance sucking synchronize_srcu() on every MR destruction is not
   needed.

 - No smp_load_acquire(live) and xa_load() double barrier on read

Due to the SRCU locking scheme care must be taken with the placement of
the xa_store(). Once it completes the MR is immediately visible to other
threads and only through a xa_erase() & synchronize_srcu() cycle could it
be destroyed.

Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
50211ec944 RDMA/mlx5: Split sig_err MR data into its own xarray
The locking model for signature is completely different than ODP, do not
share the same xarray that relies on SRCU locking to support ODP.

Simply store the active mlx5_core_sig_ctx's in an xarray when signature
MRs are created and rely on trivial xarray locking to serialize
everything.

The overhead of storing only a handful of SIG related MRs is going to be
much less than an xarray full of every mkey.

Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
fb985e278a RDMA/mlx5: Use SRCU properly in ODP prefetch
When working with SRCU protected xarrays the xarray itself should be the
SRCU 'update' point. Instead prefetch is using live as the SRCU update
point and this prevents switching the locking design to use the xarray
instead.

To solve this the prefetch must only read from the xarray once, and hold
on to the actual MR pointer for the duration of the async
operation. Incrementing num_pending_prefetch delays destruction of the MR,
so it is suitable.

Prefetch calls directly to the pagefault_mr using the MR pointer and only
does a single xarray lookup.

All the testing if a MR is prefetchable or not is now done only in the
prefetch code and removed from the pagefault critical path.

Link: https://lore.kernel.org/r/20191009160934.3143-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:41:13 -03:00
Jason Gunthorpe
036313316d Linux 5.4-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl210Z8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv+kIAKRpO7EuDokQL4qp
 hxEEaCMJA1T055EMlNU6FVAq/ZbmapzreUyNYiRMpPWKGTWNMkhIcZQfysYeGZz5
 y/KRxAiVxlcB+3v3yRmoZd/XoQmhgvJmqD4zhaGI2Utonow4f/SGSEFFZqqs9WND
 4HJROjZHgQ4JBxg9Z+QMo0FxbV/DCZpEOUq51N9WJywyyDRb18zotE83stpU+pE2
 fjqT7mk0NLrnYXuDRAbFC1Aau9ed4H6LlwLmxaqxq/Pt5Rz7wIKwKL9HIT4Dm/0a
 qpani6phhHWL7MwUpa2wkEonFCD03rJFl3DUVJo64Ijh4up5D/jyXQ+GKV2P4WKJ
 275Rb5Q=
 =WiZZ
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc5' into rdma.git for-next

Linux 5.4-rc5

For dependencies in the next patches

Conflict resolved by keeping the delete of the unlock.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 16:36:29 -03:00
Jason Gunthorpe
1524b12a6e RDMA/mlx5: Use irq xarray locking for mkey_table
The mkey_table xarray is touched by the reg_mr_callback() function which
is called from a hard irq. Thus all other uses of xa_lock must use the
_irq variants.

  WARNING: inconsistent lock state
  5.4.0-rc1 #12 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30
  {IN-HARDIRQ-W} state was registered at:
    lock_acquire+0xe1/0x200
    _raw_spin_lock_irqsave+0x35/0x50
    reg_mr_callback+0x2dd/0x450 [mlx5_ib]
    mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core]
    mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core]
   [..]

   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&xa->xa_lock)->rlock#3);
    <Interrupt>
      lock(&(&xa->xa_lock)->rlock#3);

   *** DEADLOCK ***

  2 locks held by python3/343:
   #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs]
   #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs]

  stack backtrace:
  CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x86/0xca
   print_usage_bug.cold.50+0x2e5/0x355
   mark_lock+0x871/0xb50
   ? match_held_lock+0x20/0x250
   ? check_usage_forwards+0x240/0x240
   __lock_acquire+0x7de/0x23a0
   ? __kasan_check_read+0x11/0x20
   ? mark_lock+0xae/0xb50
   ? mark_held_locks+0xb0/0xb0
   ? find_held_lock+0xca/0xf0
   lock_acquire+0xe1/0x200
   ? xa_erase+0x12/0x30
   _raw_spin_lock+0x2a/0x40
   ? xa_erase+0x12/0x30
   xa_erase+0x12/0x30
   mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib]
   uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs]
   uverbs_free_mw+0x1a/0x20 [ib_uverbs]
   destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs]
   [..]

Fixes: 0417791536 ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases")
Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28 14:08:48 -03:00
Michael Guralnik
3f89b01f4b IB/mlx5: Align usage of QP1 create flags with rest of mlx5 defines
There is little value in keeping separate function for one flag, provide
it directly like any other mlx5 define.

Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:39:49 -03:00
Ran Rozenstein
68abaa765e IB/mlx5: Remove dead code
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the
dead code.

Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org
Signed-off-by: Ran Rozenstein <ranro@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 16:37:48 -03:00
Erez Alfasi
4061ff7aa3 RDMA/nldev: Provide MR statistics
Add RDMA nldev netlink interface for dumping MR statistics information.

Output example:

$ ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

$ rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0

Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:33:31 -03:00
Erez Alfasi
e1b95ae0b0 RDMA/mlx5: Return ODP type per MR
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show
mr' dump.

For example:

$ rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit

For non-ODP MRs, we won't print "drv_odp ..." at all.

Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Erez Alfasi
a3de94e3d6 IB/mlx5: Introduce ODP diagnostic counters
Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
 1) Page faults:
	Total number of faulted pages.
 2) Page invalidations:
	Total number of pages invalidated by the OS during all
	invalidation events. The translations can be no longer
	valid due to either non-present pages or mapping changes.

Link: https://lore.kernel.org/r/20191016062308.11886-2-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 15:22:47 -03:00
Jason Gunthorpe
a2aca4d7f0 Merge branch 'mlx5-rd-sgl' into rdma.git for-next
From Yamin Friedman:

====================
This series from Yamin implements long standing "TODO" existed in rw.c. It
allows the driver to specify a cut-over point where it is faster to build
a lkey MR rather than do a large SGL for RDMA READ operations.

mlx5 HW gets a notable performane boost by switching to MRs.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'mlx5-rd-sgl': (3 commits)
  RDMA/mlx5: Add capability for max sge to get optimized performance
  RDMA/rw: Support threshold for registration vs scattering to local pages
  net/mlx5: Expose optimal performance scatter entries capability
2019-10-22 14:27:25 -03:00
Yamin Friedman
366090564b RDMA/mlx5: Add capability for max sge to get optimized performance
Allows the IB device to provide a value of maximum scatter gather entries
per RDMA READ.

In certain cases it may be preferable for a device to perform UMR memory
registration rather than have many scatter entries in a single RDMA READ.
This provides a significant performance increase in devices capable of
using different memory registration schemes based on the number of scatter
gather entries. This general capability allows each device vendor to fine
tune when it is better to use memory registration.

Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22 14:27:00 -03:00
Rafi Wiener
c8973df2da RDMA/mlx5: Clear old rate limit when closing QP
Before QP is closed it changes to ERROR state, when this happens
the QP was left with old rate limit that was already removed from
the table.

Fixes: 7d29f349a4 ("IB/mlx5: Properly adjust rate limit on QP state transitions")
Signed-off-by: Rafi Wiener <rafiw@mellanox.com>
Signed-off-by: Oleg Kuporosov <olegk@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17 16:07:25 -04:00
Parav Pandit
03232cc43c IB/mlx5: Introduce and use mkey context setting helper routine
Introduce and use set_mkc_access_pd_addr_fields() which sets mkey
context's access rights, PD, address fields.  Thereby avoid the code
duplication.

Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08 16:45:07 -03:00
Jason Gunthorpe
0417791536 RDMA/mlx5: Add missing synchronize_srcu() for MW cases
While MR uses live as the SRCU 'update', the MW case uses the xarray
directly, xa_erase() causes the MW to become inaccessible to the pagefault
thread.

Thus whenever a MW is removed from the xarray we must synchronize_srcu()
before freeing it.

This must be done before freeing the mkey as re-use of the mkey while the
pagefault thread is using the stale mkey is undesirable.

Add the missing synchronizes to MW and DEVX indirect mkey and delete the
bogus protection against double destroy in mlx5_core_destroy_mkey()

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
aa603815c7 RDMA/mlx5: Put live in the correct place for ODP MRs
live is used to signal to the pagefault thread that the MR is initialized
and ready for use. It should be after the umem is assigned and all other
setup is completed. This prevents races (at least) of the form:

    CPU0                                     CPU1
mlx5_ib_alloc_implicit_mr()
 implicit_mr_alloc()
  live = 1
 imr->umem = umem
                                    num_pending_prefetch_inc()
                                      if (live)
				        atomic_inc(num_pending_prefetch)
 atomic_set(num_pending_prefetch,0) // Overwrites other thread's store

Further, live is being used with SRCU as the 'update' in an
acquire/release fashion, so it can not be read and written raw.

Move all live = 1's to after MR initialization is completed and use
smp_store_release/smp_load_acquire() for manipulating it.

Add a missing live = 0 when an implicit MR child is deleted, before
queuing work to do synchronize_srcu().

The barriers in update_odp_mr() were some broken attempt to create a
acquire/release, but were not even applied consistently and missed the
point, delete it as well.

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
aa116b810a RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
During destroy setting live = 0 and then synchronize_srcu() prevents
num_pending_prefetch from incrementing, and also, ensures that all work
holding that count is queued on the WQ. Testing before causes races of the
form:

    CPU0                                         CPU1
  dereg_mr()
                                          mlx5_ib_advise_mr_prefetch()
            				   srcu_read_lock()
                                            num_pending_prefetch_inc()
					      if (!live)
   live = 0
   atomic_read() == 0
     // skip flush_workqueue()
                                              atomic_inc()
 					      queue_work();
            				   srcu_read_unlock()
   WARN_ON(atomic_read())  // Fails

Swap the order so that the synchronize_srcu() prevents this.

Fixes: a6bc3875f1 ("IB/mlx5: Protect against prefetch of invalid MR")
Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:22 -03:00
Jason Gunthorpe
9dc775e7f5 RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()
This fixes a race of the form:
    CPU0                               CPU1
mlx5_ib_invalidate_range()     mlx5_ib_invalidate_range()
				 // This one actually makes npages == 0
				 ib_umem_odp_unmap_dma_pages()
				 if (npages == 0 && !dying)
  // This one does nothing
  ib_umem_odp_unmap_dma_pages()
  if (npages == 0 && !dying)
     dying = 1;
                                    dying = 1;
				    schedule_work(&umem_odp->work);
     // Double schedule of the same work
     schedule_work(&umem_odp->work);  // BOOM

npages and dying must be read and written under the umem_mutex lock.

Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
mlx5_ib_update_xlt, and both need to be done in the same locking region,
hoist the lock out of unmap.

This avoids an expensive double critical section in
mlx5_ib_invalidate_range().

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Jason Gunthorpe
f28b1932ea RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR
mlx5_ib_update_xlt() must be protected against parallel free of the MR it
is accessing, also it must be called single threaded while updating the
HW. Otherwise we can have races of the form:

    CPU0                               CPU1
  mlx5_ib_update_xlt()
   mlx5_odp_populate_klm()
     odp_lookup() == NULL
     pklm = ZAP
                                      implicit_mr_get_data()
 				        implicit_mr_alloc()
 					  <update interval tree>
					mlx5_ib_update_xlt
					  mlx5_odp_populate_klm()
					    odp_lookup() != NULL
					    pklm = VALID
					   mlx5_ib_post_send_wait()

    mlx5_ib_post_send_wait() // Replaces VALID with ZAP

This can be solved by putting both the SRCU and the umem_mutex lock around
every call to mlx5_ib_update_xlt(). This ensures that the content of the
interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr)
will not change while it is running, and thus the posted WRs to update the
KLM will always reflect the correct information.

The race above will resolve by either having CPU1 wait till CPU0 completes
the ZAP or CPU0 will run after the add and instead store VALID.

The pagefault path adding children already holds the umem_mutex and SRCU,
so the only missed lock is during MR destruction.

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Jason Gunthorpe
880505cfef RDMA/mlx5: Do not allow rereg of a ODP MR
This code is completely broken, the umem of a ODP MR simply cannot be
discarded without a lot more locking, nor can an ODP mkey be blithely
destroyed via destroy_mkey().

Fixes: 6aec21f6a8 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:54:21 -03:00
Erez Alfasi
6f26b2ac69 IB/mlx5: Remove unnecessary else statement
'else' is not generally useful after a break or return. Remove this
unnecessary statement.

Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Erez Alfasi
2d67c07988 IB/mlx5: Remove unnecessary return statement
There is no reason to call return at the end of function which returns
void. Remove this unnecessary statement.

Link: https://lore.kernel.org/r/20191002122517.17721-3-leon@kernel.org
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Leon Romanovsky
4b2a67362e RDMA/mlx5: Group boolean parameters to take less space
Clean the code to store all boolean parameters inside one variable.

Link: https://lore.kernel.org/r/20191002122517.17721-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04 15:38:16 -03:00
Linus Torvalds
018c6837f3 RDMA subsystem updates for 5.4
This cycle mainly saw lots of bug fixes and clean up code across the core
 code and several drivers, few new functional changes were made.
 
 - Many cleanup and bug fixes for hns
 
 - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
   bnxt_re, efa
 
 - Share the query_port code between all the iWarp drivers
 
 - General rework and cleanup of the ODP MR umem code to fit better with
   the mmu notifier get/put scheme
 
 - Support rdma netlink in non init_net name spaces
 
 - mlx5 support for XRC devx and DC ODP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g
 mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO
 KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D
 W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf
 rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an
 xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44
 Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA
 CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c
 axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS
 F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH
 yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+
 ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE=
 =zHXW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull RDMA subsystem updates from Jason Gunthorpe:
 "This cycle mainly saw lots of bug fixes and clean up code across the
  core code and several drivers, few new functional changes were made.

   - Many cleanup and bug fixes for hns

   - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
     bnxt_re, efa

   - Share the query_port code between all the iWarp drivers

   - General rework and cleanup of the ODP MR umem code to fit better
     with the mmu notifier get/put scheme

   - Support rdma netlink in non init_net name spaces

   - mlx5 support for XRC devx and DC ODP"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
  RDMA: Fix double-free in srq creation error flow
  RDMA/efa: Fix incorrect error print
  IB/mlx5: Free mpi in mp_slave mode
  IB/mlx5: Use the original address for the page during free_pages
  RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
  RDMA/hns: Package operations of rq inline buffer into separate functions
  RDMA/hns: Optimize cmd init and mode selection for hip08
  IB/hfi1: Define variables as unsigned long to fix KASAN warning
  IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
  IB/hfi1: Add traces for TID RDMA READ
  RDMA/siw: Relax from kmap_atomic() use in TX path
  IB/iser: Support up to 16MB data transfer in a single command
  RDMA/siw: Fix page address mapping in TX path
  RDMA: Fix goto target to release the allocated memory
  RDMA/usnic: Avoid overly large buffers on stack
  RDMA/odp: Add missing cast for 32 bit
  RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
  Documentation/infiniband: update name of some functions
  RDMA/cma: Fix false error message
  RDMA/hns: Fix wrong assignment of qp_access_flags
  ...
2019-09-21 10:26:24 -07:00
Linus Torvalds
84da111de0 hmm related patches for 5.4
This is more cleanup and consolidation of the hmm APIs and the very
 strongly related mmu_notifier interfaces. Many places across the tree
 using these interfaces are touched in the process. Beyond that a cleanup
 to the page walker API and a few memremap related changes round out the
 series:
 
 - General improvement of hmm_range_fault() and related APIs, more
   documentation, bug fixes from testing, API simplification &
   consolidation, and unused API removal
 
 - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and
   make them internal kconfig selects
 
 - Hoist a lot of code related to mmu notifier attachment out of drivers by
   using a refcount get/put attachment idiom and remove the convoluted
   mmu_notifier_unregister_no_release() and related APIs.
 
 - General API improvement for the migrate_vma API and revision of its only
   user in nouveau
 
 - Annotate mmu_notifiers with lockdep and sleeping region debugging
 
 Two series unrelated to HMM or mmu_notifiers came along due to
 dependencies:
 
 - Allow pagemap's memremap_pages family of APIs to work without providing
   a struct device
 
 - Make walk_page_range() and related use a constant structure for function
   pointers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl1/nnkACgkQOG33FX4g
 mxqaRg//c6FqowV1pQlLutvAOAgMdpzfZ9eaaDKngy9RVQxz+k/MmJrdRH/p/mMA
 Pq93A1XfwtraGKErHegFXGEDk4XhOustVAVFwvjyXO41dTUdoFVUkti6ftbrl/rS
 6CT+X90jlvrwdRY7QBeuo7lxx7z8Qkqbk1O1kc1IOracjKfNJS+y6LTamy6weM3g
 tIMHI65PkxpRzN36DV9uCN5dMwFzJ73DWHp1b0acnDIigkl6u5zp6orAJVWRjyQX
 nmEd3/IOvdxaubAoAvboNS5CyVb4yS9xshWWMbH6AulKJv3Glca1Aa7QuSpBoN8v
 wy4c9+umzqRgzgUJUe1xwN9P49oBNhJpgBSu8MUlgBA4IOc3rDl/Tw0b5KCFVfkH
 yHkp8n6MP8VsRrzXTC6Kx0vdjIkAO8SUeylVJczAcVSyHIo6/JUJCVDeFLSTVymh
 EGWJ7zX2iRhUbssJ6/izQTTQyCH3YIyZ5QtqByWuX2U7ZrfkqS3/EnBW1Q+j+gPF
 Z2yW8iT6k0iENw6s8psE9czexuywa/Lttz94IyNlOQ8rJTiQqB9wLaAvg9hvUk7a
 kuspL+JGIZkrL3ouCeO/VA6xnaP+Q7nR8geWBRb8zKGHmtWrb5Gwmt6t+vTnCC2l
 olIDebrnnxwfBQhEJ5219W+M1pBpjiTpqK/UdBd92A4+sOOhOD0=
 =FRGg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This is more cleanup and consolidation of the hmm APIs and the very
  strongly related mmu_notifier interfaces. Many places across the tree
  using these interfaces are touched in the process. Beyond that a
  cleanup to the page walker API and a few memremap related changes
  round out the series:

   - General improvement of hmm_range_fault() and related APIs, more
     documentation, bug fixes from testing, API simplification &
     consolidation, and unused API removal

   - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE,
     and make them internal kconfig selects

   - Hoist a lot of code related to mmu notifier attachment out of
     drivers by using a refcount get/put attachment idiom and remove the
     convoluted mmu_notifier_unregister_no_release() and related APIs.

   - General API improvement for the migrate_vma API and revision of its
     only user in nouveau

   - Annotate mmu_notifiers with lockdep and sleeping region debugging

  Two series unrelated to HMM or mmu_notifiers came along due to
  dependencies:

   - Allow pagemap's memremap_pages family of APIs to work without
     providing a struct device

   - Make walk_page_range() and related use a constant structure for
     function pointers"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits)
  libnvdimm: Enable unit test infrastructure compile checks
  mm, notifier: Catch sleeping/blocking for !blockable
  kernel.h: Add non_block_start/end()
  drm/radeon: guard against calling an unpaired radeon_mn_unregister()
  csky: add missing brackets in a macro for tlb.h
  pagewalk: use lockdep_assert_held for locking validation
  pagewalk: separate function pointers from iterator data
  mm: split out a new pagewalk.h header from mm.h
  mm/mmu_notifiers: annotate with might_sleep()
  mm/mmu_notifiers: prime lockdep
  mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end
  mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports
  mm/hmm: hmm_range_fault() infinite loop
  mm/hmm: hmm_range_fault() NULL pointer bug
  mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
  mm/mmu_notifiers: remove unregister_no_release
  RDMA/odp: remove ib_ucontext from ib_umem
  RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
  RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
  RDMA/mlx5: Use ib_umem_start instead of umem.address
  ...
2019-09-21 10:07:42 -07:00
Danit Goldberg
5d44adebbb IB/mlx5: Free mpi in mp_slave mode
ib_add_slave_port() allocates a multiport struct but never frees it.
Don't leak memory, free the allocated mpi struct during driver unload.

Cc: <stable@vger.kernel.org>
Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 14:08:18 -03:00
Danit Goldberg
130c2c576e IB/mlx5: Use the original address for the page during free_pages
The removal of 'buffer' in the patch below caused free_page() to use a
value that had been offset since the wqe pointer is adjusted while the
routine runs.

The current implementation of free_pages() rounds down to a pfn,
discarding the adjustment, but this is not the right way to use the
API. Preserve the initial value and use it for free_page().

Fixes: 0f51427bd0 ("RDMA/mlx5: Cleanup WQE page fault handler")
Link: https://lore.kernel.org/r/20190916064818.19823-2-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16 13:39:56 -03:00
Jason Gunthorpe
75c66515e4 Merge tag 'v5.3-rc8' into rdma.git for-next
To resolve dependencies in following patches

mlx5_ib.h conflict resolved by keeing both hunks

Linux 5.3-rc8

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:51 -03:00
Maor Gottlieb
2b688ea5ef net/mlx5: Add flow steering actions to fs_cmd shim layer
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-03 12:54:19 -07:00
Saeed Mahameed
a06ebb8d95 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next patches needed for upcoming mlx5 software steering.

1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-02 00:16:05 -07:00
Ariel Levkovich
c9b9dcb430 net/mlx5: Move device memory management to mlx5_core
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.

This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.

In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-01 23:44:41 -07:00
Saeed Mahameed
537f321097 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.
4) ODP capabilities for DC transport

Misc updates:
5) Saeed, two compiler warnings cleanups
6) Add XRQ legacy commands opcodes
7) Use refcount_t for refcount
8) fix a -Wstringop-truncation warning
2019-08-28 11:48:56 -07:00
Jason Gunthorpe
a0d8994b30 Merge branch 'mlx5-odp-dc' into rdma.git for-next
Michael Guralnik says:

====================
The series adds support for on-demand paging for DC transport.

As DC is a mlx-only transport, the capabilities are exposed to the user
using DEVX objects and later on through mlx5dv_query_device.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'mlx5-odp-dc':
  IB/mlx5: Add page fault handler for DC initiator WQE
  IB/mlx5: Remove check of FW capabilities in ODP page fault handling
  net/mlx5: Set ODP capabilities for DC transport to max
2019-08-28 11:25:37 -03:00
Michael Guralnik
75e46fc02c IB/mlx5: Add page fault handler for DC initiator WQE
Parsing DC initiator WQEs upon page fault requires skipping an address
vector segment, as in UD WQEs.

Link: https://lore.kernel.org/r/20190819120815.21225-4-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-28 11:18:40 -03:00
Michael Guralnik
29af94987b IB/mlx5: Remove check of FW capabilities in ODP page fault handling
As page fault handling is initiated by FW, there is no need to check that
the ODP supports the operation and transport.

Link: https://lore.kernel.org/r/20190819120815.21225-3-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-28 11:18:39 -03:00
Mark Zhang
d8abe88450 RDMA/mlx5: RDMA_RX flow type support for user applications
Currently user applications can only steer TCP/IP(NIC RX/RX) traffic.
This patch adds RDMA_RX as a new flow type to allow the user to insert
steering rules to control RDMA traffic.
Two destinations are supported(but not set at the same time): devx
flow table object and QP.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-26 10:30:00 -04:00
Jason Gunthorpe
c571feca2d RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
This is a significant simplification, no extra list is kept per FD, and
the interval tree is now shared between all the ucontexts, reducing
overhead if there are multiple ucontexts active.

Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 20:58:18 -03:00
Jason Gunthorpe
868df536f5 Merge branch 'odp_fixes' into rdma.git for-next
Jason Gunthorpe says:

====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================

The branch is based on v5.3-rc5 due to dependencies

* odp_fixes:
  RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
  RDMA/mlx5: Use ib_umem_start instead of umem.address
  RDMA/core: Make invalidate_range a device operation
  RDMA/odp: Use kvcalloc for the dma_list and page_list
  RDMA/odp: Check for overflow when computing the umem_odp end
  RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
  RDMA/odp: Split creating a umem_odp from ib_umem_get
  RDMA/odp: Make the three ways to create a umem_odp clear
  RMDA/odp: Consolidate umem_odp initialization
  RDMA/odp: Make it clearer when a umem is an implicit ODP umem
  RDMA/odp: Iterate over the whole rbtree directly
  RDMA/odp: Use the common interval tree library instead of generic
  RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:10:36 -03:00
Jason Gunthorpe
fba0e448a2 RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
These are the same thing since mr always comes from odp->private. It is
confusing to reference the same memory via two names.

Link: https://lore.kernel.org/r/20190819111710.18440-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:43 -03:00
Jason Gunthorpe
a705f3e3a1 RDMA/mlx5: Use ib_umem_start instead of umem.address
These are subtly different, the address is the original VA requested
during umem_get, while ib_umem_start() is the version that is rounded to
the proper page size, ie is the true start of the umem's dma map.

Link: https://lore.kernel.org/r/20190819111710.18440-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:43 -03:00
Moni Shoua
ce51346fee RDMA/core: Make invalidate_range a device operation
The callback function 'invalidate_range' is implemented in a driver so the
place for it is in the ib_device_ops structure and not in ib_ucontext.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:43 -03:00
Jason Gunthorpe
0446cad9ca RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
Now that there are allocator APIs that return the ib_umem_odp directly
it should be freed through a umem_odp free'er as well.

Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:42 -03:00
Jason Gunthorpe
261dc53f8e RDMA/odp: Split creating a umem_odp from ib_umem_get
This is the last creation API that is overloaded for both, there is very
little code sharing and a driver has to be specifically ready for a
umem_odp to be created to use the odp version.

Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:42 -03:00
Jason Gunthorpe
f20bef6a95 RDMA/odp: Make the three ways to create a umem_odp clear
The three paths to build the umem_odps are kind of muddled, they are:
- As a normal ib_mr umem
- As a child in an implicit ODP umem tree
- As the root of an implicit ODP umem tree

Only the first two are actually umem's, the last is an abuse.

The implicit case can only be triggered by explicit driver request, it
should never be co-mingled with the normal case. While we are here, make
sensible function names and add some comments to make this clearer.

Link: https://lore.kernel.org/r/20190819111710.18440-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:42 -03:00
Jason Gunthorpe
fd7dbf035e RDMA/odp: Make it clearer when a umem is an implicit ODP umem
Implicit ODP umems are special, they don't have any page lists, they don't
exist in the interval tree and they are never DMA mapped.

Instead of trying to guess this based on a zero length use an explicit
flag.

Further, do not allow non-implicit umems to be 0 size.

Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:42 -03:00
Jason Gunthorpe
f993de88a5 RDMA/odp: Iterate over the whole rbtree directly
Instead of intersecting a full interval, just iterate over every element
directly. This is faster and clearer.

Link: https://lore.kernel.org/r/20190819111710.18440-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21 14:08:24 -03:00
Moni Shoua
841b07f99a IB/mlx5: Block MR WR if UMR is not possible
Check conditions that are mandatory to post_send UMR WQEs.
1. Modifying page size.
2. Modifying remote atomic permissions if atomic access is required.

If either condition is not fulfilled then fail to post_send() flow.

Fixes: c8d75a980f ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-9-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20 13:44:45 -04:00
Moni Shoua
25a4517214 IB/mlx5: Fix MR re-registration flow to use UMR properly
The UMR WQE in the MR re-registration flow requires that
modify_atomic and modify_entity_size capabilities are enabled.
Therefore, check that the these capabilities are present before going to
umr flow and go through slow path if not.

Fixes: c8d75a980f ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20 13:44:45 -04:00
Moni Shoua
008157528a IB/mlx5: Report and handle ODP support properly
ODP depends on the several device capabilities, among them is the ability
to send UMR WQEs with that modify atomic and entity size of the MR.
Therefore, only if all conditions to send such a UMR WQE are met then
driver can report that ODP is supported. Use this check of conditions
in all places where driver needs to know about ODP support.

Also, implicit ODP support depends on ability of driver to send UMR WQEs
for an indirect mkey. Therefore, verify that all conditions to do so are
met when reporting support.

Fixes: c8d75a980f ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20 13:44:45 -04:00
Moni Shoua
0e6613b41e IB/mlx5: Consolidate use_umr checks into single function
Introduce helper function to unify various use_umr checks.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20 13:44:44 -04:00
Jason Gunthorpe
27b7fb1ab7 RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
When ODP is enabled with IB_ACCESS_HUGETLB then the required pages
should be calculated based on the extent of the MR, which is rounded
to the nearest huge page alignment.

Fixes: d2183c6f19 ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190815083834.9245-5-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20 13:44:43 -04:00
Leon Romanovsky
9dc4cfff11 RDMA/mlx5: Annotate lock dependency in bind/unbind slave port
NULL-ing notifier_call is performed under protection
of mlx5_ib_multiport_mutex lock. Such protection is
not easily spotted and better to be guarded by lockdep
annotation.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190813102814.22350-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13 12:26:06 -04:00
Yishai Hadas
8293a598fe IB/mlx5: Expose XRQ legacy commands over the DEVX interface
Expose missing XRQ legacy commands over the DEVX interface.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190808084358.29517-5-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13 12:24:32 -04:00
Yishai Hadas
972d7560ee IB/mlx5: Add legacy events to DEVX list
Add two events that were defined in the device specification but were
not exposed in the driver list.

Post this patch those events can be read over the DEVX events interface
once be reported by the firmware.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Edward Srouji <edwards@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190808084358.29517-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13 12:24:17 -04:00
Doug Ledford
749b9eef50 Merge remote-tracking branch 'mlx5-next/mlx5-next' into wip/dl-for-next
Merging tip of mlx5-next in order to get changes related to adding
XRQ support to the DEVX interface needed prior to the following two
patches.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13 12:19:19 -04:00
Yishai Hadas
e9eec6a55c IB/mlx5: Fix use-after-free error while accessing ev_file pointer
Call to uverbs_close_fd() releases file pointer to 'ev_file' and
mlx5_ib_dev is going to be inaccessible. Cache pointer prior cleaning
resources to solve the KASAN warning below.

BUG: KASAN: use-after-free in devx_async_event_close+0x391/0x480 [mlx5_ib]
Read of size 8 at addr ffff888301e3cec0 by task devx_direct_tes/4631
CPU: 1 PID: 4631 Comm: devx_direct_tes Tainted: G OE 5.3.0-rc1-for-upstream-dbg-2019-07-26_01-19-56-93 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
Call Trace:
dump_stack+0x9a/0xeb
print_address_description+0x1e2/0x400
? devx_async_event_close+0x391/0x480 [mlx5_ib]
__kasan_report+0x15c/0x1df
? devx_async_event_close+0x391/0x480 [mlx5_ib]
kasan_report+0xe/0x20
devx_async_event_close+0x391/0x480 [mlx5_ib]
__fput+0x26a/0x7b0
task_work_run+0x10d/0x180
exit_to_usermode_loop+0x137/0x160
do_syscall_64+0x3c7/0x490
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f5df907d664
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f
80 00 00 00 00 8b 05 6a cd 20 00 48 63 ff 85 c0 75 13 b8
03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 f3 c3 66 90
48 83 ec 18 48 89 7c 24 08 e8
RSP: 002b:00007ffd353cb958 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 000056017a88c348 RCX: 00007f5df907d664
RDX: 00007f5df969d400 RSI: 00007f5de8f1ec90 RDI: 0000000000000006
RBP: 00007f5df9681dc0 R08: 00007f5de8736410 R09: 000056017a9d2dd0
R10: 000000000000000b R11: 0000000000000246 R12: 00007f5de899d7d0
R13: 00007f5df96c4248 R14: 00007f5de8f1ecb0 R15: 000056017ae41308

Allocated by task 4631:
save_stack+0x19/0x80
kasan_kmalloc.constprop.3+0xa0/0xd0
alloc_uobj+0x71/0x230 [ib_uverbs]
alloc_begin_fd_uobject+0x2e/0xc0 [ib_uverbs]
rdma_alloc_begin_uobject+0x96/0x140 [ib_uverbs]
ib_uverbs_run_method+0xdf0/0x1940 [ib_uverbs]
ib_uverbs_cmd_verbs+0x57e/0xdb0 [ib_uverbs]
ib_uverbs_ioctl+0x177/0x260 [ib_uverbs]
do_vfs_ioctl+0x18f/0x1010
ksys_ioctl+0x70/0x80
__x64_sys_ioctl+0x6f/0xb0
do_syscall_64+0x95/0x490
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4631:
save_stack+0x19/0x80
__kasan_slab_free+0x11d/0x160
slab_free_freelist_hook+0x67/0x1a0
kfree+0xb9/0x2a0
uverbs_close_fd+0x118/0x1c0 [ib_uverbs]
devx_async_event_close+0x28a/0x480 [mlx5_ib]
__fput+0x26a/0x7b0
task_work_run+0x10d/0x180
exit_to_usermode_loop+0x137/0x160
do_syscall_64+0x3c7/0x490
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888301e3cda8
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 280 bytes inside of 512-byte region
[ffff888301e3cda8, ffff888301e3cfa8)
The buggy address belongs to the page:
page:ffffea000c078e00 refcount:1 mapcount:0
mapping:ffff888352811300 index:0x0 compound_mapcount: 0
flags: 0x2fffff80010200(slab|head)
raw: 002fffff80010200 ffffea000d152608 ffffea000c077808 ffff888352811300
raw: 0000000000000000 0000000000250025 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888301e3cd80: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb
ffff888301e3ce00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888301e3ce80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888301e3cf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888301e3cf80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
Disabling lock debugging due to kernel taint

Cc: <stable@vger.kernel.org> # 5.2
Fixes: 7597385371 ("IB/mlx5: Enable subscription for device events over DEVX")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lore.kernel.org/r/20190808081538.28772-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12 10:46:30 -04:00
Kamal Heib
72a7720fca RDMA: Introduce ib_port_phys_state enum
In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Andrew Boyer <aboyer@tobark.org>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12 10:18:52 -04:00
Dan Carpenter
e7e6c6320c IB/mlx5: Check the correct variable in error handling code
The code accidentally checks "event_sub" instead of "event_sub->eventfd".

Fixes: 7597385371 ("IB/mlx5: Enable subscription for device events over DEVX")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190807123236.GA11452@mwanda
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-07 16:46:41 -04:00
Yishai Hadas
f591822c3c IB/mlx5: Fix implicit MR release flow
Once implicit MR is being called to be released by
ib_umem_notifier_release() its leaves were marked as "dying".

However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
when those leaves were marked as "dying".

As such ib_umem_release() for the leaves won't be called and their MRs
will be leaked as well.

When an application exits/killed without calling dereg_mr we might hit the
above flow.

This fatal scenario is reported by WARN_ON() upon
mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
call trace can be seen below.

Originally the "dying" mark as part of ib_umem_notifier_release() was
introduced to prevent pagefault_mr() from returning a success response
once this happened. However, we already have today the completion
mechanism so no need for that in those flows any more.  Even in case a
success response will be returned the firmware will not find the pages and
an error will be returned in the following call as a released mm will
cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero().

Fix the above issue by dropping the "dying" from the above flows.  The
other flows that are using "dying" are still needed it for their
synchronization purposes.

   WARNING: CPU: 1 PID: 7218 at
   drivers/infiniband/hw/mlx5/main.c:2004
		  mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
   CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G     E
	       5.2.0-rc6+ #13
   Call Trace:
   uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
   ib_uverbs_close+0x1f/0x80 [ib_uverbs]
   __fput+0xbe/0x250
   task_work_run+0x88/0xa0
   do_exit+0x2cb/0xc30
   ? __fput+0x14b/0x250
   do_group_exit+0x39/0xb0
   get_signal+0x191/0x920
   ? _raw_spin_unlock_bh+0xa/0x20
   ? inet_csk_accept+0x229/0x2f0
   do_signal+0x36/0x5e0
   ? put_unused_fd+0x5b/0x70
   ? __sys_accept4+0x1a6/0x1e0
   ? inet_hash+0x35/0x40
   ? release_sock+0x43/0x90
   ? _raw_spin_unlock_bh+0xa/0x20
   ? inet_listen+0x9f/0x120
   exit_to_usermode_loop+0x5c/0xc6
   do_syscall_64+0x182/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-07 15:36:43 -03:00
Chuhong Yuan
94f3e14e00 mlx5: Use refcount_t for refcount
Reference counters are preferred to use refcount_t instead of
atomic_t.
This is because the implementation of refcount_t can prevent
overflows and detect possible use-after-free.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-07 11:01:48 -07:00
Mark Zhang
7084ed30ae IB/mlx5: Support MLX5_CMD_OP_QUERY_LAG as a DEVX general command
The "MLX5_CMD_OP_QUERY_LAG" is one of the DEVX general commands, add it.

Fixes: 8aa8c95ce4 ("IB/mlx5: Add support for DEVX general command")
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-08-04 11:24:48 +03:00
Leon Romanovsky
23eaf3b5c1 RDMA/mlx5: Release locks during notifier unregister
The below kernel panic was observed when created bond mode LACP
with GRE tunnel on top. The reason to it was not released spinlock
during mlx5 notify unregsiter sequence.

[  234.562007] BUG: scheduling while atomic: sh/10900/0x00000002
[  234.563005] Preemption disabled at:
[  234.566864] ------------[ cut here ]------------
[  234.567120] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[  234.567139] WARNING: CPU: 16 PID: 10900 at kernel/sched/core.c:3203 preempt_count_sub+0xca/0x170
[  234.569550] CPU: 16 PID: 10900 Comm: sh Tainted: G        W 5.2.0-rc1-for-linust-dbg-2019-05-25_04-57-33-60 #1
[  234.569886] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.6.1 02/12/2018
[  234.570183] RIP: 0010:preempt_count_sub+0xca/0x170
[  234.570404] Code: 03 38
d0 7c 08 84 d2 0f 85 b0 00 00 00 8b 15 dd 02 03 04 85 d2 75 ba 48 c7 c6
00 e1 88 83 48 c7 c7 40 e1 88 83 e8 76 11 f7 ff <0f> 0b 5b c3 65 8b 05
d3 1f d8 7e 84 c0 75 82 e8 62 c3 c3 00 85 c0
[  234.570911] RSP: 0018:ffff888b94477b08 EFLAGS: 00010286
[  234.571133] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[  234.571391] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000246
[  234.571648] RBP: ffff888ba5560000 R08: fffffbfff08962d5 R09: fffffbfff08962d5
[  234.571902] R10: 0000000000000001 R11: fffffbfff08962d4 R12: ffff888bac6e9548
[  234.572157] R13: ffff888babfaf728 R14: ffff888bac6e9568 R15: ffff888babfaf750
[  234.572412] FS: 00007fcafa59b740(0000) GS:ffff888bed200000(0000) knlGS:0000000000000000
[  234.572686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  234.572914] CR2: 00007f984f16b140 CR3: 0000000b2bf0a001 CR4: 00000000001606e0
[  234.573172] Call Trace:
[  234.573336] _raw_spin_unlock+0x2e/0x50
[  234.573542] mlx5_ib_unbind_slave_port+0x1bc/0x690 [mlx5_ib]
[  234.573793] mlx5_ib_cleanup_multiport_master+0x1d3/0x660 [mlx5_ib]
[  234.574039] mlx5_ib_stage_init_cleanup+0x4c/0x360 [mlx5_ib]
[  234.574271]  ? kfree+0xf5/0x2f0
[  234.574465] __mlx5_ib_remove+0x61/0xd0 [mlx5_ib]
[  234.574688]  ? __mlx5_ib_remove+0xd0/0xd0 [mlx5_ib]
[  234.574951] mlx5_remove_device+0x234/0x300 [mlx5_core]
[  234.575224] mlx5_unregister_device+0x4d/0x1e0 [mlx5_core]
[  234.575493] remove_one+0x4f/0x160 [mlx5_core]
[  234.575704] pci_device_remove+0xef/0x2a0
[  234.581407]  ? pcibios_free_irq+0x10/0x10
[  234.587143]  ? up_read+0xc1/0x260
[  234.592785] device_release_driver_internal+0x1ab/0x430
[  234.598442] unbind_store+0x152/0x200
[  234.604064]  ? sysfs_kf_write+0x3b/0x180
[  234.609441]  ? sysfs_file_ops+0x160/0x160
[  234.615021] kernfs_fop_write+0x277/0x440
[  234.620288]  ? __sb_start_write+0x1ef/0x2c0
[  234.625512] vfs_write+0x15e/0x460
[  234.630786] ksys_write+0x156/0x1e0
[  234.635988]  ? __ia32_sys_read+0xb0/0xb0
[  234.641120]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  234.646163] do_syscall_64+0x95/0x470
[  234.651106] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  234.656004] RIP: 0033:0x7fcaf9c9cfd0
[  234.660686] Code: 73 01
c3 48 8b 0d c0 6e 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00
83 3d cd cf 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73
31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24
[  234.670128] RSP: 002b:00007ffd3b01ddd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  234.674811] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fcaf9c9cfd0
[  234.679387] RDX: 000000000000000d RSI: 00007fcafa5c1000 RDI: 0000000000000001
[  234.683848] RBP: 00007fcafa5c1000 R08: 000000000000000a R09: 00007fcafa59b740
[  234.688167] R10: 00007ffd3b01d8e0 R11: 0000000000000246 R12: 00007fcaf9f75400
[  234.692386] R13: 000000000000000d R14: 0000000000000001 R15: 0000000000000000
[  234.696495] irq event stamp: 153067
[  234.700525] hardirqs last enabled at (153067): [<ffffffff83258c39>] _raw_spin_unlock_irqrestore+0x59/0x70
[  234.704665] hardirqs last disabled at (153066): [<ffffffff83259382>] _raw_spin_lock_irqsave+0x22/0x90
[  234.708722] softirqs last enabled at (153058): [<ffffffff836006c5>] __do_softirq+0x6c5/0xb4e
[  234.712673] softirqs last disabled at (153051): [<ffffffff81227c1d>] irq_exit+0x17d/0x1d0
[  234.716601] ---[ end trace 5dbf096843ee9ce6 ]---

Fixes: df097a278c ("IB/mlx5: Use the new mlx5 core notifier API")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731083852.584-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01 12:41:41 -04:00
Guy Levi
e5366d309a IB/mlx5: Fix MR registration flow to use UMR properly
Driver shouldn't allow to use UMR to register a MR when
umr_modify_atomic_disabled is set. Otherwise it will always end up with a
failure in the post send flow which sets the UMR WQE to modify atomic access
right.

Fixes: c8d75a980f ("IB/mlx5: Respect new UMR capabilities")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731081929.32559-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01 11:49:51 -04:00
Leon Romanovsky
d129e3f422 RDMA/mlx5: Remove DEBUG ODP code
Delete DEBUG ODP dead code which is leftover from development
stage and doesn't need to be part of the upstream kernel.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190731115627.5433-1-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-31 15:02:02 -04:00
Doug Ledford
525a2c651c Merge branch 'wip/dl-for-rc' into wip/dl-for-next
The fix for IB port statistics initialization ("IB/core: Fix querying
total rdma stats") is needed before we take a follow-on patch to
for-next.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29 13:38:42 -04:00
Parav Pandit
3e1f000ff7 IB/mlx5: Support per device q counters in switchdev mode
When parent mlx5_core_dev is in switchdev mode, q_counters are not
applicable to multiple non uplink vports.
Hence, have make them limited to device level.

While at it, correct __mlx5_ib_qp_set_counter() and
__mlx5_ib_modify_qp() to use u16 set_id as defined by the device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190723073117.7175-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29 11:33:28 -04:00
Parav Pandit
5dcecbc967 IB/mlx5: Refactor code for counters allocation
To support per device counters in switchdev mode (instead of
per port counter), refactor query routines to work on mlx5_ib_counter
structure instead of mlx5_ib_port structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190723073117.7175-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29 11:33:28 -04:00
Max Gurtovoy
8b38c538d4 IB/mlx5: Add CREATE_PSV/DESTROY_PSV for devx interface
Limit the number of PSV's created through devx to 1, to create a symmetry
between create/destroy cmds. In the kernel, one can create up to 4 PSV's
using CREATE_PSV cmd but the destruction is one by one. Add a protection
for this a-symmetric definition for devx.

Link: https://lore.kernel.org/r/20190723070412.6385-1-leon@kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25 14:24:06 -03:00
Parav Pandit
a5c9c299d1 IB/mlx5: Avoid unnecessary typecast
IB device pointer is already available while deallocating IB device,
Hence do not typecast it.

Link: https://lore.kernel.org/r/20190723065733.4899-9-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25 12:07:14 -03:00
Yishai Hadas
b7165bd0d6 IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
The specification for the Toeplitz function doesn't require to set the key
explicitly to be symmetric. In case a symmetric functionality is required
a symmetric key can be simply used.

Wrongly forcing the algorithm to symmetric causes the wrong packet
distribution and a performance degradation.

Link: https://lore.kernel.org/r/20190723065733.4899-7-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.7
Fixes: 28d6137008 ("IB/mlx5: Add RSS QP support")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Alex Vainman <alexv@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25 11:45:48 -03:00
Moni Shoua
296e3a2aad IB/mlx5: Prevent concurrent MR updates during invalidation
The device requires that memory registration work requests that update the
address translation table of a MR will be fenced if posted together.  This
scenario can happen when address ranges are invalidated by the mmu in
separate concurrent calls to the invalidation callback.

We prefer to block concurrent address updates for a single MR over fencing
since making the decision if a WQE needs fencing will be more expensive
and fencing all WQEs is a too radical choice.

Further, it isn't clear that this code can even run safely concurrently,
so a lock is a safer choice.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Link: https://lore.kernel.org/r/20190723065733.4899-8-leon@kernel.org
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25 11:39:38 -03:00
Yishai Hadas
b9332dad98 IB/mlx5: Fix clean_mr() to work in the expected order
Any dma map underlying the MR should only be freed once the MR is fenced
at the hardware.

As of the above we first destroy the MKEY and just after that can safely
call to dma_unmap_single().

Link: https://lore.kernel.org/r/20190723065733.4899-6-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.3
Fixes: 8a187ee52b ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24 16:50:03 -03:00
Yishai Hadas
9ec4483a3f IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache
Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
can't be accessed by userspace.

This ensures that nothing can continue to access the MR once it has been
placed in the kernels cache for reuse.

MRs in the cache continue to have their HW state, including DMA tables,
present. Even though the MR has been invalidated, changing the PD provides
an additional layer of protection against use of the MR.

Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24 16:43:55 -03:00
Yishai Hadas
afd1417404 IB/mlx5: Use direct mkey destroy command upon UMR unreg failure
Use a direct firmware command to destroy the mkey in case the unreg UMR
operation has failed.

This prevents a case that a mkey will leak out from the cache post a
failure to be destroyed by a UMR WR.

In case the MR cache limit didn't reach a call to add another entry to the
cache instead of the destroyed one is issued.

In addition, replaced a warn message to WARN_ON() as this flow is fatal
and can't happen unless some bug around.

Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42df ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24 16:43:19 -03:00
Yishai Hadas
6a05395373 IB/mlx5: Fix unreg_umr to ignore the mkey state
Fix unreg_umr to ignore the mkey state and do not fail if was freed.  This
prevents a case that a user space application already changed the mkey
state to free and then the UMR operation will fail leaving the mkey in an
inappropriate state.

Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org
Cc: <stable@vger.kernel.org> # 3.19
Fixes: 968e78dd96 ("IB/mlx5: Enhance UMR support to allow partial page table update")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24 16:42:27 -03:00
Chuhong Yuan
b7f406bb88 IB/mlx5: Replace kfree with kvfree
Memory allocated by kvzalloc should not be freed by kfree(), use kvfree()
instead.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Link: https://lore.kernel.org/r/20190717082101.14196-1-hslester96@gmail.com
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-22 15:19:40 -03:00
Leon Romanovsky
96e2fd733b RDMA/mlx5: Set RDMA DIM to be enabled by default
Enable RDMA DIM by default for better user experience.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08 16:37:22 -03:00
Danit Goldberg
89705e9270 IB/mlx5: Report correctly tag matching rendezvous capability
Userspace expects the IB_TM_CAP_RC bit to indicate that the device
supports RC transport tag matching with rendezvous offload. However the
firmware splits this into two capabilities for eager and rendezvous tag
matching.

Only if the FW supports both modes should userspace be told the tag
matching capability is available.

Cc: <stable@vger.kernel.org> # 4.13
Fixes: eb76189435 ("IB/mlx5: Fill XRQ capabilities")
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08 14:26:37 -03:00
Jason Gunthorpe
20893d9da7 Merge branch 'vhca-tunnel' into rdma.git for-next
Max Gurtovoy says:

====================
Those two patches introduce VHCA tunnel mechanism to DEVX interface
needed for Bluefield SOC. See extensive commit messages for more
information.
====================

Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies

* branch 'vcha-tunnel':
  IB/mlx5: Implement VHCA tunnel mechanism in DEVX
  net/mlx5: Introduce VHCA tunnel device capability
2019-07-08 13:48:55 -03:00
Max Gurtovoy
b6142608e8 IB/mlx5: Implement VHCA tunnel mechanism in DEVX
This mechanism will allow function-A to perform operations "on behalf" of
function-B via tunnel object. Function-A will have privileges for creating
and using this tunnel object.

For example, in the device emulation feature presented in Bluefield-1 SoC,
using device emulation capability, one can present NVMe function to the
host OS.

Since the NVMe function doesn't have a normal command interface to the HCA
HW, here is a need to create a channel that will be able to issue commands
"on behalf" of this function.

This channel is the VHCA_TUNNEL general object. The emulation software
will create this tunnel for every managed function and issue commands via
devx general cmd interface using the appropriate tunnel ID. When devX
context will receive a command with non-zero vhca_tunnel_id, it will pass
the command as-is down to the HCA.

All the validation, security and resource tracking of the commands and the
created tunneled objects is in the responsibility of the HCA FW. When a
VHCA_TUNNEL object destroyed, the device will issue an internal
FLR (function level reset) to the emulated function associated with this
tunnel. This will destroy all the created resources using the tunnel
mechanism.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08 13:43:49 -03:00
Mark Zhang
18d422ce8c IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
Add support for ib callback counter_alloc_stats() and
counter_update_stats().

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Mark Zhang
45842fc627 IB/mlx5: Support statistic q counter configuration
Add support for ib callbacks counter_bind_qp(), counter_unbind_qp() and
counter_dealloc().

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Mark Zhang
318d535cef IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters()
Add counter set id as a parameter so that this API can be used for
querying any q counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Mark Zhang
d14133dd41 IB/mlx5: Support set qp counter
Support bind a qp with counter. If counter is null then bind the qp to the
default counter. Different QP state has different operation:

- RESET: Set the counter field so that it will take effective during
  RST2INIT change;
- RTS: Issue an RTS2RTS change to update the QP counter;
- Other: Set the counter field and mark the counter_pending flag, when QP
  is moved to RTS state and this flag is set, then issue an RTS2RTS
  modification to update the counter.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05 10:22:55 -03:00
Jason Gunthorpe
5600a410ea Merge mlx5-next into rdma for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in the next patches.

* mlx5-next:
  net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
  net/mlx5: Properly name the generic WQE control field
  net/mlx5: Introduce TLS TX offload hardware bits and structures
  net/mlx5: Refactor mlx5_esw_query_functions for modularity
  net/mlx5: E-Switch prepare functions change handler to be modular
  net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
2019-07-05 10:16:19 -03:00
Leon Romanovsky
50ba3c18a4 RDMA/mlx5: Use proper allocation API to get zeroed memory
There is no need in custom memory zeroing, because it can be done
by using kzalloc from the beginning.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-04 14:08:18 -03:00
Yishai Hadas
5832fdd35e IB/mlx5: DEVX cleanup mdev
No need any more to hold mlx5_core_dev on the devx_object, it can be
accessed from ib_dev.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:13:44 -03:00
Yishai Hadas
ef1659ade3 IB/mlx5: Add DEVX support for CQ events
Add DEVX support for CQ events by creating and destroying the CQ via
mlx5_core and set an handler to manage its completions.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:38 -03:00
Yishai Hadas
5ec9d8ee87 IB/mlx5: Implement DEVX dispatching event
Implement DEVX dispatching event by looking up for the applicable
subscriptions for the reported event and using their target fd to
signal/set the event.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:38 -03:00
Yishai Hadas
7597385371 IB/mlx5: Enable subscription for device events over DEVX
Enable subscription for device events over DEVX.

Each subscription is added to the two level xarray data structure
according to its event number and the DEVX object information in case was
given with the given target fd.

Those events will be reported over the given fd once will occur.
Downstream patches will mange the dispatching to any subscription.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:38 -03:00
Yishai Hadas
e337dd53ce IB/mlx5: Register DEVX with mlx5_core to get async events
Register DEVX with with mlx5_core to get async events.  This will enable
to dispatch the applicable events to its consumers in down stream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:12:37 -03:00
Yishai Hadas
2afc5e1b9c IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
implementation.

This object is from type class FD and will be used to read DEVX
async events.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 17:11:10 -03:00
Parav Pandit
2752b82316 net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.

Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03 12:50:42 -07:00
Jason Gunthorpe
69ea0582f3 Merge mlx5-next into rdma for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies in the next patches.

Resolved the conflicts:
 - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports()
   version
 - esw_offloads_steering_init() drop the cap test
 - esw_offloads_init() drop the extra function arguments

* branch 'mlx5-next': (39 commits)
  net/mlx5: Expose device definitions for object events
  net/mlx5: Report EQE data upon CQ completion
  net/mlx5: Report a CQ error event only when a handler was set
  net/mlx5: mlx5_core_create_cq() enhancements
  net/mlx5: Expose the API to register for ANY event
  net/mlx5: Use event mask based on device capabilities
  net/mlx5: Fix mlx5_core_destroy_cq() error flow
  net/mlx5: E-Switch, Handle UC address change in switchdev mode
  net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop
  net/mlx5: E-Switch, Use iterator for vlan and min-inline setups
  net/mlx5: E-Switch, Reg/unreg function changed event at correct stage
  net/mlx5: E-Switch, Consolidate eswitch function number of VFs
  net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
  net/mlx5: Handle host PF vport mac/guid for ECPF
  net/mlx5: E-Switch, Use correct flags when configuring vlan
  net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs
  net/mlx5: Don't handle VF func change if host PF is disabled
  net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices
  net/mlx5: Move pci status reg access mutex to mlx5_pci_init
  net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 16:50:26 -03:00
Parav Pandit
2f40cf30c8 IB/mlx5: Fixed reporting counters on 2nd port for Dual port RoCE
Currently during dual port IB device registration in below code flow,

ib_register_device()
  ib_device_register_sysfs()
    ib_setup_port_attrs()
      add_port()
        get_counter_table()
          get_perf_mad()
            process_mad()
              mlx5_ib_process_mad()

mlx5_ib_process_mad() fails on 2nd port when both the ports are not fully
setup at the device level (because 2nd port is unaffiliated).

As a result, get_perf_mad() registers different PMA counter group for 1st
and 2nd port, namely pma_counter_ext and pma_counter. However both ports
have the same capability and counter offsets.

Due to this when counters are read by the user via sysfs in below code
flow, counters are queried from wrong location from the device mainly from
PPCNT instead of VPORT counters.

show_pma_counter()
  get_perf_mad()
    process_mad()
      mlx5_ib_process_mad()
        process_pma_cmd()

This shows all zero counters for 2nd port.

To overcome this, process_pma_cmd() is invoked, and when unaffiliated port
is not yet setup during device registration phase, make the query on the
first port.  while at it, only process_pma_cmd() needs to work on the
native port number and underlying mdev, so shift the get, put calls to
where its needed inside process_pma_cmd().

Fixes: 212f2a87b7 ("IB/mlx5: Route MADs for dual port RoCE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03 15:08:54 -03:00
Yishai Hadas
4e0e2ea188 net/mlx5: Report EQE data upon CQ completion
Report EQE data upon CQ completion to let upper layers use this data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:00:20 +03:00
Yishai Hadas
38164b7719 net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:32 +03:00
Yishai Hadas
b9a7ba5562 net/mlx5: Use event mask based on device capabilities
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:55:45 +03:00
Bodong Wang
f6455de0b0 net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
b8ca123860 RDMA/mlx5: Cleanup rep when doing unload
When an IB rep is loaded, netdev for the same vport is saved for later
reference. However, it's not cleaned up when doing unload. For ECPF,
kernel crashes when driver is referring to the already removed netdev.

Following steps lead to a shown call trace:
1. Create n VFs from host PF
2. Distroy the VFs
3. Run "rdma link" from ARM

Call trace:
  mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib]
  mlx5_query_port_roce+0x268/0x558 [mlx5_ib]
  mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib]
  ib_query_port+0x9c/0xfc [ib_core]
  fill_port_info+0x74/0x28c [ib_core]
  nldev_port_get_doit+0x1a8/0x1e8 [ib_core]
  rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core]
  rdma_nl_rcv+0xe8/0x144 [ib_core]
  netlink_unicast+0x184/0x214
  netlink_sendmsg+0x288/0x354
  sock_sendmsg+0x18/0x2c
  __sys_sendto+0xbc/0x138
  __arm64_sys_sendto+0x28/0x34
  el0_svc_common+0xb0/0x100
  el0_svc_handler+0x6c/0x84
  el0_svc+0x8/0xc

Cleanup the rep and netdev reference when unloading IB rep.

Fixes: 26628e2d58 ("RDMA/mlx5: Move to single device multiport ports in switchdev mode")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
2f69e591e4 {IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mapping
In the single IB device mode, the mapping between vport number and
rep relies on a counter. However for dynamic vport allocation, it is
desired to keep consistent map of eswitch vport and IB port.

Hence, simplify code to remove the free running counter and instead
use the available vport index during load/unload sequence from the
eswitch.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Jason Gunthorpe
371bb62158 Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into rdma.git for-next

For dependencies in next patches.

Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 21:18:23 -03:00
Jianbo Liu
669ff1e32f RDMA/mlx5: Add vport metadata matching for IB representors
If vport metadata matching is enabled in eswitch, the rule created
must be changed to match on the metadata, instead of source port.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
bb0ee7dcc4 net/mlx5: Add flow context for flow tag
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Matthew Wilcox
792c4e9d0b net/mlx5: Convert mkey_table to XArray
The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24 16:44:40 -07:00
Max Gurtovoy
7796d2a3bb RDMA/mlx5: Refactor MR descriptors allocation
Improve code readability using static helpers for each memory region
type. Re-use the common logic to get smaller functions that are easy
to maintain and reduce code duplication.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Max Gurtovoy
2563e2f30a RDMA/mlx5: Use PA mapping for PI handover
If possibe, avoid doing a UMR operation to register data and protection
buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the
SG lists using PA access. This is safe, since the internal key for data
and protection never exposed to the remote server (only signature key
might be exposed). If PA mappings are not possible, perform mapping
using MTT/KLM descriptors.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1266.4K/1262.4K    1720.1K/1732.1K
4k    793139/570902      1129.6K/773982
32k   72660/72086        97229/96164

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1590.2K/1600.1K    1828.2K/1830.3K
4k    1078.1K/937272     1142.1K/815304
32k   77012/77369        98125/97435

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
de0ae958de RDMA/mlx5: Improve PI handover performance
In some loads, there is performance degradation when using KLM mkey
instead of MTT mkey. This is because KLM descriptor access is via
indirection that might require more HW resources and cycles.
Using KLM descriptor is not necessary when there are no gaps at the
data/metadata sg lists. As an optimization, use MTT mkey whenever it
is possible. For that matter, allocate internal MTT mkey and choose the
effective pi_mr for in transaction according to the required mapping
scheme.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o/baseline):

bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1262.4K/1243.3K/1147.1K    1732.1K/1725.1K/1423.8K
4k    570902/571233/457874       773982/743293/642080
32k   72086/72388/71933          96164/71789/93249

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1600.1K/1572.1K/1393.3K    1830.3K/1823.5K/1557.2K
4k    937272/921992/762934       815304/753772/646071
32k   77369/75052/72058          97435/73180/94612

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Idan Burstein <idanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
5c171cbe3a RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR code
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY
was used.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Max Gurtovoy
185eddc457 RDMA/core: Validate integrity handover device cap
Protect the case that a ULP tries to allocate a QP with signature
enabled flag while the LLD doesn't support this feature.
While we're here, also move integrity_en attribute from mlx5_qp to
ib_qp as a preparation for adding new integrity API to the rw-API
(that is part of ib_core module).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
c0a6cbb9cb RDMA/core: Rename signature qp create flag and signature device capability
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN
and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
38ca87c6f1 RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work request
This new WR will be used to perform PI (protection information) handover
using the new API. Using the new API, the user will post a single WR that
will internally perform all the needed actions to complete PI operation.
This new WR will use a memory region that was allocated as
IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the
registration. In the old API, in order to perform a signature handover
operation, each ULP should perform the following:
1. Map and register the data buffers.
2. Map and register the protection buffers.
3. Post a special reg WR to configure the signature handover operation
   layout.
4. Invalidate the signature memory key.
5. Invalidate protection buffers memory key.
6. Invalidate data buffers memory key.

In the new API, the mapping of both data and protection buffers is
performed using a single call to ib_map_mr_sg_pi function. Also the
registration of the buffers and the configuration of the signature
operation layout is done by a single new work request called
IB_WR_REG_MR_INTEGRITY.
This patch implements this operation for mlx5 devices that are capable to
offload data integrity generation/validation while performing the actual
buffer transfer.
This patch will not remove the old signature API that is used by the iSER
initiator and target drivers. This will be done in the future.

In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work
request, we are using a single UMR operation to register both data and
protection buffers using KLM's.
Afterwards, another UMR operation will describe the strided block format.
These will be followed by 2 SET_PSV operations to set the memory/wire
domains initial signature parameters passed by the user.
In the end of the whole transaction, only the signature memory key
(the one that exposed for the RDMA operation) will be invalidated.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
22465bba39 RDMA/mlx5: Update set_sig_data_segment attribute for new signature API
Explicitly pass the sig_mr and the access flags for the mkey segment
configuration. This function will be used also in the new signature
API, so modify it in order to use it in both APIs. This is a preparation
commit before adding new signature API.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
9ac7c4bcd3 RDMA/mlx5: Pass UMR segment flags instead of boolean
UMR ctrl segment flags can vary between UMR operations. for example,
using inline UMR or adding free/not-free checks for a memory key.
This is a preparation commit before adding new signature API that
will not need not-free checks for the internal memory key during the
UMR operation.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
62e3c379d4 RDMA/mlx5: Add attr for max number page list length for PI operation
PI offload (protection information) is a feature that each RDMA provider
can implement differently. Thus, introduce new device attribute to define
the maximal length of the page list for PI fast registration operation. For
example, mlx5 driver uses a single internal MR to map both data and
protection SGL's, so it's equal to max_fast_reg_page_list_len / 2.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
6c984472ba RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrity
mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the
mlx5 memory region prior to the registration operation. In the new
API, the mlx5 driver will allocate an internal memory region for the
UMR operation to register both PI and data SG lists. The internal MR
will use KLM mode in order to map 2 (possibly non-contiguous/non-align)
SG lists using 1 memory key. In the new API, each ULP will use 1 memory
region for the signature operation (instead of 3 in the old API). This
memory region will have a key that will be exposed to remote server to
perform RDMA operation. The internal memory key that will map the SG lists
will stay private.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Leon Romanovsky
836a0fbb3e RDMA: Check umem pointer validity prior to release
Update ib_umem_release() to behave similarly to kfree() and allow
submitting NULL pointer as safe input to this function.

Fixes: a52c8e2469 ("RDMA: Clean destroy CQ in drivers do not return errors")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 15:17:59 -04:00
Leon Romanovsky
a49b1dc7ae RDMA: Convert destroy_wq to be void
All callers of destroy WQ are always success and there is no need
to check their return value, so convert destroy_wq to be void.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 14:37:02 -04:00
Maor Gottlieb
09d985bea9 RDMA/mlx5: Enable decap and packet reformat on FDB
If FDB flow tables support decap operation, enable it on creation,
This allows to perform decapsulation of tunnelled packets by steering
rules. If FDB flow tables support reformat operation, enable it on
creation as well.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:52 -04:00
Maor Gottlieb
cecae747b6 RDMA/mlx5: Consider eswitch encap mode
When flow steering is created, then the encap support should
consider the eswitch encap mode. If the eswitch flow table (FDB)
supports encap then it shouldn't be supported on NIC RX flow tables.

Fixes: 4adda1122c ('RDMA/mlx5: Enable decap and packet reformat on flow tables')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:52 -04:00
Doug Ledford
12dbc04db0 Merge remote-tracking branch 'mlx5-next/mlx5-next' into HEAD
Take mlx5-next so we can take a dependent two patch series next.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:36 -04:00
Yuval Avnery
1f8a7bee27 net/mlx5: Add EQ enable/disable API
Previously, EQ joined the chain notifier on creation.
This forced the caller to be ready to handle events before creating
the EQ through eq_create_generic interface.

To help the caller control when the created EQ will be attached to the
IRQ, add enable/disable API.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Ariel Levkovich
81bfa20603 net/mlx5: Use a single IRQ for all async EQs
The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.

The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Yuval Avnery
24163189da net/mlx5: Separate IRQ request/free from EQ life cycle
Instead of requesting IRQ with eq creation, IRQs will be requested
before EQ table creation.
Instead of freeing the IRQs after EQ destroy, free IRQs after eq
table destroy.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Yuval Avnery
ca390799c2 net/mlx5: Change interrupt handler to call chain notifier
Multiple EQs may share the same IRQ in subsequent patches.

Instead of calling the IRQ handler directly, the EQ will register
to an atomic chain notfier.

The Linux built-in shared IRQ is not used because it forces the caller
to disable the IRQ and clear affinity before free_irq() can be called.

This patch is the first step in the separation of IRQ and EQ logic.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Leon Romanovsky
e39afe3d6d RDMA: Convert CQ allocations to be under core responsibility
Ensure that CQ is allocated and freed by IB/core and not by drivers.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:39:49 -04:00
Leon Romanovsky
a52c8e2469 RDMA: Clean destroy CQ in drivers do not return errors
Like all other destroy commands, .destroy_cq() call is not supposed
to fail. In all flows, the attempt to return earlier caused to memory
leaks.

This patch converts .destroy_cq() to do not return any errors.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11 16:17:10 -04:00
Jason Gunthorpe
7a15414252 RDMA: Move owner into struct ib_device_ops
This more closely follows how other subsytems work, with owner being a
member of the structure containing the function pointers.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:03 -03:00
Jason Gunthorpe
72c6ec18eb RDMA: Move uverbs_abi_ver into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Jason Gunthorpe
b9560a419b RDMA: Move driver_id into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Linus Torvalds
6e38335dcc 5.2 First rc pull request
The usual driver bug fixes and fixes for a couple of regressions introduced in
 5.2:
 
 - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to
   rename its internal sys files
 
 - Fix a memory leak in hns
 
 - Don't leak resources in efa on certain error unwinds
 
 - Don't panic in certain error unwinds in ib_register_device
 
 - Various small user visible bug fix patches for the hfi and efa drivers
 
 - Fix the 32 bit compilation break
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlz5c5oACgkQOG33FX4g
 mxpEEhAAk64phaKUih9KVT0MpC1zezB1C0EGKg45GKuMOFUFJQ5tZ0g4s6aDEbG3
 374ZE9h82HMgKn4tQ95110AKvCI+VAbbKOS7kzk1rLWE1ruJxk5DNsvp1v5/S3FE
 GXBSws+HtZtdiRAMTYyEOfz0MqpvghFg0vor4PugrmOuqIe2a0bkYPEzYPjYbaNH
 jSctd/q4s/o02n6gfbCrFpXsW0Va3OIaDX5a+Fx5+lWW+GPr/Uzk/3kN95mFbDRp
 XsCE80V+n3ceKSQUp0lYtxU3tm2mT1JpiiZjXuKyjRV8IMUS+xkdJ8scEz0upGcg
 +Jr74mN/xKT3toHaMv7fZ3RGlYgFsSsZcAApm6LrIlTNQXKjJ8hl+2BWdi4nRfYZ
 X89RRWEl3j8i6URu65iH7y7IlfFEhjJGmATUQFdrfECR9hBMJ8VHzBfcz7aYgoac
 Ggi+2Vjm7GQlr9mzW/phXb25PWqP5yVTW6/3BUtMs3oY7kd6vE2n9XzGIy13uBpX
 fzY/tnIMrgZMjphYPPbBAbwl+tBKZCu4k6lpP7cLsVsIwY0NIWS26JCnCdO0efqR
 SnAUPjoAV7nkpG3mMO9Qv7h7yar3HrG7ED15hfmB4VowRNQMfDoTLc8jVWDvGk4/
 aFBSH8dEjszZ5tMO9HL+RXnvpkRcDyQpfVQJttY5adZFQlUOd+0=
 =RmxY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Things are looking pretty quiet here in RDMA, not too many bug fixes
  rolling in right now. The usual driver bug fixes and fixes for a
  couple of regressions introduced in 5.2:

   - Fix a race on bootup with RDMA device renaming and srp. SRP also
     needs to rename its internal sys files

   - Fix a memory leak in hns

   - Don't leak resources in efa on certain error unwinds

   - Don't panic in certain error unwinds in ib_register_device

   - Various small user visible bug fix patches for the hfi and efa
     drivers

   - Fix the 32 bit compilation break"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/efa: Remove MAYEXEC flag check from mmap flow
  mlx5: avoid 64-bit division
  IB/hfi1: Validate page aligned for a given virtual address
  IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
  IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
  IB/rdmavt: Fix alloc_qpn() WARN_ON()
  RDMA/core: Fix panic when port_data isn't initialized
  RDMA/uverbs: Pass udata on uverbs error unwind
  RDMA/core: Clear out the udata before error unwind
  RDMA/hns: Fix PD memory leak for internal allocation
  RDMA/srp: Rename SRP sysfs name after IB device rename trigger
2019-06-07 09:25:27 -07:00
Parav Pandit
8693115af4 {IB,net}/mlx5: Constify rep ops functions pointers
Currently for every representor type and for every single vport,
representer function pointers copy is stored even though they don't
change from one to other vport.

Additionally priv data entry for the rep is not passed during
registration, but its copied. It is used (set and cleared) by the user
of the reps.

As we want to scale vports, to simplify and also to split constants
from data,

1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops
prefix with other standard netdev, ibdev ops.
2. Constify the IB and Ethernet rep ops structure.
3. Instead of storing copy of all rep function pointers, store copy
per eswitch rep type.
4. Split data and function pointers to mlx5_eswitch_rep_ops and
mlx5_eswitch_rep_data.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31 12:28:14 -07:00
Parav Pandit
c94ff74877 {IB, net}/mlx5: No need to typecast from void* to mlx5_ib_dev*
Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv*
as it is not needed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31 12:28:14 -07:00
Michal Kubecek
37eb86c450 mlx5: avoid 64-bit division
Commit 25c13324d0 ("IB/mlx5: Add steering SW ICM device memory type")
breaks i386 build by introducing three 64-bit divisions. As the divisor is
MLX5_SW_ICM_BLOCK_SIZE() which is always a power of 2, we can replace the
division with bit operations.

Fixes: 25c13324d0 ("IB/mlx5: Add steering SW ICM device memory type")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29 13:03:21 -03:00
Linus Torvalds
2c1212de6f SPDX update for 5.2-rc2, round 1
Here are series of patches that add SPDX tags to different kernel files,
 based on two different things:
   - SPDX entries are added to a bunch of files that we missed a year ago
     that do not have any license information at all.
 
     These were either missed because the tool saw the MODULE_LICENSE()
     tag, or some EXPORT_SYMBOL tags, and got confused and thought the
     file had a real license, or the files have been added since the last
     big sweep, or they were Makefile/Kconfig files, which we didn't
     touch last time.
 
   - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
     tools can determine the license text in the file itself.  Where this
     happens, the license text is removed, in order to cut down on the
     700+ different ways we have in the kernel today, in a quest to get
     rid of all of these.
 
 These patches have been out for review on the linux-spdx@vger mailing
 list, and while they were created by automatic tools, they were
 hand-verified by a bunch of different people, all whom names are on the
 patches are reviewers.
 
 The reason for these "large" patches is if we were to continue to
 progress at the current rate of change in the kernel, adding license
 tags to individual files in different subsystems, we would be finished
 in about 10 years at the earliest.
 
 There will be more series of these types of patches coming over the next
 few weeks as the tools and reviewers crunch through the more "odd"
 variants of how to say "GPLv2" that developers have come up with over
 the years, combined with other fun oddities (GPL + a BSD disclaimer?)
 that are being unearthed, with the goal for the whole kernel to be
 cleaned up.
 
 These diffstats are not small, 3840 files are touched, over 10k lines
 removed in just 24 patches.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXOP8uw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynmGQCgy3evqzleuOITDpuWaxewFdHqiJYAnA7KRw4H
 1KwtfRnMtG6dk/XaS7H7
 =O9lH
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull SPDX update from Greg KH:
 "Here is a series of patches that add SPDX tags to different kernel
  files, based on two different things:

   - SPDX entries are added to a bunch of files that we missed a year
     ago that do not have any license information at all.

     These were either missed because the tool saw the MODULE_LICENSE()
     tag, or some EXPORT_SYMBOL tags, and got confused and thought the
     file had a real license, or the files have been added since the
     last big sweep, or they were Makefile/Kconfig files, which we
     didn't touch last time.

   - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
     tools can determine the license text in the file itself. Where this
     happens, the license text is removed, in order to cut down on the
     700+ different ways we have in the kernel today, in a quest to get
     rid of all of these.

  These patches have been out for review on the linux-spdx@vger mailing
  list, and while they were created by automatic tools, they were
  hand-verified by a bunch of different people, all whom names are on
  the patches are reviewers.

  The reason for these "large" patches is if we were to continue to
  progress at the current rate of change in the kernel, adding license
  tags to individual files in different subsystems, we would be finished
  in about 10 years at the earliest.

  There will be more series of these types of patches coming over the
  next few weeks as the tools and reviewers crunch through the more
  "odd" variants of how to say "GPLv2" that developers have come up with
  over the years, combined with other fun oddities (GPL + a BSD
  disclaimer?) that are being unearthed, with the goal for the whole
  kernel to be cleaned up.

  These diffstats are not small, 3840 files are touched, over 10k lines
  removed in just 24 patches"

* tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3
  ...
2019-05-21 12:33:38 -07:00
Jason Gunthorpe
d2183c6f19 RDMA/umem: Move page_shift from ib_umem to ib_odp_umem
This value has always been set to PAGE_SHIFT in the core code, the only
thing that does differently was the ODP path. Move the value into the ODP
struct and still use it for ODP, but change all the non-ODP things to just
use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly.

Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-05-21 15:23:24 -03:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Linus Torvalds
78e0365184 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:1) Use after free in __dev_map_entry_free(), from Eric Dumazet.

 1) Use after free in __dev_map_entry_free(), from Eric Dumazet.

 2) Fix TCP retransmission timestamps on passive Fast Open, from Yuchung
    Cheng.

 3) Orphan NFC, we'll take the patches directly into my tree. From
    Johannes Berg.

 4) We can't recycle cloned TCP skbs, from Eric Dumazet.

 5) Some flow dissector bpf test fixes, from Stanislav Fomichev.

 6) Fix RCU marking and warnings in rhashtable, from Herbert Xu.

 7) Fix some potential fib6 leaks, from Eric Dumazet.

 8) Fix a _decode_session4 uninitialized memory read bug fix that got
    lost in a merge. From Florian Westphal.

 9) Fix ipv6 source address routing wrt. exception route entries, from
    Wei Wang.

10) The netdev_xmit_more() conversion was not done %100 properly in mlx5
    driver, fix from Tariq Toukan.

11) Clean up botched merge on netfilter kselftest, from Florian
    Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (74 commits)
  of_net: fix of_get_mac_address retval if compiled without CONFIG_OF
  net: fix kernel-doc warnings for socket.c
  net: Treat sock->sk_drops as an unsigned int when printing
  kselftests: netfilter: fix leftover net/net-next merge conflict
  mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM
  mlxsw: core: Prevent QSFP module initialization for old hardware
  vsock/virtio: Initialize core virtio vsock before registering the driver
  net/mlx5e: Fix possible modify header actions memory leak
  net/mlx5e: Fix no rewrite fields with the same match
  net/mlx5e: Additional check for flow destination comparison
  net/mlx5e: Add missing ethtool driver info for representors
  net/mlx5e: Fix number of vports for ingress ACL configuration
  net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
  net/mlx5e: Fix wrong xmit_more application
  net/mlx5: Fix peer pf disable hca command
  net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index
  net/mlx5: Add meaningful return codes to status_to_err function
  net/mlx5: Imply MLXFW in mlx5_core
  Revert "tipc: fix modprobe tipc failed after switch order of device registration"
  vsock/virtio: free packets during the socket release
  ...
2019-05-20 08:21:07 -07:00
Parav Pandit
02f3afd975 net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index
To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.

vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.

vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.

Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.

Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.

Fixes: 5ae5162066 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-17 13:16:47 -07:00
Linus Torvalds
5ac9433224 5.2 Merge Window second pull request
This is being sent to get a fix for the gcc 9.1 build warnings, and I've
 also pulled in some bug fix patches that were posted in the last two
 weeks.
 
 - Avoid the gcc 9.1 warning about overflowing a union member
 
 - Fix the wrong callback type for a single response netlink to doit
 
 - Bug fixes from more usage of the mlx5 devx interface
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzbYAsACgkQOG33FX4g
 mxpzYw/9HxKMpU5QmHIpV17sVV5SSepfWVQ6YmrNMG5BTBI8by0zj58fJ9TLuNu+
 OYMD6dS/baLeiN6jszec6zWufjUVfMU5aw1ja+iwF78fS8NmVXlrLz/xWmkLu4fi
 pBN3PCt90ziCnVXOlsn55dKAcgmiaRws+TzGjGGvQP9IYpfO6kyj8HIrP6im910E
 j41HcGrD1fMLy0js9Aq6OzMswbop8uFTV/UBp5onKASNPwAGlnigvjTKqnSlt+Vo
 rswc/h8uIz1jnuH1s8EfggFY7nGqxNmq9G/UNBo/86JcLI97SaYN9pqQJ+HcEtDR
 tJYoDr8PFDJcDaFpm0gbNK5pO9cS7X/I/NWZrdePywZAPAMFKXWgnUejLXVcPKd9
 EdkWyg7sJxPHoo6CXrNECu7t/57q3E3qOG93HnXt64pJqv9C9lUmpGrvdv7PBVRK
 6nVBysrkV0/27sBeZzul0teRbEqRii/RJ/iphE3w3hPx696Bi5uFzN/8M3tfavj1
 pBX7eLAevA+yPlN7+sZiefPjeP0jsvwlzNdrP+9CmB5iIlj0yNlmTvT2rbv+hte0
 0JTQvDilmC0e/W0KqQ6fGGfmPFBbHm/UDLu0h24qdw1qQXGOaDH6RRMslrtgNYNw
 Mkc++uIC6/KdiehEzolht87FH4sMJrd0DS540WVqJqje7K3jyY8=
 =Lo/s
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Jason Gunthorpe:
 "This is being sent to get a fix for the gcc 9.1 build warnings, and
  I've also pulled in some bug fix patches that were posted in the last
  two weeks.

   - Avoid the gcc 9.1 warning about overflowing a union member

   - Fix the wrong callback type for a single response netlink to doit

   - Bug fixes from more usage of the mlx5 devx interface"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  net/mlx5: Set completion EQs as shared resources
  IB/mlx5: Verify DEVX general object type correctly
  RDMA/core: Change system parameters callback from dumpit to doit
  RDMA: Directly cast the sockaddr union to sockaddr
2019-05-14 20:56:31 -07:00
Yishai Hadas
cd5d20f13f IB/mlx5: Verify DEVX general object type correctly
As the obj_id in the firmware is not globally unique in general_object,
the object type must be considered upon checking for a valid object id.

Fixes: 2351776e87 ("IB/mlx5: Verify DEVX object type")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-14 10:22:09 -03:00
Linus Torvalds
dce45af5c2 5.2 Merge Window pull request
This has been a smaller cycle than normal. One new driver was accepted,
 which is unusual, and at least one more driver remains in review on the
 list.
 
 - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma
 
 - Many patches from MatthewW converting radix tree and IDR users to use
   xarray
 
 - Introduction of tracepoints to the MAD layer
 
 - Build large SGLs at the start for DMA mapping and get the driver to
   split them
 
 - Generally clean SGL handling code throughout the subsystem
 
 - Support for restricting RDMA devices to net namespaces for containers
 
 - Progress to remove object allocation boilerplate code from drivers
 
 - Change in how the mlx5 driver shows representor ports linked to VFs
 
 - mlx5 uapi feature to access the on chip SW ICM memory
 
 - Add a new driver for 'EFA'. This is HW that supports user space packet
   processing through QPs in Amazon's cloud
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzTIU0ACgkQOG33FX4g
 mxrGKQ/8CqpyvuCyZDW5ovO4DI4YlzYSPXehWlwxA4CWhU1AYTujutnNOdZdngnz
 atTthOlJpZWJV26orvvzwIOi4qX/5UjLXEY3HYdn07JP1Z4iT7E3P4W2sdU3vdl3
 j8bU7xM7ZWmnGxrBZ6yQlVRadEhB8+HJIZWMw+wx66cIPnvU+g9NgwouH67HEEQ3
 PU8OCtGBwNNR508WPiZhjqMDfi/3BED4BfCihFhMbZEgFgObjRgtCV0M33SSXKcR
 IO2FGNVuDAUBlND3vU9guW1+M77xE6p1GvzkIgdCp6qTc724NuO5F2ngrpHKRyZT
 CxvBhAJI6tAZmjBVnmgVJex7rA8p+y/8M/2WD6GE3XSO89XVOkzNBiO2iTMeoxXr
 +CX6VvP2BWwCArxsfKMgW3j0h/WVE9w8Ciej1628m1NvvKEV4AGIJC1g93lIJkRN
 i3RkJ5PkIrdBrTEdKwDu1FdXQHaO7kGgKvwzJ7wBFhso8BRMrMfdULiMbaXs2Bw1
 WdL5zoSe/bLUpPZxcT9IjXRxY5qR0FpIOoo6925OmvyYe/oZo1zbitS5GGbvV90g
 tkq6Jb+aq8ZKtozwCo+oMcg9QPLYNibQsnkL3QirtURXWCG467xdgkaJLdF6s5Oh
 cp+YBqbR/8HNMG/KQlCfnNQKp1ci8mG3EdthQPhvdcZ4jtbqnSI=
 =TS64
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a smaller cycle than normal. One new driver was
  accepted, which is unusual, and at least one more driver remains in
  review on the list.

  Summary:

   - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
     vmw_pvrdma

   - Many patches from MatthewW converting radix tree and IDR users to
     use xarray

   - Introduction of tracepoints to the MAD layer

   - Build large SGLs at the start for DMA mapping and get the driver to
     split them

   - Generally clean SGL handling code throughout the subsystem

   - Support for restricting RDMA devices to net namespaces for
     containers

   - Progress to remove object allocation boilerplate code from drivers

   - Change in how the mlx5 driver shows representor ports linked to VFs

   - mlx5 uapi feature to access the on chip SW ICM memory

   - Add a new driver for 'EFA'. This is HW that supports user space
     packet processing through QPs in Amazon's cloud"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
  RDMA/ipoib: Allow user space differentiate between valid dev_port
  IB/core, ipoib: Do not overreact to SM LID change event
  RDMA/device: Don't fire uevent before device is fully initialized
  lib/scatterlist: Remove leftover from sg_page_iter comment
  RDMA/efa: Add driver to Kconfig/Makefile
  RDMA/efa: Add the efa module
  RDMA/efa: Add EFA verbs implementation
  RDMA/efa: Add common command handlers
  RDMA/efa: Implement functions that submit and complete admin commands
  RDMA/efa: Add the ABI definitions
  RDMA/efa: Add the com service API definitions
  RDMA/efa: Add the efa_com.h file
  RDMA/efa: Add the efa.h header file
  RDMA/efa: Add EFA device definitions
  RDMA: Add EFA related definitions
  RDMA/umem: Remove hugetlb flag
  RDMA/bnxt_re: Use core helpers to get aligned DMA address
  RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
  RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
  RDMA/umem: Add API to find best driver supported page size in an MR
  ...
2019-05-09 09:02:46 -07:00
Linus Torvalds
80f232121b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

   2) Add fib_sync_mem to control the amount of dirty memory we allow to
      queue up between synchronize RCU calls, from David Ahern.

   3) Make flow classifier more lockless, from Vlad Buslov.

   4) Add PHY downshift support to aquantia driver, from Heiner
      Kallweit.

   5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
      contention on SLAB spinlocks in heavy RPC workloads.

   6) Partial GSO offload support in XFRM, from Boris Pismenny.

   7) Add fast link down support to ethtool, from Heiner Kallweit.

   8) Use siphash for IP ID generator, from Eric Dumazet.

   9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
      entries, from David Ahern.

  10) Move skb->xmit_more into a per-cpu variable, from Florian
      Westphal.

  11) Improve eBPF verifier speed and increase maximum program size,
      from Alexei Starovoitov.

  12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
      spinlocks. From Neil Brown.

  13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

  14) Improve link partner cap detection in generic PHY code, from
      Heiner Kallweit.

  15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
      Maguire.

  16) Remove SKB list implementation assumptions in SCTP, your's truly.

  17) Various cleanups, optimizations, and simplifications in r8169
      driver. From Heiner Kallweit.

  18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

  19) Switch PHY drivers over to use dynamic featue detection, from
      Heiner Kallweit.

  20) Support flow steering without masking in dpaa2-eth, from Ioana
      Ciocoi.

  21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
      Pirko.

  22) Increase the strict parsing of current and future netlink
      attributes, also export such policies to userspace. From Johannes
      Berg.

  23) Allow DSA tag drivers to be modular, from Andrew Lunn.

  24) Remove legacy DSA probing support, also from Andrew Lunn.

  25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
      Haabendal.

  26) Add a generic tracepoint for TX queue timeouts to ease debugging,
      from Cong Wang.

  27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
  cxgb4: Fix error path in cxgb4_init_module
  net: phy: improve pause mode reporting in phy_print_status
  dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
  net: macb: Change interrupt and napi enable order in open
  net: ll_temac: Improve error message on error IRQ
  net/sched: remove block pointer from common offload structure
  net: ethernet: support of_get_mac_address new ERR_PTR error
  net: usb: smsc: fix warning reported by kbuild test robot
  staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
  net: dsa: support of_get_mac_address new ERR_PTR error
  net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
  vrf: sit mtu should not be updated when vrf netdev is the link
  net: dsa: Fix error cleanup path in dsa_init_module
  l2tp: Fix possible NULL pointer dereference
  taprio: add null check on sched_nest to avoid potential null pointer dereference
  net: mvpp2: cls: fix less than zero check on a u32 variable
  net_sched: sch_fq: handle non connected flows
  net_sched: sch_fq: do not assume EDT packets are ordered
  net: hns3: use devm_kcalloc when allocating desc_cb
  net: hns3: some cleanup for struct hns3_enet_ring
  ...
2019-05-07 22:03:58 -07:00
Linus Torvalds
dd4e5d6106 Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
 architectures that need it, hide the barrier inside spin_unlock() when
 MMIO has been performed inside the critical section.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
 YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
 VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
 GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
 rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
 yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
 WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
 =aC8m
 -----END PGP SIGNATURE-----

Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull mmiowb removal from Will Deacon:
 "Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())

  Remove mmiowb() from the kernel memory barrier API and instead, for
  architectures that need it, hide the barrier inside spin_unlock() when
  MMIO has been performed inside the critical section.

  The only relatively recent changes have been addressing review
  comments on the documentation, which is in a much better shape thanks
  to the efforts of Ben and Ingo.

  I was initially planning to split this into two pull requests so that
  you could run the coccinelle script yourself, however it's been plain
  sailing in linux-next so I've just included the whole lot here to keep
  things simple"

* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
  docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
  docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
  arch: Remove dummy mmiowb() definitions from arch code
  net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
  i40iw: Redefine i40iw_mmiowb() to do nothing
  scsi/qla1280: Remove stale comment about mmiowb()
  drivers: Remove explicit invocations of mmiowb()
  drivers: Remove useless trailing comments from mmiowb() invocations
  Documentation: Kill all references to mmiowb()
  riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  m68k/io: Remove useless definition of mmiowb()
  nds32/io: Remove useless definition of mmiowb()
  x86/io: Remove useless definition of mmiowb()
  arm64/io: Remove useless definition of mmiowb()
  ARM/io: Remove useless definition of mmiowb()
  mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  ...
2019-05-06 16:57:52 -07:00
Leon Romanovsky
10bf13c334 RDMA/mlx5: Remove MAYEXEC flag
MAYEXEC flag was mistakenly added in the commit cited in the fixes line.

Fixes: 4eb6ab13b9 ("RDMA: Remove rdma_user_mmap_page")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:56:55 -03:00
Ariel Levkovich
33cde96fb5 IB/mlx5: Device resource control for privileged DEVX user
For DEVX users who have SYS_RAWIO capability, we set the internal device
resources capability when creating the UCTX.  This will allow the device
to restrict the allocation of internal device resources such as SW ICM
memory to privileged DEVX users only.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:51 -03:00
Ariel Levkovich
25c13324d0 IB/mlx5: Add steering SW ICM device memory type
This patch adds support for allocating, deallocating and registering a new
device memory type, STEERING_SW_ICM.  This memory can be allocated and
used by a privileged user for direct rule insertion and management of the
device's steering tables.

The type is provided by the user via the dedicated attribute in the
alloc_dm ioctl command.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:51 -03:00
Ariel Levkovich
4056b12efd IB/mlx5: Warn on allocated MEMIC buffers during cleanup
Adding a warning on allocated MEMIC buffers that weren't freed prior to
driver tear down.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:51 -03:00
Ariel Levkovich
3b113a1ec3 IB/mlx5: Support device memory type attribute
This patch intoruduces a new mlx5_ib driver attribute to the DM allocation
method - the DM type.

In order to allow addition of new types in downstream patches this patch
also refactors the allocation, deallocation and registration handlers to
consider the requested type and perform the necessary actions according to
it.

Since not all future device memory types will be such that are mapped to
user memory, the mandatory page index output attribute is modified to be
optional.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:51:50 -03:00
David S. Miller
f3f050a4df mlx5-updates-2019-04-30
mlx5 misc updates:
 
 1) Bodong Wang and Parav Pandit (6):
    - Remove unused mlx5_query_nic_vport_vlans
    - vport macros refactoring
    - Fix vport access in E-Switch
    - Use atomic rep state to serialize state change
 
 2) Eli Britstein (2):
    - prio tag mode support, added ACLs and replace TC vlan pop with
      vlan 0 rewrite when prio tag mode is enabled.
 
 3) Erez Alfasi (2):
    - ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions
    - mlx5e: ethtool, Add support for EEPROM high pages query
 
 4) Masahiro Yamada (1):
    - remove meaningless CFLAGS_tracepoint.o
 
 5) Maxim Mikityanskiy (1):
    - Put the common XDP code into a function
 
 6) Tariq Toukan (2):
    - Turn on HW tunnel offload in all TIRs
 
 7) Vlad Buslov (1):
    - Return error when trying to insert existing flower filter
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcyhIFAAoJEEg/ir3gV/o+LgsH/idNT42AQewm2gn1NAt/njRx
 hA/ILH4ZmqYD8tgme5q3lByGrGRTweCPQ92+/tYP1i90PL8EJKNFbRPXuORp+hUk
 m+ywoeyBHx0ZyDlAIGNDCFprY//jZV/3XQKuJhLUliGfN77lUSkVtIz2UY+cDr2U
 XBn0B3Fy54+XP7EqVHXdxRkLiwDCsDwZBF6O9/1cw/rKsly6fIzw1b7UVjFaFA8f
 1g5Ca/+v4X0Rsky1KOGLv8HVB4bxbiSZspAjKwVGJagPUNJMRR6xZyL+VNHWX71R
 N68VMQQbwg7XDDFQNtYAFSpxOkAY+wilkRDe7+3A50cFE8ZYYskwVJunvb75fCA=
 =oqb8
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-04-30

mlx5 misc updates:

1) Bodong Wang and Parav Pandit (6):
   - Remove unused mlx5_query_nic_vport_vlans
   - vport macros refactoring
   - Fix vport access in E-Switch
   - Use atomic rep state to serialize state change

2) Eli Britstein (2):
   - prio tag mode support, added ACLs and replace TC vlan pop with
     vlan 0 rewrite when prio tag mode is enabled.

3) Erez Alfasi (2):
   - ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions
   - mlx5e: ethtool, Add support for EEPROM high pages query

4) Masahiro Yamada (1):
   - remove meaningless CFLAGS_tracepoint.o

5) Maxim Mikityanskiy (1):
   - Put the common XDP code into a function

6) Tariq Toukan (2):
   - Turn on HW tunnel offload in all TIRs

7) Vlad Buslov (1):
   - Return error when trying to insert existing flower filter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 00:25:02 -04:00
Parav Pandit
a70c07397f RDMA: Introduce and use GID attr helper to read RoCE L2 fields
Instead of RoCE drivers figuring out vlan, smac fields while working on
QP/AH, provide a helper routine to read the L2 fields such as vlan_id and
source mac address.

This moves logic from mlx5 driver to core for wider usage for RoCE ports.

This is a preparation patch to allow detaching netdev in subsequent patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 11:10:02 -03:00
Jack Morgenstein
8f4426aa19 IB/mlx5: Add missing XRC options to QP optional params mask
The QP transition optional parameters for the various transition for XRC
QPs are identical to those for RC QPs.

Many of the XRC QP transition optional parameter bits are missing from the
QP optional mask table.  These omissions caused failures when doing XRC QP
state transitions.

For example, when trying to change the response timer of an XRC receive QP
via the RTS2RTS transition, the new timer value was ignored because
MLX5_QP_OPTPAR_RNR_TIMEOUT bit was missing from the optional params mask
for XRC qps for the RTS2RTS transition.

Fix this by adding the missing XRC optional parameters for all QP
transitions to the opt_mask table.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Fixes: a4774e9095 ("IB/mlx5: Fix opt param mask according to firmware spec")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03 10:15:13 -03:00
David S. Miller
ff24e4980a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three trivial overlapping conflicts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02 22:14:21 -04:00
Saeed Mahameed
c515e70d67 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This merge commit includes some misc shared code updates from mlx5-next branch needed
for net-next.

1) From Aya: Enable general events on all physical link types and
   restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma
   driver to ethernet links only as it was intended.

2) From Eli: Introduce low level bits for prio tag mode

3) From Maor: Low level steering updates to support RDMA RX flow
   steering and enables RoCE loopback traffic when switchdev is enabled.

4) From Vu and Parav: Two small mlx5 core cleanups

5) From Yevgeny add HW definitions of geneve offloads

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01 13:57:48 -07:00
Aya Levin
6cfdc7e468 IB/mlx5: Restrict 'DELAY_DROP_TIMEOUT' subtype to Ethernet interfaces
Subtype 'DELAY_DROP_TIMEOUT' (under 'GENERAL' event) is restricted to
Ethernet interfaces. This patch doesn't change functionality or breaks
current flow. In the downstream patch, non Ethernet (like IB) interfaces
will receive 'GENERAL' event.

Fixes: 5d3c537f90 ("net/mlx5: Handle event of power detection in the PCIE slot")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-29 16:55:05 -07:00
Vu Pham
c42260f195 net/mlx5: Separate and generalize dma device from pci device
The mlx5 Sub-Function (SF) sub device will be introduced in
subsequent patches. It will be created as mediated device and
belong to mdev bus. It is necessary to treat dma operations on
PF, VF and SF in uniform way, hence reduce the dependency on
pdev pci dev struct and work directly out of newly introduced
'struct device' from previous patch.

This patch does not change any functionality.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-29 16:55:05 -07:00
Jason Gunthorpe
1d045aa76f Merge branch 'mlx5_tir_icm' into rdma.git for-next
Ariel Levkovich says:

====================
The series exposes the ICM address of the receive transport
interface (TIR) of Raw Packet and RSS QPs to the user since they are
required to properly create and insert steering rules that direct flows to
these QPs.
====================

For dependencies this branch is based on mlx5-next from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

* branch 'mlx5_tir_icm':
  IB/mlx5: Expose TIR ICM address to user space
  net/mlx5: Introduce new TIR creation core API
  net/mlx5: Expose TIR ICM address in command outbox

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 10:33:00 -03:00
Ariel Levkovich
1f1d6abbf0 IB/mlx5: Expose TIR ICM address to user space
This patch exposes the TIR ICM address of raw packet and RSS
QPs to user space.

In order to pass the new field, the patch extends the mlx5
specific QP creation response structure and fills it with
the icm address returned by the FW command, if available.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25 10:03:19 -03:00
Jason Gunthorpe
449a224c10 Merge branch 'rdma_mmap' into rdma.git for-next
Jason Gunthorpe says:

====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
 * BAR pages intended for read-only can be switched to writable via mprotect.
 * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
 * Disassociate causes SIGBUS when touching the pages.
 * CPU pages are being mapped through to the process via remap_pfn_range
   instead of the more appropriate vm_insert_page, causing weird behaviors
   during disassociation.

This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================

For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

* branch 'rdma_mmap':
  RDMA: Remove rdma_user_mmap_page
  RDMA/mlx5: Use get_zeroed_page() for clock_info
  RDMA/ucontext: Fix regression with disassociate
  RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
  RDMA/mlx5: Do not allow the user to write to the clock page

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:20:34 -03:00
Jason Gunthorpe
4eb6ab13b9 RDMA: Remove rdma_user_mmap_page
Upon further research drivers that want this should simply call the core
function vm_insert_page(). The VMA holds a reference on the page and it
will be automatically freed when the last reference drops. No need for
disassociate to sequence the cleanup.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:18:36 -03:00
Jason Gunthorpe
ddcdc368b1 RDMA/mlx5: Use get_zeroed_page() for clock_info
get_zeroed_page() returns a virtual address for the page which is better
than allocating a struct page and doing a permanent kmap on it.

Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 13:40:50 -03:00
Jason Gunthorpe
d5e560d3f7 RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.

Cc: stable@vger.kernel.org
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-24 13:06:40 -03:00
Jason Gunthorpe
c660133c33 RDMA/mlx5: Do not allow the user to write to the clock page
The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.

Cc: stable@vger.kernel.org
Fixes: 5c99eaecb1 ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-24 13:06:24 -03:00
Saeed Mahameed
c3bdd5e651 Linux 5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM
 pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW
 H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t
 wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX
 jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ
 7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z
 4+dwGBk=
 =pyXR
 -----END PGP SIGNATURE-----

Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next

Linux 5.1-rc1

We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-22 15:25:39 -07:00
Mark Bloch
5fb58c9e2f RDMA/mlx5: Don't create IB representors when in multiport RoCE mode
Switchdev mode and mutiport RoCE mode aren't compatible at this point.
Don't create IB reps when a user switches to switchdev mode and the driver
operates in that mode.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
d3b5cc1cd9 RDMA/mlx5: Initialize roce port info before multiport master init
When working in mutliport RoCE mode it is possible to attach a slave
before the master. In that case the slave is waiting for a master to be
attached.  When the master is attached it goes over the list of waiting
slaves, finds a slave that is compatible and tries to bind it to itself.

The call stack is:
mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port()

In the bind function we will create a netdev notifier, but this is done
before we initialize the RoCE structure (this is done at a later stage by
the master in the ROCE stage).

Once events are delivered to that notifier we will use
mlx5_ib_get_native_port_mdev() to get the actual port and as the native
port is zero we will access an invalid index in the port structure.

Move the RoCE structure initialization to an earlier stage.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
7f575103b0 RDMA/mlx5: Allow DEVX and raw creation flow on reps
Remove the limitations that were in place and provide support for DEVX and
raw flow creation on reps.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Maor Gottlieb
56e5acd405 RDMA/mlx5: Add query e-switch vport context to devx white list
Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed
only if HCA_CAP.eswitch_manager==1.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
52438be441 RDMA/mlx5: Allow inserting a steering rule to the FDB
Allow this only via mlx5 raw create flow API, legacy verbs are not
supported. To accommodate that, we add a new attribute to matcher creation
to indicate the type of flow table to be used.
	MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE
With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer
needed, we keep it for compatibility but at most only a single attribute can
be passed of the two.

When inserting a flow rule to the FDB we require that a DEVX FT is
provided as a destination, no other configuration is allowed.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
3b70508a6b RDMA/mlx5: Create flow table with max size supported
Instead of failing the request, just use the supported number of flow
entries.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Mark Bloch
13a4376568 RDMA/mlx5: Access the prio bypass inside the FDB flow table namespace
Now that we have a specific prio inside the FDB namespace allow retrieving
it from the RDMA side.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-22 15:24:05 -03:00
Guy Levi
7249c8ea22 IB/mlx5: Fix scatter to CQE in DCT QP creation
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.

The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.

A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.

Fixes: 5d6ff1babe ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-18 03:13:41 -03:00
David S. Miller
6b0a7f84ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflict resolution of af_smc.c from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17 11:26:25 -07:00
Colin Ian King
1db86318c4 RDMA/mlx5: Check for error return in flow_rule rather than err
Currently when the call to create_flow_rule_vport_sq fails, the error
check is being performed on err rather than on the return pointer
flow_rule.  The return flow_rule maybe NULL (which is not considered an
error) or an error code, so check for the error on flow_rule.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: d5ed8ac34c ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-12 11:19:49 -03:00
Mark Bloch
fb652d3299 RDMA/mlx5: Remove VF representor profile
Now that we have a single IB device with multiple ports we can remove the
VF representor profile.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:40 -03:00
Mark Bloch
26628e2d58 RDMA/mlx5: Move to single device multiport ports in switchdev mode
Move from IB device (representor) per virtual function to single IB device
with port per virtual function (port 1 represents the uplink). As number
of ports is a static property of an IB device, declare the IB device with
as many port as the possible according to the PCI bus.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
a989ea01cb RDMA/mlx5: Move SMI caps logic
We store the SMI information in the core device's struct, make sure we set
that information only once (and not per port), while here make the for
loop based on the actual size of the array.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
35b0aa67b2 RDMA/mlx5: Refactor netdev affinity code
The design of representors is such that once an IB representor is created,
the netdev of representor already exists, we can use that fact to simplify
the netdev affinity code.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
d5ed8ac34c RDMA/mlx5: Move default representors SQ steering to rule to modify QP
Currently the steering for SQs created on representors is done on
creation, once we move to representors as ports of an IB device we need
the port argument which is given only at the modify QP stage, adjust the
code appropriately.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
6a4d00be08 RDMA/mlx5: Move rep into port struct
In preparation of moving into a model of single IB device multiple ports
move rep to be part of the port structure. We mark a representor device by
setting is_rep, no functional change with this patch.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
5d8f6a0e92 RDMA/mlx5: Use correct size for device resources
On allocation we use the array size and on destruction num_ports, use the
array size of destruction as well, in this context the array corresponds
to the native/actual ports on the NIC so no need to adjust this logic for
representors.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
da796ccb3e RDMA/mlx5: Move ports allocation to outside of INIT stage
In downstream patches we will need access to the ports before doing any
stages, in order to set net device per representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
4a6dc8552a RDMA/mlx5: Free IB device on remove
Simplify the code and move the deallocation of the IB device into the
remove function.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Mark Bloch
95579e785a RDMA/mlx5: Move netdev info into the port struct
Netdev info is stored in a separate array and holds data relevant on a per
port basis, move it to be part of the port struct.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:05:39 -03:00
Jason Gunthorpe
5331fa0db7 Merge branch 'mlx5-next' into rdma.git for-next
From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Required for dependencies on the next series

* branch 'mlx5-next':
  net/mlx5: E-Switch, add a new prio to be used by the RDMA side
  net/mlx5: E-Switch, don't use hardcoded values for FDB prios
  net/mlx5: Fix false compilation warning
  net/mlx5: Expose MPEIN (Management PCIE INfo) register layout
  net/mlx5: Add rate limit print macros
  net/mlx5: Add explicit bar address field
  net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info
  net/mlx5: Use dev->priv.name instead of dev_name
  net/mlx5: Make mlx5_core messages independent from mdev->pdev
  net/mlx5: Break load_one into three stages
  net/mlx5: Function setup/teardown procedures
  net/mlx5: Move health and page alloc init to mdev_init
  net/mlx5: Split mdev init and pci init
  net/mlx5: Remove redundant init functions parameter
  net/mlx5: Remove spinlock support from mlx5_write64
  net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 14:59:27 -03:00
Leon Romanovsky
68e326dea1 RDMA: Handle SRQ allocations by IB/core
Convert SRQ allocation from drivers to be in the IB/core

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Leon Romanovsky
d345691471 RDMA: Handle AH allocations by IB/core
Simplify drivers by ensuring lifetime of ib_ah object. The changes
in .create_ah() go hand in hand with relevant update in .destroy_ah().

We will use this opportunity and convert .destroy_ah() to don't fail, as
it was suggested a long time ago, because there is nothing to do in case
of failure during destroy.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Jason Gunthorpe
e79c9c6062 IB/mlx5: Remove references to uboject->context
These should all go through udata now. Add mlx5_udata_to_mdev to convert
a udata into the struct mlx5_ib_dev as these call sites require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:25 -03:00
Shiraz Saleem
d10bcf947a RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs
Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.

Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.

Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.

Move npages tracking to ib_umem_odp as ODP drivers still need it.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08 13:05:24 -03:00
Will Deacon
fb24ea52f7 drivers: Remove explicit invocations of mmiowb()
mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:01:02 +01:00
Saeed Mahameed
b6460c72c3 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This merge commit includes some misc shared code updates from mlx5-next branch needed
for net-next.

1) From Maxim, Remove un-used macros and spinlock from mlx5 code.

2) From Aya, Expose Management PCIE info register layout and add rate limit
print macros.

3) From Tariq, Compilation warning fix in fs_core.c

4) From Vu, Huy and Saeed, Improve mlx5 initialization flow:
The goal is to provide a better logical separation of mlx5 core
device initialization flow and will help to seamlessly support
creating different mlx5 device types such as PF, VF and SF
mlx5 sub-function virtual devices.

Mlx5_core driver needs to separate HCA resources from pci resources.
Its initialize/load/unload will be broken into stages:
1. Initialize common data structures
2. Setup function which initializes pci resources (for PF/VF)
   or some other specific resources for virtual device
3. Initialize software objects according to hardware capabilities
4. Load all mlx5_core components

It is also necessary to detach mlx5_core mdev name/message from pci
device mdev->pdev name/message for a clearer report/debug of
different mlx5 device types.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-05 14:10:16 -07:00
Leon Romanovsky
0f51427bd0 RDMA/mlx5: Cleanup WQE page fault handler
Refactor the page fault handler to be more readable and extensible, this
cleanup was triggered by the error reported below. The code structure made
it unclear to the automatic tools to identify that such a flow is not
possible in real life because "requestor != NULL" means that "qp != NULL"
too.

    drivers/infiniband/hw/mlx5/odp.c:1254 mlx5_ib_mr_wqe_pfault_handler()
    error: we previously assumed 'qp' could be null (see line 1230)

Fixes: 08100fad5c ("IB/mlx5: Add ODP SRQ support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-04 08:28:59 -03:00
Jason Gunthorpe
1c726c4421 Merge HFI1 updates into k.o/for-next
Based on rdma.git for-rc for dependencies.

From Dennis Dalessandro:

====================

Here are some code improvement patches and fixes for less serious bugs to
TID RDMA than we sent for RC.

====================

* HFI1 updates:
  IB/hfi1: Implement CCA for TID RDMA protocol
  IB/hfi1: Remove WARN_ON when freeing expected receive groups
  IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE
  IB/hfi1: Add a function to read next expected psn from hardware flow
  IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:28:05 -03:00
Huy Nguyen
aa8106f137 net/mlx5: Add explicit bar address field
Add bar_addr field to store bar-0 address to avoid calling
pci_resource_start with hard-coded bar-0 as parameter.
Also note that different mlx5 device types will have bar_addr
on different bars.

This patch does not change any functionality.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02 12:49:38 -07:00
Maxim Mikityanskiy
bbf29f618e net/mlx5: Remove spinlock support from mlx5_write64
As there is no user of mlx5_write64 that passes a spinlock to
mlx5_write64, remove this functionality and simplify the function.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-02 12:49:37 -07:00
Shamir Rabinovitch
ff23dfa134 IB: Pass only ib_udata in function prototypes
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.

Make ib_udata the only argument psssed.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 15:00:47 -03:00
Shamir Rabinovitch
bdeacabd1a IB: Remove 'uobject->context' dependency in object destroy APIs
Now that we have the udata passed to all the ib_xxx object destroy APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:59:35 -03:00
Shamir Rabinovitch
c4367a2635 IB: Pass uverbs_attr_bundle down ib_x destroy path
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:57:35 -03:00
Shamir Rabinovitch
a6a3797df2 IB: Pass uverbs_attr_bundle down uobject destroy path
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will
use this to eliminate the dependecy of the drivers in ib_x->uobject
pointers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01 14:55:36 -03:00
Bart Van Assche
1f687edee2 IB/mlx5: Declare devx_async_cmd_event_fops static
Avoid that sparse complains about a missing declaration.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes: 6bf8f22aea ("IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28 10:22:48 -03:00
Artemy Kovalyov
d623dfd283 IB/mlx5: Compare only index part of a memory window rkey
The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY
WINDOWS says that if the CI supports the Base Memory Management Extensions
defined in this specification, the R_Key format for a Type 2 Memory Window
must consist of:

* 24 bit index in the most significant bits of the R_Key, which is owned
  by the CI, and
* 8 bit key in the least significant bits of the R_Key, which is owned by
  the Consumer.

This means that the kernel should compare only the index part of a R_Key
to determine equality with another R_Key.

Fixes: db570d7dea ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:27:56 -03:00
Artemy Kovalyov
1e5887b700 IB/mlx5: WQE dump jumps over first 16 bytes
Move index increment after its is used or otherwise it will start the dump
of the WQE from second WQE BB.

Fixes: 34f4c9554d ("IB/mlx5: Use fragmented QP's buffer for in-kernel users")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:27:33 -03:00
Moni Shoua
1abe186ed8 IB/mlx5: Reset access mask when looping inside page fault handler
If page-fault handler spans multiple MRs then the access mask needs to
be reset before each MR handling or otherwise write access will be
granted to mapped pages instead of read-only.

Cc: <stable@vger.kernel.org> # 3.19
Fixes: 7bdf65d411 ("IB/mlx5: Handle page faults")
Reported-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27 15:27:11 -03:00
Matthew Wilcox
b02a29eb88 mlx5: Convert mlx5_srq_table to XArray
Remove the custom spinlock as the XArray handles its own locking.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26 09:39:42 -03:00
Aya Levin
cd27287562 IB/mlx5: Fix mapping of link-mode to IB width and speed
Add mapping of link mode: CAUI4 100Gbps CR4/KR4 with 4 lines and 25Gbps.
Fix mapping of link mode: GAUI2 50Gbps CR2/KR2 to be 2 lines with 25Gbps.

Fixes: 08e8676f16 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:39 -03:00
Yishai Hadas
c5ae1954c4 IB/mlx5: Use mlx5 core to create/destroy a DEVX DCT
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling DRAIN DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

In that case the DRAIN DCT command will be called and only once that it
will be completed the DESTROY DCT command will be called.  Otherwise, the
DESTROY DCT may fail and a hardware leak may occur.

As of that change the DRAIN DCT command should not be exposed any more
from DEVX, it's managed internally by the driver to work as expected by
the device specification.

Fixes: 7efce3691d ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17 21:40:39 -03:00
Linus Torvalds
a50243b1dd 5.1 Merge Window Pull Request
This has been a slightly more active cycle than normal with ongoing core
 changes and quite a lot of collected driver updates.
 
 - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe
 
 - A new data transfer mode for HFI1 giving higher performance
 
 - Significant functional and bug fix update to the mlx5 On-Demand-Paging MR
   feature
 
 - A chip hang reset recovery system for hns
 
 - Change mm->pinned_vm to an atomic64
 
 - Update bnxt_re to support a new 57500 chip
 
 - A sane netlink 'rdma link add' method for creating rxe devices and fixing
   the various unregistration race conditions in rxe's unregister flow
 
 - Allow lookup up objects by an ID over netlink
 
 - Various reworking of the core to driver interface:
   * Drivers should not assume umem SGLs are in PAGE_SIZE chunks
   * ucontext is accessed via udata not other means
   * Start to make the core code responsible for object memory
     allocation
   * Drivers should convert struct device to struct ib_device
     via a helper
   * Drivers have more tools to avoid use after unregister problems
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlyAJYYACgkQOG33FX4g
 mxrWwQ/+OyAx4Moru7Aix0C6GWxTJp/wKgw21CS3reZxgLai6x81xNYG/s2wCNjo
 IccObVd7mvzyqPdxOeyHBsJBbQDqWvoD6O2duH8cqGMgBRgh3CSdUep2zLvPpSAx
 2W1SvWYCLDnCuarboFrCA8c4AN3eCZiqD7z9lHyFQGjy3nTUWzk1uBaOP46uaiMv
 w89N8EMdXJ/iY6ONzihvE05NEYbMA8fuvosKLLNdghRiHIjbMQU8SneY23pvyPDd
 ZziPu9NcO3Hw9OVbkwtJp47U3KCBgvKHmnixyZKkikjiD+HVoABw2IMwcYwyBZwP
 Bic/ddONJUvAxMHpKRnQaW7znAiHARk21nDG28UAI7FWXH/wMXgicMp6LRcNKqKF
 vqXdxHTKJb0QUR4xrYI+eA8ihstss7UUpgSgByuANJ0X729xHiJtlEvPb1DPo1Dz
 9CB4OHOVRl5O8sA5Jc6PSusZiKEpvWoyWbdmw0IiwDF5pe922VLl5Nv88ta+sJ38
 v2Ll5AgYcluk7F3599Uh9D7gwp5hxW2Ph3bNYyg2j3HP4/dKsL9XvIJPXqEthgCr
 3KQS9rOZfI/7URieT+H+Mlf+OWZhXsZilJG7No0fYgIVjgJ00h3SF1/299YIq6Qp
 9W7ZXBfVSwLYA2AEVSvGFeZPUxgBwHrSZ62wya4uFeB1jyoodPk=
 =p12E
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a slightly more active cycle than normal with ongoing
  core changes and quite a lot of collected driver updates.

   - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

   - A new data transfer mode for HFI1 giving higher performance

   - Significant functional and bug fix update to the mlx5
     On-Demand-Paging MR feature

   - A chip hang reset recovery system for hns

   - Change mm->pinned_vm to an atomic64

   - Update bnxt_re to support a new 57500 chip

   - A sane netlink 'rdma link add' method for creating rxe devices and
     fixing the various unregistration race conditions in rxe's
     unregister flow

   - Allow lookup up objects by an ID over netlink

   - Various reworking of the core to driver interface:
       - drivers should not assume umem SGLs are in PAGE_SIZE chunks
       - ucontext is accessed via udata not other means
       - start to make the core code responsible for object memory
         allocation
       - drivers should convert struct device to struct ib_device via a
         helper
       - drivers have more tools to avoid use after unregister problems"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
  net/mlx5: ODP support for XRC transport is not enabled by default in FW
  IB/hfi1: Close race condition on user context disable and close
  RDMA/umem: Revert broken 'off by one' fix
  RDMA/umem: minor bug fix in error handling path
  RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
  cxgb4: kfree mhp after the debug print
  IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
  IB/rdmavt: Fix loopback send with invalidate ordering
  IB/iser: Fix dma_nents type definition
  IB/mlx5: Set correct write permissions for implicit ODP MR
  bnxt_re: Clean cq for kernel consumers only
  RDMA/uverbs: Don't do double free of allocated PD
  RDMA: Handle ucontext allocations by IB/core
  RDMA/core: Fix a WARN() message
  bnxt_re: fix the regression due to changes in alloc_pbl
  IB/mlx4: Increase the timeout for CM cache
  IB/core: Abort page fault handler silently during owning process exit
  IB/mlx5: Validate correct PD before prefetch MR
  IB/mlx5: Protect against prefetch of invalid MR
  RDMA/uverbs: Store PR pointer before it is overwritten
  ...
2019-03-09 15:53:03 -08:00
Moni Shoua
7095ec3ca0 IB/mlx5: Set correct write permissions for implicit ODP MR
The write access of an implicit MR is inherited to all of its children.
Therefore we must set the correct write access to the parent MR.

Pass full access_flags when creating umem to let it calculate write access
correctly.

Fixes: da6a496a34 ("IB/mlx5: Ranges in implicit ODP MR inherit its write access")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-04 10:57:19 -04:00
Leon Romanovsky
a2a074ef39 RDMA: Handle ucontext allocations by IB/core
Following the PD conversion patch, do the same for ucontext allocations.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22 14:11:37 -07:00
Moni Shoua
81dd4c4be3 IB/mlx5: Validate correct PD before prefetch MR
When prefetching odp mr it is required to verify that pd of the mr is
identical to the pd for which the advise_mr request arrived with.

This check was missing from synchronous flow and is added now.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 16:32:45 -07:00
Moni Shoua
a6bc3875f1 IB/mlx5: Protect against prefetch of invalid MR
When deferring a prefetch request we need to protect against MR or PD
being destroyed while the request is still enqueued.

The first step is to validate that PD owns the lkey that describes the MR
and that the MR that the lkey refers to is owned by that PD.

The second step is to dequeue all requests when MR is destroyed.

Since PD can't be destroyed while it owns MRs it is guaranteed that when a
worker wakes up the request it refers to is still valid.

Now, it is possible to refrain from taking a reference on the device since
it is assured to be present as pd.

While that, replace the dedicated ordered workqueue with the system
unbound workqueue to reuse an existing resource and improve
performance. This will also fix a bug of queueing to the wrong workqueue.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 16:32:45 -07:00
Jason Gunthorpe
815f748037 Merge branch 'mlx5-next' into rdma.git for-next
From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

To resolve conflicts with net-next and pick up the first patch.

* branch 'mlx5-next':
  net/mlx5: Factor out HCA capabilities functions
  IB/mlx5: Add support for 50Gbps per lane link modes
  net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
  net/mlx5: Add new fields to Port Type and Speed register
  net/mlx5: Refactor queries to speed fields in Port Type and Speed register
  net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode
  net/mlx5: Relocate vport macros to the vport header file
  net/mlx5: E-Switch, Normalize the name of uplink vport number
  net/mlx5: Provide an alternative VF upper bound for ECPF
  net/mlx5: Add host params change event
  net/mlx5: Add query host params command
  net/mlx5: Update enable HCA dependency
  net/mlx5: Introduce Mellanox SmartNIC and modify page management logic
  IB/mlx5: Use unified register/load function for uplink and VF vports
  net/mlx5: Use consistent vport num argument type
  net/mlx5: Use void pointer as the type in address_of macro
  net/mlx5: Align ODP capability function with netdev coding style
  mlx5: use RCU lock in mlx5_eq_cq_get()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21 12:40:18 -07:00
Bodong Wang
f8e8fa0262 net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver
Eswitch has two users: IB and ETH. They both register repersentors
when mlx5 interface is added, and unregister the repersentors when
mlx5 interface is removed. Ideally, each driver should only deal with
the entities which are unique to itself. However, current IB and ETH
drivers have to perform the following eswitch operations:

1. When registering, specify how many vports to register. This number
   is the same for both drivers which is the total available vport
   numbers.
2. When unregistering, specify the number of registered vports to do
   unregister. Also, unload the repersentors which are already loaded.

It's unnecessary for eswitch driver to hands out the control of above
operations to individual driver users, as they're not unique to each
driver. Instead, such operations should be centralized to eswitch
driver. This consolidates eswitch control flow, and simplified IB and
ETH driver.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 17:25:58 -08:00
Saeed Mahameed
259fae5a2c Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Merge mlx5-next shared branched into net-next,

From Bodong Wang:
1) Introduction of ECPF (Embedded CPU Physical Function), and low level
bits for mlx5 SmartNic capabilities support.
2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core

From Aya Levin,
3) Add support for 50Gbps per lane link modes in the Port Type and Speed
register (PTYS)
4) Refactor low level query functions for PTYS register
5) Add support for 50Gbps per lane link modes to mlx5_ib

Note: due to a change in API in mlx5/core and a later patch from net-next,
a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT
with MLX5_VPORT_UPLINK which exists only in upstream net-next.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15 16:45:31 -08:00
Shamir Rabinovitch
8994445054 IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs
Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15 15:38:38 -07:00
Aya Levin
08e8676f16 IB/mlx5: Add support for 50Gbps per lane link modes
Driver now supports new link modes: 50Gbps per lane support for
50G/100G/200G. This patch reads the correct field (legacy vs. extended)
based on a FW indication bit, and adds a translation function (link
modes to IB width and speed) to the new link modes.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Aya Levin
a08b4ed137 net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
This patch exposes new link modes (including 50Gbps per lane), and ext_*
fields which describes the new link modes in Port Type and Speed
register (PTYS).
Access functions, translation functions (speed <-> HW bits) and
link max speed function were modified.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Aya Levin
bc4e12ffef net/mlx5: Refactor queries to speed fields in Port Type and Speed register
This patch fascicles queries to speed related fields in Port Type and
Speed register (PTYS) into a single API. I addition, this patch
refactors functions which serves only Ethernet driver: remove the
protocol type as an input parameter, move code from 'core' directory
into 'en' directory and add 'eth' prefix to the function's name. The
patch also encapsulates functions that are not used outside the Ethernet
driver removes redundant include files.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang
b05af6aacd net/mlx5: E-Switch, Normalize the name of uplink vport number
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to
comply with the same naming convention along with the introduction of
other vports. Use MLX5_VPORT as the prefix for such vports and
relocate the uplink vport definition to public header file for the
benefits of both net and IB drivers.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:42 -08:00
Bodong Wang
f0666f1f22 IB/mlx5: Use unified register/load function for uplink and VF vports
IB driver maintains different registration and load function calls
for uplink and VF vports. This is not necessary as they only differ
with each other on their profiles.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14 12:14:41 -08:00
Mark Bloch
fc9e4477f9 RDMA/mlx5: Fix memory leak in case we fail to add an IB device
Make sure the IB device is freed on failure.

Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11 15:32:29 -07:00
Yishai Hadas
0da4d48d99 IB/mlx5: Fix bad flow upon DEVX mkey creation
Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey
from the radix tree in case there was a previous failure to insert it.

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11 15:30:40 -07:00
Leon Romanovsky
21a428a019 RDMA: Handle PD allocations by IB/core
The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().

We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08 16:51:04 -07:00
Gal Pressman
af8b38ed0b IB/mlx5: Simplify WQE count power of two check
Use is_power_of_2() instead of hard coding it in the driver. While at it,
fix the meaningless error print.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07 13:14:55 -07:00
Bart Van Assche
bf3b4f066d IB/mlx5: Do not use hw_access_flags for be and CPU data
Avoid that sparse reports the following for the mlx5 driver:

drivers/infiniband/hw/mlx5/qp.c:2671:34: warning: invalid assignment: |=
drivers/infiniband/hw/mlx5/qp.c:2671:34:    left side has type restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2671:34:    right side has type int
drivers/infiniband/hw/mlx5/qp.c:2679:34: warning: invalid assignment: |=
drivers/infiniband/hw/mlx5/qp.c:2679:34:    left side has type restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2679:34:    right side has type int
drivers/infiniband/hw/mlx5/qp.c:2680:34: warning: invalid assignment: |=
drivers/infiniband/hw/mlx5/qp.c:2680:34:    left side has type restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2680:34:    right side has type int
drivers/infiniband/hw/mlx5/qp.c:2684:34: warning: invalid assignment: |=
drivers/infiniband/hw/mlx5/qp.c:2684:34:    left side has type restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2684:34:    right side has type int
drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: incorrect type in argument 1 (different base types)
drivers/infiniband/hw/mlx5/qp.c:2686:28:    expected unsigned int [usertype] val
drivers/infiniband/hw/mlx5/qp.c:2686:28:    got restricted __be32 [usertype]
drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32
drivers/infiniband/hw/mlx5/qp.c:2686:28: warning: cast from restricted __be32

This patch does not change any functionality.

Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-05 15:22:45 -07:00
Jason Gunthorpe
6a8a2aa62d Linux 5.0-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlxXYaEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkSQH/2yrfnviNPFYpZOR
 QQdc71Bfhkd8m85SmWIsSebkxmi3hKFVj15sGbWXd6+0/VxjEEGvQCZpvVwJceke
 LwDxtkKGg/74wAqJvlSAWxFNZ+Had4jDeoSoeQChddsBVXBBCxQx2v6ECg3o2x7W
 k8Z8t4+3RijDf8fYXY9ETyO2zW8R/wgT+dnl+DPgUH7u4dxh7FzAUfc4bgZIDg+i
 FzBQfbTJuz4BU7uRZ9IJiwhWKv0Iyi2DR3BY8Z1pqEpRaUMJMrCs2WGytHbTgt9e
 0EtO1airbVneU4eumU/ZaF9cyEbah9HousEPnP7J09WG4s/Odxc4zE+uK1QqS2im
 5Xv88is=
 =dVd1
 -----END PGP SIGNATURE-----

Merge tag 'v5.0-rc5' into rdma.git for-next

Linux 5.0-rc5

Needed to merge the include/uapi changes so we have an up to date
single-tree for these files. Patches already posted are also expected to
need this for dependencies.
2019-02-04 14:53:42 -07:00
Moni Shoua
6141f8fa5b IB/mlx5: Advertise XRC ODP support
Query all per transport caps for XRC and set the appropriate bits in the
per transport field of the advertised struct.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua
2e68daceac IB/mlx5: Advertise SRQ ODP support for supported transports
ODP support in SRQ is per transport capability. Based on device
capabilities set this flag in device structure for future queries.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua
08100fad5c IB/mlx5: Add ODP SRQ support
Add changes to the WQE page-fault handler to

1. Identify that the event is for a SRQ WQE
2. Pass SRQ object instead of a QP to the function that reads the WQE
3. Parse the SRQ WQE with respect to its structure

The rest is handled as for regular RQ WQE.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua
fbeb4075c6 IB/mlx5: Let read user wqe also from SRQ buffer
Reading a WQE from SRQ is almost identical to reading from regular RQ.
The differences are the size of the queue, the size of a WQE and buffer
location.

Make necessary changes to mlx5_ib_read_user_wqe() to let it read a WQE
from a SRQ or RQ by caller choice.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua
29917f4750 IB/mlx5: Add XRC initiator ODP support
Skip XRC segment in the beginning of a send WQE and fetch ODP XRC
capabilities when QP type is IB_QPT_XRC_INI. The rest of the handling is
the same as in RC QP.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:07 -07:00
Moni Shoua
6ff7414a17 IB/mlx5: Clean mlx5_ib_mr_responder_pfault_handler() signature
In the function mlx5_ib_mr_responder_pfault_handler()

1. The parameter wqe is used as read-only so there is no need to pass it
   by reference.
2. Remove the unused argument pfault from list of arguments.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
Moni Shoua
586f4e95c7 IB/mlx5: Remove useless check in ODP handler
When handling an ODP event for a receive WQE in SRQ the target QP is
unknown. Therefore, it is wrong to ask if QP has a SRQ in the page-fault
handler.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
Moni Shoua
10f56242e3 IB/mlx5: Fix the locking of SRQ objects in ODP events
QP and SRQ objects are stored in different containers so the action to get
and lock a common resource during ODP event needs to address that.

While here get rid of 'refcount' and 'free' fields in mlx5_core_srq struct
and use the fields with same semantics in common structure.

Fixes: 032080ab43 ("IB/mlx5: Lock QP during page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 14:34:06 -07:00
Parav Pandit
cf34e1fe52 IB/mlx5: Consider vlan of lower netdev for macvlan GID entries
When a given netdev of the GID entry is macvlan netdevice, and if the
lower netdevice is vlan device, GID entry for macvlan based IP address
needs to inherit the vlan of the lower netdevice.

Therefore, attempt to find out if the lower device exist and if so, if
it is vlan device and setup the vlan tag correctly.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 20:43:46 -07:00
Leon Romanovsky
459cc69fa4 RDMA: Provide safe ib_alloc_device() function
All callers to ib_alloc_device() provide a larger size than struct
ib_device and rely on the fact that struct ib_device is embedded in their
driver specific structure as the first member.

Provide a safer variant of ib_alloc_device() that checks and enforces this
approach to make sure the drivers are using it right.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:52:30 -07:00
Kamal Heib
e5c1bb47cc IB/mlx5: Remove set but not used variable
Remove 'del_mkey' variable that is set but not used.

Fixes: 534fd7aac5 ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:17:17 -07:00
Kamal Heib
f3ffed0ce4 IB/mlx5: Make mlx5_ib_stage_odp_cleanup() static
The function mlx5_ib_stage_odp_cleanup() is only used in main.c

Fixes: d5d284b829 ("{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30 15:17:17 -07:00
Jason Gunthorpe
55c293c38e Merge branch 'devx-async' into k.o/for-next
Yishai Hadas says:

Enable DEVX asynchronous query commands

This series enables querying a DEVX object in an asynchronous mode.

The userspace application won't block when calling the firmware and it will be
able to get the response back once that it will be ready.

To enable the above functionality:

- DEVX asynchronous command completion FD object was introduced.
- The applicable file operations were implemented to enable using it by
  the user application.
- Query asynchronous method was added to the DEVX object, it will call the
  firmware asynchronously and manages the response on the given input FD.
- Hot unplug support was added for the FD to work properly upon
  unbind/disassociate.
- mlx5 core fence for asynchronous commands was implemented and used to
  prevent racing upon unbind/disassociate.

This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

* branch 'devx-async':
  IB/mlx5: Implement DEVX hot unplug for async command FD
  IB/mlx5: Implement the file ops of DEVX async command FD
  IB/mlx5: Introduce async DEVX obj query API
  IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:49:31 -07:00
Yishai Hadas
eaebaf77e7 IB/mlx5: Implement DEVX hot unplug for async command FD
Implement DEVX hot unplug for the async command FD.

This is done by managing a list of the inflight commands and wait until
all launched work is completed as part of
devx_hot_unplug_async_cmd_event_file.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas
4accbb3fd2 IB/mlx5: Implement the file ops of DEVX async command FD
Implement the file ops of the DEVX async command FD, this enables using
the FD for reading the events and manage other options on the FD.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas
a124edba26 IB/mlx5: Introduce async DEVX obj query API
Introduce async DEVX obj query API to get the command response back to
user space once it's ready without blocking when calling the firmware.

The event's data includes a header with some meta data then the firmware
output command data.

The header includes:
- The input 'wr_id' to let application recognizing the response.

The input FD attribute is used to have the event data ready on.
Downstream patches from this series will implement the file ops to let
application read it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:33:00 -07:00
Yishai Hadas
6bf8f22aea IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation.

This object is from type class FD and will be used to read DEVX async
commands completion.

The core layer should allow the driver to set object from type FD in a
safe mode, this option was added with a matching comment in place.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29 13:32:43 -07:00
Mark Bloch
c1b03c25f5 RDMA/mlx5: Fix flow creation on representors
The intention of the flow_is_supported was to disable the entire tree and
methods that allow raw flow creation, but the grammar syntax has this
disable the entire UVERBS_FLOW object. Since the method requires a
MLX5_IB_OBJECT_FLOW_MATCHER there is no need to do anything, as it is
automatically disabled when matchers are disabled.

This restores the ability to create flow steering rules on representors
via regular verbs.

Fixes: a1462351b5 ("RDMA/mlx5: Fail early if user tries to create flows on IB representors")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25 11:58:06 -07:00
Moni Shoua
da6a496a34 IB/mlx5: Ranges in implicit ODP MR inherit its write access
A sub-range in ODP implicit MR should take its write permission from the
MR and not be set always to allow.

Fixes: d07d1d70ce ("IB/umem: Update on demand page (ODP) support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Bart Van Assche
0a353c2e94 IB/mlx5: Declare local functions 'static'
This patch avoids that sparse complains about missing function
declarations.

Fixes: c9990ab39b ("RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mm")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:30 -07:00
Greg Kroah-Hartman
73eb8f03f0 infiniband: mlx5: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24 09:22:29 -07:00
Jason Gunthorpe
e355477ed9 net/mlx5: Make mlx5_cmd_exec_cb() a safe API
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.

The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-24 14:25:26 +02:00
Maor Gottlieb
6113cc4401 IB/mlx5: Don't override existing ip_protocol
Two flow specifications can set the ip protocol field in
the flow table entry:

1) IB_FLOW_SPEC_TCP/UDP/GRE - set the ip protocol accordingly.
2) IB_FLOW_SPEC_IPV4/6 - has ip_protocol field for users
who want to receive specific L4 packets.

We need to avoid overriding of the ip_protocol with zeros,
in case that the user first put the L4 specification and
only then the L3.

Fixes: ca0d475385 ('IB/mlx5: Add support in TOS and protocol to flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:10:06 -07:00
Yishai Hadas
414556af5f IB/mlx5: Add support for ODP for DEVX indirection mkey
Add support for ODP for DEVX indirection mkey, it includes:
- Recognizing its type as part of the radix tree lookup.
- Use similar flow as done for the MW MKEY type.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:06:49 -07:00
Yishai Hadas
534fd7aac5 IB/mlx5: Manage indirection mkey upon DEVX flow for ODP
Manage indirection mkey upon DEVX flow to support ODP.

To support a page fault event on the indirection mkey it needs to be part
of the device mkey radix tree.

Both the creation and the deletion flows for a DEVX object which is
indirection mkey were adapted to handle that.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:06:49 -07:00
Yishai Hadas
fa31f14380 IB/mlx5: DEVX handling for indirection MKEY
Once an indirection MKEY is created umem valid bit shouldn't be set as
this MKEY doesn't really hold a umem.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 20:06:49 -07:00
Jason Gunthorpe
951d01b96f IB/mlx5: Fix how advise_mr() launches async work
Work must hold a kref on the ib_device otherwise the dev pointer can
become free before the work runs. This can happen because the work is
being pushed onto the system work queue which is not flushed during driver
unregister.

Remove the bogus use of 'reg_state':
 - While in uverbs the reg_state is guaranteed to always be
   REGISTERED
 - Testing reg_state with no locking is bogus. Use ib_device_try_get()
   to get back into a region that prevents unregistration.

For now continue with a flow that is similar to the existing code.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
2019-01-21 14:39:29 -07:00
Mark Bloch
8af526e035 RDMA/mlx5: Fix check for supported user flags when creating a QP
When the flags verification was added two flags were missed from the
check:
 * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
 * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

This causes user applications that were using these flags to break.

Fixes: 2e43bb31b8 ("IB/mlx5: Verify that driver supports user flags")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21 14:20:08 -07:00
Gustavo A. R. Silva
8e8aa14542 RDMA/mlx5: Replace kzalloc with kcalloc
Replace kzalloc() function with its 2-factor argument form, kcalloc().

This patch replaces cases of:

	kzalloc(a * b, gfp)

with:
	kcalloc(a, b, gfp)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-15 15:53:08 -07:00
Leon Romanovsky
73f5a82bb3 RDMA/mad: Reduce MAD scope to mlx5_ib only
Management Datagram Interface (MAD) is applicable
only when physical port is Infiniband. It makes MAD
command logic to be completely unrelated to eth/core
parts of mlx5.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-15 10:02:29 +02:00
Parav Pandit
5474723115 RDMA: Introduce and use rdma_device_to_ibdev()
Introduce and use rdma_device_to_ibdev() API for those drivers which are
registering one sysfs group and also use in ib_core.

In subsequent patch, device->provider_ibdev one-to-one mapping is no
longer holds true during accessing sysfs entries.
Therefore, introduce an API rdma_device_to_ibdev() that provides such
information.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 13:12:03 -07:00
Parav Pandit
ea4baf7f11 RDMA: Rename port_callback to init_port
Most provider routines are callback routines which ib core invokes.
_callback suffix doesn't convey information about when such callback is
invoked. Therefore, rename port_callback to init_port.

Additionally, store the init_port function pointer in ib_device_ops, so
that it can be accessed in subsequent patches when binding rdma device to
net namespace.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14 13:05:14 -07:00
Leon Romanovsky
8cbfaac3d0 RDMA: Clear PD objects during their allocation
As part of an audit process to update drivers to use rdma_restrack_add()
ensure that PD objects is cleared before access.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:08:52 -07:00
Jason Gunthorpe
b0ea0fa543 IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata
ib_umem_get() can only be called in a method callback, which always has a
udata parameter. This allows ib_umem_get() to derive the ucontext pointer
directly from the udata without requiring the drivers to find it in some
way or another.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-10 17:07:45 -07:00
Shamir Rabinovitch
6fa8f1afd3 IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to ib_uverbs
The next patch will add dependency from ib_umem_get in to ib_uverbs so
move the required ib_umem_xxx functionality to it's correct module -
ib_uverbs - and avoid circular dependecy from the form of ib_core ->
ib_uverbs -> ib_core in depmod.

Since this now requires all drivers to be build modular if uverbs is
modular, hoist the test a couple drivers had into the main kconfig and
apply it to all drivers uniformly.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10 17:06:44 -07:00
Leon Romanovsky
13859d5df4 RDMA/mlx5: Embed into the code flow the ODP config option
Convert various places to more readable code, which embeds
CONFIG_INFINIBAND_ON_DEMAND_PAGING into the code flow.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
8b4d5bc5cf RDMA/mlx5: Introduce and reuse helper to identify ODP MR
Consolidate various checks if MR is ODP backed to one simple helper and
update call sites to use it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
e502b8b011 RDMA/core: Don't depend device ODP capabilities on kconfig option
Device capability bits are exposing what specific device supports from HW
perspective. Those bits are not dependent on kernel configurations and
RDMA/core should ensure that proper interfaces to users will be disabled
if CONFIG_INFINIBAND_ON_DEMAND_PAGING is not set.

Fixes: f4056bfd8c ("IB/core: Add on demand paging caps to ib_uverbs_ex_query_device")
Fixes: 8cdd312cfe ("IB/mlx5: Implement the ODP capability query verb")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
96f87ee181 RDMA: Clean structures from CONFIG_INFINIBAND_ON_DEMAND_PAGING
CONFIG_INFINIBAND_ON_DEMAND_PAGING is used in general structures to
micro-optimize the memory footprint. Remove it, so it will allow us to
simplify various ODP device flows.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08 16:41:38 -07:00
Leon Romanovsky
ccffa54548 Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"
Longer term testing shows this patch didn't play well with MR cache and
caused to call traces during remove_mkeys().

This reverts commit bb7e22a8ab.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Yishai Hadas
7422edce73 IB/mlx5: Allow XRC INI usage via verbs in DEVX context
From device point of view both XRC target and initiator are XRC transport
type.

Fix to use the expected UID as was handled for the XRC target case to
allow its usage via verbs in DEVX context.

Fixes: 5aa3771ded ("IB/mlx5: Allow XRC usage via verbs in DEVX context")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02 09:40:34 -07:00
Linus Torvalds
5d24ae67a9 4.21 merge window pull request
This has been a fairly typical cycle, with the usual sorts of driver
 updates. Several series continue to come through which improve and
 modernize various parts of the core code, and we finally are starting to
 get the uAPI command interface cleaned up.
 
 - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5,
   qib, rxe, usnic
 
 - Rework the entire syscall flow for uverbs to be able to run over
   ioctl(). Finally getting past the historic bad choice to use write()
   for command execution
 
 - More functional coverage with the mlx5 'devx' user API
 
 - Start of the HFI1 series for 'TID RDMA'
 
 - SRQ support in the hns driver
 
 - Support for new IBTA defined 2x lane widths
 
 - A big series to consolidate all the driver function pointers into
   a big struct and have drivers provide a 'static const' version of the
   struct instead of open coding initialization
 
 - New 'advise_mr' uAPI to control device caching/loading of page tables
 
 - Support for inline data in SRPT
 
 - Modernize how umad uses the driver core and creates cdev's and sysfs
   files
 
 - First steps toward removing 'uobject' from the view of the drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwhV2oACgkQOG33FX4g
 mxpF8A/9EkRCg6wCDC59maA53b5PjuNmD//9hXbycQPQSlxntI2PyYtxrzBqc0+2
 yIaFFMehL41XNN6y1zfkl7ndl62McCH2TpiidU8RyTxVw/e3KsDD5sU6++atfHRo
 M82RNfedDtxPG8TcCPKVLof6JHADApGSR1r4dCYfAnu7KFMyvlLmeYyx4r/2E6yC
 iQPmtKVOdbGkuWGeX+brGEA0vg7FUOAvaysnxddjyh9hyem4h0SUR3Af/Ik0N5ME
 PYzC+hMKbkPVBLoCWyg7QwUaqK37uWwguMQLtI2byF7FgbiK/lBQt6TsidR4Fw3p
 EalL7uqxgCTtLYh918vxLFjdYt6laka9j7xKCX8M8d06sy/Lo8iV4hWjiTESfMFG
 usqs7D6p09gA/y1KISji81j6BI7C92CPVK2drKIEnfyLgY5dBNFcv9m2H12lUCH2
 NGbfCNVaTQVX6bFWPpy2Bt2y/Litsfxw5RviehD7jlG0lQjsXGDkZzsDxrMSSlNU
 S79iiTJyK4kUZkXzrSSlN58pLBlbupJwm5MDjKmM+irsrsCHjGIULvc902qtnC3/
 8ImiTtW6XvqLbgWXyy2Th8/ZgRY234p1ybhog+DFaGKUch0XqB7VXTV2OZm0GjcN
 Fp4PUeBt+/gBgYqjpuffqQc1rI4uwXYSoz7wq9RBiOpw5zBFT1E=
 =T0p1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a fairly typical cycle, with the usual sorts of driver
  updates. Several series continue to come through which improve and
  modernize various parts of the core code, and we finally are starting
  to get the uAPI command interface cleaned up.

   - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
     mlx5, qib, rxe, usnic

   - Rework the entire syscall flow for uverbs to be able to run over
     ioctl(). Finally getting past the historic bad choice to use
     write() for command execution

   - More functional coverage with the mlx5 'devx' user API

   - Start of the HFI1 series for 'TID RDMA'

   - SRQ support in the hns driver

   - Support for new IBTA defined 2x lane widths

   - A big series to consolidate all the driver function pointers into a
     big struct and have drivers provide a 'static const' version of the
     struct instead of open coding initialization

   - New 'advise_mr' uAPI to control device caching/loading of page
     tables

   - Support for inline data in SRPT

   - Modernize how umad uses the driver core and creates cdev's and
     sysfs files

   - First steps toward removing 'uobject' from the view of the drivers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
  RDMA/srpt: Use kmem_cache_free() instead of kfree()
  RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
  IB/uverbs: Signedness bug in UVERBS_HANDLER()
  IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
  IB/umad: Start using dev_groups of class
  IB/umad: Use class_groups and let core create class file
  IB/umad: Refactor code to use cdev_device_add()
  IB/umad: Avoid destroying device while it is accessed
  IB/umad: Simplify and avoid dynamic allocation of class
  IB/mlx5: Fix wrong error unwind
  IB/mlx4: Remove set but not used variable 'pd'
  RDMA/iwcm: Don't copy past the end of dev_name() string
  IB/mlx5: Fix long EEH recover time with NVMe offloads
  IB/mlx5: Simplify netdev unbinding
  IB/core: Move query port to ioctl
  RDMA/nldev: Expose port_cap_flags2
  IB/core: uverbs copy to struct or zero helper
  IB/rxe: Reuse code which sets port state
  IB/rxe: Make counters thread safe
  IB/mlx5: Use the correct commands for UMEM and UCTX allocation
  ...
2018-12-28 14:57:10 -08:00
Dan Carpenter
58f7c0bfb4 RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
The "num_actions" variable needs to be signed for the error handling to
work.  The maximum number of actions is less than 256 so int type is large
enough for that.

Fixes: cbfdd442c4 ("IB/uverbs: Add helper to get array size from ptr attribute")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-22 16:07:13 -07:00
Yishai Hadas
aa74be6eea IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
The per-port Q counter is some kernel resource and as such may be used by
few UID(s) upon DEVX usage.

To enable using it for QP/RQ when DEVX context is used need to allocate it
with a sharing mode indication to let firmware allows its usage.

The UID = 0xffff was chosen to mark it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21 12:15:07 -07:00
Jason Gunthorpe
623d154305 IB/mlx5: Fix wrong error unwind
The destroy_workqueue on error unwind is missing, and the code jumps to
the wrong exit label.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 20:49:31 -07:00
Huy Nguyen
bb7e22a8ab IB/mlx5: Fix long EEH recover time with NVMe offloads
On NVMe offloads connection with many IO queues, EEH takes long time to
recover. The culprit is the synchronize_srcu in the destroy_mkey. The
solution is to use synchronize_srcu only for ODP mkey.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 16:28:47 -07:00
Or Gerlitz
842a9c837e IB/mlx5: Simplify netdev unbinding
When dealing with netdev unregister events, we just need to know that this
is our currently bounded netdev. There's no need to do any further
checks/queries.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 15:18:24 -07:00
Yishai Hadas
6e3722baac IB/mlx5: Use the correct commands for UMEM and UCTX allocation
During testing the command format was changed to close a security
hole. Revise the driver to use the command format that will actually be
supported in GA firmware.

Both the UMEM and UCTX are intended only for use by the kernel and cannot
be executed using a general command.

Since the UMEM and CTX are not part of the general object the caps bits
were moved to be some log_xxx location in the general HCA caps.

The firmware code was adapted as well to match the above.

Fixes: a8b92ca1b0 ("IB/mlx5: Introduce DEVX")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 13:49:48 -07:00
Yishai Hadas
425518cc5e IB/mlx5: Use uid as part of alloc/dealloc transport domain
Use uid as part of alloc/dealloc transport domain to let firmware manages
the resources correctly.

Fixes: d2d19121ae ("IB/mlx5: Set uid as part of TD commands")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20 13:39:41 -07:00
Jason Gunthorpe
ed50edfb72 Merge branch 'mlx5-next' into rdma.git
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5 updates taken for dependencies on following patches.

* branche 'mlx5-next': (23 commits)
  IB/mlx5: Introduce uid as part of alloc/dealloc transport domain
  net/mlx5: Add shared Q counter bits
  net/mlx5: Continue driver initialization despite debugfs failure
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c
  net/mlx5: Remove the get protocol device interface entry
  net/mlx5: Support extended destination format in flow steering command
  net/mlx5: E-Switch, Change vhca id valid bool field to bit flag
  net/mlx5: Introduce extended destination fields
  net/mlx5: Revise gre and nvgre key formats
  net/mlx5: Add monitor commands layout and event data
  net/mlx5: Add support for plugged-disabled cable status in PME
  net/mlx5: Add support for PCIe power slot exceeded error in PME
  net/mlx5: Rework handling of port module events
  net/mlx5: Move flow counters data structures from flow steering header
  ...
2018-12-20 13:24:50 -07:00
David S. Miller
2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Gal Pressman
2553ba217e RDMA: Mark if destroy address handle is in a sleepable context
Introduce a 'flags' field to destroy address handle callback and add a
flag that marks whether the callback is executed in an atomic context or
not.

This will allow drivers to wait for completion instead of polling for it
when it is allowed.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:28:03 -07:00
Gal Pressman
b090c4e3a0 RDMA: Mark if create address handle is in a sleepable context
Introduce a 'flags' field to create address handle callback and add a flag
that marks whether the callback is executed in an atomic context or not.

This will allow drivers to wait for completion instead of polling for it
when it is allowed.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19 16:17:19 -07:00
Doug Ledford
c9e585ebdc IB/mlx5: Fix compile issue when ODP disabled
When CONFIG_INFINIBAND_ON_DEMAND_PAGING is not enabled, we were getting
build failures for defined but not used code.  Fix that.

Fixes: 813e90b1ae ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-19 13:43:17 -05:00
Shamir Rabinovitch
e00b64f7c5 RDMA: Cleanup undesired pd->uobject usage
Drivers should be using udata to determine if a method is invoked from
user space or kernel space. A pd does not necessarily say a different
objects is kernel or user.

Transforming the tests to use udata eliminates a large number of uobject
references from the drivers.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 19:15:48 -07:00
Moni Shoua
813e90b1ae IB/mlx5: Add advise_mr() support
The verb advise_mr() is used to give advice to the kernel about an address
range that belongs to a MR.  Implement the verb and register it on the
device. The current implementation supports the only known advice to date,
prefetch.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 15:26:19 -07:00
Moni Shoua
cbfdd442c4 IB/uverbs: Add helper to get array size from ptr attribute
When the parser of an ioctl command has the knowledge that a ptr attribute
in a bundle represents an array of structures, it is useful for it to know
the number of elements in the array. This is done by dividing the
attribute length with the element size.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 15:19:45 -07:00
Gal Pressman
ac2f7e623d RDMA/mlx5: Fix function name typo 'fileds' -> 'fields'
Fix typo in 'set_mr_fileds' -> 'set_mr_fields'.

Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 14:15:24 -07:00
Leon Romanovsky
8e3b688301 RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion
Handle atomic was left as unimplemented from 2013, remove the code till
this part will be developed.

Remove the dead code by simplifying SW completion logic which is supposed
to be the same for send and receive paths.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # compile tested
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18 13:59:34 -07:00
Aviv Heller
7c34ec19e1 net/mlx5: Make RoCE and SR-IOV LAG modes explicit
With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:55 -08:00
Saeed Mahameed
64e4cf0dab Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:
1) Lag refactroing and flow counter affinity bits.
2) mlx5 core cleanups

By Roi Dayan (2) and others
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 11:15:25 -08:00
Mark Bloch
06cc74af05 IB/mlx5: Unify e-switch representors load approach between uplink and VFs
When in switchdev mode and the add function is called by the core
level driver, make sure we only register the callbacks, but don't
create the mlx5 IB device or initialize anything. With this change
all the IB devices in switchdev mode are created only once the load
callback is invoked by the e-switch core sub-module. This follows
the design paradigm under which the all the Eth representors must
be loaded before any of IB reprs is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Kamal Heib
3023a1e936 RDMA: Start use ib_device_ops
Make all the required change to start use the ib_device_ops structure.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:16 -07:00
Kamal Heib
96458233ee RDMA/mlx5: Initialize ib_device_ops struct
Initialize ib_device_ops with the supported operations using
ib_set_device_ops().

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12 07:40:13 -07:00
Yuval Shaia
59590b8ad2 IB/{mlx5,ocrdma,qedr,rxe}: Omit port validation from IB verbs
RDMA core layer already make sure port is valid, no need to check it here
again.

For the pkey validation this depends on commit b3ac5742fead ("RDMA/core:
Validate port number in query_pkey verb")

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Leon Romanovsky
a1462351b5 RDMA/mlx5: Fail early if user tries to create flows on IB representors
IB representors don't support creation of RAW ethernet QP flows.  Disable
them by reusing existing RDMA/core support macros.  We do it for both
creation and matcher because latter is not usable if no flow creation is
available.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
YueHaibing
f94e02ddfd IB/mlx5: Remove duplicated include from mlx5_ib.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Michael Guralnik
d764970bce IB/mlx5: Add 2X width support to query_port
Report to the user 2x width over MAD interface.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 14:38:16 -07:00
Jason Gunthorpe
28ab1bb0e8 Linux 4.20-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwNpb0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwGwH/00UHnXfxww3ixxz
 zwTVDzptA6SPm6s84yJOWatM5fXhPiAltZaHSYF9lzRzNU71NCq7Frhq3fQUIXKM
 OxqDn9nfSTWcjWTk2q5n2keyRV/KIn67YX7UgqFc1bO/mqtVjEgNWaMyblhI+e9E
 giu1ZXayHr43jK1cDOmGExZubXUq7Vsc9TOlrd+d2SwIqeEP7TCMrPhnHDwCNvX2
 UU5dtANpVzGtHaBcr37wJj+L8kODCc0f+PQ3g2ar5jTHst5SLlHp2u0AMRnUmgdi
 VkGx+mu/uk8mtwUqMIMqhplklVoqK6LTeLqsY5Xt32SKruw9UqyJGdphLjW2QP/g
 MkmA1lI=
 =7kaD
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc6' into rdma.git for-next

For dependencies in following patches.
2018-12-11 14:24:57 -07:00
Michael Guralnik
b874155a5f IB/mlx5: Add HDR speed support to query port
Report HDR speed when HDR is supported in CapabilityMask2 and the actual
speed is HDR.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 13:22:46 -07:00
Michael Guralnik
4106a758f7 IB/mlx5: Report CapabilityMask2 in ib_query_port
CapabilityMask2 exists when IB_PORT_CAP_MASK2_SUP is set in the original
capability mask. In such cases, query its value and report it in query
port.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 13:22:45 -07:00
Michael Guralnik
a5a5d19936 IB/core: Add new IB rates
Add the new rates that were added to Infiniband spec as part of HDR and 2x
support.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11 13:22:45 -07:00
Saeed Mahameed
2f62747c77 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:

1) RDMA ODP  (On Demand Paging) improvements and moving ODP logic to
mlx5 RDMA driver
2) Improved mlx5 core driver and device events handling and provided API
for upper layers to subscribe to device events.
3) RDMA only code cleanup from mlx5 core
4) Add helper to get CQE opcode
5) Rework handling of port module events
6) shared mlx5_ifc.h updates to avoid conflicts

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:50:50 -08:00
Oz Shlomo
5886a96ad1 net/mlx5: Revise gre and nvgre key formats
GRE RFC defines a 32 bit key field. NVGRE RFC splits the 32 bit
key field to 24 bit VSID (gre_key_h) and 8 bit flow entropy (gre_key_l).

Define the two key parsing alternatives in a union, thus enabling both
access methods.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 14:00:08 -08:00
Tariq Toukan
bdefffd13b IB/mlx5: Use helper to get CQE opcode
Use the new helper that extracts the opcode
from a CQE (completion queue entry) structure.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-09 18:16:16 -08:00
Jason Gunthorpe
fe15bcc6e2 Merge branch 'mlx5-packet-credit-fc' into rdma.git
Danit Goldberg says:

Packet based credit mode

Packet based credit mode is an alternative end-to-end credit mode for QPs
set during their creation. Credits are transported from the responder to
the requester to optimize the use of its receive resources.  In
packet-based credit mode, credits are issued on a per packet basis.

The advantage of this feature comes while sending large RDMA messages
through switches that are short in memory.

The first commit exposes QP creation flag and the HCA capability. The
second commit adds support for a new DV QP creation flag. The last commit
report packet based credit mode capability via the MLX5DV device
capabilities.

* branch 'mlx5-packet-credit-fc':
  IB/mlx5: Report packet based credit mode device capability
  IB/mlx5: Add packet based credit mode support
  net/mlx5: Expose packet based credit mode

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 13:25:12 -07:00
Danit Goldberg
7e11b911b5 IB/mlx5: Report packet based credit mode device capability
Report packet based credit mode capability via the mlx5 DV interface.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 13:22:09 -07:00
Danit Goldberg
569c665150 IB/mlx5: Add packet based credit mode support
The device can support two credit modes, message based (default) and
packet based. In order to enable packet based mode, the QP should be
created with special flag that indicates this.

This patch adds support for the new DV QP creation flag that can be used
for RC QPs in order to change the credit mode.

Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07 13:22:09 -07:00
Alex Vesker
419822c8b8 IB/mlx5: Enable TX on a DEVX flow table
Flow table can be passed as a DEVX object which is a valid destination in
an EGRESS flow. Fix the original code to allow that.

Fixes: a7ee18bdee ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 20:33:09 -07:00
Yishai Hadas
47f07f03b5 IB/mlx5: Block DEVX umem from the non applicable cases
Blocks creating a DEVX UMEM with the non applicable access flags
as of ODP, MW_BIND, etc.

Specifically when an ODP flag is used below WARN call trace is issued.

[ 2510.404131] RIP: 0010:__mlx5_ib_populate_pas+0x207/0x220 [mlx5_ib]
...
[ 2510.404143] Call Trace:
[ 2510.404150]  ? __kmalloc_node+0x1b3/0x280
[ 2510.404156]  ? _uverbs_alloc+0x63/0x90 [ib_uverbs]
[ 2510.404158]  ? _uverbs_alloc+0x63/0x90 [ib_uverbs]
[ 2510.404162]  mlx5_ib_populate_pas+0x53/0x60 [mlx5_ib]
[ 2510.404167]  mlx5_ib_handler_MLX5_IB_METHOD_DEVX_UMEM_REG+0x273/0x3f0 [mlx5_ib]

Fixes: aeae94579c ("IB/mlx5: Add DEVX support for memory registration")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-06 11:38:14 -05:00
Yishai Hadas
5aa3771ded IB/mlx5: Allow XRC usage via verbs in DEVX context
Allows XRC usage from the verbs flow in a DEVX context.
As XRCD is some shared kernel resource between processes it should be
created with UID=0 to point on that.

As a result once XRC QP/SRQ are created they must be used as well with
UID=0 so that firmware will allow the XRCD usage.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04 13:46:42 -05:00
Yishai Hadas
719598c98d IB/mlx5: Update the supported DEVX commands
Update the supported DEVX commands, it includes adding to the
query/modify command's list and to the encoding handling.

In addition, a valid range for general commands was added to be used for
future commands.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04 13:46:42 -05:00
Yishai Hadas
fb98153bbf IB/mlx5: Enforce DEVX privilege by firmware
Enforce DEVX privilege by firmware, this enables future device
functionality without the need to make driver changes unless a new
privilege type will be introduced.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04 13:46:42 -05:00
Yishai Hadas
34613eb1d2 IB/mlx5: Enable modify and query verbs objects via DEVX
Enables modify and query verbs objects via the DEVX interface.
To support this the above DEVX handlers were changed to get any
object type via the UVERBS_IDR_ANY_OBJECT mechanism.

The type checking and handling is done per object as part of the
driver code.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04 13:46:42 -05:00
Doug Ledford
f33cb7e760 Merge 'mlx5-next' into mlx5-devx
The enhanced devx support series needs commit:
9d43faac02 ("net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits")

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-04 13:36:57 -05:00
Leon Romanovsky
36ff48805a RDMA/mlx5: Unfold modify RMP function
There is no need to perform modify_rmp in two separate function,
while one of them uses stack as a placeholder for data while other
allocates it dynamically. Combine those two functions to one call
instead of two.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:26:00 +02:00
Leon Romanovsky
a1eb180238 RDMA/mlx5: Unfold create RMP function
There is no need to perform create_rmp in two separate function, while
one of them uses stack as a placeholder for data while other allocates
it dynamically. Combine those two functions to one instead of two.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:25:55 +02:00
Leon Romanovsky
f3da6577da RDMA/mlx5: Initialize SRQ tables on mlx5_ib
Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:25:50 +02:00
Leon Romanovsky
b4990804e1 RDMA/mlx5: Update SRQ functions signatures to mlx5_ib format
Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by
updating function signatures do not require mlx5_core_dev as an input,
because all operations in mlx5_ib are supposed to use mlx5_ib_dev.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:25:45 +02:00
Leon Romanovsky
81773ce5f0 RDMA/mlx5: Use stages for callback to setup and release DEVX
Reuse existing infrastructure to initialize and release DEVX uid.
The DevX interface is intended for user space access, so it is supposed
to be initialized before ib_register_device(). Also it isn't supported
in switchdev mode and don't need to initialize it in that mode.

Fixes: 76dc5a8406 ("IB/mlx5: Manage device uid for DEVX white list commands")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:23:53 +02:00
Leon Romanovsky
c48d386b2b RDMA/mlx5: Remove SRQ signature global flag
SRQ signature is not supported, hence no need for special static
global variable to announce it.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:14:37 +02:00
Leon Romanovsky
f02d0d6e53 net/mlx5: Move SRQ functions to RDMA part
There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:14:30 +02:00
Leon Romanovsky
6cd0014ab9 net/mlx5: Align SRQ licenses and copyright information
Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04 09:14:09 +02:00
Artemy Kovalyov
37b06e5078 IB/mlx5: Fix implicit ODP interrupted page fault
Since any page fault may be interrupted by a MMU invalidation and implicit
leaf MR may be released during this process. The check for parent value
is unreliable condition for an implicit MR.
Use other condition that we can rely on to determine if MR is implicit.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03 16:52:53 -05:00
Saeed Mahameed
09e574fa76 IB/mlx5: Handle raw delay drop general event
Handle FW general event rq delay drop as it was received from FW via mlx5
notifiers API, instead of handling the processed software version of that
event. After this patch we can safely remove all software processed FW
events types and definitions.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-29 16:40:32 -08:00
Saeed Mahameed
134e9349ec IB/mlx5: Handle raw port change event rather than the software version
Use the FW version of the port change event as forwarded via new mlx5
notifiers API.

After this patch, processed software version of the port change event
will become deprecated and will be totally removed in downstream
patches.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-29 16:40:31 -08:00
Saeed Mahameed
df097a278c IB/mlx5: Use the new mlx5 core notifier API
Remove the deprecated mlx5_interface->event mlx5_ib callback and use new
mlx5 notifier API to subscribe for mlx5 events.

For native mlx5_ib devices profiles pf_profile/nic_rep_profile register
the notifier callback mlx5_ib_handle_event which treats the notifier
context as mlx5_ib_dev.

For vport repesentors, don't register any notifier, same as before, they
didn't receive any mlx5 events.

For slave port (mlx5_ib_multiport_info) register a different notifier
callback mlx5_ib_event_slave_port, which knows that the event is coming
for mlx5_ib_multiport_info and prepares the event job accordingly.
Before this on the event handler work we had to ask mlx5_core if this is
a slave port mlx5_core_is_mp_slave(work->dev), now it is not needed
anymore.
mlx5_ib_multiport_info notifier registration is done on
mlx5_ib_bind_slave_port and de-registration is done on
mlx5_ib_unbind_slave_port.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-29 16:40:31 -08:00
Guy Levi
34f4c9554d IB/mlx5: Use fragmented QP's buffer for in-kernel users
The current implementation of create QP requires contiguous memory, such a
requirement is problematic once the memory is fragmented or the system is
low in memory, it causes failures in dma_zalloc_coherent().

This patch takes advantage of the new mlx5_core API which allocates a
fragmented buffer. This makes the QP creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.

We also use the opportunity to fix some cosmetic legacy coding convention
errors which were in the feature scope.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29 17:12:13 -07:00
Guy Levi
20e5a59b2e IB/mlx5: Use fragmented SRQ's buffer for in-kernel users
The current implementation of create SRQ requires contiguous memory, such
a requirement is problematic once the memory is fragmented or the system
is low in memory, it causes failures in dma_zalloc_coherent().

This patch takes the advantage of the new mlx5_core API which allocates a
fragmented buffer, and makes the SRQ creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29 17:12:13 -07:00
Mark Bloch
bfc5d83918 RDMA/mlx5: Attach a DEVX counter via raw flow creation
Allow a user to attach a DEVX counter via mlx5 raw flow creation. In order
to attach a counter we introduce a new attribute:

MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX

A counter can be attached to multiple flow steering rules.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29 16:51:33 -07:00
Leon Romanovsky
7bca603a69 RDMA/mlx5: Initialize return variable in case pagefault was skipped
Pagefaults occurred in non-ODP MR are completely valid events, so
initialize return variable to 0.

Fixes: 4d5422a309 ("IB/mlx5: Skip non-ODP MR when handling a page fault")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29 15:16:45 -07:00
Jason Gunthorpe
15a1b4becb RDMA/uverbs: Do not pass ib_uverbs_file to ioctl methods
The uverbs_attr_bundle already contains this pointer, and most methods
don't actually need it. Get rid of the redundant function argument.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26 16:48:07 -07:00
Jason Gunthorpe
8313c10fa8 RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for write
Now that we can add meta-data to the description of write() methods we
need to pass the uverbs_attr_bundle into all write based handlers so
future patches can use it as a container for any new data transferred out
of the core.

This is the first step to bringing the write() and ioctl() methods to a
common interface signature.

This is a simple search/replace, and we push the attr down into the uobj
and other APIs to keep changes minimal.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26 16:48:07 -07:00
Artemy Kovalyov
75b7b86bdb IB/mlx5: Fix page fault handling for MW
Memory windows are implemented with an indirect MKey, when a page fault
event comes for a MW Mkey we need to find the MR at the end of the list of
the indirect MKeys by iterating on all items from the first to the last.

The offset calculated during this process has to be zeroed after the first
iteration or the next iteration will start from a wrong address, resulting
incorrect ODP faulting behavior.

Fixes: db570d7dea ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26 16:28:36 -07:00
Artemy Kovalyov
4d5422a309 IB/mlx5: Skip non-ODP MR when handling a page fault
It is possible that we call pagefault_single_data_segment() with a MKey
that belongs to a memory region which is not on demand (i.e. pinned
pages). This can happen if, for instance, a WQE that points to multiple
MRs where some of them are ODP MRs and some are not.  In this case we
don't need to handle this MR in the ODP context besides reporting success.

Otherwise the code will call pagefault_mr() which will do to_ib_umem_odp()
on a non-ODP MR and thus access out of bounds.

Fixes: 7bdf65d411 ("IB/mlx5: Handle page faults")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26 16:27:29 -07:00
Jason Gunthorpe
36e235c882 RDMA/mlx5: Use the uapi disablement APIs instead of code
Rely on UAPI_DEF_IS_OBJ_SUPPORTED instead of manipulating the contents of
the driver's definition list.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22 11:57:32 -07:00
Jason Gunthorpe
0cbf432db4 RDMA/uverbs: Use a linear list to describe the compiled-in uapi
The 'tree' data structure is very hard to build at compile time, and this
makes it very limited. The new radix tree based compiler can handle a more
complex input language that does not require the compiler to perfectly
group everything into a neat tree structure.

Instead use a simple list to describe to input, where the list elements
can be of various different 'opcodes' instructing the radix compiler what
to do. Start out with opcodes chaining to other definition lists and
chaining to the existing 'tree' definition.

Replace the very top level of the 'object tree' with this list type and
get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22 11:57:32 -07:00
Jason Gunthorpe
dfb631a187 RDMA/mlx5: Do not generate the uabi specs unconditionally
For DM there is no reason not to add the spec for the START_OFFSET, if DM
is not supported then ib_dev.alloc_dm is already set to NULL which ensures
we do not call the method.

For IPSEC, the core code should be setting ib_dev.create_flow_action_esp
to NULL to disable it, not relying on wonky manipulation of the specs.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-22 11:57:31 -07:00
Jason Gunthorpe
8742902475 Merge branch 'mlx5-next' into rdma.git
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5 updates taken for dependencies on later ODP patches.

Conflict resolved by deleting mlx5_ib_get_vector_affinity()

* branch 'mlx5-next': (21 commits)
  net/mlx5: EQ, Make EQE access methods inline
  {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA
  net/mlx5: EQ, Generic EQ
  net/mlx5: EQ, Different EQ types
  net/mlx5: EQ, Privatize eq_table and friends
  net/mlx5: EQ, irq_info and rmap belong to eq_table
  net/mlx5: EQ, Create all EQs in one place
  net/mlx5: EQ, Move all EQ logic to eq.c
  net/mlx5: EQ, Remove redundant completion EQ list lock
  net/mlx5: EQ, No need to store eq index as a field
  net/mlx5: EQ, Remove unused fields and structures
  net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
  IB/mlx5: Improve ODP debugging messages
  net/mlx5: Use multi threaded workqueue for page fault handling
  net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
  IB/mlx5: Lock QP during page fault handling
  net/mlx5: Enumerate page fault types
  net/mlx5: Add interface to hold and release core resources
  net/mlx5: Release resource on error flow
  net/mlx5: Fix offsets of ifc reserved fields
  ...

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21 14:29:40 -07:00
Artemy Kovalyov
5ec0304cdc IB/mlx5: Allow modify AV in DCI QP to RTR
This is required so the user can set the SL on the DC QP.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21 14:27:23 -07:00
Michael Guralnik
db7a691a15 IB/mlx5: Avoid load failure due to unknown link width
If the firmware reports a connection width that is not 1x, 4x, 8x or 12x
it causes the driver to fail during initialization.

To prevent this failure every time a new width is introduced to the RDMA
stack, we will set a default 4x width for these widths which ar unknown to
the driver.

This is needed to allow to run old kernels with new firmware.

Cc: <stable@vger.kernel.org> # 4.1
Fixes: 1b5daf11b0 ("IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode")
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21 14:18:02 -07:00
Yonatan Cohen
13f8d9c166 IB/mlx5: Fix XRC QP support after introducing extended atomic
Extended atomics are supported with RC and XRC QP types, but the commit
citied in the Fixes line added an unneeded check to
to_mlx5_access_flags. This broke XRC QPs.

The following ib_atomic_bw invocation over XRC reproduces the issue:
   ib_atomic_bw -d mlx5_1 --connection=XRC --atomic_type=FETCH_AND_ADD

It is safe to remove such checks because the QP type was already checked
in ib_modify_qp_is_ok(), which was previously called from
mlx5_ib_modify_qp.

Fixes: a60109dc9a ("IB/mlx5: Add support for extended atomic operations")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21 14:15:14 -07:00
Majd Dibbiny
074fca3a18 RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR
Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the
current fence will be SMALL instead of Normal Fence.

Without this patch krping doesn't work on CX-5 devices and throws
following error:

The error messages are from CX5 driver are: (from server side)
[ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe
[ 710.434016] 00000000 00000000 00000000 00000000
[ 710.434016] 00000000 00000000 00000000 00000000
[ 710.434017] 00000000 00000000 00000000 00000000
[ 710.434018] 00000000 93003204 100000b8 000524d2
[ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32

Fixed the logic to set the correct fence type.

Fixes: 6e8484c5cf ("RDMA/mlx5: set UMR wqe fence according to HCA cap")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-21 14:03:00 -07:00
Saeed Mahameed
d5d284b829 {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA
Use the new generic EQ API to move all ODP RDMA data structures and logic
form mlx5 core driver into mlx5_ib driver.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:07:33 +02:00
Saeed Mahameed
f2f3df5501 net/mlx5: EQ, Privatize eq_table and friends
Move unnecessary EQ table structures and declaration from the
public include/linux/mlx5/driver.h into the private area of mlx5_core
and into eq.c/eq.h.

Introduce new mlx5 EQ APIs:

mlx5_comp_vectors_count(dev);
mlx5_comp_irq_get_affinity_mask(dev, vector);

And use them from mlx5_ib or mlx5e netdevice instead of direct access to
mlx5_core internal structures.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:54 +02:00
Moni Shoua
b02394aa75 IB/mlx5: Improve ODP debugging messages
Add and modify debug messages to ODP related error flows.
In that context, return code EAGAIN is considered less severe and print
level for it is set debug instead of warn.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12 22:26:02 +02:00
Moni Shoua
032080ab43 IB/mlx5: Lock QP during page fault handling
When page fault event for a WQE arrives, the event data contains the
resource (e.g. QP) number which will later be used by the page fault
handler to retrieve the resource. Meanwhile, another context can destroy
the resource and cause use-after-free. To avoid that, take a reference on the
resource when handler starts and release it when it ends.

Page fault events for RDMA operations don't need to be protected because
the driver doesn't need to access the QP in the page fault handler.

Fixes: d9aaed8387 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12 22:21:01 +02:00
Sagi Grimberg
9afc97c29b mlx5: remove support for ib_get_vector_affinity
Devices that does not use managed affinity can not export a vector
affinity as the consumer relies on having a static mapping it can map to
upper layer affinity (e.g. sw queues). If the driver allows the user to
set the device irq affinity, then the affinitization of a long term
existing entites is not relevant.

For example, nvme-rdma controllers queue-irq affinitization is determined
at init time so if the irq affinity changes over time, we are no longer
aligned.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-08 14:22:53 -07:00
Linus Torvalds
da19a102ce First merge window pull request
This has been a smaller cycle with many of the commits being smallish code
 fixes and improvements across the drivers.
 
 - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe
 
 - Memory window support in hns
 
 - mlx5 user API 'flow mutate/steering' allows accessing the full packet
   mangling and matching machinery from user space
 
 - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and
   provide options to use devx with less privilege
 
 - Modernize the use of syfs and the device interface to use attribute groups
   and cdev properly for uverbs, and clean up some of the core code's device list
   management
 
 - More progress on net namespaces for RDMA devices
 
 - Consolidate driver BAR mmapping support into core code helpers and rework
   how RDMA holds poitners to mm_struct for get_user_pages cases
 
 - First pass to use 'dev_name' instead of ib_device->name
 
 - Device renaming for RDMA devices
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlvR7dUACgkQOG33FX4g
 mxojiw//a9GU5kq4IZ3LNAEio/3Ql/NHRF0uie5tSzJgipRJA1Ln9zW0Cm1S/ms1
 VCmaSJ3l3q3GC4i3tIlsZSIIkN5qtjv/FsT/i+TZwSJYx9BDpPbzWtG6Mp4PSDj0
 v3xzklFCN5HMOmEcjkNmyZw3VjHOt2Iw2mKjqvGbI9imCPLOYnw+WQaZLmMWMH6p
 GL0HDbAopN5Lv8ireWd8pOhPLVbSb12cWM1crx+yHOS3q8YNWjIXGiZr/QkOPtPr
 cymSXB8yuITJ7gnjbs/GxZHg6rxU0knC/Ck8hE7FqqYYHgytTklOXDE2ef1J2lFe
 1VmotD+nTsCir0mZWSdcRrszEk7tzaZT7n1oWggKvWySDB6qaH0II8vWumJchQnN
 pElIQn/WDgpekIqplamNqXJnKnDXZJpEVA01OHHDN4MNSc+Ad08hQy4FyFzpB6/G
 jv9TnDMfGC6ma9pr1ipOXyCgCa2pHYEUCaYxUqRA0O/4ATVl7/PplqT0rqtJ6hKg
 o/hmaVCawIFOUKD87/bo7Em2HBs3xNwE/c5ggbsQElLYeydrgPrZfrPfjkshv5K3
 eIKDb+HPyis0is1aiF7m/bz1hSIYZp0bQhuKCdzLRjZobwCm5WDPhtuuAWb7vYVw
 GSLCJWyet+bLyZxynNOt67gKm9je9lt8YTr5nilz49KeDytspK0=
 =pacJ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a smaller cycle with many of the commits being smallish
  code fixes and improvements across the drivers.

   - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and
     rxe

   - Memory window support in hns

   - mlx5 user API 'flow mutate/steering' allows accessing the full
     packet mangling and matching machinery from user space

   - Support inter-working with verbs API calls in the 'devx' mlx5 user
     API, and provide options to use devx with less privilege

   - Modernize the use of syfs and the device interface to use attribute
     groups and cdev properly for uverbs, and clean up some of the core
     code's device list management

   - More progress on net namespaces for RDMA devices

   - Consolidate driver BAR mmapping support into core code helpers and
     rework how RDMA holds poitners to mm_struct for get_user_pages
     cases

   - First pass to use 'dev_name' instead of ib_device->name

   - Device renaming for RDMA devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits)
  IB/mlx5: Add support for extended atomic operations
  RDMA/core: Fix comment for hw stats init for port == 0
  RDMA/core: Refactor ib_register_device() function
  RDMA/core: Fix unwinding flow in case of error to register device
  ib_srp: Remove WARN_ON in srp_terminate_io()
  IB/mlx5: Allow scatter to CQE without global signaled WRs
  IB/mlx5: Verify that driver supports user flags
  IB/mlx5: Support scatter to CQE for DC transport type
  RDMA/drivers: Use core provided API for registering device attributes
  RDMA/core: Allow existing drivers to set one sysfs group per device
  IB/rxe: Remove unnecessary enum values
  RDMA/umad: Use kernel API to allocate umad indexes
  RDMA/uverbs: Use kernel API to allocate uverbs indexes
  RDMA/core: Increase total number of RDMA ports across all devices
  IB/mlx4: Add port and TID to MAD debug print
  IB/mlx4: Enable debug print of SMPs
  RDMA/core: Rename ports_parent to ports_kobj
  RDMA/core: Do not expose unsupported counters
  IB/mlx4: Refer to the device kobject instead of ports_parent
  RDMA/nldev: Allow IB device rename through RDMA netlink
  ...
2018-10-26 07:38:19 -07:00
Tariq Toukan
4972e6fa3a net/mlx5: Refactor fragmented buffer struct fields and init flow
Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not
needed to manage and control the datapath of the fragmented buffers API.

struct mlx5_frag_buf contains control info to manage the allocation
and de-allocation of the fragmented buffer.
Its fields are not relevant for datapath, so here I take them out of the
struct mlx5_frag_buf_ctrl, except for the fragments array itself.

In addition, modified mlx5_fill_fbc to initialise the frags pointers
as well. This implies that the buffer must be allocated before the
function is called.

A set of type-specific *_get_byte_size() functions are replaced by
a generic one.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18 13:13:31 -07:00
Paul Blakey
d5634fee24 net/mlx5: Add a no-append flow insertion mode
If no-append flag is set, we will add a new FTE, instead of appending
the actions of the inserted rule when the same match already exists.

While here, move the has_flow_tag boolean indicator to be a flag too.

This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17 14:18:50 -07:00
Mark Bloch
171c7625be net/mlx5: Use flow counter IDs and not the wrapping cache object
Currently, when a flow rule is created using the FS core layer, the caller
has to pass the entire flow counter object and not just the counter HW
handle (ID). This requires both the FS core and the caller to have
knowledge about the inner implementation of the FS layer flow counters
cache and limits the possible users.

Move to use the counter ID across the place when dealing with flows.

Doing this decoupling, now can we privatize the inner implementation
of the flow counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17 14:15:48 -07:00
Saeed Mahameed
186daf0c20 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next
mlx5 updates for both net-next and rdma-next

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits)
  net/mlx5: Expose DC scatter to CQE capability bit
  net/mlx5: Update mlx5_ifc with DEVX UID bits
  net/mlx5: Set uid as part of DCT commands
  net/mlx5: Set uid as part of SRQ commands
  net/mlx5: Set uid as part of SQ commands
  net/mlx5: Set uid as part of RQ commands
  net/mlx5: Set uid as part of QP commands
  net/mlx5: Set uid as part of CQ commands
  net/mlx5: Rename incorrect naming in IFC file
  net/mlx5: Export packet reformat alloc/dealloc functions
  net/mlx5: Pass a namespace for packet reformat ID allocation
  net/mlx5: Expose new packet reformat capabilities
  {net, RDMA}/mlx5: Rename encap to reformat packet
  net/mlx5: Move header encap type to IFC header file
  net/mlx5: Break encap/decap into two separated flow table creation flags
  net/mlx5: Add support for more namespaces when allocating modify header
  net/mlx5: Export modify header alloc/dealloc functions
  net/mlx5: Add proper NIC TX steering flow tables support
  net/mlx5: Cleanup flow namespace getter switch logic
  net/mlx5: Add memic command opcode to command checker
  ...

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-17 14:13:36 -07:00
Yonatan Cohen
a60109dc9a IB/mlx5: Add support for extended atomic operations
Extended atomic operations cmp&swp and fetch&add is a Mellanox
feature extending the standard atomic operation to use, varied
operand sizes, as apposed to normal atomic operation that use
an 8 byte operand only.
Extended atomics allows masking the results and arguments.

This patch configures QP to support extended atomic operation
with the maximum size possible, as exposed by HCA capabilities.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17 11:53:23 -04:00
Yonatan Cohen
6f4bc0ea68 IB/mlx5: Allow scatter to CQE without global signaled WRs
Requester scatter to CQE is restricted to QPs configured to signal
all WRs.

This patch adds ability to enable scatter to cqe (force enable)
in the requester without sig_all, for users who do not want all WRs
signaled but rather just the ones whose data found in the CQE.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17 11:25:41 -04:00
Yonatan Cohen
2e43bb31b8 IB/mlx5: Verify that driver supports user flags
Flags sent down from user might not be supported by
running driver.
This might lead to unwanted bugs.
To solve this, added macro to test for unsupported flags.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17 11:25:41 -04:00
Yonatan Cohen
5d6ff1babe IB/mlx5: Support scatter to CQE for DC transport type
Scatter to CQE is a HW offload that saves PCI writes by scattering the
payload to the CQE.
This patch extends already existing functionality to support DC
transport type.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17 11:25:41 -04:00
Parav Pandit
508a523f6b RDMA/drivers: Use core provided API for registering device attributes
Use rdma_set_device_sysfs_group() to register device attributes and
simplify the driver.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-17 03:45:01 -06:00
Artemy Kovalyov
013c2403bf IB/mlx5: Fix MR cache initialization
Schedule MR cache work only after bucket was initialized.

Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42df ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 08:30:37 -06:00
Mark Bloch
ba4a411983 RDMA/mlx5: Add support for flow tag to raw create flow
A user can provide a hint which will be attached to the packet and written
to the CQE on receive. This can be used as a way to offload operations
into the HW, for example parsing a packet which is a tunneled packet, and
if so, pass 0x1 as the hint. The software can use that hint to decapsulate
the packet and parse only the inner headers thus saving CPU cycles.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:24:45 -06:00
Gal Pressman
645ba5970c RDMA/mlx5: Remove extraneous error check
Remove double error check from create user RQ error flow.

Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:21:38 -06:00
Yishai Hadas
2351776e87 IB/mlx5: Verify DEVX object type
Verify that the input DEVX object type matches the created object.

As the obj_id in the firmware is not globally unique the object type must
be considered upon checking for a valid object id.

Once both the type and the id match we know that the lock was taken on the
correct object by the uverbs layer.

Fixes: e662e14d80 ("IB/mlx5: Add DEVX support for modify and query commands")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:19:59 -06:00
Jason Gunthorpe
59bfc59a68 Merge branch 'for-rc' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

This is required to resolve dependencies of the next series of RDMA
patches.

The code motion conflicts in drivers/infiniband/core/cache.c were
resolved.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:01:02 -06:00
David S. Miller
1986647c2f mlx5e-updates-2018-10-10
IPoIB netlink support and mlx5e pre-allocated netdevice initialization
 
 IP link was broken due to the changes in IPoIB for the rdma_netdev
 support after commit cd565b4b51
 ("IB/IPoIB: Support acceleration options callbacks").
 
 This patchset fixes IPoIB pkey creation and removal using rtnetlink by
 adding support in both IPoIB ULP layer and mlx5 layer:
 
 From Jason and Denis:
 1) Introduces changes in the RDMA netdev code in order to
    allow allocation of the netdev to be done by the rtnl netdev code.
 2) Reworks IPoIB initialization to use the two step rdma_netdev
    creation.
 
 From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting
 pre-allocated netdevs:
 3) Adds support to initialize/cleanup netdevs that are not created
    by mlx5 driver.
 4) Change mlx5e netdevice layer to accept the pre-allocated netdevice
    queue number.
 5) Initialize mlx5e generic structures in one place to be used for all
    netdevs types NIC/representors/IPoIB (both mlx5 allocated and
    pre-allocted).
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbvqHhAAoJEEg/ir3gV/o+TRQH+wYWgvi5AbqYlGpT8xSho8Dg
 5tuFE9AbYDGLLWptZgtYXsBsTaltGVqKnwWy//dAuuSI7LvqtjeKqq0G1JKau70L
 /49+vv8NTQ6vov2yY95yzzEo5B1/zVcYEl9dmJl4y6Xlwc+YIteewReVqRj+tucR
 xO7ghLWc1o4Pl7gWBAcOULyOsZc+CtG8ElwEaBROXTtYRpsR7FKn7vN+WdWZ2VOG
 MjtCUaG0vZlK1vaF76J3P9bL2V3y6o6i6S8RZwg1kPxO4jnrzyGrkxWMOX6f32KB
 JAUcLVlJW5wckcRAKfTwLxsFMrc9vLBPHyj4XneXVWGKh8/dGUGY5gdGIJV5Jlc=
 =m0tM
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-10-10

IPoIB netlink support and mlx5e pre-allocated netdevice initialization

IP link was broken due to the changes in IPoIB for the rdma_netdev
support after commit cd565b4b51
("IB/IPoIB: Support acceleration options callbacks").

This patchset fixes IPoIB pkey creation and removal using rtnetlink by
adding support in both IPoIB ULP layer and mlx5 layer:

From Jason and Denis:
1) Introduces changes in the RDMA netdev code in order to
   allow allocation of the netdev to be done by the rtnl netdev code.
2) Reworks IPoIB initialization to use the two step rdma_netdev
   creation.

From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting
pre-allocated netdevs:
3) Adds support to initialize/cleanup netdevs that are not created
   by mlx5 driver.
4) Change mlx5e netdevice layer to accept the pre-allocated netdevice
   queue number.
5) Initialize mlx5e generic structures in one place to be used for all
   netdevs types NIC/representors/IPoIB (both mlx5 allocated and
   pre-allocted).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:49:56 -07:00
Denis Drozdov
f6a8a19bb1 RDMA/netdev: Hoist alloc_netdev_mqs out of the driver
netdev has several interfaces that expect to call alloc_netdev_mqs from
the core code, with the driver only providing the arguments.  This is
incompatible with the rdma_netdev interface that returns the netdev
directly.

Thus re-organize the API used by ipoib so that the verbs core code calls
alloc_netdev_mqs for the driver. This is done by allowing the drivers to
provide the allocation parameters via a 'get_params' callback and then
initializing an allocated netdev as a second step.

Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10 17:58:11 -07:00
Valentine Fatiev
dd9a403495 IB/mlx5: Unmap DMA addr from HCA before IOMMU
The function that puts back the MR in cache also removes the DMA address
from the HCA. Therefore we need to call this function before we remove
the DMA mapping from MMU. Otherwise the HCA may access a memory that
is no longer DMA mapped.

Call trace:
NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4
Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012
RIP: 0010:intel_idle+0x73/0x120
Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02
RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046
RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000
RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9
R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000
R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d
FS:  0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0
Call Trace:
 cpuidle_enter_state+0x7e/0x2e0
 do_idle+0x1ed/0x290
 cpu_startup_entry+0x6f/0x80
 start_kernel+0x524/0x544
 ? set_init_arg+0x55/0x55
 secondary_startup_64+0xa4/0xb0
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set
DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set

Fixes: f3f134f526 ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-10 14:52:43 -04:00
Kamal Heib
d31131bba5 RDMA: Remove unused parameter from ib_modify_qp_is_ok()
The ll parameter is not used in ib_modify_qp_is_ok(), so remove it.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:05:46 -06:00
Yishai Hadas
3df6e0234a IB/mlx5: Enable DEVX on IB
IB has additional protections with SELinux that cannot be extended to the
DEVX domain. SELinux can restrict access to pkeys. The first version of
DEVX blocked IB entirely until this could be understood.

Since DEVX requires CAP_NET_RAW, it supersedes the SELinux restriction and
allows userspace to form arbitrary packets with arbitrary pkeys.

Thus we enable IB for DEVX when CAP_NET_RAW is given.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-27 13:01:33 -06:00
Yishai Hadas
7e1335a736 IB/mlx5: Enable DEVX white list commands
Enable DEVX white list commands without the need for CAP_NET_RAW.

DEVX uid must exist from the ucontext or the device so that the firmware
will mask unprivileged capabilities.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-27 13:01:33 -06:00
Yishai Hadas
76dc5a8406 IB/mlx5: Manage device uid for DEVX white list commands
Manage device uid for DEVX white list commands.  The created device uid
will be used on white list commands if the user didn't supply its own uid.

This will enable the firmware to filter out non privileged functionality
as of the recognition of the uid.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-27 13:01:33 -06:00
Yishai Hadas
7f72052cb4 IB/mlx5: Expose RAW QP device handles to user space
Expose RAW QP device handles to user space by extending the UHW part of
mlx5_ib_create_qp_resp.

This data is returned only when DEVX context is used where it may be
applicable.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-27 13:01:33 -06:00
Jason Gunthorpe
5a738b5d47 RDMA/drivers: Use dev_err/dbg/etc instead of pr_* + ibdev->name
Kernel convention is that a driver for a subsystem will print using
dev_* on the subsystem's struct device, or with dev_* on the physical
device. Drivers should rarely use a pr_* function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-26 13:51:48 -06:00
Jason Gunthorpe
e349f858d2 RDMA: Fully setup the device name in ib_register_device
The current code has two copies of the device name, ibdev->dev and
dev_name(&ibdev->dev), and they are setup at different times, which is
very confusing.

Set them both up at the same time and make dev_name() the lead name, which
is the proper use of the driver core APIs. To make it very clear that the
name is not valid until registration pass it in to the
ib_register_device() call rather than messing with ibdev->name directly.

Also the reorganization now checks that dev_name is unique even if it does
not contain a %.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-09-26 13:51:36 -06:00
Yishai Hadas
e8ef090a61 IB/mlx5: Destroy the DEVX object upon error flow
Upon DEVX object creation the object must be destroyed upon a follows
error flow.

Fixes: 7efce3691d ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:49:17 -06:00
Mark Bloch
0430e74f9f RDMA/mlx5: Remove superfluous version print
When profiles were introduced to MLX5 IB an unneeded version print when
creating an MLX5 IB device was added. Remove the print, we still have a
printk for driver version in mlx5_ib_add().

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:36:50 -06:00
Yishai Hadas
ba1a057da2 IB/mlx5: Set valid umem bit on DEVX
Set valid umem bit on DEVX commands that use umem.
This will enforce the umem usage by the firmware and not the 'pas' info.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:38 -06:00
Yishai Hadas
d2d19121ae IB/mlx5: Set uid as part of TD commands
Set uid as part of TD commands so that the firmware can
manage the TD object in a secured way.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:38 -06:00
Yishai Hadas
d00614c057 IB/mlx5: Set uid as part of XRCD commands
Set uid as part of XRCD commands so that the firmware can manage the
XRCD object in a secured way.

That will enable using an XRCD that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
cf50a7863b IB/mlx5: Set uid as part of CQ creation
Set uid as part of CQ creation so that the firmware can manage the
CQ object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a CQ that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
58895f0d18 IB/mlx5: Set uid upon PD allocation
Set uid as part of PD allocation, this uid is used for other mlx5
objects upon calling the firmware.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
5deba86ee2 IB/mlx5: Set uid as part of RQT commands
Set uid as part of RQT commands so that the firmware can manage the
RQT object in a secured way.

That will enable using an RQT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
1cd6dbd32f IB/mlx5: Set uid as part of TIS commands
Set uid as part of TIS commands so that the firmware can manage the
TIS object in a secured way.

That will enable using a TIS that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
443c1cf9d6 IB/mlx5: Set uid as part of TIR commands
Set uid as part of TIR commands so that the firmware can manage the
TIR object in a secured way.

That will enable using a TIR that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
539ec98276 IB/mlx5: Set uid as part of MCG commands
Set uid as part of MCG commands so that the firmware can manage the
MCG object in a secured way.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
a01a5860b2 IB/mlx5: Set uid as part of DCT commands
Set uid as part of DCT create command so that the firmware can
manage the DCT object in a secured way.

The uid for the destroy and drain commands are set by mlx5_core.

That will enable using a DCT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
9f33ec03bc IB/mlx5: Set uid as part of SRQ commands
Set uid as part of SRQ create command so that the firmware can manage
the SRQ object in a secured way.

The uid for the destroy and modify commands are set by mlx5_core.

That will enable using a SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
c14003f090 IB/mlx5: Set uid as part of SQ commands
Set uid as part of SQ commands so that the firmware can manage the
SQ object in a secured way.

The uid for the destroy command is set by mlx5_core.

This will enable using an SQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
34d57585f9 IB/mlx5: Set uid as part of RQ commands
Set uid as part of RQ commands so that the firmware can manage the
RQ object in a secured way.

The uid for the destroy command is set by mlx5_core.

This will enable using an RQ that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
991d219829 IB/mlx5: Set uid as part of QP creation
Set uid as part of QP creation so that the firmware can manage the
QP object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas
a1069c1c75 IB/mlx5: Use uid as part of PD commands
Use uid as part of PD commands so that the firmware can manage the
PD object in a secured way.

For example when a QP is created its uid must match the CQ uid which it
uses.

Next patches in this series will use the uid from the PD, then will come
a patch to set the uid on the PD so that all objects will be properly
work in one change.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Doug Ledford
f9882bb506 Merge branch 'mlx5-vport-loopback' into rdma.get
For dependencies, branch based on 'mlx5-next' of
    git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

mlx5 mcast/ucast loopback control enhancements from Leon Romanovsky:

====================
This is short series from Mark which extends handling of loopback
traffic. Originally mlx5 IB dynamically enabled/disabled both unicast
and multicast based on number of users. However RAW ethernet QPs need
more granular access.
====================

Fixed failed automerge in mlx5_ib.h (minor context conflict issue)

mlx5-vport-loopback branch:
    RDMA/mlx5: Enable vport loopback when user context or QP mandate
    RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
    RDMA/mlx5: Refactor transport domain bookkeeping logic
    net/mlx5: Rename incorrect naming in IFC file

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:41:58 -04:00
Mark Bloch
0042f9e458 RDMA/mlx5: Enable vport loopback when user context or QP mandate
A user can create a QP which can accept loopback traffic, but that's not
enough. We need to enable loopback on the vport as well. Currently vport
loopback is enabled only when more than 1 users are using the IB device,
update the logic to consider whatever a QP which supports loopback was
created, if so enable vport loopback even if there is only a single user.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch
175edba856 RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
Expose two new flags:
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

Those flags can be used at creation time in order to allow a QP
to be able to receive loopback traffic (unicast and multicast).
We store the state in the QP to be used on the destroy path
to indicate with which flags the QP was created with.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch
a560f1d9af RDMA/mlx5: Refactor transport domain bookkeeping logic
In preparation to enable loopback on a single user context move the logic
that enables/disables loopback to separate functions and group variables
under a single struct.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch
5d773ff41a net/mlx5: Rename incorrect naming in IFC file
Remove a trailing underscore from the multicast/unicast names.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-22 00:38:39 +03:00
Jason Gunthorpe
f27a0d50a4 RDMA/umem: Use umem->owning_mm inside ODP
Since ODP had a single struct mmu_notifier located in the ucontext it
could only handle a single MM at a time, and this prevented it from using
the new owning_mm system.

With the prior rework it is now simple to let ODP track multiple MMs per
ucontext, finish the job so that the per_mm is allocated on a mm by mm
basis, and freed when the last umem is dropped from the ucontext.

As a side effect the new saner locking removes the lockdep splat about
nesting the umem_rwsem between mmu_notifier_unregister and
ib_umem_odp_release.

It also makes ODP work with multiple processes, across, fork, etc.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe
c9990ab39b RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mm
This is the first step to make ODP use the owning_mm that is now part of
struct ib_umem.

Each ODP umem is linked to a single per_mm structure, which in turn, is
linked to a single mm, via the embedded mmu_notifier. This first patch
introduces the structure and reworks eveything to use it.

This also needs to introduce tgid into the ib_ucontext_per_mm, as
get_user_pages_remote() requires the originating task for statistics
tracking.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe
597ecc5a09 RDMA/umem: Get rid of struct ib_umem.odp_data
This no longer has any use, we can use container_of to get to the
umem_odp, and a simple flag to indicate if this is an odp MR. Remove the
few remaining references to it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe
41b4deeaa1 RDMA/umem: Make ib_umem_odp into a sub structure of ib_umem
These two structures are linked together, use the container_of pattern
instead of a double allocation to make the code simpler and easier to
follow.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe
b5231b019d RDMA/umem: Use ib_umem_odp in all function signatures connected to ODP
All of these functions already require the ODP version of the umem struct,
make this very clear by having the signature require it. This paves the
way to using the container_of() pattern to link umem_odp and umem
together.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe
e2cd1d1ad2 RDMA/mlx5: Use rdma_user_mmap_io
Rely on the new core code helper to map BAR memory from the driver.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Moni Shoua
99ed748e87 IB/mlx5: Allow transition of DCI QP to reset
The transition is allowed from any state and the atrribute mask must be
IB_QP_STATE.

Fixes: c32a4f296e ("IB/mlx5: Add support for DC Initiator QP")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 14:58:17 -06:00
Mark Bloch
a7ee18bdee RDMA/mlx5: Allow creating a matcher for a NIC TX flow table
Currently a matcher can only be created and attached to a NIC RX flow
table. Extend it to allow it on NIC TX flow tables as well.

In order to achieve that, we:

1) Expose a new attribute: MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS.
   enum ib_flow_flags is used as valid flags. Only
   IB_FLOW_ATTR_FLAGS_EGRESS is supported.

2) Remove the requirement to have a DEVX or QP destination when creating a
   flow. A flow added to NIC TX flow table will forward the packet outside
   of the vport (Wire or E-Switch in the SR-iOV case).

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:07 -06:00
Mark Bloch
b47fd4ffe2 RDMA/mlx5: Add NIC TX namespace when getting a flow table
Add the ability to get a NIC TX flow table when using _get_flow_table().
This will allow to create a matcher and a flow rule on the NIC TX path.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:07 -06:00
Mark Bloch
fa76d24ee0 RDMA/mlx5: Add flow actions support to raw create flow
Support attaching flow actions to a flow rule via raw create flow.
For now only NIC RX path is supported. This change requires to export
flow resources management functions so we can maintain proper bookkeeping
of flow actions.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:07 -06:00
Mark Bloch
b823dd6d86 RDMA/mlx5: Refactor raw flow creation
Move struct mlx5_flow_act to be passed from the method entry point,
this will allow to add support for flow action for the raw create flow
path.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:07 -06:00
Mark Bloch
501f14e37b RDMA/mlx5: Don't overwrite action if already set
We support only a single action type per flow rule, in case the user passes
the same type of flow actions fail the flow creation.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
2ea2620390 RDMA/mlx5: Refactor flow action parsing to be more generic
Make the parsing of flow actions more generic so it could be used by
mlx5 raw create flow.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
86e1d464a8 RDMA/uverbs: Move flow resources initialization
Use ib_set_flow() when initializing flow related resources.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
e806f9328b RDMA/mlx5: Enable attaching packet reformat action to steering flows
Any matching rules will be mutated based on the packet reformat context
which is attached to that given flow rule.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
5c2db53f62 RDMA/mlx5: Enable reformat on NIC RX if supported
A L3_TUNNEL_TO_L2 decap flow action requires to enable the encap bit on
the flow table, enable it if supported. This will allow to attach those
flow actions to NIC RX steering. We don't enable if running on a
representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
10a308964e RDMA/mlx5: Enable attaching DECAP action to steering flows
Any matching packet will be stripped of it's VXLAN tunnel, only the inner
L2 onward is left. The user will receive the decapsulated packet.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
4adda1122c RDMA/mlx5: Enable decap and packet reformat on flow tables
If NIC RX flow tables support decap operation, enable it on creation,
This allows to perform decapsulation of tunnelled packets by steering
rules. If NIC TX flow tables support reformat operation, enable it on
creation.

We don't enable those capabilities on representors as the E-Switch should
handle packet modification (can be configured via TC) and as current
hardware can't handle both FDB and NIC flow tables with decap/packet
reformat support.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
b1085be3f4 RDMA/mlx5: Enable attaching modify header to steering flows
When creating a flow steering rule, allow the user to attach a modify
header action.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Mark Bloch
78dd0c430f RDMA/mlx5: Add NIC TX steering support
Just like ingress steering, allow a user to create steering rules that
match egress vport traffic. We expose the same number of priorities as
the bypass (NIC RX) steering.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:28:06 -06:00
Parav Pandit
6c75520f7e IB/mlx5: Don't hold spin lock while checking device state
mdev->state device state is not protected by the QP for which WRs are
being processed. Therefore, there is no need to hold spin lock while
checking mdev state.

Given that device fatal error is unlikely situation, wrap the condition
check with unlikely().

Additionally, kernel QP1 is also a kernel ULP for which soft CQEs needs
to be generated. Therefore, check for device fatal error before
processing QP1 work requests.

Fixes: 89ea94a7b6 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06 13:35:15 -06:00
Jason Gunthorpe
af68ccbc11 Merge branch 'mlx5-flow-mutate' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Flow actions to mutate packets from Leon Romanovsky:

====================
This series exposes the ability to create flow actions which can
mutate packet headers. We do that by exposing two new verbs:
 * modify header - can change existing packet headers. packet
 * reformat - can encapsulate or decapsulate a packet.
              Once created a flow action must be attached to a steering
              rule for it to take effect.

The first 10 patches refactor mlx5_core code, rename internal structures
to better reflect their operation and export needed functions so the RDMA
side can allocate the action.

The last 5 patches expose via the IOCTL infrastructure mlx5_ib methods
which do the actual allocation of resources and return an handle to the
user. A user of this API is expected to know how to work with the device's
spec as the input to those function is HW depended.

An example usage of the modify header action is routing, A user can create
an action which edits the L2 header and decrease the TTL.

An example usage of the packet reformat action is VXLAN encap/decap which
is done by the HW.
====================

* branch 'mlx5-flow-mutate':
  RDMA/mlx5: Extend packet reformat verbs
  RDMA/mlx5: Add new flow action verb - packet reformat
  RDMA/uverbs: Add generic function to fill in flow action object
  RDMA/mlx5: Add a new flow action verb - modify header
  RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
  net/mlx5: Export packet reformat alloc/dealloc functions
  net/mlx5: Pass a namespace for packet reformat ID allocation
  net/mlx5: Expose new packet reformat capabilities
  {net, RDMA}/mlx5: Rename encap to reformat packet
  net/mlx5: Move header encap type to IFC header file
  net/mlx5: Break encap/decap into two separated flow table creation flags
  net/mlx5: Add support for more namespaces when allocating modify header
  net/mlx5: Export modify header alloc/dealloc functions
  net/mlx5: Add proper NIC TX steering flow tables support
  net/mlx5: Cleanup flow namespace getter switch logic

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:24:58 -06:00
Mark Bloch
a090d0d859 RDMA/mlx5: Extend packet reformat verbs
We expose new actions:

L2_TO_L2_TUNNEL - A generic encap from L2 to L2, the data passed should
		  be the encapsulating headers.

L3_TUNNEL_TO_L2 - Will do decap where the inner packet starts from L3,
		  the data should be mac or mac + vlan (14 or 18 bytes).

L2_TO_L3_TUNNEL - Will do encap where is L2 of the original packet will
		  not be included, the data should be the encapsulating
		  header.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:59 -06:00
Mark Bloch
08aeb97cb8 RDMA/mlx5: Add new flow action verb - packet reformat
For now, only add L2_TUNNEL_TO_L2 option. This will allow to perform
generic decap operation if the encapsulating protocol is L2 based, and the
inner packet is also L2 based. For example this can be used to decap VXLAN
packets.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:59 -06:00
Mark Bloch
841eefc5cb RDMA/uverbs: Add generic function to fill in flow action object
Refactor the initialization of a flow action object to a common function.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:59 -06:00
Mark Bloch
b4749bf256 RDMA/mlx5: Add a new flow action verb - modify header
Expose the ability to create a flow action which changes packet
headers. The data passed from userspace should be modify header actions as
defined by HW specification.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 15:23:58 -06:00
Mark Bloch
60786f0987 {net, RDMA}/mlx5: Rename encap to reformat packet
Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 08:10:59 +03:00
Majd Dibbiny
c6a21c3864 IB/mlx5: Change TX affinity assignment in RoCE LAG mode
In the current code, the TX affinity is per RoCE device, which can cause
unfairness between different contexts. e.g. if we open two contexts, and
each open 10 QPs concurrently, all of the QPs of the first context might
end up on the first port instead of distributed on the two ports as
expected

To overcome this unfairness between processes, we maintain per device TX
affinity, and per process TX affinity.

The allocation algorithm is as follow:

1. Hold two tx_port_affinity atomic variables, one per RoCE device and one
   per ucontext. Both initialized to 0.

2. In mlx5_ib_alloc_ucontext do:
 2.1. ucontext.tx_port_affinity = device.tx_port_affinity
 2.2. device.tx_port_affinity += 1

3. In modify QP INIT2RST:
 3.1. qp.tx_port_affinity = ucontext.tx_port_affinity % MLX5_PORT_NUM
 3.2. ucontext.tx_port_affinity += 1

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-04 16:26:14 -06:00
Michal Hocko
93065ac753 mm, oom: distinguish blockable mode for mmu notifiers
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: David Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:44 -07:00
Jason Gunthorpe
89982f7cce Linux 4.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
 mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
 qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
 zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
 vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
 wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
 L47IdXo=
 =sJkt
 -----END PGP SIGNATURE-----

Merge tag 'v4.18' into rdma.git for-next

Resolve merge conflicts from the -rc cycle against the rdma.git tree:

Conflicts:
 drivers/infiniband/core/uverbs_cmd.c
  - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
  - Merge removal of file->ucontext in for-next with new code in -rc
 drivers/infiniband/core/uverbs_main.c
  - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 13:12:00 -06:00
Jason Gunthorpe
0625b4ba1a IB/mlx5: Fix leaking stack memory to userspace
mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes
were written.

Fixes: 41d902cb7c ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp")
Cc: <stable@vger.kernel.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 15:35:12 -06:00
Jason Gunthorpe
b61815e241 IB/uverbs: Use uverbs_alloc for allocations
Several handlers need temporary allocations for the life of the method,
switch them to use the uverbs_alloc allocator.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-13 09:16:13 -06:00
Jason Gunthorpe
7d96c9b176 IB/uverbs: Have the core code create the uverbs_root_spec
There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.

The core uses this to build up the runtime uapi for each device.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10 16:06:24 -06:00
Leon Romanovsky
0dfe452241 RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq
[   61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34
[   61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int'
[   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
[   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   61.188315] Call Trace:
[   61.188661]  dump_stack+0xc7/0x13b
[   61.190427]  ubsan_epilogue+0x9/0x49
[   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
[   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
[   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
[   61.213892]  ib_uverbs_write+0x77e/0xae0
[   61.248018]  vfs_write+0x121/0x3b0
[   61.249831]  ksys_write+0xa1/0x120
[   61.254024]  do_syscall_64+0x7c/0x2a0
[   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   61.259211] RIP: 0033:0x7f54bab70e99
[   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89
[   61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99
[   61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
[   61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002
[   61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0
[   61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-08 09:47:26 -06:00
Jason Gunthorpe
9f49a5b5c2 RDMA/netdev: Use priv_destructor for netdev cleanup
Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:43 -06:00
Jason Gunthorpe
e83f0ecdc4 IB/uverbs: Do not pass struct ib_device to the ioctl methods
This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.

- Retrieve the ib_dev from the uobject->context->device. This is
  safe under ioctl as the core has already done rdma_alloc_begin_uobject
  and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Kamal Heib
26e551c5ae RDMA: Fix return code check in rdma_set_cq_moderation
The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31 17:03:46 -06:00
Jason Gunthorpe
bccd06223f IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language
This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.

Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.

If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.

Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-30 20:23:29 -06:00
Bart Van Assche
d34ac5cd3a RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:09:34 -06:00
Bart Van Assche
7bb1fafc2f IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument
Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:00:20 -06:00
Bart Van Assche
f696bf6d64 RDMA: Constify the argument of the work request conversion functions
When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:00:20 -06:00
Qing Huang
2577188edc IB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port
When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."

So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
repeatedly attempts to bind the new mpi structure to every device on the
list until it finds an unbound device.

Change the log level from warn to dbg to avoid log flooding as the warning
should be harmless.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:02:28 -06:00
Jason Gunthorpe
22fa27fbc6 IB/uverbs: Fix locking around struct ib_uverbs_file ucontext
We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.

Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
  while holding a READ or WRITE lock on the uobject.
  This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL

This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.

As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:46 -06:00
Jason Gunthorpe
c36ee46daf IB/mlx5: Use the ucontext from the uobj, not the file
This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext.  Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Yishai Hadas
cb80fb1892 IB/mlx5: Enable driver uapi commands for flow steering
Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:33:52 -06:00
Yishai Hadas
6346f0bfa0 IB/mlx5: Add support for a flow table destination for driver flow steering
Add support to set a destination that is a flow table, this can come from
the DEVX destination.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:56 -06:00
Yishai Hadas
d4be3f4466 IB/mlx5: Support adding flow steering rule by raw description
Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.

The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.

This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:56 -06:00
Yishai Hadas
3226944124 IB/mlx5: Introduce driver create and destroy flow methods
Introduce driver create and destroy flow methods on the uverbs flow
object.

This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.

The IB object's attributes are set via the ib_set_flow() helper function.

The specific implementation for the given specification is added in
downstream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:49 -06:00
Yishai Hadas
fd44e3853c IB/mlx5: Introduce flow steering matcher uapi object
Introduce flow steering matcher object and its create and destroy methods.

This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.

It will be used in downstream patches to be part of mlx5 specific create
flow method.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 13:34:37 -06:00
Jason Gunthorpe
eda98779f7 Merge branch 'mellanox/mlx5-next' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support for flow table destination number
  net/mlx5: Add forward compatible support for the FTE match data
  net/mlx5: Fix tristate and description for MLX5 module
  net/mlx5: Better return types for CQE API
  net/mlx5: Use ERR_CAST() instead of coding it
  net/mlx5: Add missing SET_DRIVER_VERSION command translation
  net/mlx5: Add XRQ commands definitions
  net/mlx5: Add core support for double vlan push/pop steering action
  net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
  net/mlx5: FW tracer, add hardware structures
  net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 13:10:23 -06:00
Kamal Heib
aa09ea6e6b RDMA/mlx5: Remove set but not used variables
Remove "uctx" and "pa" variables that were set but not used.

Fixes: a8b92ca1b0 ("IB/mlx5: Introduce DEVX")
Fixes: 8f06228733 ("RDMA/mlx5: Remove debug prints of VMA pointers")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-23 15:34:54 -06:00
Eran Ben Elisha
048f31437a net/mlx5: Fix tristate and description for MLX5 module
Current description did not include new devices. Fix that by proving the
correct generic description.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Linus Torvalds
4659fc8484 Regression, user visible bugs, and crashing fixes:
- cxgb4 could wrongly fail MR creation due to a typo
 
 - Various crashes if the wrong QP type is mixed in with APIs that expect
   other types
 
 - Syzkaller oops
 
 - Using ERR_PTR and NULL together cases HFI1 to crash in some cases
 
 - mlx5 memory leak in error unwind
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbSNzSAAoJEDht9xV+IJsarfQP/38i9pbDqWdniEhv42CisS1D
 aZYygJ6yAsoEPmEI1oIGtJURte44wxHWmWO2Jbz8aTFl19tSh2cZgFfsNKImSU5A
 5ON19gxOeQ/KiZPmfbgP8cgunU41DpYq3twW8NvW9u5JTl1nbFKtpWfJxKVvjlsu
 PwiFrQxG43/9BrooHJc4eogJHmB77iypR3NmkagAM3oSx/d35zt+Wnw45bybIl8e
 T6OvyEvNHlGLQoqE8j4JYDN6whLwr7uqtcJXv/ukjhkD4WMb8ti9QZH6FPA+8pGG
 oRO5AlbWpcSHThu4tYYoThEdVMLjS5RnzOOUyMiHr7CS+MlEPrKtR5D23f3egSyU
 lUETkyPfkVRSrfD9LnOrj+W6xjwNMJm2Rq/zexVnoKjRrl5asfmDTTAhWJxu7CMK
 Oeb39ue6r7A3wEpaLWlqqrVnzIS+rKKKFCxqc+ni/JUTX0EPr+OcVJav+JmfjpKd
 Q24sbiSkoAp7HRJnsenCX4Kv2x55ClhLktGnltgoJtzsMbT/EOO38d2LrYpWkCkX
 aoi4Xpg1rOvY9biy4dDtGmawsKgEKwJ2j/xr6RQdKZZAifeabnoro/ekSj5/SquQ
 vUfATYG+JEKGethWgO2A7/tgdo7artrN77w0eM0yKtNuJwUGLHCuXbDFUgFxapvm
 RIXvnUEp9Q27bc2JjX3z
 =WDR8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Things have been quite slow, only 6 RC patches have been sent to the
  list. Regression, user visible bugs, and crashing fixes:

   - cxgb4 could wrongly fail MR creation due to a typo

   - various crashes if the wrong QP type is mixed in with APIs that
     expect other types

   - syzkaller oops

   - using ERR_PTR and NULL together cases HFI1 to crash in some cases

   - mlx5 memory leak in error unwind"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
  RDMA/uverbs: Don't fail in creation of multiple flows
  IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
  RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
  RDMA/uverbs: Protect from attempts to create flows on unsupported QP
  iw_cxgb4: correctly enforce the max reg_mr depth
2018-07-13 12:42:14 -07:00
Leon Romanovsky
05f58ceba1 RDMA/mlx5: Check that supplied blue flame index doesn't overflow
User's supplied index is checked again total number of system pages, but
this number already includes num_static_sys_pages, so addition of that
value to supplied index causes to below error while trying to access
sys_pages[].

BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
Read of size 4 at addr ffff880065561904 by task syz-executor446/314

CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xef/0x17e
 print_address_description+0x83/0x3b0
 kasan_report+0x18d/0x4d0
 bfregn_to_uar_index+0x34f/0x400
 create_user_qp+0x272/0x227d
 create_qp_common+0x32eb/0x43e0
 mlx5_ib_create_qp+0x379/0x1ca0
 create_qp.isra.5+0xc94/0x22d0
 ib_uverbs_create_qp+0x21b/0x2a0
 ib_uverbs_write+0xc2c/0x1010
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x433679
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006

Allocated by task 314:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x1a9/0x510
 mlx5_ib_alloc_ucontext+0x966/0x2620
 ib_uverbs_get_context+0x23f/0xa60
 ib_uverbs_write+0xc2c/0x1010
 __vfs_write+0x10d/0x720
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1:
 __kasan_slab_free+0x12e/0x180
 kfree+0x159/0x630
 kvfree+0x37/0x50
 single_release+0x8e/0xf0
 __fput+0x2d8/0x900
 task_work_run+0x102/0x1f0
 exit_to_usermode_loop+0x159/0x1c0
 do_syscall_64+0x408/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff880065561100
 which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 2052 bytes inside of
 4096-byte region [ffff880065561100, ffff880065562100)
The buggy address belongs to the page:
page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Cc: <stable@vger.kernel.org> # 4.15
Fixes: 1ee47ab3e8 ("IB/mlx5: Enable QP creation with a given blue flame index")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:54:50 -06:00
Leon Romanovsky
ffaf58def0 RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one call
There is no need for three consecutive calls to alloc_bfreg(). It can be
implemented with one function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:54:50 -06:00
Kamal Heib
d63c46734c RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
to free the allocated srq.

Fixes: c2b37f7648 ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:16:13 -06:00
Yishai Hadas
528922afd4 IB: Enable uverbs_destroy_def_handler to be used by drivers
Enable uverbs_destroy_def_handler to be used by drivers and replace
current code to use it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 11:52:06 -06:00
Artemy Kovalyov
b02289b3d6 RDMA: Validate grh_required when handling AVs
Extend the existing grh_required flag to check when AV's are handled that
a GRH is present.

Since we don't want to do query_port during the AV checks for performance
reasons move the flag into the immutable_data.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 11:13:04 -06:00
Jason Gunthorpe
2f944c0fbf RDMA: Fix storage of PortInfo CapabilityMask in the kernel
The internal flag IP_BASED_GIDS was added to a field that was being used
to hold the port Info CapabilityMask without considering the effects this
will have. Since most drivers just use the value from the HW MAD it means
IP_BASED_GIDS will also become set on any HW that sets the IBA flag
IsOtherLocalChangesNoticeSupported - which is not intended.

Fix this by keeping port_cap_flags only for the IBA CapabilityMask value
and store unrelated flags externally. Move the bit definitions for this to
ib_mad.h to make it clear what is happening.

To keep the uAPI unchanged define a new set of flags in the uapi header
that are only used by ib_uverbs_query_port_resp.port_cap_flags which match
the current flags supported in rdma-core, and the values exposed by the
current kernel.

Fixes: b4a26a2728 ("IB: Report using RoCE IP based gids in port caps")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-10 11:06:45 -06:00
Parav Pandit
921c0f5ba5 IB/mlx5: Honor cnt_set_id_valid flag instead of set_id
It is incorrect to depend on set_id value to know if counters were
allocated or not. set_id_valid field is set to true when counters
were allocated. Therefore, use set_id_valid while deciding to
free counters.

Cc: <stable@vger.kernel.org> # 4.15
Fixes: aac4492ef2 ("IB/mlx5: Update counter implementation for dual port RoCE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:30:25 -06:00
Leon Romanovsky
e3f1ed1f5a RDMA/mlx5: Remove unused port number parameter
Clean up a little bit code to drop unused port_num parameter.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:24:21 -06:00
Jann Horn
60e6627f12 IB/mlx5: fix uaccess beyond "count" in debugfs read/write handlers
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.

In this case, the affected files are in debugfs (and should therefore only
be accessible to root), and the read handlers check that *pos is zero
(meaning that at least sys_splice() can't trigger kernel memory
corruption). Because of the root requirement, this is not a security fix,
but rather a cleanup.

For the read handlers, fix it by using simple_read_from_buffer() instead
of custom logic. Add min() calls to the write handlers.

Fixes: 4a2da0b8c0 ("IB/mlx5: Add debug control parameters for congestion control")
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:15:12 -06:00
Jason Gunthorpe
540cd69209 RDMA/uverbs: Use UVERBS_ATTR_MIN_SIZE correctly and uniformly
This newer macro allows specifying a lower bound on the accepted size, and
has an 'unlimited' upper bound. Due to this it never checks for trailing
zeroing so it doesn't make any sense to combine it with MIN_SZ_OR_ZERO, so
drop MIN_SZ_OR_ZERO when they are used together

There were a couple of places that open coded this pattern, switch them to
use the clearer UVERBS_ATTR_MIN_SIZE for clarity.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
83bb444233 RDMA/uverbs: Remove UA_FLAGS
This bit of boilerplate isn't really necessary, we can use bitfields
instead of a flags enum and the macros can then individually initialize
them through the __VA_ARGS__ like everything else.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
9a119cd597 RDMA/uverbs: Get rid of the & in method specifications
Hide it inside the macros. The & is confusing and interferes with using
this as a generic DSL in later patches.

Since this also touches almost every line, also run the specs through
clang-format (with 'BinPackParameters: false') to make the maintenance
easier.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
6c61d2a55c RDMA/uverbs: Simplify UVERBS_OBJECT and _TREE family of macros
Instead of the large set of indirecting macros, define the few needed
macros to directly instantiate the struct uverbs_oject_tree_def and
associated objects list.

This is small amount of code duplication but the readability is far
better.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
595c7736d4 RDMA/uverbs: Simplify method definition macros
Instead of the large set of indirecting macros, define the few needed
macros to directly instantiate the struct uverbs_method_def and associated
attributes list.

This is small amount of code duplication but the readability is far
better.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
d108dac080 RDMA/uverbs: Simplify UVERBS_ATTR family of macros
Instead of using a complex cascade of macros, just directly provide the
initializer list each of the declarations is trying to create.

Now that the macros are simplified this also reworks the uverbs_attr_spec
to be friendly to older compilers by eliminating any unnamed
structures/unions inside, and removing the duplication of some fields. The
structure size remains at 16 bytes which was the original motivation for
some of this oddness.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
87fc2a620a RDMA/uverbs: Store the specs_root in the struct ib_uverbs_device
The specs are required to operate the uverbs file, so they belong inside
the ib_uverbs_device, not inside the ib_device. The spec passed in the
ib_device is just a communication from the driver and should not be used
during runtime.

This also changes the lifetime of the spec memory to match the
ib_uverbs_device, however at this time the spec_root can still contain
driver pointers after disassociation, so it cannot be used if ib_dev is
NULL. This is preparation for another series.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe
8193abb6a8 Merge branch 'mlx5-dump-fill-mkey' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Dump and fill MKEY from Leon Romanovsky:

====================
MLX5 IB HCA offers the memory key, dump_fill_mkey to increase performance,
when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access, while
keeping track of the processed length in the ibv_sge handling.

In this three patch series, we expose various bits in our HW spec
file (mlx5_ifc.h), move unneeded for mlx5_core FW command and export such
memory key to user space thought our mlx5-abi header file.
====================

Botched auto-merge in mlx5_ib_alloc_ucontext() resolved by hand.

* branch 'mlx5-dump-fill-mkey':
  IB/mlx5: Expose dump and fill memory key
  net/mlx5: Add hardware definitions for dump_fill_mkey
  net/mlx5: Limit scope of dump_fill_mkey function
  net/mlx5: Rate limit errors in command interface

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 13:23:46 -06:00
Yonatan Cohen
25bb36e75d IB/mlx5: Expose dump and fill memory key
MLX5 IB HCA offers the memory key, dump_fill_mkey to boost
performance, when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access,
while keeping track of the processed length in the ibv_sge handling.

Meaning, instead of a PCI write access the HCA leaves the target
memory untouched, and skips filling that packet section. Similar
behavior is done upon send, the HCA skips data in memory relevant
to this key and saves PCI bus access.

This functionality saves PCI read/write operations.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 13:16:04 -06:00
Yonatan Cohen
4d4fb5dc98 net/mlx5: Limit scope of dump_fill_mkey function
mlx5_core_dump_fill_mkey() is going to be used in next
patch in IB and doesn't need to be visible to whole
mlx5_core. Move that command to mlx5_ib.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 21:51:07 +03:00
Maor Gottlieb
a93b632c45 IB/mlx5: Fix GRE flow specification
Currently the driver sets the mask of the gre_protocol to 0xffff
without consideration in the user request.

Fix it by copy the mask from the verbs spec.

Fixes: da2f22ae77 ("IB/mlx5: Add support for GRE flow specification")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 15:46:07 -06:00
Bart Van Assche
7496a511a0 IB/mlx5: Remove set-but-not-used variables
Avoid that the compiler complains about set-but-not-used variables when
building with W=1. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:14:21 -06:00
Yishai Hadas
1c77483e4c IB: Improve uverbs_cleanup_ucontext algorithm
Improve uverbs_cleanup_ucontext algorithm to work properly when the
topology graph of the objects cannot be determined at compile time.  This
is the case with objects created via the devx interface in mlx5.

Typically uverbs objects must be created in a strict topologically sorted
order, so that LIFO ordering will generally cause them to be freed
properly. There are only a few cases (eg memory windows) where objects can
point to things out of the strict LIFO order.

Instead of using an explicit ordering scheme where the HW destroy is not
allowed to fail, go over the list multiple times and allow the destroy
function to fail. If progress halts then a final, desperate, cleanup is
done before leaking the memory. This indicates a driver bug.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 14:35:46 -06:00
Leon Romanovsky
1517799965 RDMA/mlx5: Don't leak UARs in case of free fails
The failure in releasing one UAR doesn't mean that we can't continue to
release rest of system pages, so don't return too early.

As part of cleanup, there is no need to print warning if
mlx5_cmd_free_uar() fails because such warning will be printed as part of
mlx5_cmd_exec().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 11:45:50 -06:00
David S. Miller
04c6faa175 mlx5-fixes-2018-06-26
Fixes for mlx5 core and netdev driver:
 
 Two fixes from Alex Vesker to address command interface issues
  - Race in command interface polling mode
  - Incorrect raw command length parsing
 
 From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster.
 
 From Or Gerlitz and Eli Cohin, Address backward compatability issues for when
 Eswitch capability is not advertised for the PF host driver
     - Fix required capability for manipulating MPFS
     - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
     - Avoid dealing with vport IB/eth representors if not being e-switch manager
     - E-Switch, Avoid setup attempt if not being e-switch manager
     - Don't attempt to dereference the ppriv struct if not being eswitch manager
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbMtMEAAoJEEg/ir3gV/o+NjAIAMGrerpwg8ADBj+b9tSWm4WV
 2yAJ561kBObwhA+uDJtH7mGUO3+AnkcWz9vynGqFdmkOikUcbpPkBb9D+rmFbkX2
 E585pwR3pH7lEzYEG4xO6SwuQcQ4OytFNxz94AT6CgNEXqrmbrD7A5Vsgk265yZq
 pJzL1OVfkXKOtb2x5PpCOh19/28OxAzyMQfoklsE2Wn7j8/2RWX0UUDxuF8jS+He
 9loaurT4Fsfo5JYE+o+k38knHFBkdTUZBD9/bZrtaMcrD68bZdJTpZm6eYwRXW3S
 7J88SmH/xTy74f1KY4qf0JOTxnaWtm/r4YaCXf1QD05W2/U9FQpIW1ipMKH51vk=
 =te2H
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2018-06-26

Fixes for mlx5 core and netdev driver:

Two fixes from Alex Vesker to address command interface issues
 - Race in command interface polling mode
 - Incorrect raw command length parsing

From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster.

From Or Gerlitz and Eli Cohin, Address backward compatability issues for when
Eswitch capability is not advertised for the PF host driver
    - Fix required capability for manipulating MPFS
    - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
    - Avoid dealing with vport IB/eth representors if not being e-switch manager
    - E-Switch, Avoid setup attempt if not being e-switch manager
    - Don't attempt to dereference the ppriv struct if not being eswitch manager
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 16:21:35 +09:00
Or Gerlitz
aff2252a2a IB/mlx5: Avoid dealing with vport representors if not being e-switch manager
In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the switchdev mode representors are running on
the embedded cpu (EC) and not at the host.

As such, we should avoid dealing with vport representors if
not being esw manager.

Fixes: b5ca15ad7e ('IB/mlx5: Add proper representors support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:30 -07:00
Yishai Hadas
d0e84c0ad3 IB/mlx5: Add support for drain SQ & RQ
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.

Specifically,
Upon internal error state modify QP to an error state can be assumed to
be success as each in-progress WR going to be flushed in error in any
case as expected by that modify command.

In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.

In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.

As the above depends on scheduling the code takes the relevant locks and
actions to make sure that the completion handler for that WR will always
be called after that the post_send/recv were issued but not in parallel
to the other task that handles the error flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 14:32:36 -06:00
Jason Gunthorpe
4d7dff2b8b Merge branch 'icrc-counter' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull RoCE ICRC counters from Leon Romanovsky:

====================
This series exposes RoCE ICRC counter through existing RDMA hw_counters
sysfs interface.

The first patch has all HW definitions in mlx5_ifc.h file and second patch
is the actual counter implementation.
====================

* branch 'icrc-counter':
  IB/mlx5: Support RoCE ICRC encapsulated error counter
  net/mlx5: Add RoCE RX ICRC encapsulated counter
2018-06-22 08:53:27 -06:00
Talat Batheesh
9f876f3de6 IB/mlx5: Support RoCE ICRC encapsulated error counter
This patch adds support to query the counter that counts the
RoCE packets with corrupted ICRC (Invariant Cyclic Redundancy Code).

This counter will be under
/sys/class/infiniband/<mlx5-dev>/ports/<port>/hw_counters/

rx_icrc_encapsulated - The number of RoCE packets with ICRC
error.

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 08:51:14 -06:00
Linus Torvalds
1abd8a8f39 4.18-rc
Regression and crashing bug fixes:
 
 - mlx4/5: Fixes for issues found from various checkers
 - A resource tracking and uverbs regression in the core code
 - qedr: NULL pointer regression found during testing
 - rxe: Various small bugs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbKr/pAAoJEDht9xV+IJsasIoP/2yyHUHjBp3vVNJ3A2qRnzAJ
 Yt4DHVo+lWfAhtEY+1rqRQx432aa+gv7e9TUA/Y9Llj0+C2nrOIsNniJvyjF7UrF
 djtAua66p5L+TxmeQPbQP+RsE8pUoczxtPWvpTP6dJ5pkp+/0IJl4P7aZNG+WlYT
 t/4pW1zBejhA9nXfHCFej4A3HM3/6oW3narmIldrNhW1EH7+5jeidyyLKueY6c1Q
 MJ8zfLQM/ZdP1hFwrzfZPMsFmGI4WD7P0F4jWVa+JvpeedV/jOTVVBLKrjHfF1JS
 7JMEeVlK/Mqsu4hCu/BJqHsh8kpFs4aTGfHUOyusZ1xsOx92X1QWCTtGEwi/ZKZh
 PvZMkbWU6Syd1IFwtMRHrKMxGQYrErwXf9V3xHxVn4bIFEAWTT8qn/T1w+tiUcJY
 gBtfqpLuIdzjZ4JtNGBRtfxOvhzqBkHdZO7sd1ARmuIf6Euzvas9AEz9qH893Oun
 rfeLOL70hoz2TrJIpnDApndo9LFEGUB+ypUpax9e99nVHVdbPh/PSdRze/2khoj3
 oJ8z8oh6KAimiW1sMkJ89fefDfUnkkOFOYrxH3nTYfkdrOHyiEtpLuE424pZwVKM
 uWqQ+yoXRuab4X58Gw2ezYq2/UIILn4hJEJ/VdTgJomb41nd0iZtKNlgw2uk8G8M
 WhOCed7yvYsp6hDi8pSq
 =Gjuy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Here are eight fairly small fixes collected over the last two weeks.

  Regression and crashing bug fixes:

   - mlx4/5: Fixes for issues found from various checkers

   - A resource tracking and uverbs regression in the core code

   - qedr: NULL pointer regression found during testing

   - rxe: Various small bugs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/rxe: Fix missing completion for mem_reg work requests
  RDMA/core: Save kernel caller name when creating CQ using ib_create_cq()
  IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write
  IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'
  RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM
  IB/mlx5: Fix return value check in flow_counters_set_data()
  IB/mlx5: Fix memory leak in mlx5_ib_create_flow
  IB/rxe: avoid double kfree skb
2018-06-21 07:22:30 +09:00
Leon Romanovsky
cfdeb8934b RDMA/mlx5: Refactor transport domain checks
Put all relevant checks for transport domain in the
mlx5_ib_alloc/dealloc_transport_domain functions.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 13:32:17 -06:00
Yishai Hadas
c59450c463 IB/mlx5: Expose DEVX tree
Expose DEVX tree to be used by upper layers.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
f6fe01b718 IB/mlx5: Add DEVX query EQN support
Return the matching device EQN for a given user vector number via the
DEVX interface.

Note:
EQs are owned by the kernel and shared by all user processes.
Basically, a user CQ can point to any EQ.
The kernel doesn't enforce any such limitation today either.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
aeae94579c IB/mlx5: Add DEVX support for memory registration
Add support to register a memory with the firmware via the DEVX
interface.

The driver translates a given user address to ib_umem then it will
register the physical addresses with the firmware and get a unique id
for this registration to be used for this virtual address.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
7c043e908a IB/mlx5: Add support for DEVX query UAR
Return a device UAR index for a given user index via the DEVX interface.

Security note:
The hardware protection mechanism works like this: Each device object that
is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
the device specification manual) upon its creation. Then upon doorbell,
hardware fetches the object context for which the doorbell was rang, and
validates that the UAR through which the DB was rang matches the UAR ID
of the object.

If no match the doorbell is silently ignored by the hardware.  Of
course, the user cannot ring a doorbell on a UAR that was not mapped to
it.

Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
mailboxes (except tagging them with UID), we expose to the user its UAR
ID, so it can embed it in these objects in the expected specification
format. So the only thing the user can do is hurt itself by creating a
QP/SQ/CQ with a UAR ID other than his, and then in this case other users
may ring a doorbell on its objects.

The consequence of that will be that another user can schedule a QP/SQ
of the buggy user for execution (just insert it to the hardware schedule
queue or arm its CQ for event generation), no further harm is expected.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
e662e14d80 IB/mlx5: Add DEVX support for modify and query commands
Add support in DEVX for modify and query commands, the required lock is
taken (i.e. READ/WRITE) by the KABI infrastructure accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
7efce3691d IB/mlx5: Add obj create and destroy functionality
Add support to create and destroy firmware objects via the DEVX
interface.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
8aa8c95ce4 IB/mlx5: Add support for DEVX general command
Add support to run general firmware command via the DEVX interface.

A command that works on some object (e.g. CQ, WQ, etc.) will be added
in next patches while maintaining the required object lock.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
a8b92ca1b0 IB/mlx5: Introduce DEVX
Introduce DEVX to enable direct device commands in downstream patches
from this series.

In that mode of work the firmware manages the isolation between
processes' resources and as such a DEVX user id is created and assigned
to the given user context upon allocation request.

A capability check is done to make sure that this feature is really
supported by the firmware prior to creating the DEVX user id.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Steve Wise
33023fb85a IB/core: add max_send_sge and max_recv_sge attributes
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 13:17:28 -06:00
Parav Pandit
47ec386662 RDMA: Convert drivers to use sgid_attr instead of sgid_index
The core code now ensures that all driver callbacks that receive an
rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.

Drivers can use this pointer instead of calling a query function with
sgid_index. This simplifies the drivers and also avoids races where a
gid_index lookup may return different data if it is changed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Parav Pandit
f4df9a7c34 RDMA: Use GID from the ib_gid_attr during the add_gid() callback
Now that ib_gid_attr contains the GID, make use of that in the add_gid()
callback functions for the provider drivers to simplify the add_gid()
implementations.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Kees Cook
778e1cdd81 treewide: kvzalloc() -> kvcalloc()
The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
patch replaces cases of:

        kvzalloc(a * b, gfp)

with:
        kvcalloc(a * b, gfp)

as well as handling cases of:

        kvzalloc(a * b * c, gfp)

with:

        kvzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kvcalloc(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kvzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kvzalloc
+ kvcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kvzalloc(sizeof(THING) * C2, ...)
|
  kvzalloc(sizeof(TYPE) * C2, ...)
|
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(C1 * C2, ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
weiyongjun (A)
e31abf76f4 IB/mlx5: Fix return value check in flow_counters_set_data()
In case of error, the function mlx5_fc_create() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().

Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:02:27 -06:00
Gustavo A. R. Silva
299eafee39 IB/mlx5: Fix memory leak in mlx5_ib_create_flow
In case memory resources for *ucmd* were allocated, release them
before return.

Addresses-Coverity-ID: 1469857 ("Resource leak")
Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:02:27 -06:00
Linus Torvalds
a1cdde8c41 4.18 Merge window pull request
This has been a quiet cycle for RDMA, the big bulk is the usual smallish
 driver updates and bug fixes. About four new uAPI related things. Not as much
 Szykaller patches this time, the bugs it finds are getting harder to fix.
 
 - More work cleaning up the RDMA CM code
 - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe
 - Driver specific resource tracking and reporting via netlink
 - Continued work for name space support from Parav
 - MPLS support for the verbs flow steering uAPI
 - A few tricky IPoIB fixes improving robustness
 - HFI1 driver support for the '16B' management packet format
 - Some auditing to not print kernel pointers via %llx or similar
 - Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it
   entirely. The user space side of this was long ago replaced with RDMA-CM and
   syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because
   nobody uses it.
 - Purge more bogus BUG_ON's from Leon
 - 'flow counters' verbs uAPI
 - T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA
   tree due to dependencies
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbGEcPAAoJEDht9xV+IJsarBMQAIsAFOizycF0kQfDtvz1yHyV
 YjkT3NA71379DsDsCOezVKqZ6RtXdQncJoqqEG1FuNKiXh/rShR3rk9XmdBwUCTq
 mIY0ySiQggdeSIJclROiBuzLE3F/KIIkY3jwM80DzT9GUEbnVuvAMt4M56X48Xo8
 RpFc13/1tY09ZLBVjInlfmCpRWyNgNccDBDywB/5hF5KCFR/BG/vkp4W0yzksKiU
 7M/rZYyxQbtwSfe/ZXp7NrtwOpkpn7vmhED59YgKRZWhqnHF9KKmV+K1FN+BKdXJ
 V1KKJ2RQINg9bbLJ7H2JPdQ9EipvgAjUJKKBoD+XWnoVJahp6X2PjX351R/h4Lo5
 TH+0XwuCZ2EdjRxhnm3YE+rU10mDY9/UUi1xkJf9vf0r25h6Fgt6sMnN0QBpqkTh
 euRZnPyiFeo1b+hCXJfKqkQ6An+F3zes5zvVf59l0yfVNLVmHdlz0lzKLf/RPk+t
 U+YZKxfmHA+mwNhMXtKx7rKVDrko+uRHjaX2rPTEvZ0PXE7lMzFMdBWYgzP6sx/b
 4c55NiJMDAGTyLCxSc7ziGgdL9Lpo/pRZJtFOHqzkDg8jd7fb07ID7bMPbSa05y0
 BU5VpC8yEOYRpOEFbkJSPtHc0Q8cMCv/q1VcMuuhKXYnfSho3TWvtOSQIjUoU/q0
 8T6TXYi2yF+f+vZBTFlV
 =Mb8m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a quiet cycle for RDMA, the big bulk is the usual
  smallish driver updates and bug fixes. About four new uAPI related
  things. Not as much Szykaller patches this time, the bugs it finds are
  getting harder to fix.

  Summary:

   - More work cleaning up the RDMA CM code

   - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
     i40iw, iw_cxgb4, mlx5, rxe

   - Driver specific resource tracking and reporting via netlink

   - Continued work for name space support from Parav

   - MPLS support for the verbs flow steering uAPI

   - A few tricky IPoIB fixes improving robustness

   - HFI1 driver support for the '16B' management packet format

   - Some auditing to not print kernel pointers via %llx or similar

   - Mark the entire 'UCM' user-space interface as BROKEN with the
     intent to remove it entirely. The user space side of this was long
     ago replaced with RDMA-CM and syzkaller is finding bugs in the
     residual UCM interface nobody wishes to fix because nobody uses it.

   - Purge more bogus BUG_ON's from Leon

   - 'flow counters' verbs uAPI

   - T10 fixups for iser/isert, these are Acked by Martin but going
     through the RDMA tree due to dependencies"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
  RDMA/mlx5: Update SPDX tags to show proper license
  RDMA/restrack: Change SPDX tag to properly reflect license
  IB/hfi1: Fix comment on default hdr entry size
  IB/hfi1: Rename exp_lock to exp_mutex
  IB/hfi1: Add bypass register defines and replace blind constants
  IB/hfi1: Remove unused variable
  IB/hfi1: Ensure VL index is within bounds
  IB/hfi1: Fix user context tail allocation for DMA_RTAIL
  IB/hns: Use zeroing memory allocator instead of allocator/memset
  infiniband: fix a possible use-after-free bug
  iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
  IB/isert: use T10-PI check mask definitions from core layer
  IB/iser: use T10-PI check mask definitions from core layer
  RDMA/core: introduce check masks for T10-PI offload
  IB/isert: fix T10-pi check mask setting
  IB/mlx5: Add counters read support
  IB/mlx5: Add flow counters read support
  IB/mlx5: Add flow counters binding support
  IB/mlx5: Add counters create and destroy support
  IB/uverbs: Add support for flow counters
  ...
2018-06-07 13:04:07 -07:00
Leon Romanovsky
c1191a19fe RDMA/mlx5: Update SPDX tags to show proper license
Mellanox code is supposed to be OpenIB compliant code,
so let's update SPDX tags to show it.

Fixes: fc385b7ac4 ("IB/mlx5: Add basic regiser/unregister representors code")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-05 14:04:20 -06:00
Jason Gunthorpe
0f45e69d62 Verbs flow counters support
This series comes to allow user space applications to monitor real time
 traffic activity and events of the verbs objects it manages, e.g.:
 ibv_qp, ibv_wq, ibv_flow.
 
 This API enables generic counters creation and define mapping
 to association with a verbs object, current mlx5 driver using
 this API for flow counters.
 
 With this API, an application can monitor the entire life cycle of
 object activity, defined here as a static counters attachment.
 This API also allows dynamic counters monitoring of measurement points
 for a partial period in the verbs object life cycle.
 
 In addition it presents the implementation of the generic counters interface.
 
 This will be achieved by extending flow creation by adding a new flow count
 specification type which allows the user to associate a previously created
 flow counters using the generic verbs counters interface to the created flow,
 once associated the user could read statistics by using the read function of
 the generic counters interface.
 
 The API includes:
 1. create and destroyed API of a new counters objects
 2. read the counters values from HW
 
 Note:
 Attaching API to allow application to define the measurement points per objects
 is a user space only API and this data is passed to kernel when the counted
 object (e.g. flow) is created with the counters object.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCWxIiqQAKCRAp8NhrnBAZ
 sWJRAPYl06nEfQjRlW//ZE/pO2oKXbfEevg7nnbpe80ERlxLAQDA2LHAcU7ma/NC
 hS5yxIq1gLSA27N+5qAoFVK8vJ5ZCg==
 =EiAV
 -----END PGP SIGNATURE-----

Merge tag 'verbs_flow_counters' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git into for-next

Pull verbs counters series from Leon Romanovsky:

====================
Verbs flow counters support

This series comes to allow user space applications to monitor real time
traffic activity and events of the verbs objects it manages, e.g.: ibv_qp,
ibv_wq, ibv_flow.

The API enables generic counters creation and define mapping to
association with a verbs object, the current mlx5 driver is using this API
for flow counters.

With this API, an application can monitor the entire life cycle of object
activity, defined here as a static counters attachment.  This API also
allows dynamic counters monitoring of measurement points for a partial
period in the verbs object life cycle.

In addition it presents the implementation of the generic counters
interface.

This will be achieved by extending flow creation by adding a new flow
count specification type which allows the user to associate a previously
created flow counters using the generic verbs counters interface to the
created flow, once associated the user could read statistics by using the
read function of the generic counters interface.

The API includes:
1. create and destroyed API of a new counters objects
2. read the counters values from HW

Note:
Attaching API to allow application to define the measurement points per
objects is a user space only API and this data is passed to kernel when
the counted object (e.g. flow) is created with the counters object.
===================

* tag 'verbs_flow_counters':
  IB/mlx5: Add counters read support
  IB/mlx5: Add flow counters read support
  IB/mlx5: Add flow counters binding support
  IB/mlx5: Add counters create and destroy support
  IB/uverbs: Add support for flow counters
  IB/core: Add support for flow counters
  IB/core: Support passing uhw for create_flow
  IB/uverbs: Add read counters support
  IB/core: Introduce counters read verb
  IB/uverbs: Add create/destroy counters support
  IB/core: Introduce counters object and its create/destroy
  IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
  net/mlx5: Export flow counter related API
  net/mlx5: Use flow counter pointer as input to the query function
2018-06-04 08:48:11 -06:00
Raed Salem
1a1e03dc15 IB/mlx5: Add counters read support
This patch implements the uverbs counters read API, it will use the
specific read counters function to the given type to accomplish its
task.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:37 +03:00
Raed Salem
5e95af5f7b IB/mlx5: Add flow counters read support
Implements the flow counters read wrapper.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:37 +03:00
Raed Salem
3b3233fbf0 IB/mlx5: Add flow counters binding support
Associates a counters with a flow when IB_FLOW_SPEC_ACTION_COUNT is part
of the flow specifications.

The counters user space placements of location and description (index,
description) pairs are passed as private data of the counters flow
specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:35:32 +03:00
Raed Salem
b29e2a1309 IB/mlx5: Add counters create and destroy support
This patch implements the device counters create and destroy APIs and
introducing some internal management structures.

Downstream patches in this series will add the functionality to support
flow counters binding and reading.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:57 +03:00
Matan Barak
59082a327d IB/core: Support passing uhw for create_flow
This is required when user-space drivers need to pass extra information
regarding how to handle this flow steering specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:55 +03:00
Leon Romanovsky
2cb4079188 RDMA/mlx5: Don't check return value of zap_vma_ptes()
There is no need to check return value of zap_vma_ptes()
because there is nothing to do with this knowledge.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01 11:16:24 -04:00
Wei Hu(Xavier)
a0976f418d RDMA/uverbs: Hoist the common process of disassociate_ucontext into ib core
This patch hoisted the common process of disassociate_ucontext
callback function into ib core code, and these code are common
to ervery ib_device driver.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-30 20:45:03 -04:00
Jason Gunthorpe
f3ca0ab114 Merge branch 'mini_cqe' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Leon Romanovsky says:

====================
Introduce new internal to mlx5 CQE format - mini-CQE. It is a CQE in
compressed form that holds data needed to extra a single full CQE.

It is a stride index, byte count and packet checksum.
====================

* mini_cqe:
  IB/mlx5: Introduce a new mini-CQE format
  IB/mlx5: Refactor CQE compression response
  net/mlx5: Exposing a new mini-CQE format

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-29 15:23:18 -06:00
Yonatan Cohen
6f1006a438 IB/mlx5: Introduce a new mini-CQE format
The new mini-CQE format includes the stride index, byte count and
packet checksum.
Stride index is needed for striding WQ feature.
This patch exposes this capability and enables its setting
via mlx5 UHW data as part of query device and cq creation.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-29 15:18:38 -06:00
Yonatan Cohen
572f46bf94 IB/mlx5: Refactor CQE compression response
Refactor CQE compression response to be fully set only
when it`s really supported. There is no change from user
perspective because anyway resp.cqe_comp_caps.max_num was
set to zero.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>W
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-29 15:18:38 -06:00
Jason Gunthorpe
0394808d9e Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Update mlx4 to support user MR creation against read-only memory, previously
it required the memory to be writable.

Based on rdma for-rc due to dependencies.

* mr_fix: (2 commits)
  IB/mlx4: Mark user MR as writable if actual virtual memory is writable
  IB/core: Make testing MR flags for writability a static inline function
2018-05-28 11:44:35 -06:00
David S. Miller
5b79c2af66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of easy overlapping changes in the confict
resolutions here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26 19:46:15 -04:00
Linus Torvalds
34b48b8789 Merge candidates for 4.17-rc
- Remove bouncing addresses from the MAINTAINERS file
 - Kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers
 - Various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers
 - Two fixes for patches already sent during the merge window
 - A long standing bug related to not decreasing the pinned pages count in the right
   MM was found and fixed
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbByPQAAoJEDht9xV+IJsa164P/AihB/vbn9MBdK3pe1OSUGTm
 tKZJ/Y6nY/Q/XTJSeM2wNECk8fOrZbKuLBz2XlPRsB2djp4ugC5WWfK9YbwWMGXG
 I5B/lB8VTorQr8E5i9lqqMDQc8aF8VcGJtdqVE3nD4JsVTrQSGiSnw45/BARDUm3
 OycJJMDOWhDj2wnNSa+JfjPemIMDM1jse7DnsJfDsGfTMS/G+6nyzjKIlEnnFZ8/
 PBxhq0q7C5viNDwwn2GsAVUrATTlW48SY0WYhkgMdSl20d2th9wMZqNMqtniz8NP
 lg87SrhzsAPOTlbSWlYYkAnzE7nEhfJyIfYUp2piNJeYuOohYPtO6w99Tqjl/GmU
 uLIYIXtZCxAK1Zb/znc49HkRVL5YFDsQGXdtYy7tvRZPwwR32kowUtpKIWaZFz8O
 BA/x+Zgqu9AlwqSWwQwxmMbUX42RRwhNJDVyTYlXQSSzhfgFaLIZARqb4K6HxeNN
 vZN0BK+x6pX6FI7hpdsqNRtH1oo4SNUBxiuUsrZ7cy7GqYNdUJ6piygDgmERaJxU
 svIUJof/+OoU1QyErQ0JgUEK/3jOHbjxSPb/rjQeqxAnCqhaGOuNGMtdfsGqgvBU
 x/u3eDcbfi/LBErXR46gYtxnOQ8I2BB+m8erUc/GVvCzWrX+R7ELZYpBrP5Pcu/6
 mr2D7hDqgZHbeU8aB8+D
 =uFZh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is pretty much just the usual array of smallish driver bugs.

   - remove bouncing addresses from the MAINTAINERS file

   - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
     hns drivers

   - various small LOC behavioral/operational bugs in mlx5, hns, qedr
     and i40iw drivers

   - two fixes for patches already sent during the merge window

   - a long-standing bug related to not decreasing the pinned pages
     count in the right MM was found and fixed"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
  RDMA/hns: Move the location for initializing tmp_len
  RDMA/hns: Bugfix for cq record db for kernel
  IB/uverbs: Fix uverbs_attr_get_obj
  RDMA/qedr: Fix doorbell bar mapping for dpi > 1
  IB/umem: Use the correct mm during ib_umem_release
  iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
  RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
  RDMA/i40iw: Avoid reference leaks when processing the AEQ
  RDMA/i40iw: Avoid panic when objects are being created and destroyed
  RDMA/hns: Fix the bug with NULL pointer
  RDMA/hns: Set NULL for __internal_mr
  RDMA/hns: Enable inner_pa_vld filed of mpt
  RDMA/hns: Set desc_dma_addr for zero when free cmq desc
  RDMA/hns: Fix the bug with rq sge
  RDMA/hns: Not support qp transition from reset to reset for hip06
  RDMA/hns: Add return operation when configured global param fail
  RDMA/hns: Update convert function of endian format
  RDMA/hns: Load the RoCE dirver automatically
  RDMA/hns: Bugfix for rq record db for kernel
  RDMA/hns: Add rq inline flags judgement
  ...
2018-05-24 14:12:05 -07:00
Jason Gunthorpe
c62091bcd9 mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
 
 From Christophe JAILLET, first three patche to use kvfree where needed.
 
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 Next six patches from Roi and Co adds support for merged
 sriov e-switch which comes to serve cases where both PFs, VFs set
 on them and both uplinks are to be used in single v-switch SW model.
 When merged e-switch is supported, the per-port e-switch is logically
 merged into one e-switch that spans both physical ports and all the VFs.
 
 This model allows to offload TC eswitch rules between VFs belonging
 to different PFs (and hence have different eswitch affinity), it also
 sets the some of the foundations needed for uplink LAG support.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
 3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
 YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
 j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
 I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
 G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
 =iHi6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next

mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.

* tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5e: Explicitly set source e-switch in offloaded TC rules
  net/mlx5: Add source e-switch owner
  net/mlx5e: Explicitly set destination e-switch in FDB rules
  net/mlx5: Add destination e-switch owner
  net/mlx5: Properly handle a vport destination when setting FTE
  net/mlx5: Add merged e-switch cap
  IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-24 09:40:43 -06:00
Erez Shitrit
7b74a83cf5 IB/mlx5: Fetch soft WQE's on fatal error state
On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.

For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.

This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.

Without the fix the next deadlock might appears:
        schedule_timeout+0x274/0x350
        wait_for_common+0xec/0x240
        mcast_remove_one+0xd0/0x120 [ib_core]
        ib_unregister_device+0x12c/0x230 [ib_core]
        mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
        mlx5_detach_device+0x184/0x1a0 [mlx5_core]
        mlx5_unload_one+0x308/0x340 [mlx5_core]
        mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 89ea94a7b6 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
Leon Romanovsky
8f06228733 RDMA/mlx5: Remove debug prints of VMA pointers
Remove various prints of VMA pointers.

Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24 09:39:25 -06:00
David S. Miller
3888ea4e2f mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
 
 From Christophe JAILLET, first three patche to use kvfree where needed.
 
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 Next six patches from Roi and Co adds support for merged
 sriov e-switch which comes to serve cases where both PFs, VFs set
 on them and both uplinks are to be used in single v-switch SW model.
 When merged e-switch is supported, the per-port e-switch is logically
 merged into one e-switch that spans both physical ports and all the VFs.
 
 This model allows to offload TC eswitch rules between VFs belonging
 to different PFs (and hence have different eswitch affinity), it also
 sets the some of the foundations needed for uplink LAG support.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
 3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
 YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
 j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
 I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
 G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
 =iHi6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 13:00:08 -04:00
Ariel Levkovich
e818e255a5 IB/mlx5: Expose MPLS related tunneling offloads
This patch reports the device's capbilities to offload
encapsulated MPLS tunnel protocols to user-space:
- Capability to offload MPLS over GRE.
- Capability to offload MPLS over UDP.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
71c6e8638c IB/mlx5: Add support for MPLS flow specification
This patch introduces support for the MPLS flow spec and
allows the creation of rules that are matching on the
MPLS label.

Applying the rule matching depends on the flow specs order and
the location of the MPLS in the spec list as there are different
configurations to be made in the device in the cases of MPLSoGRE
and MPLSoUDP vs. non-encapsulated MPLS.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Ariel Levkovich
da2f22ae77 IB/mlx5: Add support for GRE flow specification
This patch introduces support for the GRE flow spec and
allowing the creation of rules based on the protocol and
key fields that are part of GRE protocol header.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Christophe JAILLET
909d434406 IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:50:19 -07:00
Doug Ledford
f5e27a203f Merge branch 'k.o/for-rc' into k.o/wip/dl-for-next
Several items of conflict have arisen between the RDMA stack's for-rc
branch and upcoming for-next work:

9fd4350ba8 ("IB/rxe: avoid double kfree_skb") directly conflicts with
2e47350789 ("IB/rxe: optimize the function duplicate_request")

Patches already submitted by Intel for the hfi1 driver will fail to
apply cleanly without this merge

Other people on the mailing list have notified that their upcoming
patches also fail to apply cleanly without this merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:48:48 -04:00
Idan Burstein
064e526247 IB/mlx5: posting klm/mtt list inline in the send queue for reg_wr
As most kernel RDMA ULPs, (e.g. NVMe over Fabrics in its default
"register_always=Y" mode) registers and invalidates user buffer
upon each IO.

Today the mlx5 driver is posting the registration work
request using scatter/gather entry for the MTT/KLM list.
The fetch of the MTT/KLM list becomes the bottleneck in
number of IO operation could be done by NVMe over Fabrics
host driver on a single adapter as shown below.

This patch is adding the support for inline registration
work request upon MTT/KLM list of size <=64B.

The result for NVMe over Fabrics is increase of > x3.5 for small
IOs as shown below, I expect other ULPs (e.g iSER, SRP, NFS over RDMA)
performance to be enhanced as well.

The following results were taken against a single NVMe-oF (RoCE link layer)
subsystem with a single namespace backed by null_blk using fio benchmark
(with rw=randread, numjobs=48, iodepth={16,64}, ioengine=libaio direct=1):

ConnectX-5 (pci Width x16)
---------------------------

Block Size       s/g reg_wr            inline reg_wr
++++++++++     +++++++++++++++        ++++++++++++++++
512B            1302.8K/34.82%         4951.9K/99.02%
1KB             1284.3K/33.86%         4232.7K/98.09%
2KB             1238.6K/34.1%          2797.5K/80.04%
4KB             1169.3K/32.46%         1941.3K/61.35%
8KB             1013.4K/30.08%         1236.6K/39.47%
16KB            695.7K/20.19%          696.9K/20.59%
32KB            350.3K/9.64%           350.6K/10.3%
64KB            175.86K/5.27%          175.9K/5.28%

ConnectX-4 (pci Width x8)
---------------------------

Block Size       s/g reg_wr            inline reg_wr
++++++++++     +++++++++++++++        ++++++++++++++++
512B            1285.8K/42.66%          4242.7K/98.18%
1KB             1254.1K/41.74%          3569.2K/96.00%
2KB             1185.9K/39.83%          2173.9K/75.58%
4KB             1069.4K/36.46%          1343.3K/47.47%
8KB             755.1K/27.77%           748.7K/29.14%

Tested-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Idan Burstein <idanb@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 12:08:21 -04:00
Daria Velikovsky
37da2a03c0 RDMA/mlx5: Use proper spec flow label type
Flow label is defined as u32 in the in ipv6 flow spec, but
used internally in the flow specs parsing as u8. That was
causing loss of part of flow_label value.

Fixes: 2d1e697e9b ('IB/mlx5: Add support to match inner packet fields')
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Daria Velikovsky <daria@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Yishai Hadas
18b0362e87 RDMA/mlx5: Don't assume that medium blueFlame register exists
User can leave system without medium BlueFlames registers,
however the code assumed that at least one such register exists.

This patch fixes that assumption.

Fixes: c1be5232d2 ("IB/mlx5: Fix micro UAR allocator")
Reported-by: Rohit Zambre <rzambre@uci.edu>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Linus Torvalds
eb4f959b26 First pull request for 4.17-rc
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
 - SPDX license tag cleanups (new tag Linux-OpenIB)
 - RoCE GID fixes related to default GIDs
 - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
   mlx4, mlx5, and hfi1 (medium batch)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJa7JXPAAoJELgmozMOVy/dc0AP/0i7EajAmgl1ihka6BYVj2pa
 DV8iSrVMDPulh9AVnAtwLJSbdwmgN/HeVzLzcutHyMYk6tAf8RCs6TsyoB36XiOL
 oUh5+V2GyNnyh9veWPwyGTgZKCpPJc3uQFV6502lZVDYwArMfGatumApBgQVKiJ+
 YdPEXEQZPNIs6YZB1WXkNYV/ra9u0aBByQvUrxwVZ2AND+srJYO82tqZit2wBtjK
 UXrhmZbWXGWMFg8K3/lpfUkQhkG3Arj+tMKkCfqsVzC7wUPhlTKBHR9NmvdLIiy9
 5Vhv7Xp78udcxZKtUeTFsbhaMqqK7x7sKHnpKAs7hOZNZ/Eg47BrMwMrZVLOFuDF
 nBLUL1H+nJ1mASZoMWH5xzOpVew+e9X0cot09pVDBIvsOIh97wCG7hgptQ2Z5xig
 fcDiMmg6tuakMsaiD0dzC9JI5HR6Z7+6oR1tBkQFDxQ+XkkcoFabdmkJaIRRwOj7
 CUhXRgcm0UgVd03Jdta6CtYXsjSODirWg4AvSSMt9lUFpjYf9WZP00/YojcBbBEH
 UlVrPbsKGyncgrm3FUP6kXmScESfdTljTPDLiY9cO9+bhhPGo1OHf005EfAp178B
 jGp6hbKlt+rNs9cdXrPSPhjds+QF8HyfSlwyYVWKw8VWlh/5DG8uyGYjF05hYO0q
 xhjIS6/EZjcTAh5e4LzR
 =PI8v
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "This is our first pull request of the rc cycle. It's not that it's
  been overly quiet, we were just waiting on a few things before sending
  this off.

  For instance, the 6 patch series from Intel for the hfi1 driver had
  actually been pulled in on Tuesday for a Wednesday pull request, only
  to have Jason notice something I missed, so we held off for some
  testing, and then on Thursday had to respin the series because the
  very first patch needed a minor fix (unnecessary cast is all).

  There is a sizable hns patch series in here, as well as a reasonably
  largish hfi1 patch series, then all of the lines of uapi updates are
  just the change to the new official Linux-OpenIB SPDX tag (a bunch of
  our files had what amounts to a BSD-2-Clause + MIT Warranty statement
  as their license as a result of the initial code submission years ago,
  and the SPDX folks decided it was unique enough to warrant a unique
  tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
  and core/cache/cma updates to round out the bunch.

  None of it was overly large by itself, but in the 2 1/2 weeks we've
  been collecting patches, it has added up :-/.

  As best I can tell, it's been through 0day (I got a notice about my
  last for-next push, but not for my for-rc push, but Jason seems to
  think that failure messages are prioritized and success messages not
  so much). It's also been through linux-next. And yes, we did notice in
  the context portion of the CMA query gid fix patch that there is a
  dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
  and remove it anywhere we can.

  Summary:

   - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)

   - SPDX license tag cleanups (new tag Linux-OpenIB)

   - RoCE GID fixes related to default GIDs

   - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
     mlx4, mlx5, and hfi1 (medium batch)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
  RDMA/cma: Do not query GID during QP state transition to RTR
  IB/mlx4: Fix integer overflow when calculating optimal MTT size
  IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
  IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
  IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
  IB/hfi1: Fix loss of BECN with AHG
  IB/hfi1 Use correct type for num_user_context
  IB/hfi1: Fix handling of FECN marked multicast packet
  IB/core: Make ib_mad_client_id atomic
  iw_cxgb4: Atomically flush per QP HW CQEs
  IB/uverbs: Fix kernel crash during MR deregistration flow
  IB/uverbs: Prevent reregistration of DM_MR to regular MR
  RDMA/mlx4: Add missed RSS hash inner header flag
  RDMA/hns: Fix a couple misspellings
  RDMA/hns: Submit bad wr
  RDMA/hns: Update assignment method for owner field of send wqe
  RDMA/hns: Adjust the order of cleanup hem table
  RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
  RDMA/hns: Remove some unnecessary attr_mask judgement
  RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
  ...
2018-05-04 20:51:10 -10:00
Leon Romanovsky
444261ca6f RDMA/mlx5: Properly check return value of mlx5_get_uars_page
Starting from commit 72f36be061 ("net/mlx5: Fix mlx5_get_uars_page to
return error code") the mlx5_get_uars_page() call returns error in case
of failure, but it was mistakenly overlooked in the merge commit.

Fixes: e7996a9a77 ("Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git")
Reported-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Parav Pandit
84a6a7a99c IB/mlx5: Fix represent correct netdevice in dual port RoCE
In commit bcf87f1dbb ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode")
incorrectly mapped primary device's netdevice to 2nd port netdevice.
It always represented primary port's netdevice for 2nd port netdevice
when ib representors were not used.

This results into failing to process CM request arriving on 2nd port due
to incorrect mapping of netdevice.

This fix corrects it by considering the right mdev.

Cc: <stable@vger.kernel.org> # 4.16
Fixes: bcf87f1dbb ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Danit Goldberg
4f32ac2e45 IB/mlx5: Use unlimited rate when static rate is not supported
Before the change, if the user passed a static rate value different
than zero and the FW doesn't support static rate,
it would end up configuring rate of 2.5 GBps.

Fix this by using rate 0; unlimited, in cases where FW
doesn't support static rate configuration.

Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Leon Romanovsky
002bf2282b RDMA/mlx5: Protect from shift operand overflow
Ensure that user didn't supply values too large that can cause overflow.

UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:263:23
shift exponent -2147483648 is negative
CPU: 0 PID: 292 Comm: syzkaller612609 Not tainted 4.16.0-rc1+ #131
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call
Trace:
dump_stack+0xde/0x164
ubsan_epilogue+0xe/0x81
set_rq_size+0x7c2/0xa90
create_qp_common+0xc18/0x43c0
mlx5_ib_create_qp+0x379/0x1ca0
create_qp.isra.5+0xc94/0x2260
ib_uverbs_create_qp+0x21b/0x2a0
ib_uverbs_write+0xc2c/0x1010
vfs_write+0x1b0/0x550
SyS_write+0xc7/0x1a0
do_syscall_64+0x1aa/0x740
entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x433569
RSP: 002b:00007ffc6e62f448 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433569
RDX: 0000000000000070 RSI: 00000000200042c0 RDI: 0000000000000003
RBP: 00000000006d5018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040c9f0 R14: 000000000040ca80 R15: 0000000000000006

Cc: <stable@vger.kernel.org> # 3.10
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Leon Romanovsky
b4bd701ac4 RDMA/mlx5: Fix multiple NULL-ptr deref errors in rereg_mr flow
Failure in rereg MR releases UMEM but leaves the MR to be destroyed
by the user. As a result the following scenario may happen:
"create MR -> rereg MR with failure -> call to rereg MR again" and
hit "NULL-ptr deref or user memory access" errors.

Ensure that rereg MR is only performed on a non-dead MR.

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.5
Fixes: 395a8e4c32 ("IB/mlx5: Refactoring register MR code")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 11:03:15 -04:00
Israel Rukshin
6082d9c9c9 net/mlx5: Fix mlx5_get_vector_affinity function
Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
and the fact that the internal completion vectors that mlx5 allocates
don't get an EQ index.

The second problem here is that using effective_affinity_mask gives the same
CPU for different vectors.
This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
This doesn't happen when using affinity_hint mask.

Fixes: 2572cf57d7 ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
Fixes: 05e0cc84e0 ("net/mlx5: Fix get vector affinity helper function")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-04-26 12:43:19 -07:00
Randy Dunlap
b3fe6c62bc infiniband: mlx5: fix build errors when INFINIBAND_USER_ACCESS=m
Fix build errors when INFINIBAND_USER_ACCESS=m and MLX5_INFINIBAND=y.
The build error occurs when the mlx5 driver code attempts to use
USER_ACCESS interfaces, which are built as a loadable module.

Fixes these build errors:

drivers/infiniband/hw/mlx5/main.o: In function `populate_specs_root':
../drivers/infiniband/hw/mlx5/main.c:4982: undefined reference to `uverbs_default_get_objects'
../drivers/infiniband/hw/mlx5/main.c:4994: undefined reference to `uverbs_alloc_spec_tree'
drivers/infiniband/hw/mlx5/main.o: In function `depopulate_specs_root':
../drivers/infiniband/hw/mlx5/main.c:5001: undefined reference to `uverbs_free_spec_tree'

Build-tested with multiple config combinations.

Fixes: 8c84660bb4 ("IB/mlx5: Initialize the parsing tree root without the help of uverbs")
Cc: stable@vger.kernel.org # reported against 4.16
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:26:43 -06:00
Zhu Yanjun
25d0e2db3d IB/mlx5: remove duplicate header file
The header file fs_helpers.h is included twice. So it should be removed.

Fixes: 802c212568 ("IB/mlx5: Add IPsec support for egress and ingress")
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-16 16:23:02 -06:00
Linus Torvalds
19fd08b85b Merge candidates for 4.17 merge window
- Fix RDMA uapi headers to actually compile in userspace and be more
   complete
 
 - Three shared with netdev pull requests from Mellanox:
 
    * 7 patches, mostly to net with 1 IB related one at the back). This
      series addresses an IRQ performance issue (patch 1), cleanups related to
      the fix for the IRQ performance problem (patches 2-6), and then extends
      the fragmented completion queue support that already exists in the net
      side of the driver to the ib side of the driver (patch 7).
 
    * Mostly IB, with 5 patches to net that are needed to support the remaining
      10 patches to the IB subsystem. This series extends the current
      'representor' framework when the mlx5 driver is in switchdev mode from
      being a netdev only construct to being a netdev/IB dev construct. The IB
      dev is limited to raw Eth queue pairs only, but by having an IB dev of
      this type attached to the representor for a switchdev port, it enables
      DPDK to work on the switchdev device.
 
    * All net related, but needed as infrastructure for the rdma driver
 
 - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
 
 - SRP performance updates
 
 - IB uverbs write path cleanup patch series from Leon
 
 - Add RDMA_CM support to ib_srpt. This is disabled by default.  Users need to
   set the port for ib_srpt to listen on in configfs in order for it to be
   enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
 
 - TSO and Scatter FCS support in mlx4
 
 - Refactor of modify_qp routine to resolve problems seen while working on new
   code that is forthcoming
 
 - More refactoring and updates of RDMA CM for containers support from Parav
 
 - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory'
   user API features
 
 - Infrastructure updates for the new IOCTL interface, based on increased usage
 
 - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit
   kernel as was originally intended. See the commit messages for
   extensive details
 
 - Syzkaller bugs and code cleanups motivated by them
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJax5Z0AAoJEDht9xV+IJsacCwQAJBIgmLCvVp5fBu2kJcXMMVI
 y3l2YNzAUJvDDKv1r5yTC9ugBXEkDtgzi/W/C2/5es2yUG/QeT/zzQ3YPrtsnN68
 5FkiXQ35Tt7+PBHMr0cacGRmF4M3Td3MeW0X5aJaBKhqlNKwA+aF18pjGWBmpVYx
 URYCwLb5BZBKVh4+1Leebsk4i0/7jSauAqE5M+9notuAUfBCoY1/Eve3DipEIBBp
 EyrEnMDIdujYRsg4KHlxFKKJ1EFGItknLQbNL1+SEa0Oe0SnEl5Bd53Yxfz7ekNP
 oOWQe5csTcs3Yr4Ob0TC+69CzI71zKbz6qPDILTwXmsPFZJ9ipJs4S8D6F7ra8tb
 D5aT1EdRzh/vAORPC9T3DQ3VsHdvhwpUMG7knnKrVT9X/g7E+gSji1BqaQaTr/xs
 i40GepHT7lM/TWEuee/6LRpqdhuOhud7vfaRFwn2JGRX9suqTcvwhkBkPUDGV5XX
 5RkHcWOb/7KvmpG7S1gaRGK5kO208LgmAZi7REaJFoZB74FqSneMR6NHIH07ha41
 Zou7rnxV68CT2bgu27m+72EsprgmBkVDeEzXgKxVI/+PZ1oadUFpgcZ3pRLOPWVx
 rEqjHu65rlA/YPog4iXQaMfSwt/oRD3cVJS/n8EdJKXi4Qt2RDDGdyOmt74w4prM
 QuLEdvJIFmwrND1KDoqn
 =Ku8g
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Doug and I are at a conference next week so if another PR is sent I
  expect it to only be bug fixes. Parav noted yesterday that there are
  some fringe case behavior changes in his work that he would like to
  fix, and I see that Intel has a number of rc looking patches for HFI1
  they posted yesterday.

  Parav is again the biggest contributor by patch count with his ongoing
  work to enable container support in the RDMA stack, followed by Leon
  doing syzkaller inspired cleanups, though most of the actual fixing
  went to RC.

  There is one uncomfortable series here fixing the user ABI to actually
  work as intended in 32 bit mode. There are lots of notes in the commit
  messages, but the basic summary is we don't think there is an actual
  32 bit kernel user of drivers/infiniband for several good reasons.

  However we are seeing people want to use a 32 bit user space with 64
  bit kernel, which didn't completely work today. So in fixing it we
  required a 32 bit rxe user to upgrade their userspace. rxe users are
  still already quite rare and we think a 32 bit one is non-existing.

   - Fix RDMA uapi headers to actually compile in userspace and be more
     complete

   - Three shared with netdev pull requests from Mellanox:

      * 7 patches, mostly to net with 1 IB related one at the back).
        This series addresses an IRQ performance issue (patch 1),
        cleanups related to the fix for the IRQ performance problem
        (patches 2-6), and then extends the fragmented completion queue
        support that already exists in the net side of the driver to the
        ib side of the driver (patch 7).

      * Mostly IB, with 5 patches to net that are needed to support the
        remaining 10 patches to the IB subsystem. This series extends
        the current 'representor' framework when the mlx5 driver is in
        switchdev mode from being a netdev only construct to being a
        netdev/IB dev construct. The IB dev is limited to raw Eth queue
        pairs only, but by having an IB dev of this type attached to the
        representor for a switchdev port, it enables DPDK to work on the
        switchdev device.

      * All net related, but needed as infrastructure for the rdma
        driver

   - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers

   - SRP performance updates

   - IB uverbs write path cleanup patch series from Leon

   - Add RDMA_CM support to ib_srpt. This is disabled by default. Users
     need to set the port for ib_srpt to listen on in configfs in order
     for it to be enabled
     (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)

   - TSO and Scatter FCS support in mlx4

   - Refactor of modify_qp routine to resolve problems seen while
     working on new code that is forthcoming

   - More refactoring and updates of RDMA CM for containers support from
     Parav

   - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
     memory' user API features

   - Infrastructure updates for the new IOCTL interface, based on
     increased usage

   - ABI compatibility bug fixes to fully support 32 bit userspace on 64
     bit kernel as was originally intended. See the commit messages for
     extensive details

   - Syzkaller bugs and code cleanups motivated by them"

* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
  IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
  IB/mlx5: Device memory mr registration support
  net/mlx5: Mkey creation command adjustments
  IB/mlx5: Device memory support in mlx5_ib
  net/mlx5: Query device memory capabilities
  IB/uverbs: Add device memory registration ioctl support
  IB/uverbs: Add alloc/free dm uverbs ioctl support
  IB/uverbs: Add device memory capabilities reporting
  IB/uverbs: Expose device memory capabilities to user
  RDMA/qedr: Fix wmb usage in qedr
  IB/rxe: Removed GID add/del dummy routines
  RDMA/qedr: Zero stack memory before copying to user space
  IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
  IB/mlx5: Add information for querying IPsec capabilities
  IB/mlx5: Add IPsec support for egress and ingress
  {net,IB}/mlx5: Add ipsec helper
  IB/mlx5: Add modify_flow_action_esp verb
  IB/mlx5: Add implementation for create and destroy action_xfrm
  IB/uverbs: Introduce ESP steering match filter
  IB/uverbs: Add modify ESP flow_action
  ...
2018-04-06 17:35:43 -07:00
Ariel Levkovich
6c29f57ea4 IB/mlx5: Device memory mr registration support
Adding mlx5_ib driver implementation for reg_dm_mr callback
which allows registering device memory (DM) as an MR for
local and remote access.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
cdbd0d2bae net/mlx5: Mkey creation command adjustments
This change updates the mlx5 interface to create mkey
on the device.

The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.

All the places that use the old access mode format are adjusted as
well.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
24da00164f IB/mlx5: Device memory support in mlx5_ib
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.

The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.

The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Matan Barak
2d93fc8569 IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
When a Raw Ethernet QP is created, we actually create a few objects.
One of these objects is a TIR. Currently, a TIR could hash (and spread
the traffic) by IP or port only. Adding a hashing by IPSec SPI to TIR
creation with the required UAPI bit.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Matan Barak
c03faa562d IB/mlx5: Add information for querying IPsec capabilities
Users should be able to query for IPSec support. Adding a few
capabilities bits as part of the driver specific part in
alloc_ucontext:
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_REQ_METADATA
	Payload's header is returned with metadata representing the
	IPSec decryption state.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_RX
	Support ESP_AES_GCM in ingress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_TX
	Support ESP_AES_GCM in egress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_SPI_RSS_ONLY
	Hardware doesn't support matching SPI in flow steering rules
	but just hashing and spreading the traffic accordingly.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Aviad Yehezkel
802c212568 IB/mlx5: Add IPsec support for egress and ingress
This commit introduces support for the esp_aes_gcm flow
specification for the Innova device. To that end we add
support for egress steering and some validations that an
IPsec rule is indeed valid.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Matan Barak
349705c193 IB/mlx5: Add modify_flow_action_esp verb
Adding implementation in mlx5 driver to modify action_xfrm object. This
merely call the accel layer. Currently a user can modify only the
ESN parameters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Aviad Yehezkel
c6475a0bca IB/mlx5: Add implementation for create and destroy action_xfrm
Adding implementation in mlx5 driver to create and destroy action_xfrm
object. This merely call the accel layer.

A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states
that [s]he expects a metadata header to be added to the payload. This
header represents information regarding the transformation's state.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
8c84660bb4 IB/mlx5: Initialize the parsing tree root without the help of uverbs
In order to have a custom parsing tree, a provider driver needs to
assign its parsing tree to ib_device specs_tree field. Otherwise, the
uverbs client assigns a common default parsing tree for it.
In downstream patches, the mlx5_ib driver gains a custom parsing tree,
which contains both the common objects and a new flags field for the
UVERBS_FLOW_ACTION_ESP_CREATE command.
This patch makes mlx5_ib assign its own tree to specs_root, which
later on will be extended.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:23 -06:00
Parav Pandit
414448d249 RDMA: Use ib_gid_attr during GID modification
Now that ib_gid_attr contains device, port and index, simplify the
provider APIs add_gid() and del_gid() to use device, port and index
fields from the ib_gid_attr attributes structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:34:16 -06:00
Parav Pandit
3e44e0ee08 IB/providers: Avoid null netdev check for RoCE
Now that IB core GID cache ensures that all RoCE entries have an
associated netdev remove null checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Jason Gunthorpe
41d902cb7c RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp
This structure is pushed down the ex and the non-ex path, so it needs to be
aligned to 8 bytes to go through ex without implicit padding.

Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so
take the approach of just copying the minimum length.

New user space will consistently provide 8 bytes in both cases.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:38:40 -06:00
David S. Miller
c0b458a946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
   MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params->log_rq_size is renamed to be
   params->log_rq_mtu_frames.

3) In 'net-next' params->hard_mtu is added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01 19:49:34 -04:00
Eran Ben Elisha
1acae6b030 mlx5: Move dump error CQE function out of mlx5_ib for code sharing
Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in
order to use it in a downstream patch from mlx5e.

In addition, use print_hex_dump instead of manual dumping of the buffer.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27 17:17:28 -07:00
Eran Ben Elisha
2816077127 mlx5_{ib,core}: Add query SQ state helper function
Move query SQ state function from mlx5_ib to mlx5_core in order to
have it in shared code.

It will be used in a downstream patch from mlx5e.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27 17:17:28 -07:00
Majd Dibbiny
c8d75a980f IB/mlx5: Respect new UMR capabilities
In some firmware configuration, UMR usage from Virtual Functions is restricted.
This information is published to the driver using new capability bits.

Avoid using UMRs in these cases and use the Firmware slow-path flow to create
mkeys and populate them with Virtual to Physical address translation.

Older drivers that do not have this patch, will end up using memory keys that
aren't populated with Virtual to Physical address translation that is done
part of the UMR work.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
Majd Dibbiny
ea8af0d2f2 IB/mlx5: Enable ECN capable bits for UD RoCE v2 QPs
When working with RC QPs, the FW sets the ECN capable bits for all
the RoCE v2 packets. On the other hand, for UD QPs, the driver needs
to set the the ECN capable bits in the Address Handler since the HW
generates each packet according to the Address Handler and not
the QP context.

If ECN is not enabled in NIC or switch, these bits are ignored.

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
David S. Miller
03fe2debbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fun set of conflict resolutions here...

For the mac80211 stuff, these were fortunately just parallel
adds.  Trivially resolved.

In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.

In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.

The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.

The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:

====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch.  This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
            drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
            (IB/mlx5: Fix cleanup order on unload) added to for-rc and
            commit b5ca15ad7e (IB/mlx5: Add proper representors support)
            add as part of the devel cycle both needed to modify the
            init/de-init functions used by mlx5.  To support the new
            representors, the new functions added by the cleanup patch
            needed to be made non-static, and the init/de-init list
            added by the representors patch needed to be modified to
            match the init/de-init list changes made by the cleanup
            patch.
    Updates:
            drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
            prototypes added by representors patch to reflect new function
            names as changed by cleanup patch
            drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
            stage list to match new order from cleanup patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 11:31:58 -04:00
Mark Bloch
32927e281c IB/mlx5: Don't clean uninitialized UMR resources
In case we failed to create UMR resources, mark them as invalid so we
won't try to destroy them on the unwind path.

Add the relevant checks to destroy_umrc_res(), this is done for the
unlikely event ib_register_device() or create_umr_res() err out and we
try to destroy invalid objects.

Fixes: 42cea83f95 ("IB/mlx5: Fix cleanup order on unload")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:22:22 -06:00
Matan Barak
0ede73bc01 IB/uverbs: Extend uverbs_ioctl header with driver_id
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Bodong Wang
61147f391a IB/mlx5: Packet packing enhancement for RAW QP
Enable RAW QP to be able to configure burst control by modify_qp. By
using burst control with rate limiting, user can achieve best
performance and accuracy. The burst control information is passed by
user through udata.

This patch also reports burst control capability for mlx5 related
hardwares, burst control is only marked as supported when both
packet_pacing_burst_bound and packet_pacing_typical_size are
supported.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:55:13 -06:00
Honggang Li
7672ed33c4 IB/mlx5: Set the default active rate and width to QDR and 4X
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set the default active_width
and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.

When the RoCE port is down, the RoCE port does not negotiate the active
width with the remote side, causing the active width to be zero. When
running userspace ibstat to view the port status, ibstat will panic as it
reads an invalid width from sys file.

This patch restores the original behavior.

Fixes: f1b65df5a2 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:39:47 -06:00
Leon Romanovsky
eeea6953c4 RDMA/mlx5: Simplify clean and destroy MR calls
The failure to destroy the MRs is printed on mlx5_core layer
as error and it makes warning prints useless.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky
c985bd0ed7 RDMA/mlx5: Guard ODP specific assignments with specific CONFIG
"live" is needed for ODP only and is better to be guarded
by appropriate CONFIG.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky
4638a3b242 RDMA/mlx5: Unify error flows in rereg MR failure paths
According to the IBTA spec 1.3, the driver failure in
MR reregister shall release old and new MRs.

 C11-20: If the CI returns any other error, the CI shall
 invalidate both "old" and "new" registrations, and release
 any associated resources.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky
ea30f01376 RDMA/mlx5: Return proper value for not-supported command
Return -EOPNOTSUPP value to the user for unsupported reg_user_mr.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky
4289861d88 RDMA/mlx5: Protect from NULL pointer derefence
The mlx5_ib_alloc_implicit_mr() can fail to acquire pages
and the returned mr pointer won't be valid. Ensure that it
is not error prior to access.

Cc: <stable@vger.kernel.org> # 4.10
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Doug Ledford
2d873449a2 Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-next
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch.  This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.

Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
	(IB/mlx5: Fix cleanup order on unload) added to for-rc and
	commit b5ca15ad7e (IB/mlx5: Add proper representors support)
	add as part of the devel cycle both needed to modify the
	init/de-init functions used by mlx5.  To support the new
	representors, the new functions added by the cleanup patch
	needed to be made non-static, and the init/de-init list
	added by the representors patch needed to be modified to
	match the init/de-init list changes made by the cleanup
	patch.
Updates:
	drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
	prototypes added by representors patch to reflect new function
	names as changed by cleanup patch
	drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
	stage list to match new order from cleanup patch

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 19:28:58 -04:00
Mark Bloch
42cea83f95 IB/mlx5: Fix cleanup order on unload
On load we create private CQ/QP/PD in order to be used by UMR, we create
those resources after we register ourself as an IB device, and we destroy
them after we unregister as an IB device. This was changed by commit
16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove
stages") which moved the destruction before we unregistration. This
allowed to trigger an invalid memory access when unloading mlx5_ib while
there are open resources:

BUG: unable to handle kernel paging request at 00000001002c012c
...
Call Trace:
 mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
 __slab_free+0x9a/0x2d0
 delay_time_func+0x10/0x10 [mlx5_ib]
 unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
 mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
 clean_mr+0xc9/0x190 [mlx5_ib]
 dereg_mr+0xba/0xf0 [mlx5_ib]
 ib_dereg_mr+0x13/0x20 [ib_core]
 remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
 uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
 ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
 ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
 ib_unregister_device+0xd4/0x190 [ib_core]
 __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
 mlx5_remove_device+0xf5/0x120 [mlx5_core]
 mlx5_unregister_interface+0x37/0x90 [mlx5_core]
 mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
 SyS_delete_module+0x153/0x230
 do_syscall_64+0x62/0x110
 entry_SYSCALL_64_after_hwframe+0x21/0x86
...

We restore the original behavior by breaking the UMR stage into two parts,
pre and post IB registration stages, this way we can restore the original
functionality and maintain clean separation of logic between stages.

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:44:02 -04:00
Ilya Lesokhin
c44ef998f2 IB/mlx5: Maintain a single emergency page
The mlx5 driver needs to be able to issue invalidation to ODP MRs
even if it cannot allocate memory. To this end it preallocates
emergency pages to use when the situation arises.

This flow should be extremely rare enough, that we don't need
to worry about contention and therefore a single emergency page
is good enough.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:05:16 -04:00
Daniel Jurgens
65edd0e758 IB/mlx5: Only synchronize RCU once when removing mkeys
Instead synchronizing RCU in a loop when removing mkeys in a batch do it
once at the end before freeing them. The result is only waiting for one
RCU grace period instead of many serially.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:05:16 -04:00
Leon Romanovsky
f3f134f526 RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
The failure in rereg_mr flow caused to set garbage value (error value)
into mr->umem pointer. This pointer is accessed at the release stage
and it causes to the following crash.

There is not enough to simply change umem to point to NULL, because the
MR struct is needed to be accessed during MR deregistration phase, so
delay kfree too.

[    6.237617] BUG: unable to handle kernel NULL pointer dereference a 0000000000000228
[    6.238756] IP: ib_dereg_mr+0xd/0x30
[    6.239264] PGD 80000000167eb067 P4D 80000000167eb067 PUD 167f9067 PMD 0
[    6.240320] Oops: 0000 [#1] SMP PTI
[    6.240782] CPU: 0 PID: 367 Comm: dereg Not tainted 4.16.0-rc1-00029-gc198fafe0453 #183
[    6.242120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    6.244504] RIP: 0010:ib_dereg_mr+0xd/0x30
[    6.245253] RSP: 0018:ffffaf5d001d7d68 EFLAGS: 00010246
[    6.246100] RAX: 0000000000000000 RBX: ffff95d4172daf00 RCX: 0000000000000000
[    6.247414] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff95d41a317600
[    6.248591] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[    6.249810] R10: ffff95d417033c10 R11: 0000000000000000 R12: ffff95d4172c3a80
[    6.251121] R13: ffff95d4172c3720 R14: ffff95d4172c3a98 R15: 00000000ffffffff
[    6.252437] FS:  0000000000000000(0000) GS:ffff95d41fc00000(0000) knlGS:0000000000000000
[    6.253887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.254814] CR2: 0000000000000228 CR3: 00000000172b4000 CR4: 00000000000006b0
[    6.255943] Call Trace:
[    6.256368]  remove_commit_idr_uobject+0x1b/0x80
[    6.257118]  uverbs_cleanup_ucontext+0xe4/0x190
[    6.257855]  ib_uverbs_cleanup_ucontext.constprop.14+0x19/0x40
[    6.258857]  ib_uverbs_close+0x2a/0x100
[    6.259494]  __fput+0xca/0x1c0
[    6.259938]  task_work_run+0x84/0xa0
[    6.260519]  do_exit+0x312/0xb40
[    6.261023]  ? __do_page_fault+0x24d/0x490
[    6.261707]  do_group_exit+0x3a/0xa0
[    6.262267]  SyS_exit_group+0x10/0x10
[    6.262802]  do_syscall_64+0x75/0x180
[    6.263391]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[    6.264253] RIP: 0033:0x7f1b39c49488
[    6.264827] RSP: 002b:00007ffe2de05b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[    6.266049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1b39c49488
[    6.267187] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[    6.268377] RBP: 00007f1b39f258e0 R08: 00000000000000e7 R09: ffffffffffffff98
[    6.269640] R10: 00007f1b3a147260 R11: 0000000000000246 R12: 00007f1b39f258e0
[    6.270783] R13: 00007f1b39f2ac20 R14: 0000000000000000 R15: 0000000000000000
[    6.271943] Code: 74 07 31 d2 e9 25 d8 6c 00 b8 da ff ff ff c3 0f 1f
44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 07 53 48 8b
5f 08 <48> 8b 80 28 02 00 00 e8 f7 d7 6c 00 85 c0 75 04 3e ff 4b 18 5b
[    6.274927] RIP: ib_dereg_mr+0xd/0x30 RSP: ffffaf5d001d7d68
[    6.275760] CR2: 0000000000000228
[    6.276200] ---[ end trace a35641f1c474bd20 ]---

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:37:53 -04:00
Leon Romanovsky
75a4598209 RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs
mlx5 modify_qp() relies on FW that the error will be thrown if wrong
state is supplied. The missing check in FW causes the following crash
while using XRC_TGT QPs.

[   14.769632] BUG: unable to handle kernel NULL pointer dereference at (null)
[   14.771085] IP: mlx5_ib_modify_qp+0xf60/0x13f0
[   14.771894] PGD 800000001472e067 P4D 800000001472e067 PUD 14529067 PMD 0
[   14.773126] Oops: 0002 [#1] SMP PTI
[   14.773763] CPU: 0 PID: 365 Comm: ubsan Not tainted 4.16.0-rc1-00038-g8151138c0793 #119
[   14.775192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   14.777522] RIP: 0010:mlx5_ib_modify_qp+0xf60/0x13f0
[   14.778417] RSP: 0018:ffffbf48001c7bd8 EFLAGS: 00010246
[   14.779346] RAX: 0000000000000000 RBX: ffff9a8f9447d400 RCX: 0000000000000000
[   14.780643] RDX: 0000000000000000 RSI: 000000000000000a RDI: 0000000000000000
[   14.781930] RBP: 0000000000000000 R08: 00000000000217b0 R09: ffffffffbc9c1504
[   14.783214] R10: fffff4a180519480 R11: ffff9a8f94523600 R12: ffff9a8f9493e240
[   14.784507] R13: ffff9a8f9447d738 R14: 000000000000050a R15: 0000000000000000
[   14.785800] FS:  00007f545b466700(0000) GS:ffff9a8f9fc00000(0000) knlGS:0000000000000000
[   14.787073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.787792] CR2: 0000000000000000 CR3: 00000000144be000 CR4: 00000000000006b0
[   14.788689] Call Trace:
[   14.789007]  _ib_modify_qp+0x71/0x120
[   14.789475]  modify_qp.isra.20+0x207/0x2f0
[   14.790010]  ib_uverbs_modify_qp+0x90/0xe0
[   14.790532]  ib_uverbs_write+0x1d2/0x3c0
[   14.791049]  ? __handle_mm_fault+0x93c/0xe40
[   14.791644]  __vfs_write+0x36/0x180
[   14.792096]  ? handle_mm_fault+0xc1/0x210
[   14.792601]  vfs_write+0xad/0x1e0
[   14.793018]  SyS_write+0x52/0xc0
[   14.793422]  do_syscall_64+0x75/0x180
[   14.793888]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   14.794527] RIP: 0033:0x7f545ad76099
[   14.794975] RSP: 002b:00007ffd78787468 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
[   14.795958] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f545ad76099
[   14.797075] RDX: 0000000000000078 RSI: 0000000020009000 RDI: 0000000000000003
[   14.798140] RBP: 00007ffd78787470 R08: 00007ffd78787480 R09: 00007ffd78787480
[   14.799207] R10: 00007ffd78787480 R11: 0000000000000287 R12: 00005599ada98760
[   14.800277] R13: 00007ffd78787560 R14: 0000000000000000 R15: 0000000000000000
[   14.801341] Code: 4c 8b 1c 24 48 8b 83 70 02 00 00 48 c7 83 cc 02 00
00 00 00 00 00 48 c7 83 24 03 00 00 00 00 00 00 c7 83 2c 03 00 00 00 00
00 00 <c7> 00 00 00 00 00 48 8b 83 70 02 00 00 c7 40 04 00 00 00 00 4c
[   14.804012] RIP: mlx5_ib_modify_qp+0xf60/0x13f0 RSP: ffffbf48001c7bd8
[   14.804838] CR2: 0000000000000000
[   14.805288] ---[ end trace 3f1da0df5c8b7c37 ]---

Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:34:25 -04:00
Boris Pismenny
c2b37f7648 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:31:21 -04:00
Boris Pismenny
2c292dbb39 IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
Add a check for the length of the qpin structure to prevent out-of-bounds reads

BUG: KASAN: slab-out-of-bounds in create_raw_packet_qp+0x114c/0x15e2
Read of size 8192 at addr ffff880066b99290 by task syz-executor3/549

CPU: 3 PID: 549 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #27 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0x8d/0xd4
 print_address_description+0x73/0x290
 kasan_report+0x25c/0x370
 ? create_raw_packet_qp+0x114c/0x15e2
 memcpy+0x1f/0x50
 create_raw_packet_qp+0x114c/0x15e2
 ? create_raw_packet_qp_tis.isra.28+0x13d/0x13d
 ? lock_acquire+0x370/0x370
 create_qp_common+0x2245/0x3b50
 ? destroy_qp_user.isra.47+0x100/0x100
 ? kasan_kmalloc+0x13d/0x170
 ? sched_clock_cpu+0x18/0x180
 ? fs_reclaim_acquire.part.15+0x5/0x30
 ? __lock_acquire+0xa11/0x1da0
 ? sched_clock_cpu+0x18/0x180
 ? kmem_cache_alloc_trace+0x17e/0x310
 ? mlx5_ib_create_qp+0x30e/0x17b0
 mlx5_ib_create_qp+0x33d/0x17b0
 ? sched_clock_cpu+0x18/0x180
 ? create_qp_common+0x3b50/0x3b50
 ? lock_acquire+0x370/0x370
 ? __radix_tree_lookup+0x180/0x220
 ? uverbs_try_lock_object+0x68/0xc0
 ? rdma_lookup_get_uobject+0x114/0x240
 create_qp.isra.5+0xce4/0x1e20
 ? ib_uverbs_ex_create_cq_cb+0xa0/0xa0
 ? copy_ah_attr_from_uverbs.isra.2+0xa00/0xa00
 ? ib_uverbs_cq_event_handler+0x160/0x160
 ? __might_fault+0x17c/0x1c0
 ib_uverbs_create_qp+0x21b/0x2a0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ib_uverbs_write+0x55a/0xad0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ? ib_uverbs_open+0x760/0x760
 ? futex_wake+0x147/0x410
 ? check_prev_add+0x1680/0x1680
 ? do_futex+0x3d3/0xa60
 ? sched_clock_cpu+0x18/0x180
 __vfs_write+0xf7/0x5c0
 ? ib_uverbs_open+0x760/0x760
 ? kernel_read+0x110/0x110
 ? lock_acquire+0x370/0x370
 ? __fget+0x264/0x3b0
 vfs_write+0x18a/0x460
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x4477b9
RSP: 002b:00007f1822cadc18 EFLAGS: 00000292 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004477b9
RDX: 0000000000000070 RSI: 000000002000a000 RDI: 0000000000000005
RBP: 0000000000708000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 00000000ffffffff
R13: 0000000000005d70 R14: 00000000006e6e30 R15: 0000000020010ff0

Allocated by task 549:
 __kmalloc+0x15e/0x340
 kvmalloc_node+0xa1/0xd0
 create_user_qp.isra.46+0xd42/0x1610
 create_qp_common+0x2e63/0x3b50
 mlx5_ib_create_qp+0x33d/0x17b0
 create_qp.isra.5+0xce4/0x1e20
 ib_uverbs_create_qp+0x21b/0x2a0
 ib_uverbs_write+0x55a/0xad0
 __vfs_write+0xf7/0x5c0
 vfs_write+0x18a/0x460
 SyS_write+0xc7/0x1a0
 entry_SYSCALL_64_fastpath+0x18/0x85

Freed by task 368:
 kfree+0xeb/0x2f0
 kernfs_fop_release+0x140/0x180
 __fput+0x266/0x700
 task_work_run+0x104/0x180
 exit_to_usermode_loop+0xf7/0x110
 syscall_return_slowpath+0x298/0x370
 entry_SYSCALL_64_fastpath+0x83/0x85

The buggy address belongs to the object at ffff880066b99180  which
belongs to the cache kmalloc-512 of size 512 The buggy address is
located 272 bytes inside of  512-byte region [ffff880066b99180,
ffff880066b99380) The buggy address belongs to the page:
page:000000006040eedd count:1 mapcount:0 mapping:          (null)
index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000180190019
raw: ffffea00019a7500 0000000b0000000b ffff88006c403080 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880066b99180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880066b99200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880066b99280: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff880066b99300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880066b99380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:30:21 -04:00
Leon Romanovsky
28e9091e31 RDMA/mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

=======================================================================
UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53
signed integer overflow:
64870 * 65536 cannot be represented in type 'int'
CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ubsan_epilogue+0xe/0x81
 handle_overflow+0x1f3/0x251
 ? __ubsan_handle_negate_overflow+0x19b/0x19b
 ? lock_acquire+0x440/0x440
 mlx5_ib_resize_cq+0x17e7/0x1e40
 ? cyc2ns_read_end+0x10/0x10
 ? native_read_msr_safe+0x6c/0x9b
 ? cyc2ns_read_end+0x10/0x10
 ? mlx5_ib_modify_cq+0x220/0x220
 ? sched_clock_cpu+0x18/0x200
 ? lookup_get_idr_uobject+0x200/0x200
 ? rdma_lookup_get_uobject+0x145/0x2f0
 ib_uverbs_resize_cq+0x207/0x3e0
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ib_uverbs_write+0x7f9/0xef0
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ? uverbs_devnode+0x110/0x110
 ? sched_clock_cpu+0x18/0x200
 ? do_raw_spin_trylock+0x100/0x100
 ? __lru_cache_add+0x16e/0x290
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? sched_clock_cpu+0x18/0x200
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x433549
RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217
=======================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 3.13
Fixes: bde51583f4 ("IB/mlx5: Add support for resize CQ")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-09 18:10:48 -05:00
Doug Ledford
212a0cbc56 Revert "RDMA/mlx5: Fix integer overflow while resizing CQ"
The original commit of this patch has a munged log message that is
missing several of the tags the original author intended to be on the
patch.  This was due to patchworks misinterpreting a cut-n-paste
separator line as an end of message line and munging the mbox that was
used to import the patch:

https://patchwork.kernel.org/patch/10264089/

The original patch will be reapplied with a fixed commit message so the
proper tags are applied.

This reverts commit aa0de36a40.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-09 18:07:46 -05:00
Doug Ledford
1abb791fcd Merge tag 'mlx5-updates-2018-02-28-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next
mlx5-updates-2018-02-28-1 (IPSec-1)

This series consists of some fixes and refactors for the mlx5 drivers,
especially around the FPGA and flow steering. Most of them are trivial
fixes and are the foundation of allowing IPSec acceleration from user-space.

We use flow steering abstraction in order to accelerate IPSec packets.
When a user creates a steering rule, [s]he states that we'll carry an
encrypt/decrypt flow action (using a specific configuration) for every
packet which conforms to a certain match. Since currently offloading these
packets is done via FPGA, we'll add another set of flow steering ops.
These ops will execute the required FPGA commands and then call the
standard steering ops.

In order to achieve this, we need that the commands will get all the
required information. Therefore, we pass the fte object and embed the
flow_action struct inside the fte. In addition, we add the shim layer
that will later be used for alternating between the standard and the
FPGA steering commands.

Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
are very relevant for user-space applications, as these applications could
be killed, but we still want to wait for the FPGA and update the kernel's
database.

Regards,
Aviad and Matan

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:39 -07:00
Leon Romanovsky
aa0de36a40 RDMA/mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:23:43 -05:00
Boris Pismenny
3346c48737 {net,IB}/mlx5: Add flow steering helpers
Add helper functions that check if a protocol is
part of a flow steering match criteria.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:14 -08:00
Matan Barak
a9db0ecf15 {net,IB}/mlx5: Add has_tag to mlx5_flow_act
The has_tag member will indicate whether a tag action was specified
in flow specification.

A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag.  HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.

So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:33 -08:00
Boris Pismenny
075572d4b7 IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments
Group and pass all function arguments of parse_flow_attr call in one
common struct mlx5_flow_act.

This patch passes all the action arguments of parse_flow_attr in one common
struct mlx5_flow_act. It allows us to scale the number of actions without adding
new arguments to the function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 22:06:11 -08:00
Aviad Yehezkel
c33251a3c6 IB/mlx5: Removed not used parameters
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 22:05:36 -08:00
Dan Carpenter
5d414b178e IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
"err" is either zero or possibly uninitialized here.  It should be
-EINVAL.

Fixes: 427c1e7bcd ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Mark Bloch
210b1f7807 IB/mlx5: When not in dual port RoCE mode, use provided port as native
The series that introduced dual port RoCE mode assumed that we don't have
a dual port HCA that use the mlx5 driver, this is not the case for
Connect-IB HCAs. This reasoning led to assigning 1 as the native port
index which causes issue when the second port is used.

For example query_pkey() when called on the second port will return values
of the first port. Make sure that we assign the right port index as the
native port index.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Leon Romanovsky
55de9a77da RDMA/mlx5: Refactor QP type check to be as early as possible
Perform QP type check in one place and fail as early as possible.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:16:36 -07:00
Moni Shoua
65389322b2 IB/mlx: Set slid to zero in Ethernet completion struct
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16()  validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.

Fixes: 62ede77799 ("Add OPA extended LID support")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Daniel Jurgens
aba4621346 {net, IB}/mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Noa Osherovich
e7b169f344 IB/mlx5: Avoid passing an invalid QP type to firmware
During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.

Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.

Fixes: 09a7d9eca1 ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Sergey Gorenko
da343b6d90 IB/mlx5: Fix incorrect size of klms in the memory region
The value of mr->ndescs greater than mr->max_descs is set in the
function mlx5_ib_sg_to_klms() if sg_nents is greater than
mr->max_descs. This is an invalid value and it causes the
following error when registering mr:

mlx5_0:dump_cqe:276:(pid 193): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 0f 00 78 06 25 00 00 8b 08 1e 8f d3

Cc: <stable@vger.kernel.org> # 4.5
Fixes: b005d31647 ("mlx5: Add arbitrary sg list support")
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Mark Bloch
ec9c2fb8ce IB/mlx5: Disable self loopback check when in switchdev mode
When in switchdev mode, there is no need to do self loopback checks
as we can't receive those packets, we insert steering rules to the
eswitch that make sure packets can't be looped back.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
b5ca15ad7e IB/mlx5: Add proper representors support
This commit adds full support for IB representor:

1) Representors profile, We add two new profiles:
   nic_rep_profile - This profile will be used to create an IB device that
   represents the PF/UPLINK.
   rep_profile - This profile will be used to create an IB device that
   represents VFs. Each VF will be its own representor.
2) Proper load/unload callbacks, Those are called by the E-Switch when
   moving to/from switchdev mode.
3) Different flow DB handling for when we in switchdev mode.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
b96c9dde17 IB/mlx5: E-Switch, Add rule to forward traffic to vport
In order to forward traffic from representor's SQ to the right virtual
function, every time an SQ is created also add the corresponding flow rule
to the FDB.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
72afcf8247 IB/mlx5: Don't expose MR cache in switchdev mode
When enabling many VFs and switching to switchdev mode, the total amount
of mkeys we try to allocate when loading representors is very large and
may cause timeouts on allocations, the same issues was observed on VFs
and we employ the same fix that was done for them. We avoid allocating
the full MR cache on load but still allow it to be manipulated once the
IB device is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
8e6efa3a31 IB/mlx5: When in switchdev mode, expose only raw packet capabilities
Currently in switchdev mode we allow only for raw packet QPs.
Expose the right capabilities and set the gid table length to 0, also
make sure we don't try to enable RoCE, so split the function
to enable RoCE so representors can enable only the notifier needed for
net device events.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
bcf87f1dbb IB/mlx5: Listen to netdev register/unresiter events in switchdev mode
Currently we listen to netdev register/unregister event based on PCI
device. When in switchdev mode PF and representors share the same PCI
device, so in order to pair ib device and netdev in switchdev mode
compare the netdev that triggered the event to that of the representor.

Expose a function that lets you receive the netdev associated what
a given representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
018a94eec0 IB/mlx5: Add match on vport when in switchdev mode
When we point to a representor, it means we are in switchdev mode.
The flow db is shared between PF and virtual function representors
so each rule created needs to have a match on its specific source port.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
9a4ca38d77 IB/mlx5: Allocate flow DB only on PF IB device
A flow DB is a shared resource between PF and representors,
need to allocate it only when creating the PF IB device.
Once we add IB representors, they will use the flow db which was
created by the PF.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
fc385b7ac4 IB/mlx5: Add basic regiser/unregister representors code
Create the basic infrastructure of registering and unregistering
IB representors. The load/unload callbacks are left empty and
proper implementation will be introduced in following patches.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Yonatan Cohen
388ca8be00 IB/mlx5: Implement fragmented completion queue (CQ)
The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15 00:30:03 -08:00
Jason Gunthorpe
e7996a9a77 Linux 4.15
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
 CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
 oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
 xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
 hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
 oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
 =sOml
 -----END PGP SIGNATURE-----

Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

To resolve conflicts in:
 drivers/infiniband/hw/mlx5/main.c
 drivers/infiniband/hw/mlx5/qp.c

From patches merged into the -rc cycle. The conflict resolution matches
what linux-next has been carrying.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-30 09:30:00 -07:00
Jason Gunthorpe
beb801ac51 RDMA: Move enum ib_cq_creation_flags to uapi headers
The flags field the enum is used with comes directly from the uapi
so it belongs in the uapi headers for clarity and so userspace can
use it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 12:58:34 -07:00
Leon Romanovsky
b081808a66 RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
Failure in XRCD FW deallocation command leaves memory leaked and
returns error to the user which he can't do anything about it.

This patch changes behavior to always free memory and always return
success to the user.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Leon Romanovsky
10bea9c873 RDMA/mlx5: Remove redundant allocation warning print
The kmalloc() failure to allocate memory generates enough information
and doesn't need to be accompanied by another driver print.

Fixes: d69a24e036 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-19 13:05:39 -07:00
Feras Daoud
5c99eaecb1 IB/mlx5: Mmap the HCA's clock info to user-space
This patch maps the new page to user space applications to
allow converting a user space completion timestamp to system wall
time at the lowest possible latency cost.
By using a versioning scheme we allow compatibility between current
and future userspace libraries.
The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the
abi header file mlx5-abi.h.

Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:21 -05:00
Linus Torvalds
8cbab92dff Fifth pull request for 4.15-rc
- Oops fix in hfi1 driver
 - Use-after-free issue in iser-target
 - Use of user supplied array index without proper checking
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaXnN2AAoJELgmozMOVy/dyYAP/3YBi6ofVHrIjKovrsA6fh+y
 O1+YuG++2glgKF1QuZkZnrKqrJUIueHhGuKce2rRgWWMioAzOfd1mbXbwha3xKWh
 4mH/BK/fv1k/TZy+eu6w4WMnjtExVinPp4qeSwvvfHh3BpJfnBOtPbYNdPVt94MV
 QYPZuLarvSIJe8MFDbv6yitAIQWnPQLfGMWPW8OZb0ut7bsMNPyZ8LvcAXhfXA4a
 tVuxAfCqUb90BHWOnCBcaWGNpp9Ov2w6yyNCDkiIRXvgJxhjvJwxpHze6TMa4fCo
 mv8e4QDdQ8KK2vQaINQAJTl68Y1/fPxtI8um5tpIhJ5YjQE3rrVzTaQJRstBcXXs
 ViiDmxm/4KTUTeFtWHRj7oVJTZNSOBu+FfBVmjqtcUl4k+t9tDfj2Lw/ZVXT/3KQ
 +dsqUV+1PrQspCeFlKwg+e1mgXa3miyOYvMPacAHzRYBLv5oU6MPpb+WHbZMlTMt
 duT50JfjseIB0s6F2aGLNRRXPWb9jVUwHwzOXaDGKwX1+5u6O2IKeUP7+15TgZsS
 IFeLPodNN8W1N1z0TGcGL8zRXtY9GYhlYwrdMUe/4EWfxrrRtawLLCGGVhWKk/Rv
 iPl/Qpxj/8f2DS0A5+N5iHTAZPoOXjNx6JtZf9AZhvI9blvwZVuGdr63mKZMUnpb
 a0xeypv1bnYbv7k70zS8
 =MHNK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "We had a few more items creep up over the last week. Given we are in
  -rc8, these are obviously limited to bugs that have a big downside and
  for which we are certain of the fix.

  The first is a straight up oops bug that all you have to do is read
  the code to see it's a guaranteed 100% oops bug.

  The second is a use-after-free issue. We get away lucky if the queue
  we are shutting down is empty, but if it isn't, we can end up oopsing.
  We really need to drain the queue before destroying it.

  The final one is an issue with bad user input causing us to access our
  port array out of bounds. While fixing the array out of bounds issue,
  it was noticed that the original code did the same thing twice (the
  call to rdma_ah_set_port_num()), so its removal is not balanced by a
  readd elsewhere, it was already where it needed to be in addition to
  where it didn't need to be.

  Summary:

   - Oops fix in hfi1 driver

   - use-after-free issue in iser-target

   - use of user supplied array index without proper checking"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx5: Fix out-of-bound access while querying AH
  IB/hfi1: Prevent a NULL dereference
  iser-target: Fix possible use-after-free in connection establishment error
2018-01-16 16:47:40 -08:00
Leon Romanovsky
ae59c3f0b6 RDMA/mlx5: Fix out-of-bound access while querying AH
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.

==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409

CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xe9/0x18f
 print_address_description+0xa2/0x350
 kasan_report+0x3a5/0x400
 to_rdma_ah_attr+0xa8/0x3b0
 mlx5_ib_query_qp+0xd35/0x1330
 ib_query_qp+0x8a/0xb0
 ib_uverbs_query_qp+0x237/0x7f0
 ib_uverbs_write+0x617/0xd80
 __vfs_write+0xf7/0x500
 vfs_write+0x149/0x310
 SyS_write+0xca/0x190
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560

Allocated by task 1:
 __kmalloc+0x3f9/0x430
 alloc_mad_private+0x25/0x50
 ib_mad_post_receive_mads+0x204/0xa60
 ib_mad_init_device+0xa59/0x1020
 ib_register_device+0x83a/0xbc0
 mlx5_ib_add+0x50e/0x5c0
 mlx5_add_device+0x142/0x410
 mlx5_register_interface+0x18f/0x210
 mlx5_ib_init+0x56/0x63
 do_one_initcall+0x15b/0x270
 kernel_init_freeable+0x2d8/0x3d0
 kernel_init+0x14/0x190
 ret_from_fork+0x24/0x30

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff880019ae2000
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
 512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                          ^
 ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Cc: <stable@vger.kernel.org>
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 14:19:55 -07:00
Eran Ben Elisha
72f36be061 net/mlx5: Fix mlx5_get_uars_page to return error code
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Eran Ben Elisha
8978cc921f {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036 ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb4 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 00:52:42 +02:00
Colin Ian King
da005f9fab IB/mlx5: remove redundant assignment of mdev
The initial assignment to mdev is redundant as mdev is re-assigned
later and the first assigned value is never read. Remove this
redundant assignment.

Cleans up clang warning:
drivers/infiniband/hw/mlx5/main.c:359:24: warning: Value stored
to 'mdev' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-10 16:52:52 -05:00
Daniel Jurgens
85c7c01499 IB/mlx5: Don't advertise RAW QP support in dual port mode
When operating in dual port RoCE mode FW doesn't support steering for
raw QPs on the slave port. They still work on the master port, but
the user has no way of knowing which port is the master. The
capability is reported per device, not per port.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:24 -07:00
Daniel Jurgens
212f2a87b7 IB/mlx5: Route MADs for dual port RoCE
Route performance query MADs to the correct mlx5_core_dev when using
dual port RoCE mode.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Daniel Jurgens
cfe4e37fdc {net, IB}/mlx5: Change set_roce_gid to take a port number
When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Daniel Jurgens
aac4492ef2 IB/mlx5: Update counter implementation for dual port RoCE
Update the counter interface for multiple ports. Some counter sets
always comes from the primary device.

Port specific counters should be accessed per mlx5_core_dev not always
through the IB master mdev.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Parav Pandit
a9e546e73a IB/mlx5: Change debugfs to have per port contents
When there are multiple ports for single IB(RoCE) device, support
debugfs entries to be available for each port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
b3cbd6f080 IB/mlx5: Implement dual port functionality in query routines
Port operations must be routed to their native mlx5_core_dev. A
multiport RoCE device registers itself as having 2 ports even before a
2nd port is affiliated. If an unaffilated port is queried use capability
information from the master port, these values are the same.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
d69a24e036 IB/mlx5: Move IB event processing onto a workqueue
Because mlx5_ib_event can be called from atomic context move event
handling onto a workqueue. A mutex lock is required to get the IB device
for slave ports, so move event processing onto a work queue. When an IB
event is received, check if the mlx5_core_dev  is a slave port, if so
attempt to get the IB device it's affiliated with. If found process the
event for that device, otherwise return.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
32f69e4be2 {net, IB}/mlx5: Manage port association for multiport RoCE
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
7fd8aefb7c IB/mlx5: Make netdev notifications multiport capable
When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:21 -07:00
Daniel Jurgens
508562d6f7 IB/mlx5: Reduce the use of num_port capability
Remove use of the num_ports general capability throughout. The number of
ports will be variable in the future, and reported in a different way.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:21 -07:00
Moni Shoua
776a3906b6 IB/mlx5: Add support for DC target QP
A DC Target (DCT) QP is represented in the hardware as a unique object.
This object is created by CREATE_DCT command and destroyed by DESTROY_DCT
command. However, in the driver we describe it as a QP.

The hardware command that creates a DCT needs parameters that the verb
create_qp() does not provide. Those remaining parameters are provided
with the call to the verb modify_qp(). Therefore we delay the actual
creation of a DCT in the hardware until the stage of modify_qp() to RTR.

A support for query_qp() was added as well. It uses QUERY_DCT command to
retrieve the applicable fields.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:51 -07:00
Moni Shoua
c32a4f296e IB/mlx5: Add support for DC Initiator QP
DC Initiator (DCI) QP is represented like any other QP in the hardware.
However, like any other transport QP there are attributes and settings
that are special to DCI QP and needs specific attention and care.
Make necessary changes to configure DCI QP.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Moni Shoua
b4aaa1f0b4 IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP
The QP type IB_QPT_DRIVER doesn't describe the transport or the service
that the QP provides but those are known only to the hardware driver.
The actual type of the QP is stored in the hardware driver context (i.e.
mlx5_qp) under the field qp_sub_type.

Take the real QP type and any extra data that is required to create the QP
from the driver channel and modify the QP initial attributes before continuing
with create_qp().

Downstream patches from this series will add support for both DCI and
DCT driver QPs.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Mark Bloch
3cc297db97 IB/mlx5: Move locks initialization to the corresponding stage
Unconditional locks/list and ODP srcu initialization should be done in
the INIT stage. Remove those from the CAPS stage and move them to the
proper stage.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
c8b8992446 IB/mlx5: Move loopback initialization to the corresponding stage
The loopback stage only initializes a lock, move it to be in
the CAPS initialization phase and get rid loopback step completely.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
5e1e761251 IB/mlx5: Move hardware counters initialization to the corresponding stage
Now that we have a stage just for hardware counters, move all relevant
initialization logic into one place.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
07321b3c67 IB/mlx5: Move ODP initialization to the corresponding stage
Now that we have a stage just for ODP, move all relevant
initialization logic into one place.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
c11a226a1e IB/mlx5: Move RoCE/ETH initialization to the corresponding stage
Now that we have a stage just for RoCE/ETH, move all relevant
initialization logic into one place.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch
16c1975f10 IB/mlx5: Create profile infrastructure to add and remove stages
Today we have single function which is used when we add an IB interface,
break this function into multiple functions.

Create stages and a generic mechanism to execute each stage.
This is in preparation for RDMA/IB representors which might not need
all stages or will do things differently in some of the stages.

This patch doesn't change any functionality.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:58 -07:00
Yishai Hadas
1ee47ab3e8 IB/mlx5: Enable QP creation with a given blue flame index
This patch enables QP creation with a given BF index, this allows the
user space driver to share same BF between few QPs or alternatively have
a dedicated BF per QP.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:37:46 -07:00
Yishai Hadas
4ed131d0bb IB/mlx5: Expose dynamic mmap allocation
This patch exposes the option to dynamic allocates a UAR, this
functionality will be used in downstream patch in this series as
part of QP creation.

Specifically, the user space driver asks for a UAR allocation in a given
page index, upon success this UAR and its bfregs can be used as part of
QP creation by the user space driver.

To enable allocating more than 256 UARs the page index is encoded in an
extra one byte just after the command byte.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:37:46 -07:00
Yishai Hadas
31a78a5a79 IB/mlx5: Extend UAR stuff to support dynamic allocation
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.

Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.

In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.

Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:33:14 -07:00
Maor Gottlieb
4e2b53a5cb IB/mlx5: Report inner RSS capability
Add missing inner RSS support capability as part of
the RSS supported fields.

In addition change MLX5_RX_HASH_INNER to 1UL << 31 in
order to define it as unsigned.

Fixes: 309fa3470f ("IB/mlx5: Add support for RSS on the inner packet")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:33:14 -07:00
Jason Gunthorpe
76a895d9e1 Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.

These are small clean ups for the vmw_pvrdma and i40iw drivers.

* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
  RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
  RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
  RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
  RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
  RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
  i40iw: Change accelerated flag to bool
2017-12-27 21:50:46 -07:00
Nitzan Carmi
45e6ae7ef2 IB/mlx5: Fix mlx5_ib_alloc_mr error flow
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.

To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).

Fixes: 8a187ee52b ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:41 -07:00
Majd Dibbiny
ad9a3668a4 IB/mlx5: Serialize access to the VMA list
User-space applications can do mmap and munmap directly at
any time.

Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.

Cc: <stable@vger.kernel.org> # v4.7
Fixes: 7c2344c3bb ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:40 -07:00
Pravin Shedge
72c7fe90ee drivers: infiniband: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 09:39:35 -07:00
Majd Dibbiny
71a0ff65a2 IB/mlx5: Fix congestion counters in LAG mode
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-21 16:06:07 -07:00
Arnd Bergmann
1b19b95169 IB/mlx5: revisit -Wmaybe-uninitialized warning
A warning that I thought I had fixed before occasionally comes
back in rare randconfig builds (I found 7 instances in the last
100000 builds, originally it was much more frequent):

drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_ib_reg_user_mr':
drivers/infiniband/hw/mlx5/mr.c:1229:5: error: 'order' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  if (order <= mr_cache_max_order(dev)) {
     ^
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'ncont' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'page_shift' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1260:2: error: 'npages' may be used uninitialized in this function [-Werror=maybe-uninitialized]

I've looked at all those findings again and noticed that they are all
with CONFIG_INFINIBAND_USER_MEM=n, which means ib_umem_get() returns
an error unconditionally and we never initialize or use those variables.
This triggers a condition in gcc iff mr_umem_get() is partially but not
entirely inlined, which in turn depends on the exact combination of
optimization settings. This is a known problem with gcc, with no
easy solution in the compiler, so this adds another workaround that
should be more reliable than my previous attempt.

Returning an error from mlx5_ib_reg_user_mr() earlier means that we
can completely bypass the logic that caused the warning, the compiler
can now see that the variable is never accessed.

Fixes: 14ab8896f5 ("IB/mlx5: avoid bogus -Wmaybe-uninitialized warning")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-11 16:19:44 -07:00
Kees Cook
e99e88a9d2 treewide: setup_timer() -> timer_setup()
This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
 _E->_timer@_stl.function = _callback;
|
 _E->_timer@_stl.function = &_callback;
|
 _E->_timer@_stl.function = (_cast_func)_callback;
|
 _E->_timer@_stl.function = (_cast_func)&_callback;
|
 _E._timer@_stl.function = _callback;
|
 _E._timer@_stl.function = &_callback;
|
 _E._timer@_stl.function = (_cast_func)_callback;
|
 _E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:07 -08:00
Yonatan Cohen
87ab3f524e IB/mlx5: Add CQ moderation capability to query_device
query_device can now obtain the maximum values for
cq_max_count and cq_period, needed for cq moderation.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:59:22 -05:00
Yonatan Cohen
b0e9df6da2 IB/mlx5: Exposing modify CQ callback to uverbs layer
Exposed mlx5_ib_modify_cq to be called from ib device
verb list.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:59:22 -05:00
Majd Dibbiny
2b621851ac IB/mlx5: Fix RoCE Address Path fields
When working over a RoCE network, the UDP source port should be set only
for statically connected QPs (RC, UC and XRC).

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 13:53:22 -05:00
Majd Dibbiny
31fde034a8 IB/mlx5: Assign send CQ and recv CQ of UMR QP
The UMR's QP is created by calling mlx5_ib_create_qp directly, and
therefore the send CQ and the recv CQ on the ibqp weren't assigned.

Assign them right after calling the mlx5_ib_create_qp to assure
that any access to those pointers will work as expected and won't
crash the system as might happen as part of reset flow.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 13:53:22 -05:00
Noa Osherovich
b1383aa641 IB/mlx5: Add PCI write end padding support
Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.

During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.

PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:50:27 -05:00
Maor Gottlieb
309fa3470f IB/mlx5: Add support for RSS on the inner packet
Some user space application would like to do RSS on the inner
packet fields instead on the outer.
When MLX5_RX_HASH_INNER is set with one or more of the other
hash fields, then the RSS will be done using the inner packet.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:19:32 -04:00
Maor Gottlieb
f95ef6cbae IB/mlx5: Add tunneling offloads support
The device can support receive Stateless Offloads for the inner
packet's fields only when the packet is processed by TIR which is
enabled to support tunneling. Otherwise, the device treats the
packet as an ordinary non-tunneling packet and receive offloads
can be done only for the outer packet's field.
In order to enable receive Stateless Offloading support for incoming
tunneling traffic the TIR should be created with tunneled_offload_en.
Tunneling offloads is supported only be raw ethernet QP.

This patch includes:
* New QP creation flag for tunneling offloads.
* Reports device capabilities.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:19:31 -04:00
Guy Levi
7a0c8f4244 IB/mlx5: Support padded 128B CQE feature
In some benchmarks and some CPU architectures, writing the CQE on a full
cache line size improves performance by saving memory access operations
(read-modify-write) relative to partial cache line change. This patch
lets the user to configure the device to pad the CQE up to 128B in case
its content is less than 128B. Currently the driver supports only padding
for a CQE size of 128B.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:17:06 -04:00
Guy Levi
de57f2ad06 IB/mlx5: Support 128B CQE compression feature
In commit 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing") the
concept of CQE compression was introduced and added a support for 64B
CQE size. This change update the code to support 128B CQE size as well.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:17:06 -04:00
Noa Osherovich
ccc8708790 IB/mlx5: Allow creation of a multi-packet RQ
Allow creation of a multi-packet receive queue.

In order to create a multi-packet RQ, the following fields in
the mlx5_ib_rwq should be set:
- log_num_strides: Log of number of strides per WQE
- single_stride_log_num_of_bytes: Log of a single stride size
- two_byte_shift_en: When enabled, hardware pads 2 bytes of zeros
  before writing the message to memory (e.g. for the IP alignment).

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:03:44 -04:00
Noa Osherovich
b4f34597a5 IB/mlx5: Expose multi-packet RQ capabilities
This patch reports the device's striding RQ capabilities to
the user-space:
- min/max_single_stride_log_num_of_bytes: Log of min/max number of
  bytes in a single stride.
- min/max_single_wqe_log_num_of_strides: Log of min/max number of
  strides in a single WQE.
- supported_qpts: A bit mask to know which QP types support multi-
  packet RQ, for now only Raw Packet QPs.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:03:44 -04:00
Doug Ledford
754137a769 Merge branch 'for-next-early' into for-next
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base.  I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:07:13 -04:00
Jérémy Lefaure
e980b44134 IB/mlx5: Use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 12:16:14 -04:00
Parav Pandit
c0348eb069 IB: Let ib_core resolve destination mac address
Since IB/core resolves the destination mac address for user and kernel
consumers, avoid resolving in multiple provider drivers.

Only ib_core resolves DMAC now, therefore resolve_eth_dmac is removed as
exported symbol.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 12:10:36 -04:00
Bart Van Assche
3d1f236dcc IB/mlx5: Remove a set-but-not-used variable
References: commit 5fe9dec0d0 ("IB/mlx5: Use blue flame register allocator in mlx5_ib")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:06 -04:00
Bart Van Assche
f6b1ee349d IB/mlx5: Suppress gcc 7 fall-through complaints
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:06 -04:00
Parav Pandit
e19cd282eb IB/mlx5: Fix label order in error path handling
When UAR get_page fails, it needs to continue to cleanup debugfs for
congestion control parameters. Labels for error path were incorrectly
ordered.

This patch fixes to do correct cleanup on debugfs init failure and uar
get page failure.

Fixes: 4a2da0b8c0 ("IB/mlx5: Add debug control parameters for congestion control")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 14:59:15 -04:00
Arvind Yadav
d23a8bafa9 IB/mlx5:: pr_err() and mlx5_ib_dbg() strings should end with newlines
pr_err() and mlx5_ib_dbg( messages should terminated with a new-line to
avoid other messages being concatenated.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 09:17:52 -04:00
Ilya Lesokhin
fbcd49838d IB/mlx5: Fix NULL deference on mlx5_ib_update_xlt failure
mlx5_ib_reg_user_mr called mlx5_ib_dereg_mr in case of MR population
failure. This resulted in a NULL dereference as ibmr->device wasn't
initialized yet.

We address this by adding an internal dereg_mr function that can handle
partially initialized MRs, and fixing clean_mr to work on partially
initialized MRs.

Fixes: ff740aefec ("IB/mlx5: Decouple MR allocation and population flows")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:24 -04:00
Ilya Lesokhin
d67bc5d4e3 IB/mlx5: Simplify mlx5_ib_cont_pages
The patch simplifies mlx5_ib_cont_pages and fixes the following
issues in the original implementation:

First issues is related to alignment of the PFNs. After the check
base + p != PFN, the alignment of the PFN wasn't checked. So the PFN
sequence 0, 1, 1, 2 would result in a page_shift of 13 even though
the 3rd PFN is not 8KB aligned.

This wasn't actually a bug because it was supported by all the
existing mlx5 compatible device, but we don't want to require
this support in all future devices.

Another issue is because the inner loop didn't advance PFN so
the test "if (base + p != pfn)" always failed for SGE with
len > (1<<page_shift).

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:24 -04:00
Leon Romanovsky
78b1beb099 IB/core: Fix typo in the name of the tag-matching cap struct
The tag matching functionality is implemented by mlx5 driver
by extending XRQ, however this internal kernel information was
exposed to user space applications with *xrq* name instead of *tm*.

This patch renames *xrq* to *tm* to handle that.

Fixes: 8d50505ada ("IB/uverbs: Expose XRQ capabilities")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Sudip Mukherjee
cbafad87e1 IB/mlx5: fix debugfs cleanup
If delay_drop_debugfs_init() fails in any of the operations to create
debugfs, it is calling delay_drop_debugfs_cleanup() as part of its
cleanup. But delay_drop_debugfs_cleanup() checks for 'dbg' and since
we have not yet pointed 'dbg' to the debugfs we need to cleanup, the
cleanup fails and we are left with stray debugfs elements and also a
memory leak.

Fixes: 4a5fd5d296 ("IB/mlx5: Add necessary delay drop assignment")
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-22 13:17:32 -04:00
Artemy Kovalyov
3fd3307ef3 IB/mlx5: Support IB_SRQT_TM
Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ
when SRQ created with IB_SRQT_TM.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:20 -04:00
Artemy Kovalyov
eb76189435 IB/mlx5: Fill XRQ capabilities
Provide driver specific values for XRQ capabilities.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:19 -04:00
Artemy Kovalyov
1a56ff6daa IB/core: Separate CQ handle in SRQ context
Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:16 -04:00
Bodong Wang
050da902ad IB/mlx5: Report mlx5 enhanced multi packet WQE capability
Expose enhanced multi packet WQE capability to user space through
query_device by uhw.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:35 -04:00
Bodong Wang
795b609c8b IB/mlx5: Allow posting multi packet send WQEs if hardware supports
Set the field to allow posting multi packet send WQEs if hardware
supports this feature. This doesn't mean the send WQEs will be for
multi packet unless the send WQE was prepared according to multi
packet send WQE format.

User space shall use flag MLX5_IB_ALLOW_MPW to check if hardware
supports MPW and allows MPW in SQ context.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:35 -04:00
Yishai Hadas
a550ddfc54 IB/mlx5: Add support for multi underlay QP
Set underlay QPN as part of flow rule when it's applicable.

There is one root flow table in the NIC RX namespace and all the
underlay QPs steer the traffic to this flow table.
In order to prevent QP to get traffic which is not target to its
underlay QP, we need to set the underlay QP number as part of
the steering matching.

Note:
When multicast traffic is sent the QPN filtering is done by the firmware
as some early step. Adding the QPN match on the flow table entry is
wrong as by that time the target QPN holds the multicast address (e.g.
FF(s)) and it won't match.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Ilya Lesokhin
7b4cdaae73 IB/mlx5: Fix integer overflow when page_shift == 31
Fix a bug where MR registration fails when mlx5_ib_cont_pages
indicates that the MR can be mapped using 2GB pages (page_shift == 31).

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Kamal Heib
5942d8ae41 IB/mlx5: Fix memory leak in clean_mr error path
In clean_mr error path the 'mr' should be freed.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Ilya Lesokhin
ff740aefec IB/mlx5: Decouple MR allocation and population flows
mlx5 compatible devices have two ways of populating the MTT
table of an MKEY: using a FW command and using a UMR WQE.

A UMR is much faster, so it should be used whenever possible.
Unfortunately the code today uses UMR only if the MKEY was allocated
from the MR cache.

Fix the code to use UMR even for MKEYs that were allocated using
a FW command.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Ilya Lesokhin
8b7ff7f3b3 IB/mlx5: Enable UMR for MRs created with reg_create
This patch is the first step in decoupling UMR usage and
allocation from the MR cache. The only functional change
in this patch is to enables UMR for MRs created with
reg_create.

This change fixes a bug where ODP memory regions that
were not allocated from the MR cache did not have UMR
enabled.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Noa Osherovich
96dc3fc5f1 IB/mlx5: Expose software parsing for Raw Ethernet QP
Software parsing (SWP) is a feature that can be used to instruct the
device to stop using its internal parser and to parse packets on the
transmit path according to offsets set for each packets.

Through this feature, the device allows the handling of checksum and
LSO by the hardware according to the location of IP and TCP/UDP
headers.

Enable SW parsing on Raw Ethernet send queue by default if firmware
supports it and report these capabilities to user space.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:47:34 -04:00
Leon Romanovsky
84305d71d6 RDMA/mlx5: Limit scope of get vector affinity local function
The mlx5_ib_get_vector_affinity() call is local to main.c file and there
is no need to be declared globally visible.

Fixes: 40b24403f3 ("mlx5: support ->get_vector_affinity")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:44:48 -04:00
Maor Gottlieb
4a5fd5d296 IB/mlx5: Add necessary delay drop assignment
Assign the statistics and configuration structure pointer on success.

Fixes: fe248c3a58 ('IB/mlx5: Add delay drop configuration and statistics')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:11 -04:00
Talat Batheesh
4edf8d5caf IB/mlx5: Fix some spelling mistakes
Fix spelling mistakes in remarks
    "retrun"->"return"
    "Decalring"->"Declaring"

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:11 -04:00
Doug Ledford
732912c738 Merge branch 'k.o/for-4.13-rc' into k.o/for-next
Pick up -rc fixes.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:58:26 -04:00
Majd Dibbiny
ec2558796d IB/mlx5: Always return success for RoCE modify port
CM layer calls ib_modify_port() regardless of the link layer.

For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:33:33 -04:00
Majd Dibbiny
1d31e9c09f IB/mlx5: Fix Raw Packet QP event handler assignment
In case we have SQ and RQ for Raw Packet QP, the SQ's event handler
wasn't assigned.

Fixing this by assigning event handler for each WQ after creation.

[ 1877.145243] Call Trace:
[ 1877.148644] <IRQ>
[ 1877.150580] [<ffffffffa07987c5>] ? mlx5_rsc_event+0x105/0x210 [mlx5_core]
[ 1877.159581] [<ffffffffa0795bd7>] ? mlx5_cq_event+0x57/0xd0 [mlx5_core]
[ 1877.167137] [<ffffffffa079208e>] mlx5_eq_int+0x53e/0x6c0 [mlx5_core]
[ 1877.174526] [<ffffffff8101a679>] ? sched_clock+0x9/0x10
[ 1877.180753] [<ffffffff810f717e>] handle_irq_event_percpu+0x3e/0x1e0
[ 1877.188014] [<ffffffff810f735d>] handle_irq_event+0x3d/0x60
[ 1877.194567] [<ffffffff810f9fe7>] handle_edge_irq+0x77/0x130
[ 1877.201129] [<ffffffff81014c3f>] handle_irq+0xbf/0x150
[ 1877.207244] [<ffffffff815ed78a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 1877.214829] [<ffffffff815f434f>] do_IRQ+0x4f/0xf0
[ 1877.220498] [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
[ 1877.227025] <EOI>
[ 1877.228967] [<ffffffff814834e2>] ? cpuidle_enter_state+0x52/0xc0
[ 1877.236990] [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
[ 1877.243676] [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
[ 1877.249831] [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
[ 1877.256513] [<ffffffff815cfee1>] start_secondary+0x265/0x27b
[ 1877.263111] Code: Bad RIP value.
[ 1877.267296] RIP [< (null)>] (null)
[ 1877.273264] RSP <ffff88046fd63df8>
[ 1877.277531] CR2: 0000000000000000

Fixes: 19098df2da ("IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:33:33 -04:00
Li Dongyang
b588300801 IB/mlx5: use kvmalloc_array for mlx5_ib_wq
We observed multiple times on our Lustre OSS servers that when
the system memory is fragmented, kmalloc() in create_kernel_qp()
could fail order 4/5 allocations while we still have many free pages.

Switch to kvmalloc_array() to allow the operation to contine.

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 16:48:23 -04:00
Amrani, Ram
e093111ddb IB/core: Fix input len in multiple user verbs
Most user verbs pass user data to the kernel with the inclusion of the
ib_uverbs_cmd_hdr structure. This is problematic because the vendor has
no ideas if the verb was called by a legacy verb or an extended verb.
Also, the incosistency between the verbs is confusing.

Fixes: 565197dd8f ("IB/core: Extend ib_uverbs_create_cq")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:02:29 -04:00
Hiatt, Don
62ede77799 Add OPA extended LID support
This patch series primarily increases sizes of variables that hold
lid values from 16 to 32 bits. Additionally, it adds a check in
the IB mad stack to verify a properly formatted MAD when OPA
extended LIDs are used.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:47:37 -04:00
Doug Ledford
d0d62c34fb Merge branch 'rdma-netlink' into k.o/merge-test
Conflicts:
	include/rdma/ib_verbs.h - Modified a function signature adjacent
	to a newly added function signature from a previous merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:34:18 -04:00
Doug Ledford
320438301b Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test
Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Both add new code
	include/rdma/ib_verbs.h - Both add new code

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:31:29 -04:00
Leon Romanovsky
9abb0d1bbd RDMA: Simplify get firmware interface
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.

The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.

The introduction of this define allows us to remove the string size
from get_fw_str function signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:10 +03:00
Sagi Grimberg
40b24403f3 mlx5: support ->get_vector_affinity
Simply refer to the generic affinity mask helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:58:03 -04:00
Hiatt, Don
7db20ecd1d IB/core: Change wc.slid from 16 to 32 bits
slid field in struct ib_wc is increased to 32 bits.
This enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Leon Romanovsky
931b3c1a83 RDMA/mlx5: Fix existence check for extended address vector
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92 ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 14:24:05 -04:00
Doug Ledford
a5f66725c7 Merge branch 'misc' into k.o/for-next 2017-07-27 09:00:38 -04:00
Colin Ian King
9c2d33d44d net/mlx5: fix spelling mistake: "alloated" -> "allocated"
Trivial fix to spelling mistake in mlx5_ib_dbg message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:46:46 -04:00
Parav Pandit
58dcb60a22 IB/mlx5: Expose extended error counters
This patch adds below requester and responder side error counters,
which will be exposed by hardware counters interface and are supported
as part of query Q counters command extension.

 +---------------------------+-------------------------------------+
 |      Name                 |           Description               |
 |---------------------------+-------------------------------------|
 |resp_local_length_error    | Number of times responder detected  |
 |                           | local length errors                 |
 |---------------------------+-------------------------------------|
 |resp_cqe_error             | Number of CQEs completed with error |
 |                           | at responder                        |
 |---------------------------+-------------------------------------|
 |req_cqe_error              | Number of CQEs completed with error |
 |                           | at requester                        |
 |---------------------------+-------------------------------------|
 |req_remote_invalid_request | Number of times requester detected  |
 |                           | remote invalid request error        |
 |---------------------------+-------------------------------------|
 |req_remote_access_error    | Number of times requester detected  |
 |                           | remote access error                 |
 |---------------------------+-------------------------------------|
 |resp_remote_access_error   | Number of times responder detected  |
 |                           | remote access error                 |
 |---------------------------+-------------------------------------|
 |resp_cqe_flush_error       | Number of CQEs completed with       |
 |                           | flushed with error at responder     |
 |---------------------------+-------------------------------------|
 |req_cqe_flush_error        | Number of CQEs completed with       |
 |                           | flushed with error at requester     |
 +---------------------------+-------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Leon Romanovsky
3fffc82ad6 IB/mlx5: Fix existence check for extended address vector
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92 ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Majd Dibbiny
4c25b7a390 IB/mlx5: Fix cached MR allocation flow
When we have a miss in one order of the mkey cache, we try to get
an mkey from a higher order.

We still need to check that the higher order can be used with UMR
before using it. Otherwise, we will get an mkey with 0 entries and
the post send operation that is used to fill it will complete with
the following error:

mlx5_0:dump_cqe:275:(pid 0): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000025 49ce59d2

Fixes: 49780d42df ("IB/mlx5: Expose MR cache for mlx5_ib")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Yishai Hadas
1d54f89094 IB/mlx5: Report RX checksum capabilities for IPoIB
Report RX checksum capabilities when 'ipoib_enhanced_offloads'
capabilities are set and support checksum.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Yishai Hadas
81e308804b IB/mlx5: Add multicast flow steering support for underlay QP
In order to add multicast flow steering support, there is need
to block the attaching of mcg flow for underlay QP, recognize
multicast IB_FLOW_SPEC_IPV4 based on its IP and enable
creating/destroying flow for IB layer.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:46 -04:00
Yishai Hadas
c2e53b2ce1 IB/mlx5: Add support for QP with a given source QPN
Allow user space applications to accelerate send and receive
traffic which is typically handled by IPoIB ULP by creating
a UD QP with a given source QPN of the IPoIB UD QP.

UD QP with a given source QPN should basically be similar to
RAW QP from point of view of its created resources.

However,
- Its TIS should point to the source QPN.
- Modify can be done only on its state as the transport attributes
  are managed by its source QP.

This patch manages below:
- Creating/destroying/modifying UD QP with a given source QPN.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:46 -04:00
Maor Gottlieb
fe248c3a58 IB/mlx5: Add delay drop configuration and statistics
Add debugfs interface for monitor the number of delay drop timeout
events and the number of existing dropless RQs in the system.

In addition add debugfs interface for configuring the global timeout value
which is used in the SET_DELAY_DROP command.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:22 -04:00
Maor Gottlieb
03404e8ae6 IB/mlx5: Add support to dropless RQ
RQs that were configured for "delay drop" will prevent packet drops
when their WQEs are depleted.
Marking an RQ to be drop-less is done by setting delay_drop_en in RQ
context using CREATE_RQ command.

Since this feature is globally activated/deactivated by using the
SET_DELAY_DROP command on all the marked RQs, we activated/deactivated
it according to the number of RQs with 'delay_drop' enabled.

When timeout is expired, then the feature is deactivated. Therefore
the driver handles the delay drop timeout event and reactivate it.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:39:53 -04:00
Bodong Wang
7ecf6d8ff1 IB/mlx5: Restore IB guid/policy for virtual functions
When a user sets port_guid, node_guid or policy of an IB virtual
function, save this information in "struct mlx5_vf_context".

This information will be restored later when pci_resume is called.
To make sure this works, one can use aer-inject to generate PCI
errors on mlx5 devices and verify if relevant fields are restored
after PCI resume.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:34:28 -04:00
Parav Pandit
4a2da0b8c0 IB/mlx5: Add debug control parameters for congestion control
This patch adds debug control parameters for congestion control which
can be read or written through debugfs. They are for reaction point and
notification point nodes.

These control parameters are as below:
 +------------------------------+-----------------------------------------+
 |      Name                    |           Description                   |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate             | When set target rate is updated to      |
 |                              | current rate                            |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate_ati         | When set update target rate based on    |
 |                              | timer as well                           |
 |------------------------------+-----------------------------------------|
 |rp_time_reset                 | time between rate increase if no        |
 |                              | CNP is received unit in usec            |
 |------------------------------+-----------------------------------------|
 |rp_byte_reset                 | Number of bytes between rate inease if  |
 |                              | no CNP is received                      |
 |------------------------------+-----------------------------------------|
 |rp_threshold                  | Threshold for reaction point rate       |
 |                              | control                                 |
 |------------------------------+-----------------------------------------|
 |rp_ai_rate                    | Rate for target rate, unit in Mbps      |
 |------------------------------+-----------------------------------------|
 |rp_hai_rate                   | Rate for hyper increase state           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_min_dec_fac                | Minimum factor by which the current     |
 |                              | transmit rate can be changed when       |
 |                              | processing a CNP, unit is percerntage   |
 |------------------------------+-----------------------------------------|
 |rp_min_rate                   | Minimum value for rate limit,           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_to_set_on_first_cnp   | Rate that is set when first CNP is      |
 |                              | received, unit is Mbps                  |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_g                  | Used to calculate alpha                 |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_rtt                | Time between updates of alpha value,    |
 |                              | unit is usec                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_reduce_monitor_period | Minimum time between consecutive rate   |
 |                              | reductions                              |
 |------------------------------+-----------------------------------------|
 |rp_initial_alpha_value        | Initial value of alpha                  |
 |------------------------------+-----------------------------------------|
 |rp_gd                         | When CNP is received, flow rate is      |
 |                              | reduced based on gd, rp_gd is given as  |
 |                              | log2(rp_gd)                             |
 |------------------------------+-----------------------------------------|
 |np_cnp_dscp                   | dscp code point for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio_mode              | 802.1p priority for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio                   | cnp priority mode                       |
 +------------------------------+-----------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:34:28 -04:00
Moni Shoua
fd65f1b8ee IB/mlx5: Change logic for dispatching IB events for port state
The old logic ignored link state. This led to missing IB events like
when link goes down on the switch while admin state is up or to redundant
events like when admin state goes up while link is down.
To fix that, probe the port state on NETDEV events and compare to last
known state to decide if IB events needs to be dispatched.

FIxes: 5ec8c83e3a ("IB/mlx5: Port events in RoCE now rely on netdev events")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:29:18 -04:00
Huy Nguyen
c85023e153 IB/mlx5: Add raw ethernet local loopback support
Currently, unicast/multicast loopback raw ethernet
(non-RDMA) packets are sent back to the vport.
A unicast loopback packet is the packet with destination
MAC address the same as the source MAC address.
For multicast, the destination MAC address is in the
vport's multicast filter list.

Moreover, the local loopback is not needed if
there is one or none user space context.

After this patch, the raw ethernet unicast and multicast
local loopback are disabled by default. When there is more
than one user space context, the local loopback is enabled.

Note that when local loopback is disabled, raw ethernet
packets are not looped back to the vport and are forwarded
to the next routing level (eswitch, or multihost switch,
or out to the wire depending on the configuration).

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:29:18 -04:00
Leon Romanovsky
e1267b0124 RDMA: Remove useless MODULE_VERSION
All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Dan Carpenter
396551eb00 IB/mlx5: Fix a warning message
"umem" is a valid pointer.  We intended to print "*umem" or even just
"err" instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Leon Romanovsky
12cc1a0273 IB/mlx5: Clean mr_cache debugfs in case of failure
The failure in creation of debugfs entries for mr_cache left entries,
which were already created.

It caused to mismatch and misguiding for the end users. The solution
is to clean mr_cache debugfs root, so no leftovers will be in the
system. In addition, let's document why the error is not needed to be
forwarded to user in case of failure.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:25 -04:00
Bart Van Assche
99975cd4fd mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger
than what fits into a single MR. .map_mr_sg() must not attempt to
map more SG-list elements than what fits into a single MR.
Hence make sure that mlx5_ib_sg_to_klms() does not write outside
the MR klms[] array.

Fixes: b005d31647 ("mlx5: Add arbitrary sg list support")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: <stable@vger.kernel.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 11:45:07 -04:00
Linus Torvalds
9871ab22f2 Fixes #3 for 4.12-rc
- 2 Fixes for OPA found by debug kernel
 - 1 Fix for user supplied input causing kernel problems
 - 1 Fix for the IPoIB fixes submitted around -rc4
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZXVfPAAoJELgmozMOVy/dRRcP/AiN4wyEQ897se1fKXAktL1g
 a17tiSkK2MukAVHbM++9Ea/YXK66e2s7Ls8Pd230E85N3V48rSUhWZUIUQLOm+gS
 b98z53uNs6KkdBCezXABsHIi4PB6u1CfzaFaUfN5WI3ymAgsYqpQWMtNyO6GNe/R
 Dur3vDieXPNJ2x+F1jiNxHFBXLKofCG0y1FX88zqsQI5vVVq7ASKgaaSX3T1emQY
 18l4Dd7pesrWj4QD9jaqQiYkruF5VC1NE8/he8Zzy6XjSgnUZZfjbjuMptbW4y3y
 Tvvd5bjMAkJhCbK1mhe1dZHPlYJhAguUBZfThjVSKtiMGwRhGA4SYkRtek3nZOga
 /OLhERgj0VomHx7o+Pwp74DWnsSv08EMoc4hXKHZPPyxok83r9czejqm7mC2VbGd
 Sa8LmVeLQp79e9MbGAj+PbNRHf9CE9dnLeFUmbj+qptXUVGvT8j9U1a9iTjTz0+2
 NX/O4iWjtnt/CIkH9dhN9aWolswbmO2jSmmzb/x2EuCLv94GNtTyZLSifvxSYMnN
 IWO86aGQmuUkWJ3RI/5tzq+gVzI6bdKB9hG5DOPWN/uJVF9nWkq3c69Bv9djvUoM
 xi/rI0grxTqYHelRx3ja4ZqaI43R6YwL928XdtZJKQ/uNanq65Lyd6KKz3W7hT0l
 emCoqb2MjuzsNWIPkSgg
 =JEor
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma update from Doug Ledford:
 "This includes two bugs against the newly added opa vnic that were
  found by turning on the debug kernel options:

   - sleeping while holding a lock, so a one line fix where they
     switched it from GFP_KERNEL allocation to a GFP_ATOMIC allocation

   - a case where they had an isolated caller of their code that could
     call them in an atomic context so they had to switch their use of a
     mutex to a spinlock to be safe, so this was considerably more lines
     of diff because all uses of that lock had to be switched

  In addition, the bug that was discussed with you already about an out
  of bounds array access in ib_uverbs_modify_qp and ib_uverbs_create_ah
  and is only seven lines of diff.

  And finally, one fix to an earlier fix in the -rc cycle that broke
  hfi1 and qib in regards to IPoIB (this one is, unfortunately, larger
  than I would like for a -rc7 submission, but fixing the problem
  required that we not treat all devices as though they had allocated a
  netdev universally because it isn't true, and it took 70 lines of diff
  to resolve the issue, but the final patch has been vetted by Intel and
  Mellanox and they've both given their approval to the fix).

  Summary:

   - Two fixes for OPA found by debug kernel
   - Fix for user supplied input causing kernel problems
   - Fix for the IPoIB fixes submitted around -rc4"

[ Doug sent this having not noticed the 4.12 release, so I guess I'll be
  getting another rdma pull request with the actuakl merge window
  updates and not just fixes.

  Oh well - it would have been nice if this small update had been the
  merge window one.     - Linus ]

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev
  RDMA/uverbs: Check port number supplied by user verbs cmds
  IB/opa_vnic: Use spinlock instead of mutex for stats_lock
  IB/opa_vnic: Use GFP_ATOMIC while sending trap
2017-07-06 11:45:08 -07:00
Niranjana Vishwanathapura
8e95960199 IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev
IPOIB is calling free_rdma_netdev even though alloc_rdma_netdev has
returned -EOPNOTSUPP.
Move free_rdma_netdev from ib_device structure to rdma_netdev structure
thus ensuring proper cleanup function is called for the rdma net device.

Fix the following trace:

ib0: Failed to modify QP to ERROR state
BUG: unable to handle kernel paging request at 0000000000001d20
IP: hfi1_vnic_free_rn+0x26/0xb0 [hfi1]
Call Trace:
 ipoib_remove_one+0xbe/0x160 [ib_ipoib]
 ib_unregister_device+0xd0/0x170 [ib_core]
 rvt_unregister_device+0x29/0x90 [rdmavt]
 hfi1_unregister_ib_device+0x1a/0x100 [hfi1]
 remove_one+0x4b/0x220 [hfi1]
 pci_device_remove+0x39/0xc0
 device_release_driver_internal+0x141/0x200
 driver_detach+0x3f/0x80
 bus_remove_driver+0x55/0xd0
 driver_unregister+0x2c/0x50
 pci_unregister_driver+0x2a/0xa0
 hfi1_mod_cleanup+0x10/0xf65 [hfi1]
 SyS_delete_module+0x171/0x250
 do_syscall_64+0x67/0x150
 entry_SYSCALL64_slow_path+0x25/0x25

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-05 17:11:00 -04:00
Ilan Tayari
095b0927f0 IB/mlx5: Respect mlx5_core reserved GIDs
Reserved gids are taken by the mlx5_core, report smaller GID table
size to IB core.

Set mlx5_query_roce_port's return value back to int. In case of
error, return an indication. This rolls back some of the change
in commit 50f22fd8ec ("IB/mlx5: Set mlx5_query_roce_port's return value to void")

Change set_roce_addr to use gid_set function, instead of directly
sending the command.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-27 16:36:47 +03:00
David S. Miller
3d09198243 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 17:35:22 -04:00
Or Gerlitz
bd10838af2 net/mlx5: Fix some spelling mistakes
Fixed few places where endianness was misspelled and
one spot whwere output was:

CHECK: 'endianess' may be misspelled - perhaps 'endianness'?
CHECK: 'ouput' may be misspelled - perhaps 'output'?

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16 00:12:40 +03:00
Alex Vesker
022d038a16 IB/ipoib: Limit call to free rdma_netdev for capable devices
Limit calls to free_rdma_netdev() for capable devices only.

Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 15:16:23 -04:00
David S. Miller
216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Max Gurtovoy
6e8484c5cf RDMA/mlx5: set UMR wqe fence according to HCA cap
Cache the needed umr_fence and set the wqe ctrl segmennt
accordingly.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:19:57 -04:00
Tariq Toukan
b359911d66 IB/mlx5: Bump driver version
Remove date and bump version for mlx5_ib driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:24:19 +03:00
Leon Romanovsky
1b9a07ee25 {net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc
Commit a7c3e901a4 ("mm: introduce kv[mz]alloc helpers") added
proper implementation of mlx5_vzalloc function to the MM core.

This made the mlx5_vzalloc function useless, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:53:18 +03:00
Erez Shitrit
693dfd5a3f IB/mlx5: Enable IPoIB acceleration
Enable mlx5 IPoIB acceleration by declaring
mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5
IPoIB rdma_netdev callbacks.

In addition, this patch brings in sync mlx5's IPoIB parts for net and IB
trees. As a precaution, we disabled IPoIB acceleration by default (in
the mlx5_core Kconfig file).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 16:22:08 -04:00
Tim Wright
133bea04ff IB/mlx5: Add port_xmit_wait to counter registers read
Add port_xmit_wait to the error counters read by mlx5_ib_process_mad to
ensure sysfs port counter provides correct value for PortXmitWait.
Otherwise the sysfs port_xmit_wait file always contains zero.

The previous MAD_IFC implementation populated this counter, but it was
removed during the migration to PPCNT for error counters (32-bit only).

Signed-off-by: Tim Wright <tim@binbash.co.uk>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 15:04:23 -04:00
Sagi Grimberg
0a49f2c31c mlx5: Fix mlx5_ib_map_mr_sg mr length
In case we got an initial sg_offset, we need to
account for it in the mr length.

Cc: stable@vger.kernel.org
Fixes: ff2ba99365 ("IB/core: Add passing an offset into the SG to
ib_map_mr_sg")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:56:07 -04:00
Dasaratharaman Chandramouli
44c58487d5 IB/core: Define 'ib' and 'roce' rdma_ah_attr types
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
38349389fe IB/mlx5: Rename to_ib_ah_attr to to_rdma_ah_attr
local function  to_ib_ah_attr is renamed so it in
sync with the rename of the ib_ah_attr structure

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Artemy Kovalyov
db570d7dea IB/mlx5: Add ODP support to MW
Internally MW implemented as KLM MKey and filled by userspace UMR
postsends.  Handle pagefault trigered by operations on this MKeys.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
1b7dbc26fc IB/mlx5: Extract page fault code
To make page fault handling code more flexible
split pagefault_single_data_segment() function.
Keep MR resolution in pagefault_single_data_segment() and
move actual updates into pagefault_single_mr().

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
b2ac91885b IB/mlx5: Add contiguous ODP support
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
4df4a5bac3 IB/mlx5: Decrease verbosity level of ODP errors
Decrease verbosity level of ODP error flows messages to debug level.
Remove one redundant print since debug level message already exists in
this flow.

Fixes: d9aaed8387 ('{net,IB}/mlx5: Refactor page fault handling')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
523791d7c5 IB/mlx5: Fix implicit MR GC
When implicit MR's leaf MKey becomes unused, i.e. when it's
last page being released my MMU invalidation it is marked as "dying"
and scheduled for release by garbage collector.
Currentle consequent page fault may remove "dying" flag.
Treat leaf MKey as non-existent once it was scheduled to removal
by GC.

Fixes: 81713d3788 ('IB/mlx5: Add implicit MR support')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
438b228e03 IB/mlx5: Fix UMR size calculation
Translation table updates of large UMR may require multiple post send
operations. The last operations can be in various lengths, but current
code set them to be the same length.

Fixes: 7d0cc6edcc ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
bd174fc2ca IB/mlx5: Fix function updating xlt emergency path
In memory shortage path we fall back to use spare buffer.
mlx5_ib_update_xlt() called from ib_uverbs_reg_mr when ibmr.ucontext
not initialized yet.

Scenario how to test it:
1. trigger memory exhaustion so __get_free_pages(GFP_KERNEL, 4) will fail
2. register MR
3. there should be no kernel oops

Fixes: 7d0cc6edcc ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Artemy Kovalyov
3e7e1193e2 IB: Replace ib_umem page_size by page_shift
Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Noa Osherovich
f1b65df5a2 IB/mlx5: Add support for active_width and active_speed in RoCE
Add missing calculation and translation of active_width and
active_speed for RoCE.

Fixes: 3f89a643eb ('IB/mlx5: Extend query_device/port to ...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:58:41 -04:00
Noa Osherovich
50f22fd8ec IB/mlx5: Set mlx5_query_roce_port's return value to void
In case of an error, the properties reported to user
are zeroed out, so no need for a return value.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:54:51 -04:00
Moni Shoua
12f8fedef2 IB/mlx5: Set correct SL in completion for RoCE
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.

While that, this patch also fills the vlan_id field in the completion if
present.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Parav Pandit
e1f24a79f4 IB/mlx5: Support congestion related counters
This patch adds support to query the congestion related hardware counters
through new command and links them with other hw counters being available
in hw_counters sysfs location.

In order to reuse existing infrastructure it renames related q_counter
data structures to more generic counters to reflect q_counters and
congestion counters and maybe some other counters in the future.

New hardware counters:
 * rp_cnp_handled - CNP packets handled by the reaction point
 * rp_cnp_ignored - CNP packets ignored by the reaction point
 * np_cnp_sent    - CNP packets sent by notification point to respond to
                     CE marked RoCE packets
 * np_ecn_marked_roce_packets - CE marked RoCE packets received by
                                notification point

It also avoids returning ENOSYS which is specific for invalid
system call and produces the following checkpatch.pl warning.

WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+		return -ENOSYS;

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Slava Shwartsman
a22ed86cff IB/mlx5: Add drop flow steering rule support
A drop rule is described by an action drop and no destination.
If a user specified IB_FLOW_SPEC_ACTION_DROP then set the action
to MLX5_FLOW_CONTEXT_ACTION_DROP and clear the destination.

Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:29:31 -04:00
Ariel Levkovich
19cc75249a IB/mlx5: Use IP version matching to classify IP traffic
This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classifed steering rules.

When user added a flow rule with IP classification, driver was
implicitly adding ethertype matching to the created rule in order
to distinguish between IPv4 and IPv6 protocols.
Since IP packets with MPLS tag header have MPLS ethertype, they missed
the rule and ended up hitting the default filters.
Such behavior prevented from MPLS packets to undergo inbound traffic
load balancing flows (if such were defined by configuring RSS) to
achieve higher throughput - the way that non-MPLS IP packets performed.

Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces Ethertype
matching by the device's capability to perform IP version parsing
and matching in order to distinguish between IPv4 and IPv6.
Therefore, whenever a flow with IP spec is added and device support IP
version matching, driver will implicitly add IP version matching to the
rule (Based on the IP spec type) without Ethertype matching which will
cause relevant MPLS tagged packets to hit this rule as well.
Otherwise (device doesn't support IP version matching), we fall back to
setting Ethertype matching.

If the user's filters specify an L2 ethertype and an IP spec
the rule will then match both the ethertype and the IP version.

The device's support for IP version matching is reported by the
device via dedicated capability bit in query_device_cap and named
outer/inner_ip_version.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Ariel Levkovich
0f750966dc IB/mlx5: Add inner spec and IPv6 validation in user's flow attribute list
This change fixes an incomplete validation of the user's
flow attributes list.

Previous implementation validated only matching of IPv4 Ethertype
to IPv4 spec of outer headers (in case both Ethernet with specified
Ethertype and IP specs were present) and lacked the validation of:
1. Matching of IPv6 Ethertype in Ethernet spec (if such exists) to an
   IPv6 protocol spec (if such exists).
2. Validation of Ethertype to IP protocol matching on inner headers specs.
Which could cause some combinations of unmatching Ethernet and IP
protocols to pass validation and apply on the device.

The fix adds validation of IPv6 Ethertype and IP spec as well as
performing the scan on both outer and inner attributes.

Fixes: 038d2ef875 ("Add flow steering support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Bodong Wang
44f2e99ecd IB/mlx5: Fix wrong use of kfree at bad flow in create_cq_user
The kfree was called to free cqb, while it should free *cqb.

Fixes: 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb
00b7c2abb6 IB/mlx5: Enlarge autogroup flow table
In order to enlarge the flow group size to 8k, we decrease
the number of flow group types to 6 and increase the flow
table size to 64k.

Flow group size is calculated as follow:
  group_size = table_size / (#group_types + 1)

Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb
dac388ef4c IB/mlx5: Check supported flow table size
Check that the required flow table size is supported
by device. Return ENOMEM error if no space left.

In addition change the create flow table routine
to return ENOMEM instead of ENOSPC.

Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb
1377661298 IB/mlx5: Change vma from shared to private
Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
it would lead to SIGBUS.

Remove the shared flags from the vma after we change it to be
anonymous.

This is easily reproduced by doing modprobe -r while running a
user-space application such as raw_ethernet_bw.

Fixes: 7c2344c3bb ('IB/mlx5: Implements disassociate_ucontext API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Maor Gottlieb
ecc7d83be3 IB/mlx5: Take write semaphore when changing the vma struct
When the driver disassociate user context, it changes the vma to
anonymous by setting the vm_ops to null and zap the vma ptes.

In order to avoid race in the kernel, we need to take write lock
before we change the vma entries.

Fixes: 7c2344c3bb ('IB/mlx5: Implements disassociate_ucontext API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 12:26:05 -04:00
Erez Shitrit
93d576af3c hw/mlx5: Add New bit to check over QP creation
Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed
258545449b net/mlx5e: IPoIB, Xmit flow
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Ingo Molnar
0881e7bd34 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>
But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Linus Torvalds
ac1820fb28 This is a tree wide change and has been kept separate for that reason.
Bart Van Assche noted that the ib DMA mapping code was significantly
 similar enough to the core DMA mapping code that with a few changes
 it was possible to remove the IB DMA mapping code entirely and
 switch the RDMA stack to use the core DMA mapping code.  This resulted
 in a nice set of cleanups, but touched the entire tree.  This branch
 will be submitted separately to Linus at the end of the merge window
 as per normal practice for tree wide changes like this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
 CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
 Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
 xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
 ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
 IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
 ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
 XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
 3e7mgpcwC+3XfA/6vU3F
 =e0Si
 -----END PGP SIGNATURE-----

Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma DMA mapping updates from Doug Ledford:
 "Drop IB DMA mapping code and use core DMA code instead.

  Bart Van Assche noted that the ib DMA mapping code was significantly
  similar enough to the core DMA mapping code that with a few changes it
  was possible to remove the IB DMA mapping code entirely and switch the
  RDMA stack to use the core DMA mapping code.

  This resulted in a nice set of cleanups, but touched the entire tree
  and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
  IB/core: Remove ib_device.dma_device
  nvme-rdma: Switch from dma_device to dev.parent
  RDS: net: Switch from dma_device to dev.parent
  IB/srpt: Modify a debug statement
  IB/srp: Switch from dma_device to dev.parent
  IB/iser: Switch from dma_device to dev.parent
  IB/IPoIB: Switch from dma_device to dev.parent
  IB/rxe: Switch from dma_device to dev.parent
  IB/vmw_pvrdma: Switch from dma_device to dev.parent
  IB/usnic: Switch from dma_device to dev.parent
  IB/qib: Switch from dma_device to dev.parent
  IB/qedr: Switch from dma_device to dev.parent
  IB/ocrdma: Switch from dma_device to dev.parent
  IB/nes: Remove a superfluous assignment statement
  IB/mthca: Switch from dma_device to dev.parent
  IB/mlx5: Switch from dma_device to dev.parent
  IB/mlx4: Switch from dma_device to dev.parent
  IB/i40iw: Remove a superfluous assignment statement
  IB/hns: Switch from dma_device to dev.parent
  ...
2017-02-25 13:45:43 -08:00
Linus Torvalds
af17fe7a63 Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree,
 I keept it separate from the remainder of the RDMA stack submission
 that is based on 4.10-rc3.
 
 This branch contains:
 
 - Various mlx4 and mlx5 fixes and minor changes
 - Support for adding a tag match rule to flow specs
 - Support for cvlan offload operation for raw ethernet QPs
 - A change to the core IB code to recognize raw eth capabilities and
   enumerate them (touches non-Mellanox code)
 - Implicit On-Demand Paging memory registration support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrx+WAAoJELgmozMOVy/du70P/1kpW2xY9Le04c3K7na2XOYl
 AUVIDrW/8Go63tpOaM7jBT3k4GlwVFr3IOmBpS24KbW/THxjhyUeP5L5+z2x+go+
 jkQOgtPWWEHr5zP3MzsNyB8fDx1YQOnJwEXxybQRW/cbw4CLjnhP+ezd6FdV/3Yy
 pPEqDVlAErzvNweG+n2r1pjcUbR8uneC3inyMLnyzUBz4CHKmC8fgD3/qJIM+DNb
 gtFT5xHFIXKCigWdQ/EwsTDcHub43V8OXlI5sO7loG6vToOUATMkjI4oOUNhDmYS
 X7XLN3yRK9QHEfb5kutXIZEWzTGh7LiFtUYGaNNYqqzDfSiMRc9NC5kTOfplEXDV
 Uo+AGb6Fh1zYIOzNk7o+tazIv3LaLv6+Fcm+9bbe0VUIqasaylsePqaTwMuIzx/I
 xP5nitmd5lbYo8WdlasVdG6mH1DlJEUbU30v4DpmTpxCP6jGpog7lexyGyF3TgzS
 NhnG0IiIClWh3WQ2/GdsFK/obIdFkpLeASli1hwD81vzPfly9zc2YpgqydZI3WCr
 q6hTXYnANcP6+eciCpQPO7giRdXdiKey08Uoq/2jxb7Qbm4daG6UwopjvH9/lm1F
 m6UDaDvzNYm+Rx+bL/+KSx9JO9+fJB1L51yCmvLGpWi6yJI4ZTfanHNMBsCua46N
 Kev/DSpIAzX1WOBkte+a
 =rspQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull Mellanox rdma updates from Doug Ledford:
 "Mellanox specific updates for 4.11 merge window

  Because the Mellanox code required being based on a net-next tree, I
  keept it separate from the remainder of the RDMA stack submission that
  is based on 4.10-rc3.

  This branch contains:

   - Various mlx4 and mlx5 fixes and minor changes

   - Support for adding a tag match rule to flow specs

   - Support for cvlan offload operation for raw ethernet QPs

   - A change to the core IB code to recognize raw eth capabilities and
     enumerate them (touches non-Mellanox code)

   - Implicit On-Demand Paging memory registration support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
  IB/mlx5: Fix configuration of port capabilities
  IB/mlx4: Take source GID by index from HW GID table
  IB/mlx5: Fix blue flame buffer size calculation
  IB/mlx4: Remove unused variable from function declaration
  IB: Query ports via the core instead of direct into the driver
  IB: Add protocol for USNIC
  IB/mlx4: Support raw packet protocol
  IB/mlx5: Support raw packet protocol
  IB/core: Add raw packet protocol
  IB/mlx5: Add implicit MR support
  IB/mlx5: Expose MR cache for mlx5_ib
  IB/mlx5: Add null_mkey access
  IB/umem: Indicate that process is being terminated
  IB/umem: Update on demand page (ODP) support
  IB/core: Add implicit MR flag
  IB/mlx5: Support creation of a WQ with scatter FCS offload
  IB/mlx5: Enable QP creation with cvlan offload
  IB/mlx5: Enable WQ creation and modification with cvlan offload
  IB/mlx5: Expose vlan offloads capabilities
  IB/uverbs: Enable QP creation with cvlan offload
  ...
2017-02-23 11:27:49 -08:00
Eli Cohen
cdbe33d0f8 IB/mlx5: Fix configuration of port capabilities
When the "ib_virt" cap is set, configuration of port capabilities need
to be done through mlx5_core_modify_hca_vport_context.
Since modify_hca_vport_context accepts mask and value, there is no need
to read the port capabilities and calculate the new cap values so we
avoid the mutex when ib_virt is set.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:29:37 -05:00
Eli Cohen
d8030b0de0 IB/mlx5: Fix blue flame buffer size calculation
A blue flame register is comprised of two buffers of equal size.

Fixes: 5fe9dec0d0 ("IB/mlx5: Use blue flame register allocator in mlx5_ib")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:44:42 -05:00
Or Gerlitz
c4550c63b3 IB: Query ports via the core instead of direct into the driver
Change the drivers to call ib_query_port in their get port
immutable handler instead of their own query port handler.

Doing this required to set the core cap flags of this device
before the ib_query_port call is made, since the IB core might
need these caps to serve the port query.

Drivers are ensured by the IB core that the port attributes passed
to the port query verb implementation are zero, and hence we
removed the zeroing from the drivers.

This patch doesn't add any new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:22 -05:00
Or Gerlitz
72cd57178f IB/mlx5: Support raw packet protocol
Mark support for the new raw packet protocol on Eth ports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:20 -05:00
Artemy Kovalyov
81713d3788 IB/mlx5: Add implicit MR support
Add implicit MR, covering entire user address space.
The MR is implemented as an indirect KSM MR consisting of
1GB direct MRs.
Pages and direct MRs are added/removed to MR by ODP.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:19 -05:00
Artemy Kovalyov
49780d42df IB/mlx5: Expose MR cache for mlx5_ib
Allow other parts of mlx5_ib to use MR cache mechanism.
* Add new functions mlx5_mr_cache_alloc and mlx5_mr_cache_free
* Traditional MTT MKey buckets are limited by MAX_UMR_CACHE_ENTRY
  Additinal buckets may be added above.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:18 -05:00
Artemy Kovalyov
94990b4989 IB/mlx5: Add null_mkey access
Add mlx5_cmd_null_mkey() function to access null_mkey information
from firmware.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:18 -05:00
Noa Osherovich
4be6da1e5b IB/mlx5: Support creation of a WQ with scatter FCS offload
Add support for creation of a WQ with scatter FCS capability, if
this capability is supported by the hardware.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:15 -05:00
Noa Osherovich
e4cc4fa7cc IB/mlx5: Enable QP creation with cvlan offload
Enable creating a RAW Ethernet QP with cvlan stripping offload when
it's supported by the hardware.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:15 -05:00
Noa Osherovich
b1f74a8437 IB/mlx5: Enable WQ creation and modification with cvlan offload
Allow creating a WQ with cvlan stripping considering device's
capabilities. The default value was fixed to disable vlan stripping
till was asked explicitly.

In addition, allow modification of a WQ to turn on/off this property.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:14 -05:00
Noa Osherovich
e816133440 IB/mlx5: Expose vlan offloads capabilities
Check device's capabilities and report which raw packet capabilities
are supported.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:14 -05:00
Majd Dibbiny
23a6964e3a IB/mlx5: Add port counter support for Receive WQs
Counters weren't updated due to Receive WQs' traffic since the
counter-id was not associated with the RQ.

Added support for associating the q-counter-id with the Receive WQ.
The attachment is done only when changing WQ's state from RESET to
READY in modify-WQ command.

FW support is required for the above, without this support
Receive WQ counters will not count.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:09 -05:00
Kamal Heib
7c16f47779 IB/mlx5: Expose Q counters groups only if they are supported by FW
This patch modify the Q counters implementation, so each one of the
three Q counters groups will be exposed by the driver only if they are
supported by the firmware.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:40:56 -05:00
Leon Romanovsky
1ffd3a26f8 IB/mlx5: Replace ENOTSUPP usage with EOPNOTSUPP
Flow steering is supposed to return EOPNOTSUPP error
for unsupported fields and not ENOTSUPP error.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:21:01 -05:00
Moses Reuben
2ac693f995 IB/mlx5: Add flow tag support
Set flow tag in flow table entry, when IB_FLOW_SPEC_ACTION_TAG
is part of the flow specifications.

Flow tag doesn't support multicast flows, so it's passing to
hardware only when used.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:21:01 -05:00
Leon Romanovsky
5abb0da9cd IB/mlx5: Remove deprecated module parameter
Commit 9603b61de1 ("mlx5: Move pci device handling from mlx5_ib
to mlx5_core") moved prof_sel module parameter from mlx5_ib to mlx5_core
and marked it as deprecated in 2014.

Three years after deprecation, it is time to remove the deprecated
module parameter.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Majd Dibbiny
ed88451e1f IB/mlx5: Assign DSCP for R-RoCE QPs Address Path
For Routable RoCE QPs, the DSCP should be set in the QP's
address path.

The DSCP's value is derived from the traffic class.

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Cc: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Maor Gottlieb
1e0e50b617 IB/mlx5: Avoid SMP MADs from VFs
According to the device specification, we need to check that the
has_smi bit is set in vport context before allowing send SMP
MADs from VF.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Maor Gottlieb
c43f1112c0 IB/mlx5: Add additional checks before processing MADs
Check the has_smi bit in vport context and class version of MADs
before allowing MADs processing to take place.
MAD_IFC SMI commands can be executed only if smi bit is set.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Kamal Heib
45bded2c21 IB/mlx5: Verify that Q counters are supported
Make sure that the Q counters are supported by the FW before trying
to allocate/deallocte them, this will avoid driver load failure when
they aren't supported by the FW.

Fixes: 0837e86a7a ('IB/mlx5: Add per port counters')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Leon Romanovsky
12bbf1ea7e IB/mlx5: Return error for unsupported signature type
In case of unsupported singature, we returned positive
value, while the better approach is to return -EINVAL.

In addition, in this change, the error print is enriched
to provide an actual supplied signature type.

Fixes: e6631814fb ("IB/mlx5: Support IB_WR_REG_SIG_MR")
Cc: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Leon Romanovsky
0fd27a88c2 IB/mlx5: Fix out-of-bound access
When we initialize buffer to create SRQ in kernel,
the number of pages was less than actually used in
following mlx5_fill_page_array().

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00