Commit Graph

192 Commits

Author SHA1 Message Date
Kaike Wan
d22a207d74 IB/hfi1: Add OPFN helper functions for TID RDMA feature
This patch adds the OPFN helper functions to initialize, encode, decode,
and reset OPFN parameters for the TID RDMA feature.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-31 11:36:05 -05:00
Linus Torvalds
5d24ae67a9 4.21 merge window pull request
This has been a fairly typical cycle, with the usual sorts of driver
 updates. Several series continue to come through which improve and
 modernize various parts of the core code, and we finally are starting to
 get the uAPI command interface cleaned up.
 
 - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5,
   qib, rxe, usnic
 
 - Rework the entire syscall flow for uverbs to be able to run over
   ioctl(). Finally getting past the historic bad choice to use write()
   for command execution
 
 - More functional coverage with the mlx5 'devx' user API
 
 - Start of the HFI1 series for 'TID RDMA'
 
 - SRQ support in the hns driver
 
 - Support for new IBTA defined 2x lane widths
 
 - A big series to consolidate all the driver function pointers into
   a big struct and have drivers provide a 'static const' version of the
   struct instead of open coding initialization
 
 - New 'advise_mr' uAPI to control device caching/loading of page tables
 
 - Support for inline data in SRPT
 
 - Modernize how umad uses the driver core and creates cdev's and sysfs
   files
 
 - First steps toward removing 'uobject' from the view of the drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwhV2oACgkQOG33FX4g
 mxpF8A/9EkRCg6wCDC59maA53b5PjuNmD//9hXbycQPQSlxntI2PyYtxrzBqc0+2
 yIaFFMehL41XNN6y1zfkl7ndl62McCH2TpiidU8RyTxVw/e3KsDD5sU6++atfHRo
 M82RNfedDtxPG8TcCPKVLof6JHADApGSR1r4dCYfAnu7KFMyvlLmeYyx4r/2E6yC
 iQPmtKVOdbGkuWGeX+brGEA0vg7FUOAvaysnxddjyh9hyem4h0SUR3Af/Ik0N5ME
 PYzC+hMKbkPVBLoCWyg7QwUaqK37uWwguMQLtI2byF7FgbiK/lBQt6TsidR4Fw3p
 EalL7uqxgCTtLYh918vxLFjdYt6laka9j7xKCX8M8d06sy/Lo8iV4hWjiTESfMFG
 usqs7D6p09gA/y1KISji81j6BI7C92CPVK2drKIEnfyLgY5dBNFcv9m2H12lUCH2
 NGbfCNVaTQVX6bFWPpy2Bt2y/Litsfxw5RviehD7jlG0lQjsXGDkZzsDxrMSSlNU
 S79iiTJyK4kUZkXzrSSlN58pLBlbupJwm5MDjKmM+irsrsCHjGIULvc902qtnC3/
 8ImiTtW6XvqLbgWXyy2Th8/ZgRY234p1ybhog+DFaGKUch0XqB7VXTV2OZm0GjcN
 Fp4PUeBt+/gBgYqjpuffqQc1rI4uwXYSoz7wq9RBiOpw5zBFT1E=
 =T0p1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a fairly typical cycle, with the usual sorts of driver
  updates. Several series continue to come through which improve and
  modernize various parts of the core code, and we finally are starting
  to get the uAPI command interface cleaned up.

   - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
     mlx5, qib, rxe, usnic

   - Rework the entire syscall flow for uverbs to be able to run over
     ioctl(). Finally getting past the historic bad choice to use
     write() for command execution

   - More functional coverage with the mlx5 'devx' user API

   - Start of the HFI1 series for 'TID RDMA'

   - SRQ support in the hns driver

   - Support for new IBTA defined 2x lane widths

   - A big series to consolidate all the driver function pointers into a
     big struct and have drivers provide a 'static const' version of the
     struct instead of open coding initialization

   - New 'advise_mr' uAPI to control device caching/loading of page
     tables

   - Support for inline data in SRPT

   - Modernize how umad uses the driver core and creates cdev's and
     sysfs files

   - First steps toward removing 'uobject' from the view of the drivers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
  RDMA/srpt: Use kmem_cache_free() instead of kfree()
  RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
  IB/uverbs: Signedness bug in UVERBS_HANDLER()
  IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
  IB/umad: Start using dev_groups of class
  IB/umad: Use class_groups and let core create class file
  IB/umad: Refactor code to use cdev_device_add()
  IB/umad: Avoid destroying device while it is accessed
  IB/umad: Simplify and avoid dynamic allocation of class
  IB/mlx5: Fix wrong error unwind
  IB/mlx4: Remove set but not used variable 'pd'
  RDMA/iwcm: Don't copy past the end of dev_name() string
  IB/mlx5: Fix long EEH recover time with NVMe offloads
  IB/mlx5: Simplify netdev unbinding
  IB/core: Move query port to ioctl
  RDMA/nldev: Expose port_cap_flags2
  IB/core: uverbs copy to struct or zero helper
  IB/rxe: Reuse code which sets port state
  IB/rxe: Make counters thread safe
  IB/mlx5: Use the correct commands for UMEM and UCTX allocation
  ...
2018-12-28 14:57:10 -08:00
Kaike Wan
c1a797c081 IB/hfi1: Ignore LNI errors before DC8051 transitions to Polling state
When it is requested to change its physical state back to Offline while in
the process to go up, DC8051 will set the ERROR field in the
DC8051_DBG_ERR_INFO_SET_BY_8051 register. This ERROR field will remain
until the next time when DC8051 transitions from Offline to Polling.
Subsequently, when the host requests DC8051 to change its physical state
to Polling again, it may receive a DC8051 interrupt with the stale ERROR
field still in DC8051_DBG_ERR_INFO_SET_BY_8051. If the host link state has
been changed to Polling, this stale ERROR will force the host to
transition to Offline state, resulting in a vicious cycle of Polling
->Offline->Polling->Offline. On the other hand, if the host link state is
still Offline when the stale ERROR is received, the stale ERROR will be
ignored, and the link will come up correctly.  This patch implements the
correct behavior by changing host link state to Polling only after DC8051
changes its physical state to Polling.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06 19:50:08 -07:00
Piotr Stankiewicz
36d842194a IB/hfi1: Fix an out-of-bounds access in get_hw_stats
When running with KASAN, the following trace is produced:

[   62.535888]

==================================================================
[   62.544930] BUG: KASAN: slab-out-of-bounds in
gut_hw_stats+0x122/0x230 [hfi1]
[   62.553856] Write of size 8 at addr ffff88080e8d6330 by task
kworker/0:1/14

[   62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted
4.19.0-test-build-kasan+ #8
[   62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
[   62.587951] Workqueue: events work_for_cpu_fn
[   62.594050] Call Trace:
[   62.598023]  dump_stack+0xc6/0x14c
[   62.603089]  ? dump_stack_print_info.cold.1+0x2f/0x2f
[   62.610041]  ? kmsg_dump_rewind_nolock+0x59/0x59
[   62.616615]  ? get_hw_stats+0x122/0x230 [hfi1]
[   62.622985]  print_address_description+0x6c/0x23c
[   62.629744]  ? get_hw_stats+0x122/0x230 [hfi1]
[   62.636108]  kasan_report.cold.6+0x241/0x308
[   62.642365]  get_hw_stats+0x122/0x230 [hfi1]
[   62.648703]  ? hfi1_alloc_rn+0x40/0x40 [hfi1]
[   62.655088]  ? __kmalloc+0x110/0x240
[   62.660695]  ? hfi1_alloc_rn+0x40/0x40 [hfi1]
[   62.667142]  setup_hw_stats+0xd8/0x430 [ib_core]
[   62.673972]  ? show_hfi+0x50/0x50 [hfi1]
[   62.680026]  ib_device_register_sysfs+0x165/0x180 [ib_core]
[   62.687995]  ib_register_device+0x5a2/0xa10 [ib_core]
[   62.695340]  ? show_hfi+0x50/0x50 [hfi1]
[   62.701421]  ? ib_unregister_device+0x2e0/0x2e0 [ib_core]
[   62.709222]  ? __vmalloc_node_range+0x2d0/0x380
[   62.716131]  ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
[   62.723735]  ? vmalloc_node+0x5c/0x70
[   62.729697]  ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
[   62.737347]  ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt]
[   62.744998]  ? __rvt_alloc_mr+0x110/0x110 [rdmavt]
[   62.752315]  ? rvt_rc_error+0x140/0x140 [rdmavt]
[   62.759434]  ? rvt_vma_open+0x30/0x30 [rdmavt]
[   62.766364]  ? mutex_unlock+0x1d/0x40
[   62.772445]  ? kmem_cache_create_usercopy+0x15d/0x230
[   62.780115]  rvt_register_device+0x1f6/0x360 [rdmavt]
[   62.787823]  ? rvt_get_port_immutable+0x180/0x180 [rdmavt]
[   62.796058]  ? __get_txreq+0x400/0x400 [hfi1]
[   62.802969]  ? memcpy+0x34/0x50
[   62.808611]  hfi1_register_ib_device+0xde6/0xeb0 [hfi1]
[   62.816601]  ? hfi1_get_npkeys+0x10/0x10 [hfi1]
[   62.823760]  ? hfi1_init+0x89f/0x9a0 [hfi1]
[   62.830469]  ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1]
[   62.838204]  ? pcie_capability_clear_and_set_word+0xcd/0xe0
[   62.846429]  ? pcie_capability_read_word+0xd0/0xd0
[   62.853791]  ? hfi1_pcie_init+0x187/0x4b0 [hfi1]
[   62.860958]  init_one+0x67f/0xae0 [hfi1]
[   62.867301]  ? hfi1_init+0x9a0/0x9a0 [hfi1]
[   62.873876]  ? wait_woken+0x130/0x130
[   62.879860]  ? read_word_at_a_time+0xe/0x20
[   62.886329]  ? strscpy+0x14b/0x280
[   62.891998]  ? hfi1_init+0x9a0/0x9a0 [hfi1]
[   62.898405]  local_pci_probe+0x70/0xd0
[   62.904295]  ? pci_device_shutdown+0x90/0x90
[   62.910833]  work_for_cpu_fn+0x29/0x40
[   62.916750]  process_one_work+0x584/0x960
[   62.922974]  ? rcu_work_rcufn+0x40/0x40
[   62.928991]  ? __schedule+0x396/0xdc0
[   62.934806]  ? __sched_text_start+0x8/0x8
[   62.941020]  ? pick_next_task_fair+0x68b/0xc60
[   62.947674]  ? run_rebalance_domains+0x260/0x260
[   62.954471]  ? __list_add_valid+0x29/0xa0
[   62.960607]  ? move_linked_works+0x1c7/0x230
[   62.967077]  ?
trace_event_raw_event_workqueue_execute_start+0x140/0x140
[   62.976248]  ? mutex_lock+0xa6/0x100
[   62.982029]  ? __mutex_lock_slowpath+0x10/0x10
[   62.988795]  ? __switch_to+0x37a/0x710
[   62.994731]  worker_thread+0x62e/0x9d0
[   63.000602]  ? max_active_store+0xf0/0xf0
[   63.006828]  ? __switch_to_asm+0x40/0x70
[   63.012932]  ? __switch_to_asm+0x34/0x70
[   63.019013]  ? __switch_to_asm+0x40/0x70
[   63.025042]  ? __switch_to_asm+0x34/0x70
[   63.031030]  ? __switch_to_asm+0x40/0x70
[   63.037006]  ? __schedule+0x396/0xdc0
[   63.042660]  ? kmem_cache_alloc_trace+0xf3/0x1f0
[   63.049323]  ? kthread+0x59/0x1d0
[   63.054594]  ? ret_from_fork+0x35/0x40
[   63.060257]  ? __sched_text_start+0x8/0x8
[   63.066212]  ? schedule+0xcf/0x250
[   63.071529]  ? __wake_up_common+0x110/0x350
[   63.077794]  ? __schedule+0xdc0/0xdc0
[   63.083348]  ? wait_woken+0x130/0x130
[   63.088963]  ? finish_task_switch+0x1f1/0x520
[   63.095258]  ? kasan_unpoison_shadow+0x30/0x40
[   63.101792]  ? __init_waitqueue_head+0xa0/0xd0
[   63.108183]  ? replenish_dl_entity.cold.60+0x18/0x18
[   63.115151]  ? _raw_spin_lock_irqsave+0x25/0x50
[   63.121754]  ? max_active_store+0xf0/0xf0
[   63.127753]  kthread+0x1ae/0x1d0
[   63.132894]  ? kthread_bind+0x30/0x30
[   63.138422]  ret_from_fork+0x35/0x40

[   63.146973] Allocated by task 14:
[   63.152077]  kasan_kmalloc+0xbf/0xe0
[   63.157471]  __kmalloc+0x110/0x240
[   63.162804]  init_cntrs+0x34d/0xdf0 [hfi1]
[   63.168883]  hfi1_init_dd+0x29a3/0x2f90 [hfi1]
[   63.175244]  init_one+0x551/0xae0 [hfi1]
[   63.181065]  local_pci_probe+0x70/0xd0
[   63.186759]  work_for_cpu_fn+0x29/0x40
[   63.192310]  process_one_work+0x584/0x960
[   63.198163]  worker_thread+0x62e/0x9d0
[   63.203843]  kthread+0x1ae/0x1d0
[   63.208874]  ret_from_fork+0x35/0x40

[   63.217203] Freed by task 1:
[   63.221844]  __kasan_slab_free+0x12e/0x180
[   63.227844]  kfree+0x92/0x1a0
[   63.232570]  single_release+0x3a/0x60
[   63.238024]  __fput+0x1d9/0x480
[   63.242911]  task_work_run+0x139/0x190
[   63.248440]  exit_to_usermode_loop+0x191/0x1a0
[   63.254814]  do_syscall_64+0x301/0x330
[   63.260283]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   63.270199] The buggy address belongs to the object at
ffff88080e8d5500
 which belongs to the cache kmalloc-4096 of size 4096
[   63.287247] The buggy address is located 3632 bytes inside of
 4096-byte region [ffff88080e8d5500, ffff88080e8d6500)
[   63.303564] The buggy address belongs to the page:
[   63.310447] page:ffffea00203a3400 count:1 mapcount:0
mapping:ffff88081380e840 index:0x0 compound_mapcount: 0
[   63.323102] flags: 0x2fffff80008100(slab|head)
[   63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001
ffff88081380e840
[   63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff
0000000000000000
[   63.350564] page dumped because: kasan: bad access detected

[   63.361974] Memory state around the buggy address:
[   63.369137]  ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[   63.379082]  ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[   63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc
fc fc fc
[   63.398944]                                      ^
[   63.406141]  ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[   63.416109]  ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[   63.426099]
==================================================================

The trace happens because get_hw_stats() assumes there is room in the
memory allocated in init_cntrs() to accommodate the driver counters.
Unfortunately, that routine only allocated space for the device
counters.

Fix by insuring the allocation has room for the additional driver
counters.

Cc: <Stable@vger.kernel.org> # v4.14+
Fixes: b7481944b0 ("IB/hfi1: Show statistics counters under IB stats interface")
Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com>
Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03 16:05:19 -05:00
Jason Gunthorpe
59bfc59a68 Merge branch 'for-rc' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

This is required to resolve dependencies of the next series of RDMA
patches.

The code motion conflicts in drivers/infiniband/core/cache.c were
resolved.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:01:02 -06:00
Alex Estrin
eb50130964 IB/hfi1: Add mtu check for operational data VLs
Since Virtual Lanes BCT credits and MTU are set through separate MADs, we
have to ensure both are valid, and data VLs are ready for transmission
before we allow port transition to Armed state.

Fixes: 5e2d6764a7 ("IB/hfi1: Verify port data VLs credits on transition to Armed")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:12 -06:00
Michael J. Ruhl
b4a4957d3d IB/hfi1: Fix destroy_qp hang after a link down
rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20 19:24:51 -06:00
Michael J. Ruhl
b53ae6bc7e IB/hfi1: set_intr_bits uses incorrect source for register modification
HFI IRQ enable bits are not being set correctly.  Send context error and
DC IRQs are not being enabled correctly.  In addition, send context error
IRQs are not being delivered.

Because of this, send context errors are not being handled correctly when
they occur.

When setting the IRQ bits, if an IRQ range is used, and the last bit is on
a register boundary (bit 63), the calculated index for the final register
modification is incorrect (index + 1 vs. index).

The incorrect index calculation causes incorrect IRQ bits to be set.  In
this case the send context error IRQ is NOT enabled.

Fix by using the 'last' value rather than the counted 'src' value to
determine the final index to use.  This satisfies all cases.

Fixes: a2f7bbdc2d ("IB/hfi1: Rework the IRQ API to be more flexible")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 11:33:13 -06:00
Michael J. Ruhl
dc9f5d0f84 IB/hfi1: Move URGENT IRQ enable to hfi1_rcvctrl()
User contexts use the receive URGENT interrupt.  However, enabling
the IRQ SRC in the file_ops module is not as clean as it could be.

Augment the _rcvctl() function to be able to enable/disable the IRQ
source.

Use the new interface from file_ops to enable/disable the IRQ.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Michael J. Ruhl
a2f7bbdc2d IB/hfi1: Rework the IRQ API to be more flexible
The current IRQ API is an all or nothing interface.  This has two
problems:

  1. All IRQs are enabled regardless of use
  2. Moving from general interrupt to MSIx handling is difficult

Introduce a new API to enable/disable specific IRQs or a range of IRQs.

Do not enable and disable all IRQs in one step.

Rework various modules to enable/disable IRQs when needed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Michael J. Ruhl
6eb4eb10fb IB/hfi1: Make the MSIx resource allocation a bit more flexible
The current method of allocating MSIx resources is a bit cumbersome,
and not very easily added to.

Refactor and re-order the code paths into a more consistent interface.

Update the interface so that allocations are not order dependent.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Michael J. Ruhl
09e71899b9 IB/hfi1: Prepare for new HFI1 MSIx API
The current HFI1 MSIx API is difficult to follow, change, or add to.

In anticipation of moving to an more flexible API, move the current
MSIx functionality to the new msix.c module.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:35 -04:00
Michael J. Ruhl
57f97e9662 IB/hfi1: Get the hfi1_devdata structure as early as possible
Currently several things occur before the hfi1_devdata structure is
allocated.  This leads to an inconsistent logging ability and makes
it more difficult to restructure some code paths.

Allocate (and do a minimal init) the structure as soon as possible.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:34 -04:00
Michael J. Ruhl
6a516bc9d7 IB/hfi1: tune_pcie_caps is arbitrarily placed, poorly
The tune_pcie_caps needs to occur sometime after PCI is enabled, but
before the HFI is enabled.  Currently it is placed in the MSIx
allocation code which doesn't really fit. Moving it to just after
the gen3 bump.

Clean up the associated code (modules, etc.).

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:34 -04:00
Michael J. Ruhl
e3091644bf IB/hfi1: Remove incorrect call to do_interrupt callback
The general interrupt handler is_rcv_avail_int() has two paths,
do_interrupt() (callback) and handle_user_interrupt().  The
do_interrupt() callback is for the threaded receive handling.
is_rcv_avail_int() cannot handle threaded IRQs.

If the do_interrupt() path is taken, and the IRQ returns
IRQ_WAKE_THREAD, the IRQ behavior will be indeterminate.

Remove incorrect call to do_interrupt() from is_rcv_avail_int(),
leaving the un-threaded (handle_user_interrupt()) path.

Fixes: f4f30031c3 ("staging/rdma/hfi1: Thread the receive interrupt.")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:29:12 -06:00
Michael J. Ruhl
d108c60d3d IB/hfi1: Set in_use_ctxts bits for user ctxts only
The in_use_ctxts bitmask is for user receive contexts only.  Setting it for
any other type of receive context is incorrect.

Move initial set of in_use_ctxts bits from the general context init to the
user context specific init. Having this bit set can allow contexts to be
incorrectly identified by some IRQ handlers. This will allow
handle_user_interrupt() will now filter user contexts correctly.

Clean up redundant is_rcv_urgent_int() user context check.

A follow on patch will clean up an incorrect code path in the
is_rcv_avail_int().

Fixes: 8737ce95c4 ("IB/hfi1: Fix an assign/ordering issue with shared context IDs")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:29:12 -06:00
Michael J. Ruhl
70324739ac IB/hfi1: Remove INTx support and simplify MSIx usage
The INTx IRQ support does not work for all HF1 IRQ handlers
(specifically the receive data IRQs).

Remove all supporting code for the INTx IRQ.

If the requested MSIx vector request is unsuccessful, do not allow the
driver to continue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn
06e81e3e92 IB/hfi1: Remove caches of chip CSRs
Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn
b67bbc5923 IB/hfi1: Remove rcvctrl from ctxtdata
It is only ever written.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn
b257843128 IB/hfi1: Remove rcvhdrq_size
The usage of this ctxt data field is not hot path and the value can be
computed on demand to cut down the ctxtdata bloat.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn
32e3d97079 IB/hfi1: Remove rcvhdrsize
The field is based on a constant that can never change.

Use the define to assign the register instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 11:49:46 -06:00
Mike Marciniszyn
40442b30aa IB/hfi1: Move rhf_offset from devdata to ctxtdata
This field should be in ctxtdata to allow for better locality of access by
eliminating a dd dereference.

The new field is now side-by-side with rcvhdrqentsize since the rhf_offset
is a function of the rcvhdrqentsize.

Both fields are now correctly sized as u8.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 11:49:45 -06:00
Mike Marciniszyn
dc2b2a917c IB/hfi1: Add bypass register defines and replace blind constants
These registers were not added in the 16B work.

Add them and replace blind constants with the correct defines.

Fixes: 72c07e2b67 ("IB/hfi1: Add support to receive 16B bypass packets")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 14:59:21 -06:00
Mike Marciniszyn
1bc0299d97 IB/hfi1: Fix user context tail allocation for DMA_RTAIL
The following code fails to allocate a buffer for the
tail address that the hardware DMAs into when the user
context DMA_RTAIL is set.

if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
	rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
		&dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
                gfp_flags);
	if (!rcd->rcvhdrtail_kvaddr)
		goto bail_free;
	rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
}

So the rcvhdrtail_kvaddr would then be NULL.

The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.

The fix is to test for both user and kernel DMA_TAIL options
during the allocation as well as testing for a NULL
rcvhdrtail_kvaddr during the mmap processing.

Additionally, all downstream testing of the capmask for DMA_RTAIL
have been eliminated in favor of testing rcvhdrtail_kvaddr.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 13:33:15 -06:00
Jason Gunthorpe
0394808d9e Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Update mlx4 to support user MR creation against read-only memory, previously
it required the memory to be writable.

Based on rdma for-rc due to dependencies.

* mr_fix: (2 commits)
  IB/mlx4: Mark user MR as writable if actual virtual memory is writable
  IB/core: Make testing MR flags for writability a static inline function
2018-05-28 11:44:35 -06:00
Sebastian Sanchez
5d18ee67d4 IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support
Currently the driver doesn't support completion vectors. These
are used to indicate which sets of CQs should be grouped together
into the same vector. A vector is a CQ processing thread that
runs on a specific CPU.

If an application has several CQs bound to different completion
vectors, and each completion vector runs on different CPUs, then
the completion queue workload is balanced. This helps scale as more
nodes are used.

Implement CQ completion vector support using a global workqueue
where a CQ entry is queued to the CPU corresponding to the CQ's
completion vector. Since the workqueue is global, it's guaranteed
to always be there when queueing CQ entries; Therefore, the RCU
locking for cq->rdi->worker in the hot path is superfluous.

Each completion vector is assigned to a different CPU. The number of
completion vectors available is computed by taking the number of
online, physical CPUs from the local NUMA node and subtracting the
CPUs used for kernel receive queues and the general interrupt.
Special use cases:

  * If there are no CPUs left for completion vectors, the same CPU
    for the general interrupt is used; Therefore, there would only
    be one completion vector available.

  * For multi-HFI systems, the number of completion vectors available
    for each device is the total number of completion vectors in
    the local NUMA node divided by the number of devices in the same
    NUMA node. If there's a division remainder, the first device to
    get initialized gets an extra completion vector.

Upon a CQ creation, an invalid completion vector could be specified.
Handle it as follows:

  * If the completion vector is less than 0, set it to 0.

  * Set the completion vector to the result of the passed completion
    vector moded with the number of device completion vectors
    available.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Kamenee Arumugam
c872a1f9e3 IB/Hfi1: Read CCE Revision register to verify the device is responsive
When Hfi1 device is unresponsive, reading the RcvArrayCnt register
will return all 1's. This value is then used to remap chip's RcvArray.
The incorrect all ones value used in remapping RcvArray
will cause warn on as shown by trace below:

[<ffffffff81685eac>] dump_stack+0x19/0x1b
[<ffffffff81085820>] warn_slowpath_common+0x70/0xb0
[<ffffffff810858bc>] warn_slowpath_fmt+0x5c/0x80
[<ffffffff81065c29>] __ioremap_caller+0x279/0x320
[<ffffffff8142873c>] ? _dev_info+0x6c/0x90
[<ffffffffa021d155>] ? hfi1_pcie_ddinit+0x1d5/0x330 [hfi1]
[<ffffffff81065d62>] ioremap_wc+0x32/0x40
[<ffffffffa021d155>] hfi1_pcie_ddinit+0x1d5/0x330 [hfi1]
[<ffffffffa0204851>] hfi1_init_dd+0x1d1/0x2440 [hfi1]
[<ffffffff813503dc>] ? pci_write_config_word+0x1c/0x20

Read CCE revision register first to verify that WFR device is
responsive. If the read return "all ones", bail out from init
and fail the driver load.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Mitko Haralanov
a74d5307ca IB/hfi1: Rework fault injection machinery
The packet fault injection code present in the HFI1 driver had some
issues which not only fragment the code but also created user
confusion. Furthermore, it suffered from the following issues:

  1. The fault_packet method only worked for received packets. This
     meant that the only fault injection mode available for sent
     packets is fault_opcode, which did not allow for random packet
     drops on all egressing packets.
  2. The mask available for the fault_opcode mode did not really work
     due to the fact that the opcode values are not bits in a bitmask but
     rather sequential integer values. Creating a opcode/mask pair that
     would successfully capture a set of packets was nearly impossible.
  3. The code was fragmented and used too many debugfs entries to
     operate and control. This was confusing to users.
  4. It did not allow filtering fault injection on a per direction basis -
     egress vs. ingress.

In order to improve or fix the above issues, the following changes have
been made:

   1. The fault injection methods have been combined into a single fault
      injection facility. As such, the fault injection has been plugged
      into both the send and receive code paths. Regardless of method used
      the fault injection will operate on both egress and ingress packets.
   2. The type of fault injection - by packet or by opcode - is now controlled
      by changing the boolean value of the file "opcode_mode". When the value
      is set to True, fault injection is done by opcode. Otherwise, by
      packet.
   2. The masking ability has been removed in favor of a bitmap that holds
      opcodes of interest (one bit per opcode, a total of 256 bits). This
      works in tandem with the "opcode_mode" value. When the value of
      "opcode_mode" is False, this bitmap is ignored. When the value is
      True, the bitmap lists all opcodes to be considered for fault injection.
      By default, the bitmap is empty. When the user wants to filter by opcode,
      the user sets the corresponding bit in the bitmap by echo'ing the bit
      position into the 'opcodes' file. This gets around the issue that the set
      of opcodes does not lend itself to effective masks and allow for extremely
      fine-grained filtering by opcode.
   4. fault_packet and fault_opcode methods have been combined. Hence, there
      is only one debugfs directory controlling the entire operation of the
      fault injection machinery. This reduces the number of debugfs entries
      and provides a more unified user experience.
   5. A new control files - "direction" - is provided to allow the user to
      control the direction of packets, which are subject to fault injection.
   6. A new control file - "skip_usec" - is added that would allow the user
      to specify a "timeout" during which no fault injection will occur.

In addition, the following bug fixes have been applied:

   1. The fault injection code has been split into its own header and source
      files. This was done to better organize the code and support conditional
      compilation without littering the code with #ifdef's.
   2. The method by which the TX PIO packets were being marked for drop
      conflicted with the way send contexts were being setup. As a result,
      the send context was repeatedly being reset.
   3. The fault injection only makes sense when the user can control it
      through the debugfs entries. However, a kernel configuration can
      enable fault injection but keep fault injection debugfs entries
      disabled. Therefore, it makes sense that the HFI fault injection
      code depends on both.
   4. Error suppression did not take into account the method by which PIO
      packets were being dropped. Therefore, even with error suppression
      turned on, errors would still be displayed to the screen. A larger
      enough packet drop percentage would case the kernel to crash because
      the driver would be stuck printing errors.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:30 -04:00
Michael J. Ruhl
e4607073ff IB/hfi1: Return correct value for device state
The driver_pstate() function is used to map internal driver state
information to externally defined states.

The VERIFY_CAP and GOING_UP states are config/training states, but
the mapping routing returns the POLLING value.

Update the return values for VERIFY_CAP and GOING_UP to return the
correct value: TRAINING.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Sebastian Sanchez
254361c189 IB/hfi1: Prevent LNI hang when LCB can't obtain lanes
When the LCB isn't able to get any lanes operational on the
first transition into mission mode, the link transfer active
never happens and the LNI stays in the polling state indefinitely.

Reset LCB upon receiving an 8051 interrupt for LCB to try to obtain
lanes with firmware version 1.25.0 or later. Also, update the LCB
reset value in other parts of the code with a macro defined to make
the code more maintainable and rename functions with the link_width
label to link_mode to reflect the fact that those functions set and
read link related data not just the link width.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:53:29 -04:00
Michael J. Ruhl
f9e76ca377 IB/hfi1: Use after free race condition in send context error path
A pio send egress error can occur when the PSM library attempts to
to send a bad packet.  That issue is still being investigated.

The pio error interrupt handler then attempts to progress the recovery
of the errored pio send context.

Code inspection reveals that the handling lacks the necessary locking
if that recovery interleaves with a PSM close of the "context" object
contains the pio send context.

The lack of the locking can cause the recovery to access the already
freed pio send context object and incorrectly deduce that the pio
send context is actually a kernel pio send context as shown by the
NULL deref stack below:

[<ffffffff8143d78c>] _dev_info+0x6c/0x90
[<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1]
[<ffffffff816ab124>] ? __schedule+0x424/0x9b0
[<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1]
[<ffffffff810aa3ba>] process_one_work+0x17a/0x440
[<ffffffff810ab086>] worker_thread+0x126/0x3c0
[<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
[<ffffffff810b252f>] kthread+0xcf/0xe0
[<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[<ffffffff816b8798>] ret_from_fork+0x58/0x90
[<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40

This is the best case scenario and other scenarios can corrupt the
already freed memory.

Fix by adding the necessary locking in the pio send context error
handler.

Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:50 -04:00
Kamenee Arumugam
0719007663 IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
HFI's counters SendWaitCnt and SendWaitVlCnt are in units
of TXE cycle time (at 805MHz). OPA counters PortXmitWait and
PortVLXmtWait are in units of flit times.
Convert the counter values to flit units using following
conversion formula:

PortXmitWait =
	SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
PortVLXmitWait =
	SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)

At link up or downgrade events, the link width can change. To ensure
accurate counter calculations, sample the counters after the events,
during counter requests, and then aggregate the OPA counters.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:30 -07:00
Michael J. Ruhl
82a9792656 IB/hfi1: Re-order IRQ cleanup to address driver cleanup race
The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
all IRQ requests.

When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
IRQF_SHARED bit is set, a call to the IRQ handler is made from the
__free_irq() function. This is testing a race condition between the
IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
able to handle this race, but does not.

This race can cause traces that start with this footprint:

BUG: unable to handle kernel NULL pointer dereference at   (null)
Call Trace:
 <hfi1 irq handler>
 ...
 __free_irq+0x1b3/0x2d0
 free_irq+0x35/0x70
 pci_free_irq+0x1c/0x30
 clean_up_interrupts+0x53/0xf0 [hfi1]
 hfi1_start_cleanup+0x122/0x190 [hfi1]
 postinit_cleanup+0x1d/0x280 [hfi1]
 remove_one+0x233/0x250 [hfi1]
 pci_device_remove+0x39/0xc0

Export IRQ cleanup function so it can be called from other modules.

Using the exported cleanup function:

  Re-order the driver cleanup code to clean up IRQ resources before
  other resources, eliminating the race.

  Re-order error path for init so that the race does not occur.

Reduce severity on spurious error message for SDMA IRQs to info.

Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Patel Jay P <jay.p.patel@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Michael J. Ruhl
11f0e89710 IB/{hfi1, qib}: Fix a concurrency issue with device name in logging
The get_unit_name() function crafts a string based on the device name
and the device unit number.  It then stores this in a static variable.

This has concurrency issues as can be seen with this log:

hfi1 0000:02:00.0: hfi1_1: read_idle_message: read idle message 0x203
hfi1 0000:01:00.0: hfi1_1: read_idle_message: read idle message 0x203

The PCI device ID (0000:02:00.0 vs. 0000:01:00.0) is correct for the
message, but the device string hfi1_1 is incorrect (it should be
hfi1_0 for the second log message).

Remove get_unit_name() function.

Instead, use the rvt accessor rvt_get_ibdev_name() to get the IB name
string.

Clean up any hfi1_early_xx calls that can now use the new path.

QIB has the same (qib_get_unit_name()) issue.  Updating as necessary.

Remove qib_get_unit_name() function.

Update log message that has redundant device name.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Sebastian Sanchez
9996b049f6 IB/hfi1: Fix infinite loop in 8051 command error path
When an 8051 command times out, the entire DC block is restarted. During
the restart, the host interface version bit is set, which calls
do_8051_command() recursively. The host version bit needs to be set
before the link moves into polling, so the host version bit can be set
in set_local_link_attributes() instead. Thus, the 8051 command functions
can be simplied as a non-locking version (dd->dc8051_lock) of those
functions are no longer needed.

Fixes: 9be6a5d788 ("IB/hfi1: Prevent LNI out of sync by resetting host interface version")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Jan Sokolowski
e8d5aff650 IB/hfi1: Send 'reboot' as planned down remote reason
On host shutdown, driver sends 'SMA_Disabled' as a reason
for link down. This is incorrect.

Send 'reboot' as a linkdown reason.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:57 -05:00
Niranjana Vishwanathapura
cc9a97ea2c IB/hfi1: Do not allocate PIO send contexts for VNIC
OPA VNIC does not use PIO contexts and instead only uses SDMA
engines. Do not allocate PIO contexts for VNIC ports.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:57 -05:00
Jan Sokolowski
641f348bbd IB/hfi1: Allow MgmtAllowed on B2B setups
HFI's are hard-wired to send Device Info frames with
MgmtAllowed bit set to 0. This means in B2B setups,
MgmtAllowed would never be allowed, which prevents
remote opa management tools from working properly.

Assume MgmtAllowed if a neighbor is also an HFI.

Fixes: 98b9ee2002 ("IB/hfi1: Cache neighbor secure data after link up")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:56 -05:00
Kamenee Arumugam
45a041cce7 IB/hfi1: Don't modify num_user_contexts module parameter
The driver parameter num_user_contexts controls global behavior and
should not be modified by the driver.
This patch eliminates modification of num_user_contexts by using a
local variable to keep track of the value.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Mike Marciniszyn
2d9544aacf IB/hfi1: Insure int mask for in-kernel receive contexts is clear
The only use for the urg interrupt is for priority PSM packets.

There is no reason for this interrupt to be enabled for kernel
contexts.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Patel Jay P
00f9203119 Ib/hfi1: Return actual operational VLs in port info query
__subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as
operational vls. hfi1_get_ib_cfg() returns vls_operational field in
hfi1_pportdata. The problem with this is that the value is always equal
to vls_supported field in hfi1_pportdata.

The logic to calculate operational_vls is to set value passed by FM
(in  __subn_set_opa_portinfo routine). If no value is passed then
default value is stored in operational_vls.

Field actual_vls_operational is calculated on the basis of buffer
control table. Hence, modifying hfi1_get_ib_cfg() to return
actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Patel Jay P <jay.p.patel@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Michael J. Ruhl
4061f3a4da IB/hfi1: Race condition between user notification and driver state
The handler for link init state (HLS_UP_INIT) notifies userspace
(update_statusp()) before enabling the device
(RCV_CTRL_RCV_PORT_ENABLE_SMASK) or setting the device state
(ppd->host_link_state).  This causes a race condition where the
userspace thinks the interface is in the INIT state before the driver
has set that state.

Rework the code path to eliminate the race.

Delay setting the init state until after a HW settling period.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Doug Ledford
894b82c427 Merge branch 'timer_setup' into for-next
Conflicts:
	drivers/infiniband/hw/cxgb4/cm.c
	drivers/infiniband/hw/qib/qib_driver.c
	drivers/infiniband/hw/qib/qib_mad.c

There were minor fixups needed in these files.  Just minor context diffs
due to patches from independent sources touching the same basic area.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:12:09 -04:00
Doug Ledford
754137a769 Merge branch 'for-next-early' into for-next
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base.  I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:07:13 -04:00
Kees Cook
8064135e8a IB/hfi1: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Switches test of .data field to
.function, since .data will be going away.

Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 11:48:19 -04:00
Doug Ledford
e527ff92b6 Merge branch 'hfi1' into k.o/for-next
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:15:14 -04:00
Jan Sokolowski
242b494bf2 IB/hfi1: Fix serdes loopback set-up
Change serdes mode setting to use MISC_CONFIG_BITS in
VERIFY_CAP_LOCAL_LINK_WIDTH register. This method of
setting up serdes loopback is universally compatible
across all firmware versions.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:12:59 -04:00
Sebastian Sanchez
7ebfc93edc IB/rdmavt: Correct issues with read-mostly and send size cache lines
The s_ahgpsn was incorrectly placed in the read-mostly section of the QP
and the s_curr_size and s_hdrwords are oversized. The misplaced
s_ahgpsn will cause the read-mostly cachelines to thrash.

Place s_ahgpsn in the send side cache lines and correctly size and
s_hdrwords and s_cur_size to keep the send side cachelines at the same
size.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 15:39:45 -04:00
Sebastian Sanchez
9be6a5d788 IB/hfi1: Prevent LNI out of sync by resetting host interface version
When the link is disabled and re-enabled, the host version bit is not
set again, so the firmware behaves as though it’s interacting with an
old driver. This causes LNI to get out of sync. The host version bit
needs to be set at load_8051_firmware() and _dc_start(). Currently, it's
only set at load_8051_firmware().

Create a common function to set the bit with the intent to make the code
more maintainable in the future, set the host version bit at _dc_start()
and modify the 8051 command API to prevent a deadlock as _dc_start() is
already holding the dc8051 lock.

Fixes: 913cc67159 ("IB/hfi1: Always perform offline transition")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 15:39:45 -04:00
Michael J. Ruhl
d7d626179f IB/hfi1: Fix incorrect available receive user context count
The addition of the VNIC contexts to num_rcv_contexts changes the
meaning of the sysfs value nctxts from available user contexts, to
user contexts + reserved VNIC contexts.

User applications that use nctxts are now broken.

Update the calculation so that VNIC contexts are used only if there are
hardware contexts available, and do not silently affect nctxts.

Update code to use the calculated VNIC context number.

Update the sysfs value nctxts to be available user contexts only.

Fixes: 2280740f01 ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Niranjana Vishwanathapura <Niranjana.Vishwanathapura@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <Stable@vger.kernel.org> #v4.12+
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 15:39:44 -04:00
Michael J. Ruhl
d59075ad1e IB/hfi1: Add a safe wrapper for _rcd_get_by_index
hfi1_rcd_get_by_index assumes that the given index is in the correct
range.  In most cases this is correct because the index is bounded by
a loop.  For these cases, adding a range check to the function is
redundant.

For the use case that is not bounded by the loop range, a _safe wrapper
function is needed to validate the index before accessing the rcd array.

Add a _safe wrapper to _get_by_index to validate the index range.

Update appropriate call sites with the new _safe function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Ira Weiny
156d24d700 IB/hfi1: Remove unused link_default variable
devdata->link_default is no longer variable

Maintain number of holes by moving dc_shutdown

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Michael J. Ruhl
05cb18fda9 IB/hfi1: Update HFI to use the latest PCI API
The HFI PCI IRQ code uses an obsolete PCI API.  Update the code to use
the new PCI IRQ API and any necessary changes because of the new API.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Jakub Byczkowski
e870b4a1f5 IB/hfi1: Add new state complete decodes for LNI failures
Add state decodes for link width negotiation, verify cap time out
and secure data resolution failures.

Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Kamenee Arumugam
09592af5fd IB/hfi1: Return correct value in general interrupt handler
The general interrupt handler returns IRQ_HANDLED whether an IRQ
was handled or not.
Determine if an IRQ was handled and return the correct value.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:10:36 -04:00
Sebastian Sanchez
30e10527bc IB/hfi1: Only reset QSFP after link up and turn off AOC TX
QSFP reset enables AOC transmitters by default. They should be off
before moving to high power mode to complete the setup. There is no
need to reset the QSFP during LNI failure as it was reset at link down.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:10:36 -04:00
Sebastian Sanchez
df5efdd970 IB/hfi1: Turn off AOC TX after offline substates
Offline.quietDuration was added in the 8051 firmware, and the driver
only turns off the AOC transmitters when offline.quiet is reached.
However, the AOC transmitters need to be turned off at the new state.
Therefore, turn off the AOC transmitters at any offline substates
including offline.quiet and offline.quietDuration, then recheck we
reached offline.quiet to support backwards compatibility.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:10:36 -04:00
Grzegorz Morys
de42de80d7 IB/hfi1: Ratelimit prints from sdma_interrupt
Ratelimit error prints from sdma_interrupt function
that could swarm dmesg otherwise.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:18 -04:00
Jakub Byczkowski
d392a673e7 IB/hfi1: Remove pstate from hfi1_pportdata
Do not track physical state separately from host_link_state.
Deduce physical state from host_link_state when required.
Change cache_physical_state to log_physical_state to make
sure host_link_state reflects hardwares physical state properly.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:38 -04:00
Sebastian Sanchez
b6422bc012 IB/hfi1: Check xchg returned value for queuing link down entry
Check xchg returned value for queuing link down entry
to guarantee proper atomic value reads.

Fixes: 626c077c02 ("IB/hfi1: Prevent link down request double queuing")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Dasaratharaman Chandramouli
51e658f5dd IB/rdmavt, hfi1, qib: Enhance rdmavt and hfi1 to use 32 bit lids
Increase lid used in hfi1 driver to 32 bits. qib continues
to use 16 bit lids.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt
72c07e2b67 IB/hfi1: Add support to receive 16B bypass packets
We introduce a struct hfi1_16b_header to support 16B headers.
16B bypass packets are received by the driver and processed
similar to 9B packets. Add basic support to handle 16B packets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Michael J. Ruhl
d295dbeb2a IB/hf1: User context locking is inconsistent
There is a mixture of mutex and spinlocks to protect receive context
(rcd/uctxt) information.  This is not used consistently.

Use the mutex to protect device receive context information only.
Use the spinlock to protect sub context information only.

Protect access to items in the rcd array with a spinlock and
reference count.

Remove spinlock around dd->rcd array cleanup.  Since interrupts are
disabled and cleaned up before this point, this lock is not useful.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Michael J. Ruhl
f2a3bc00a0 IB/hfi1: Protect context array set/clear with spinlock
The rcd array can be accessed from user context or during interrupts.
Protecting this with a mutex isn't a good idea because the mutex should
not be used from an IRQ.

Protect the allocation and freeing of rcd array elements with a
spinlock.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Byczkowski, Jakub
02a222c7f6 IB/hfi1: Remove lstate from hfi1_pportdata
Do not track logical state separately from host_link_state. Deduce
logical state from host_link_state when required. Transitions in
set_link_state and goto_offline already make sure host_link_state
reflects hardware's logical state properly.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Doug Ledford
d3cf4d9915 Merge branch 'misc' into k.o/for-next
Conflicts:
	drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
	HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
	we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:10:23 -04:00
Colin Ian King
a63aa5dbed IB/hfi1: fix spelling mistake in variable name continious
Trivial fix to spelling mistake, rename variable 'continious'
to the correct spelling 'continuous'

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 11:29:22 -04:00
Sebastian Sanchez
913cc67159 IB/hfi1: Always perform offline transition
Always initiate an offline transition request
when a link down occurs. The firmware will
use this request to confirm that the driver
has seen the link down message. A host version
is set to indicate this driver behavior to the
firmware.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Sebastian Sanchez
626c077c02 IB/hfi1: Prevent link down request double queuing
When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.

Only allow one link down request to be queued at any one time.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Sebastian Sanchez
71d47008ca IB/hfi1: Create workqueue for link events
Currently, link down interrupts queue link entries
on a workqueue intended for sending events only.
Create a workqueue for queuing link events.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Jan Sokolowski
96603ed865 IB/hfi1: Do not enable disabled port on cable insert
Fix issue where a disabled port can be enabled by
inserting a cable. The port should be explicitly
enabled instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Alex Estrin
5efd40cad4 IB/hfi1: Harden state transition to Armed and Active
There is a window that allows other threads to read state of
'host_link_state' as a new, before the hardware actual state is set.
This patch closes the window by indicating a new state only after
hardware transition is complete.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Alex Estrin
5e2d6764a7 IB/hfi1: Verify port data VLs credits on transition to Armed
There is a window where the FM can read the buffer control table
and decide not to program buffers. When a port goes down, the code
clears the table and if it is not programmed, posted SDMA descriptors
will never complete due to no buffer credits.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Bartlomiej Dudek
a618b7e40a IB/hfi1: Move saving PCI values to a separate function
During PCIe initialization some registers' values from
PCI config space are saved in order to restore them later
(i.e. after reset). Restoring those value is done by a
function called restore_pci_variables, while saving them
is put directly into function hfi1_pcie_ddinit.
Move saving values to a separate function in the image
of restoring functionality.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Jan Sokolowski
59ec873695 IB/hfi1: Fix code consistency for if/else blocks in chip.c
Code structure is not consistent for if/else blocks and break
instructions in set_link_state for case HLS_UP_INIT. Physical
state uses break in case of an error and if/else blocks for
logical use cases. These blocks should be implemented consistently.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
2250563e2c IB/hfi1: Pass the context pointer rather than the index
The hfi1_rcvctrl() function receives an index which it then converts
to an rcd.  Since most functions have the rcd, use that instead.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
17573972f4 IB/hfi1: Use context pointer rather than context index
The hfi1_<set|clear>_ctxt_<j|p>key functions take a context index and
look up the context based on that index.

Since the context index is being retrieved from the context, this
doesn't seem optimal.

Pass the context pointer for use, rather than the context index.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
e6f7622df1 IB/hfi1: Size rcd array index correctly and consistently
The array index for the rcd array is sized several different ways
throughout the code.

Use the user interface size (u16) as the standard size and update the
necessary code to reflect this.

u16 is large enough for the largest amount of supported contexts.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Mike Marciniszyn
cb51c5d2cd IB/hfi1: Fix bar0 mapping to use write combining
When the debugpat kernel boot flag is turned on the following
traces are printed:

[ 1884.793168] x86/PAT: Overlap at 0x90000000-0x92000000
[ 1884.803510] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track uncached-minus, req write-combining, ret uncached-minus
[ 1884.818167] hfi1 0000:05:00.0: hfi1_0: WC Remapped RcvArray:
ffffc9000a980000

The ioremap_wc() clearly is not returning a write combining mapping due
to an overlap where the RcvArray is mapped in a uncached mapping prior
to creating the proposed write combining mapping.

The patch replaces the single base register for uncached CSRs that
used to overlap the RcvArray with two mappings.   One, kregbase1, from the
bar0 up to the RcvArray and another, kregbase2, from the end of the
RcvArray to the pio send buffer space.  A new dd field, base2_start,
is used to convert the zero-based offset in the CSR routines to the
correct kregbase1/kregbase2 mapping.  A single direct write of the
RcvArray CSRs is replaced with hfi1_put_tid() to insure correct access
using the new disjoint mapping.

Additionally, the kregend field is deleted since it is only ever written.

patdebug now shows the RcvArray as write combining:
[   35.688990] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track write-combining, req write-combining, ret write-combining

To insulate from any potential issues with write combining, all
writeq are now flushed in hfi1_put_tid() and rcv_array_wc_fill().

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Bartlomiej Dudek
c53df62c7a IB/hfi1: Check return values from PCI config API calls
Ensure that return values from kernel PCI config access
API calls in HFI driver are checked and react properly if
they are not expected (i.e. not successful).

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:01:36 -04:00
Dennis Dalessandro
91647f4c2d IB/hfi1: Ensure dd->gi_mask can not be overflowed
As the code stands today the array access in remap_intr() is OK. To
future proof the code though we should explicitly check to ensure the
index value is not outside of the valid range. This is not a straight
forward calculation so err on the side of caution.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 11:39:45 -04:00
Mike Marciniszyn
8cb1021b80 IB/hfi1: Add traces for TID operations
This patch adds a trace for putting a TID and
for writing the RcvArray CSR.

The CSR access template can be easily extended for additional
CSR readq/writeq calls.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:13 -04:00
Jan Sokolowski
702265fc00 IB/hfi1: Set proper logging levels on QSFP cable error events
Change QSFP cable error events logging levels from info to error.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Bartlomiej Dudek
ddbf2efff4 IB/hfi1: Fix DC 8051 host info flag array
Fix info array of host message flags by adding entry
for link width downgrade and reverse values for
BC SMA and BC PWR_MSG messages

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Byczkowski, Jakub
bec7c79cd8 IB/hfi1: Modify handling of physical link state by Host Driver
Ensure states returned to the Fabric Manager are consistent with
the OPA specification by caching the physical state along with the
logical state.

Reviewed-by: Stuart Summers <john.s.summers@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrzej Kotlowski <andrzej.kotlowski@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Dennis Dalessandro
67838e64fa IB/hfi1: Fix spelling mistake in linkdown reason
Spell receive correctly in OPA_LINKDOWN_REASON_RCV_ERROR

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Dennis Dalessandro
bc54f6714c IB/hfi1: Ensure dd->gi_mask can not be overflowed
As the code stands today the array access in remap_intr() is OK. To
future proof the code though we should explicitly check to ensure the
index value is not outside of the valid range. This is not a straight
forward calculation so err on the side of caution.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Michael J. Ruhl
bb7dde8784 IB/hfi1: Replace deprecated pci functions with new API
pci_enable_msix_range() and pci_disable_msix() have been deprecated.
Updating to the new pci_alloc_irq_vectors() interface.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:11 -04:00
Don Hiatt
9039746cdf IB/hfi1: Setup common IB fields in hfi1_packet struct
We move many common IB fields into the hfi1_packet structure and
set them up in a single function. This allows us to set the fields
in a single place and not deal with them throughout the driver.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:56:33 -04:00
Byczkowski, Jakub
b3e6b4bdbb RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
Keep VL15 credits at 0 during LNI, before link-up. Store
VL15 credits value during verify cap interrupt and set
in after link-up. This addresses an issue where VL15 MAD
packets could be sent by one side of the link before
the other side is ready to receive them.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:04:20 -04:00
Linus Torvalds
3341713c67 Updates #2 for 4.12 kernel merge window
- mlx5/IPoIB fixup patch
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7idAAoJELgmozMOVy/d8bEP/31hq3Vr1unNJYjU8yEaWI0l
 D/k3ZSJcnLaeoUE5Ts69pYVkncAM/g55bmS80bKyNONRJp0BMU0MxmmzQC/iMqMM
 LAvDf3Qyv8rTrEgjjmjWVQrP/v7K0+teD6YMmTkfB8fWqhi/vR+84Sv3c50GUXpL
 OZMrn6p/IXptWa24kvi5/mKNSmeOBQkp/1lRxyi6bhAcFhjwQZoO4tDkOL6Aj1oA
 FFeUcBDwMsOjbqhH59sM9ryOnGXPEwJUnLOPfq36pIrFzyUtbbcmFfMmefjJWMLK
 XW4VBXywYLzkCvr2tGeXMukJzroUPQhD6G/4uwnFlwad0tovlBCwPmOmXHbnhRbh
 JmriZaQEGj43o3yH/HHSeRvbwJ7e71xbH0tW21K9cB/CuYTixwlvYAMd218qXQ+k
 Q6qBRueWlgsaDatlkuRV0ezrILvGQhn6QgEbCUIluDjKp/d6aRtI4IM60Y5G02U/
 WLvnMQJ/MaGQXPVR1gWqWpy1ojvj/mAeTQf6aKw1Icjd3oSPgAI06B/uThXoWxh/
 FyvDl/+HNW2StxPudQPKWkMjamaD/cToSa4HjLjdPER9HiCHmUru6nRZOufhwtwh
 q2NEPx5kraRQNEMqjpopzNIWPpGBNBm+htkqXmTRpDdfvKgQ29vL49GCcV1VHtj9
 TpYkqu/6ZVPItAAfyklh
 =Ihe9
 -----END PGP SIGNATURE-----
mergetag object 67cf3623e0
 type commit
 tag for-next
 tagger Doug Ledford <dledford@redhat.com> 1493940800 -0400
 
 Updates #3 for 4.12 kernel merge window
 
 - The hfi1 15 patch set that landed late
 - IPoIB get_link_ksettings which landed late because I asked for a
   respin
 - One late rxe change
 - One -rc worthy fix that's in early
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7q5AAoJELgmozMOVy/dungQAKpWG7sq+23PhsXDTOGtKSRg
 6jrgt5lG+V4twAJogXazxYsanPLQOE6ItKUFaBqEh2fULGyzXyPv1PsLNAVi4MYR
 6vbsMh2qCW8+dy+uEBveirQzdwdJr8U7g1mvJYPDr1Rmaw852Sq+PbId3nEy/cFK
 boMn/zhSSJuF8VsTfFCHx1gVdRN7n7gclqBP7Phq4bzdZ8E0u5GIPjmPDePUTpwH
 r3yFSC3VV5JGGgOSUc66oPPoIgraAG9EQezn01Llk5n+HPJ3eM91spV1Y9tZ7EgA
 toNW+bxEmanFYtuR+qzd0ctozUt2Ke6AUdTgcZcSCFkIan2ZOssJ0y2UN3CsNxlQ
 rYcOBe5KBNcFVb5E3NoeG+iHo9jT86fBqzYheHKmQJy38xalAlOoX6mLOvn4pPzo
 DFYDd6PpEU9SQSmhpPj1UbhvCOzecBaV3cRF/TihnqlOrRiKtd/YCccqT5GguPCB
 RVW6h1qQ94rmr/k6gobT7lNDWn2bZAgUupUD19Nw+7YwauxjGhSFsXW357NVcs4B
 3a/uLNVvnoyIjm62Sm8YFz9Bz3pRutfP9b6pSJ5V4P7dBgXYFZU35buViwcAj5fL
 DNGx4a8Poi3VQR8lQjpK6MRCWAibUY27zW1nbWQrQ9TdcHhK4sJC9Mqe41W4FXpU
 qQLkYBChz+mkCzE44bN/
 =H9f5
 -----END PGP SIGNATURE-----

Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool
2017-05-08 20:07:29 -07:00
Linus Torvalds
857f864014 pci-v4.12-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
 akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
 tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
 svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
 KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
 oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
 IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
 iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
 ags00JtMLitfNPBH3eSl
 =eE4D
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

 - use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

 - clean up and unify mmap of PCI BARs (David Woodhouse)

 - export and unify Function Level Reset support (Christoph Hellwig)

 - avoid FLR for Intel 82579 NICs (Sasha Neftin)

 - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

 - short-circuit config access failures for disconnected devices (Keith
   Busch)

 - remove D3 sleep delay when possible (Adrian Hunter)

 - freeze PME scan before suspending devices (Lukas Wunner)

 - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

 - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

 - add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

 - add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

 - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

 - fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

 - add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

 - add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

 - add i.MX7D support (Andrey Smirnov)

 - use generic MSI support for Aardvark (Thomas Petazzoni)

 - make Rockchip driver modular (Brian Norris)

 - advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

 - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

 - convert atomic_t to refcount_t in HV driver (Elena Reshetova)

 - add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

 - fix PCI bus removal in HV driver (Long Li)

 - add support for ThunderX2 DMA alias topology (Jayachandran C)

 - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

 - add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

 - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...
2017-05-08 19:03:25 -07:00
Michael J. Ruhl
9b60d2cbe0 IB/hfi1: Clean up context initialization
Context initialization mixes base context init with sub context init.
This is bad because contexts can be reused, and on reuse, reinit things
that should not re-initialized.

Normalize comments and function names to refer to base context and
sub context (not main, shared or slaves).

Separate the base context initialization from sub context initialization.

hfi1_init_ctxt() cannot return an error so changed to a void and remove
error message.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Michael J. Ruhl
637a9a7feb IB/hfi1: Correctly clear the pkey
In the close path the context is removed from the device array, and then
the clear pkey function is called.  The pkey function trys to get the
context from the device array, but because it was removed the clearing
does not occur.

Rework pkey clear function to work as expected.  Update the function
variable to reflect the correct size and name of the hw_context.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Michael J. Ruhl
f4cd876529 IB/hfi1: Name function prototype parameters
To improve the readability of function prototypes, give the parameters
names.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Jakub Byczkowski
02d1008bcf IB/hfi1: Fix checks for Offline transient state
In goto_offline() function pstate is masked by 0xff when compared
to PLS_OFFLINE state. Mask should be 0xf0, since upper 4 bits
specify the "major" state.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Dennis Dalessandro
ee495ada5c IB/hfi1: Fix unbalanced braces around else
Add missing braces around else blocks in a few places to make checkpatch
happy.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:23 -04:00
Dennis Dalessandro
a498fbcd87 IB/hfi1: Fix misspelling in comment
Checkpatch flagged a misspelled word. Fix it.

Fixes: 8764522e52 ("staging/rdma/hfi1: Unexpected link up pkey values are not an error")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:22 -04:00
Stuart Summers
98b9ee2002 IB/hfi1: Cache neighbor secure data after link up
Secure data is transferred across the link during verify
cap. This includes Neighbor Guid, Type, and Port Number.
This transfer is not guaranteed to complete until the 8051
firmware has completed processing of the state_complete
frame. Move the consumption of this data from verify cap
handling to link up handling to ensure the data is finalized.

Additionally, do not notify the SM that the link is up until
after this data is actually available.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Stuart Summers <john.s.summers@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:20 -04:00
Neel Desai
03e80e9a57 IB/hfi1: Adjust high temperature warning for QSFP cable
When we receive a QSFP_HIGH_TEMP_ALARM or QSFP_HIGH_TEMP_WARNING
interrupt, print a "QSFP cable temperature too high" message.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Neel Desai <neel.desai@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:56:20 -04:00