linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-08-29 02:59:13 +00:00

Author	SHA1	Message	Date
Bob Pearson	b00683422f	RDMA/rxe: Fix ref count error in check_rkey() There is a reference count error in error path code and a potential race in check_rkey() in rxe_resp.c. When looking up the rkey for a memory window the reference to the mw from rxe_lookup_mw() is dropped before a reference is taken on the mr referenced by the mw. If the mr is destroyed immediately after the call to rxe_put(mw) the mr pointer is unprotected and may end up pointing at freed memory. The rxe_get(mr) call should take place before the rxe_put(mw) call. All errors in check_rkey() call rxe_put(mw) if mw is not NULL but it was already called after the above. The mw pointer should be set to NULL after the rxe_put(mw) call to prevent this from happening. Fixes: `cdd0b85675` ("RDMA/rxe: Implement memory access through MWs") Link: https://lore.kernel.org/r/20230517211509.1819998-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 14:27:36 -03:00
Bob Pearson	9a3763e873	RDMA/rxe: Fix packet length checks In rxe_net.c a received packet, from udp or loopback, is passed to rxe_rcv() in rxe_recv.c as a udp packet. I.e. skb->data is pointing at the udp header. But rxe_rcv() makes length checks to verify the packet is long enough to hold the roce headers as if it were a roce packet. I.e. skb->data pointing at the bth header. A runt packet would appear to have 8 more bytes than it actually does which may lead to incorrect behavior. This patch calls skb_pull() to adjust the skb to point at the bth header before calling rxe_rcv() which fixes this error. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230517172242.1806340-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 14:27:25 -03:00
Nicolas Morey	84510a61ef	RDMA/rxe: Remove dangling declaration of rxe_cq_disable() rxe_cq_disable() has been removed but not its declaration. Fixes: `78b26a3353` ("RDMA/rxe: Remove tasklet call from rxe_cq.c") Link: https://lore.kernel.org/r/4f20ffc5-b2c4-0c11-2883-a835caf01a94@suse.com Signed-off-by: Nicolas Morey <nmorey@suse.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 12:59:32 -03:00
David Howells	c2ff29e99a	siw: Inline do_tcp_sendpages() do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Tom Talpey <tom@talpey.com> cc: Jason Gunthorpe <jgg@ziepe.ca> cc: Leon Romanovsky <leon@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-23 20:48:27 -07:00
Daisuke Matsuda	42b0a5e691	RDMA/rxe: Fix comments about removed tasklets The commit `9b4b7c1f9f` ("RDMA/rxe: Add workqueue support for rxe tasks") removed tasklets and replaced them with a workqueue, but relevant comments are still remaining in the source code. Fixes: `9b4b7c1f9f` ("RDMA/rxe: Add workqueue support for rxe tasks") Link: https://lore.kernel.org/r/20230518070027.942715-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 12:02:26 -03:00
Bob Pearson	9b4b7c1f9f	RDMA/rxe: Add workqueue support for rxe tasks Replace tasklets by work queues for the three main rxe tasklets: rxe_requester, rxe_completer and rxe_responder. work queues are a more modern way to process work from an IRQ and provide more control over how that work is run for future patches. Link: https://lore.kernel.org/r/20230428171321.5774-1-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-17 15:34:25 -03:00
Guoqing Jiang	b5f3fe27c5	RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to spin_{lock_irqsave,unlock_irqrestore} We need to call spin_lock_irqsave()/spin_unlock_irqrestore() for state_lock in rxe, otherwsie the callchain: ib_post_send_mad -> spin_lock_irqsave -> ib_post_send -> rxe_post_send -> spin_lock_bh -> spin_unlock_bh -> spin_unlock_irqrestore Causes below traces during run block nvmeof-mp/001 test due to mismatched spinlock nesting: WARNING: CPU: 0 PID: 94794 at kernel/softirq.c:376 __local_bh_enable_ip+0xc2/0x140 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:__local_bh_enable_ip+0xc2/0x140 Code: 48 85 c0 74 72 5b 41 5c 5d 31 c0 89 c2 89 c1 89 c6 89 c7 41 89 c0 e9 bd 0e 11 01 65 8b 05 f2 65 72 48 85 c0 0f 85 76 ff ff ff <0f> 0b e9 6f ff ff ff e8 d2 39 1c 00 eb 80 4c 89 e7 e8 68 ad 0a 00 RSP: 0018:ffffb7cf818539f0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffffc0f25f79 RBP: ffffb7cf81853a00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0f25f79 R13: ffff8db1f0fa6000 R14: ffff8db2c63ff000 R15: 00000000000000e8 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_bh+0x31/0x40 rxe_post_send+0x59/0x8b0 [rdma_rxe] ib_send_mad+0x26b/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 0 PID: 94794 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x37/0x60 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G W E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:warn_bogus_irq_restore+0x37/0x60 Code: fb 01 77 36 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 89 f7 e9 ac ea 01 00 48 c7 c7 e0 52 33 b9 c6 05 bb 1c 69 01 01 e8 39 24 f0 fe <0f> 0b 48 8b 5d f8 c9 31 f6 89 f7 e9 89 ea 01 00 0f b6 f3 48 c7 c7 RSP: 0018:ffffb7cf81853a58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb7cf81853a60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8db2cfb1a9e8 R13: ffff8db2cfb1a9d8 R14: ffff8db2c63ff000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x91/0xa0 ib_send_mad+0x1e3/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Fixes: `f605f26ea1` ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230510035056.881196-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-16 21:07:33 -03:00
Bob Pearson	17eabd6a04	RDMA/rxe: Fix double unlock in rxe_qp.c A recent patch can cause a double spin_unlock_bh() in rxe_qp_to_attr() at line 715 in rxe_qp.c. Move the 2nd unlock into the if statement. Fixes: `f605f26ea1` ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230515201056.1591140-1-rpearsonhpe@gmail.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/27773078-40ce-414f-8b97-781954da9f25@kili.mountain Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-16 16:52:45 -03:00
Linus Torvalds	af3877265d	v6.4 merge window RDMA pull request Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser * Unnecessary NULL checks * kmap obsolescence * pci_enable_pcie_error_reporting() obsolescence * Unused variables and macros * trace event related warnings * casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZEva5wAKCRCFwuHvBreF YZFmAQC9T3b/XQ3bRknYciuzbatC98o9xB0FTqmEFYGj+Y2lVAD9EEVe3HKfHfi3 t/GxXYB5r22oxg5bgsblZfEdEdTVCg8= =akMm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser: - remove unnecessary NULL checks - kmap obsolescence - pci_enable_pcie_error_reporting() obsolescence - unused variables and macros - trace event related warnings - casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits) RDMA/efa: Add rdma write capability to device caps RDMA/mlx5: Use correct device num_ports when modify DC RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call RDMA/rxe: Fix spinlock recursion deadlock on requester RDMA/mlx5: Fix flow counter query via DEVX RDMA/rxe: Protect QP state with qp->state_lock RDMA/rxe: Move code to check if drained to subroutine RDMA/rxe: Remove qp->req.state RDMA/rxe: Remove qp->comp.state RDMA/rxe: Remove qp->resp.state RDMA/mlx5: Allow relaxed ordering read in VFs and VMs net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA: Add ib_virt_dma_to_page() RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c IB/hfi1: Place struct mmu_rb_handler on cache line start IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests ...	2023-04-29 17:21:24 -07:00
Daisuke Matsuda	10af303192	RDMA/rxe: Fix spinlock recursion deadlock on requester The following deadlock is observed: Call Trace: <IRQ> _raw_spin_lock_bh+0x29/0x30 check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe] rxe_rcv+0x173/0x3d0 [rdma_rxe] rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe] ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe] udp_queue_rcv_one_skb+0x258/0x520 udp_unicast_rcv_skb+0x75/0x90 __udp4_lib_rcv+0x364/0x5c0 ip_protocol_deliver_rcu+0xa7/0x160 ip_local_deliver_finish+0x73/0xa0 ip_sublist_rcv_finish+0x80/0x90 ip_sublist_rcv+0x191/0x220 ip_list_rcv+0x132/0x160 __netif_receive_skb_list_core+0x297/0x2c0 netif_receive_skb_list_internal+0x1c5/0x300 napi_complete_done+0x6f/0x1b0 virtnet_poll+0x1f4/0x2d0 [virtio_net] __napi_poll+0x2c/0x1b0 net_rx_action+0x293/0x350 ? __napi_schedule+0x79/0x90 __do_softirq+0xcb/0x2ab __irq_exit_rcu+0xb9/0xf0 common_interrupt+0x80/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:_raw_spin_lock+0x17/0x30 rxe_requester+0xe4/0x8f0 [rdma_rxe] ? xas_load+0x9/0xa0 ? xa_load+0x70/0xb0 do_task+0x64/0x1f0 [rdma_rxe] rxe_post_send+0x54/0x110 [rdma_rxe] ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs] ? netif_receive_skb_list_internal+0x1e3/0x300 ib_uverbs_write+0x3c8/0x500 [ib_uverbs] vfs_write+0xc5/0x3b0 ksys_write+0xab/0xe0 ? syscall_trace_enter.constprop.0+0x126/0x1a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc </TASK> The deadlock is easily reproducible with perftest. Fix it by disabling softirq when acquiring the lock in process context. Fixes: `f605f26ea1` ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230418090642.1849358-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-21 12:33:00 -03:00
Linus Torvalds	e1f2750edc	x86: remove 'zerorest' argument from __copy_user_nocache() Every caller passes in zero, meaning they don't want any partial copy to zero the remainder of the destination buffer. Which is just as well, because the implementation of that function didn't actually even look at that argument, and wasn't even aware it existed, although some misleading comments did mention it still. The 'zerorest' thing is a historical artifact of how "copy_from_user()" worked, in that it would zero the rest of the kernel buffer that it copied into. That zeroing still exists, but it's long since been moved to generic code, and the raw architecture-specific code doesn't do it. See _copy_from_user() in lib/usercopy.c for this all. However, while __copy_user_nocache() shares some history and superficial other similarities with copy_from_user(), it is in many ways also very different. In particular, while the code makes it look similar to the generic user copy functions that can copy both to and from user space, and take faults on both reads and writes as a result, __copy_user_nocache() does no such thing at all. __copy_user_nocache() always copies to kernel space, and will never take a page fault on the destination. What can happen, though, is that the non-temporal stores take a machine check because one of the use cases is for writing to stable memory, and any memory errors would then take synchronous faults. So __copy_user_nocache() does look a lot like copy_from_user(), but has faulting behavior that is more akin to our old copy_in_user() (which no longer exists, but copied from user space to user space and could fault on both source and destination). And it very much does not have the "zero the end of the destination buffer", since a problem with the destination buffer is very possibly the very source of the partial copy. So this whole thing was just a confusing historical artifact from having shared some code with a completely different function with completely different use cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-04-19 19:09:52 -07:00
Bob Pearson	f605f26ea1	RDMA/rxe: Protect QP state with qp->state_lock Currently the rxe driver makes little effort to make the changes to qp state (which includes qp->attr.qp_state, qp->attr.sq_draining and qp->valid) atomic between different client threads and IO threads. In particular a common template is for an RDMA application to call ib_modify_qp() to move a qp to ERR state and then wait until all the packet and work queues have drained before calling ib_destroy_qp(). None of these state changes are protected by locks to assure that the changes are executed atomically and that memory barriers are included. This has been observed to lead to incorrect behavior around qp cleanup. This patch continues the work of the previous patches in this series and adds locking code around qp state changes and lookups. Link: https://lore.kernel.org/r/20230405042611.6467-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:34:04 -03:00
Bob Pearson	7b560b89a0	RDMA/rxe: Move code to check if drained to subroutine Move two blocks of code in rxe_comp.c and rxe_req.c to subroutines that check if draining is complete in the SQD state and, if so, generate a SQ_DRAINED event. Link: https://lore.kernel.org/r/20230405042611.6467-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:01:44 -03:00
Bob Pearson	98e891b5e4	RDMA/rxe: Remove qp->req.state The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->req.state by qp->attr.qp_state and enum rxe_qp_state. This is the third of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:01:44 -03:00
Bob Pearson	f55efc2ed2	RDMA/rxe: Remove qp->comp.state The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->comp.state by qp->attr.qp_state. This is the second of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:01:44 -03:00
Bob Pearson	a588429a66	RDMA/rxe: Remove qp->resp.state The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->resp.state by qp->attr.qp_state. This is the first of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-17 16:01:44 -03:00
Jason Gunthorpe	8d7c7c0eeb	RDMA: Add ib_virt_dma_to_page() Make it clearer what is going on by adding a function to go back from the "virtual" dma_addr to a kva and another to a struct page. This is used in the ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib). Call them instead of a naked casting and virt_to_page() when working with dma_addr values encoded by the various ib_map functions. This also fixes the virt_to_page() casting problem Linus Walleij has been chasing. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 11:08:07 +03:00
Zhu Yanjun	b2b1ddc457	RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" In the function rxe_create_qp(), rxe_qp_from_init() is called to initialize qp, internally things like rxe_init_task are not setup until rxe_qp_init_req(). If an error occurred before this point then the unwind will call rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task() which will oops when trying to access the uninitialized spinlock. If rxe_init_task is not executed, rxe_cleanup_task will not be called. Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8 Fixes: `8700e3e7c4` ("Soft RoCE driver") Fixes: `2d4b21e0a2` ("IB/rxe: Prevent from completer to operate on non valid QP") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 10:51:33 +03:00
Bob Pearson	67a00d29c3	RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c In a previous patch TASKLET_STATE_SCHED was used as a mask but it is a bit position instead. Add the missing shift. Link: https://lore.kernel.org/r/20230329193308.7489-1-rpearsonhpe@gmail.com Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-rdma/8a054b78-6d50-4bc6-8d8a-83f85fbdb82f@kili.mountain/ Fixes: `d946716325` ("RDMA/rxe: Rewrite rxe_task.c") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-12 13:11:51 -03:00
Tetsuo Handa	266e9b3475	RDMA/siw: Remove namespace check from siw_netdev_event() syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy siw_device created after unshare(CLONE_NEWNET) due to net namespace check. It seems that this check was by error there and should be removed. Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Leon Romanovsky <leon@kernel.org> Fixes: `bdcf26bf9b` ("rdma/siw: network and RDMA core interface") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jp Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-03 21:24:21 +03:00
Leon Romanovsky	b6ba68555d	RDMA/rxe: Clean kzalloc failure paths There is no need to print any debug messages after failure to allocate memory, because kernel will print OOM dumps anyway. Together with removal of these messages, remove useless goto jumps. Fixes: `5bf944f241` ("RDMA/rxe: Add error messages") Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/all/ea43486f-43dd-4054-b1d5-3a0d202be621@kili.mountain Link: https://lore.kernel.org/r/d3cedf723b84e73e8062a67b7489d33802bafba2.1680113597.git.leon@kernel.org Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-03-30 09:54:32 +03:00
Bob Pearson	78b26a3353	RDMA/rxe: Remove tasklet call from rxe_cq.c Remove the tasklet call in rxe_cq.c and also the is_dying in the cq struct. There is no reason for the rxe driver to defer the call to the cq completion handler by scheduling a tasklet. rxe_cq_post() is not called in a hard irq context. The rxe driver currently is incorrect because the tasklet call is made without protecting the cq pointer with a reference from having the underlying memory freed before the deferred routine is called. Executing the comp_handler inline fixes this problem. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.com Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-29 14:25:11 +03:00
Bob Pearson	d946716325	RDMA/rxe: Rewrite rxe_task.c This patch is a major rewrite of the tasklet routines in rxe_task.c. The main motivation for this is the realization that the code violates the safety of the qp pointer by correct reference counting. When a tasklet is scheduled from a verbs API the calling thread has a valid reference to the qp and schedules the tasklet to run at a later time carrying a pointer to the qp. Once the calling code returns however the qp can be destroyed at any time. In order to correct this a reference to the qp must be taken when the task is scheduled and held until it finishes running. This is complicated by the tasklet library not alwys running a task that is scheduled depending on whether someone else has scheduled it. This patch moves the logic for deciding whether to run or schedule a task outside of do_task() and guarantees that there is only one copy of the task scheduled or running at a time. Secondly the separate flags controlling teardown and draining of the task are included in the task state machine and all references to the state are protected by spinlocks to avoid consistency and memory barrier issues. Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	f455a1bc97	RDMA/rxe: Make tasks schedule each other Replace rxe_run_task() by rxe_sched_task() when tasks call each other. These are not performance critical and mainly involve error paths but they run the risk of causing deadlocks. Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	960ebe97e5	RDMA/rxe: Remove __rxe_do_task() The subroutine __rxe_do_task is not thread safe and it has no way to guarantee that the tasks, which are designed with the assumption that they are non-reentrant, are not reentered. All of its uses are non-performance critical. This patch replaces calls to __rxe_do_task with calls to rxe_sched_task. It also removes irrelevant or unneeded if tests. Instead of calling the task machinery a single call to the tasklet function (rxe_requester, etc.) is sufficient to draing the queues if task execution has been disabled or stopped. Together these changes allow the removal of __rxe_do_task. Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	a246aa2e8a	RDMA/rxe: Remove qp reference counting in tasks Currently each of the three tasklets requester, completer and responder in the rxe driver take and release a reference to the qp argument at the beginning and end of the subroutines. The caller passing in the qp argument should be responsible for holding a reference to qp so these are not required. Further doing so breaks the qp cleanup code in rxe_qp_do_cleanup which calls these routines after all the references have been dropped so they cannot drain the packet and work request queues as intended. In fact if these routines are deferred by calling tasklet_schedule there is no guarantee that the calling code does have a qp reference. That is a bug in rxe_task.c which will be fixed later in this series. Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	fbdeb828a2	RDMA/rxe: Cleanup error state handling in rxe_comp.c Cleanup the handling of qp in the error state, reset state and during rxe_qp_do_cleanup. Make the same as rxe_resp.c Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	49dc9c1f0c	RDMA/rxe: Cleanup reset state handling in rxe_resp.c Cleanup the handling of qp in the error state, reset state and during rxe_qp_do_cleanup. The error state does about the same thing as the others but has code spread all over. This patch combines them in a cleaner way. Link: https://lore.kernel.org/r/20230304174533.11296-4-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:21:36 -03:00
Bob Pearson	3946fc2a42	RDMA/rxe: Convert tasklet args to queue pairs Originally is was thought that the tasklet machinery in rxe_task.c would be used in other applications but that has not happened for years. This patch replaces the 'void arg' by struct 'rxe_qp qp' in the parameters to the tasklet calls. This change will have no affect on performance but may make the code a little clearer. Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 11:14:38 -03:00
Bob Pearson	5bf944f241	RDMA/rxe: Add error messages This patch adds error and debug messages so that every interaction with rdma-core through a verbs API call or a completion error return will generate at least one error message backed up by debug messages with more detail. With dynamic debugging one can follow up after seeing an error message by turning on the appropriate debug messages. Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 10:41:49 -03:00
Bob Pearson	9ac01f434a	RDMA/rxe: Extend dbg log messages to err and info Extend the dbg log messages (e.g. rxe_dbg_xxx) to include err and info types. rxe.c is modified to use these new log messages as examples. Link: https://lore.kernel.org/r/20230303221623.8053-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 10:41:49 -03:00
Bob Pearson	a9fb328721	RDMA/rxe: Change rxe_dbg to rxe_dbg_dev Replace the name rxe_dbg with rxe_dbg_dev which better matches the remaining rxe_dbg_xxx macros for debug messages with a rxe device parameter. Reuse the name rxe_dbg for debug messages which do not have a rxe device parameter. Link: https://lore.kernel.org/r/20230303221623.8053-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 10:41:49 -03:00
Bob Pearson	9168d125ea	RDMA/rxe: Replace exists by rxe in rxe.c 'exists' looks like a boolean. This patch replaces it by the normal name used for the rxe device, 'rxe', which should be a little less confusing. The second rxe_dbg() message is incorrect since rxe is known to be NULL and this will cause a seg fault if this message were ever sent. Replace it by pr_debug for the moment. Fixes: `c6aba5ea00` ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c") Link: https://lore.kernel.org/r/20230303221623.8053-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-24 10:41:48 -03:00
Natalia Petrova	b73a0b80c6	RDMA/rdmavt: Delete unnecessary NULL check There is no need to check 'rdi->qp_dev' for NULL. The field 'qp_dev' is created in rvt_register_device() which will fail if the 'qp_dev' allocation fails in rvt_driver_qp_init(). Overwise this pointer doesn't changed and passed to rvt_qp_exit() by the next step. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `0acb0cc7ec` ("IB/rdmavt: Initialize and teardown of qpn table") Signed-off-by: Natalia Petrova <n.petrova@fintech.ru> Link: https://lore.kernel.org/r/20230303124408.16685-1-n.petrova@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-14 11:57:01 +02:00
Kees Cook	f2f6e1661d	IB/rdmavt: Fix target union member for rvt_post_one_wr() The "cplen" result used by the memcpy() into struct rvt_swqe "wqe" may be sized to 80 for struct rvt_ud_wr (which is member "ud_wr", not "wr" which is only 40 bytes in size). Change the destination union member so the compiler can use the correct bounds check. struct rvt_swqe { union { struct ib_send_wr wr; /* don't use wr.sg_list */ struct rvt_ud_wr ud_wr; ... }; ... }; Silences false positive memcpy() run-time warning: memcpy: detected field-spanning write (size 80) of single field "&wqe->wr" at drivers/infiniband/sw/rdmavt/qp.c:2043 (size 40) Reported-by: Zhang Yi <yizhan@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216561 Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230218185701.never.779-kees@kernel.org Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-13 13:56:31 +02:00
Daniil Dulov	271bfcfb83	RDMA/siw: Fix potential page_array out of range access When seg is equal to MAX_ARRAY, the loop should break, otherwise it will result in out of range access. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `b9be6f18cf` ("rdma/siw: transmit path") Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru> Link: https://lore.kernel.org/r/20230227091751.589612-1-d.dulov@aladdin.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-03-13 13:54:49 +02:00
Linus Torvalds	8cbd92339d	v6.3 RDMA pull request Small cycle this time: - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/gPmgAKCRCFwuHvBreF YW5IAP4xOAiTif4f87vD1twRU/ebq4VEX0r+C2NX5x5fwlCJrAEA7RLV8uG9Uii2 ez0BuWNxfajuvFHntnZ1E+7UDP0S8gk= =CgUH -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...	2023-02-24 15:11:03 -08:00
Bob Pearson	a77a52385e	RDMA/rxe: Fix missing memory barriers in rxe_queue.h An earlier patch which introduced smp_load_acquire/smp_store_release into rxe_queue.h incorrectly assumed that surrounding spin-locks in rxe_verbs.c around queue updates for kernel ulps was sufficient to protect the passing of data through the queues between the ulp and the rxe tasklets. But this was incorrect. The typical sequence was ulp rxe requester tasklet ------------------------ --------------------- spin_lock_irqsave() wqe = queue_head(queue) if (!queue_full(q)) { if (!wqe) spin_unlock_irqrestore return; return -ENOMEM } <process wqe> wqe = queue_producer_addr(q) <fill in wqe> queue_advance_consumer(queue) queue_advance_producer(q) spin_unlock_irqrestore() queue_head() calls queue_empty() which calls smp_load_acquire() For user space apps queue_advance_producer() calls smp_store_release() so that there is a memory barrier between the producer and the consumer but for kernel ulps queue_advance_produce() just incremented the producer index because the lock function is a release function. But to work the barrier has to come between filling in the wqe and updating the producer index. This patch adds the missing barriers. It also changes the enum names for the ulp queue types to QUEUE_TYPE_FROM/TO_ULP instead of QUEUE_TYPE_TO/FROM_DRIVER which is very ambiguous. This bug is suspected as the cause of very rare lockups in a very high scale storage application. It is a bug in any case and should be corrected. Fixes: `0a67c46d2e` ("RDMA/rxe: Protect user space index loads/stores") Link: https://lore.kernel.org/r/20230214071053.5395-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-16 12:07:05 -04:00
Bob Pearson	72a0362744	RDMA/rxe: Remove rxe_alloc() Currently all the object types in the rxe driver are allocated in rdma-core except for MRs. By moving tha kzalloc() call outside of the pool code the rxe_alloc() subroutine can be eliminated and code checking for MR as a special case can be removed. This patch moves the kzalloc() and kfree_rcu() calls into the mr registration and destruction verbs. It removes that code from rxe_pool.c including the rxe_alloc() subroutine which is no longer used. Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-16 11:30:11 -04:00
Bob Pearson	5ff31dfcd6	Subject: RDMA/rxe: Handle zero length rdma Currently the rxe driver does not handle all cases of zero length rdma operations correctly. The client does not have to provide an rkey for zero length RDMA read or write operations so the rkey provided may be invalid and should not be used to lookup an mr. This patch corrects the driver to ignore the provided rkey if the reth length is zero for read or write operations and make sure to set the mr to NULL. In read_reply() if length is zero rxe_recheck_mr() is not called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs when the length is non-zero. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-16 11:13:52 -04:00
Bernard Metzler	65a8fc30fb	RDMA/siw: Fix user page pinning accounting To avoid racing with other user memory reservations, immediately account full amount of pages to be pinned. Fixes: `2251334dca` ("rdma/siw: application buffer management") Reported-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230202101000.402990-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-02-06 14:46:50 +02:00
Jakub Kicinski	b568d3072a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/intel/ice/ice_main.c `418e53401e` ("ice: move devlink port creation/deletion") `643ef23bd9` ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c `3d53aaef43` ("tsnep: Fix TX queue stop/wake for multiple queues") `25faa6a4c5` ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c `13bd9b31a9` ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") `a44b765148` ("netfilter: conntrack: unify established states for SCTP paths") `f71cb8f45d` ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-27 22:56:18 -08:00
Bob Pearson	592627ccbd	RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray Replace struct rxe-phys_buf and struct rxe_map by struct xarray in rxe_verbs.h. This allows using rcu locking on reads for the memory maps stored in each mr. This is based off of a sketch of a patch from Jason Gunthorpe in the link below. Some changes were needed to make this work. It applies cleanly to the current for-next and passes the pyverbs, perftest and the same blktests test cases which run today. Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-27 12:14:14 -04:00
Bob Pearson	325a7eb851	RDMA/rxe: Cleanup page variables in rxe_mr.c Cleanup usage of mr->page_shift and mr->page_mask and introduce an extractor for mr->ibmr.page_size. Normal usage in the kernel has page_mask masking out offset in page rather than masking out the page number. The rxe driver had reversed that which was confusing. Implicitly there can be a per mr page_size which was not uniformly supported. Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-26 15:04:46 -04:00
Bob Pearson	d8bdb0ebca	RDMA-rxe: Isolate mr code from atomic_write_reply() Isolate mr specific code from atomic_write_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. Check length for atomic write operation. Make iova_to_vaddr() static. Link: https://lore.kernel.org/r/20230119235936.19728-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-26 15:04:45 -04:00
Bob Pearson	f04d5b3d91	RDMA-rxe: Isolate mr code from atomic_reply() Isolate mr specific code from atomic_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_op() in rxe_mr.c. Minor cleanups to rxe_check_range() and iova_to_vaddr(). Move enum resp_state to rxe.h Link: https://lore.kernel.org/r/20230119235936.19728-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-26 15:04:45 -04:00
Bob Pearson	db4729a525	RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense. Link: https://lore.kernel.org/r/20230119235936.19728-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-26 15:04:45 -04:00
Bob Pearson	ade58da2a7	RDMA/rxe: Cleanup mr_check_range Remove blank lines and replace EFAULT by EINVAL when an invalid mr type is used. Link: https://lore.kernel.org/r/20230119235936.19728-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-26 15:04:44 -04:00
Peilin Ye	40e0b09081	net/sock: Introduce trace_sk_data_ready() As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-23 11:26:50 +00:00
Daisuke Matsuda	1aefe5c177	RDMA/rxe: Prevent faulty rkey generation If you create MRs more than 0x10000 times after loading the module, responder starts to reply NAKs for RDMA/Atomic operations because of rkey violation detected in check_rkey(). The root cause is that rkeys are incremented each time a new MR is created and the value overflows into the range reserved for MWs. This commit also increases the value of RXE_MAX_MW that has been limited unlike other parameters. Fixes: `0994a1bcd5` ("RDMA/rxe: Bump up default maximum values used via uverbs") Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-09 10:48:16 -04:00
Daisuke Matsuda	3a73746b26	RDMA/rxe: Fix inaccurate constants in rxe_type_info ibv_query_device() has reported incorrect device attributes, which are actually not used by the device. Make the constants correspond with the attributes shown to users. Fixes: `3ccffe8abf` ("RDMA/rxe: Move max_elem into rxe_type_info") Fixes: `3225717f6d` ("RDMA/rxe: Replace red-black trees by xarrays") Link: https://lore.kernel.org/r/20221220080848.253785-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-09 10:48:16 -04:00
Linus Torvalds	ed56954cf5	v6.2 merge window 2nd pull request Fix two build warnings on 32 bit platforms -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY50UewAKCRCFwuHvBreF YazuAQCHY/yOiziPBqNvI9jSb/MVGh1oaZQZRTkt5fOvLSGt8gD/Rv30/h6EcKfG S5A6eQ3qrVzEhp7P8mST2Q/MkGmQ+AQ= =Qw7m -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Fix two build warnings on 32 bit platforms It seems the linux-next CI and 0-day bot are not testing enough 32 bit configurations, as soon as you merged the rdma pull request there were two instant reports of warnings on these sytems that I would have thought should have been covered by time in linux-next Anyhow, here are the fixes so people don't hit problems with -Werror" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Fix pointer cast warning RDMA/rxe: Fix compile warnings on 32-bit	2022-12-17 08:23:42 -06:00
Arnd Bergmann	5244ca8867	RDMA/siw: Fix pointer cast warning The previous build fix left a remaining issue in configurations with 64-bit dma_addr_t on 32-bit architectures: drivers/infiniband/sw/siw/siw_qp_tx.c: In function 'siw_get_pblpage': drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 32 \| return virt_to_page((void )paddr); \| ^ Use the same double cast here that the driver uses elsewhere to convert between dma_addr_t and void. Fixes: `0d1b756acf` ("RDMA/siw: Pass a pointer to virt_to_page()") Link: https://lore.kernel.org/r/20221215170347.2612403-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-16 16:07:38 -04:00
Jason Gunthorpe	5fc24e6022	RDMA/rxe: Fix compile warnings on 32-bit Move the conditional code into a function, with two varients so it is harder to make these kinds of mistakes. drivers/infiniband/sw/rxe/rxe_resp.c: In function 'atomic_write_reply': drivers/infiniband/sw/rxe/rxe_resp.c:794:13: error: unused variable 'payload' [-Werror=unused-variable] 794 \| int payload = payload_size(pkt); \| ^~~~~~~ drivers/infiniband/sw/rxe/rxe_resp.c:793:24: error: unused variable 'mr' [-Werror=unused-variable] 793 \| struct rxe_mr mr = qp->resp.mr; \| ^~ drivers/infiniband/sw/rxe/rxe_resp.c:791:19: error: unused variable 'dst' [-Werror=unused-variable] 791 \| u64 src, dst; \| ^~~ drivers/infiniband/sw/rxe/rxe_resp.c:791:13: error: unused variable 'src' [-Werror=unused-variable] 791 \| u64 src, *dst; Fixes: `034e285f8b` ("RDMA/rxe: Make responder support atomic write on RC service") Link: https://lore.kernel.org/linux-rdma/Y5s+EVE7eLWQqOwv@nvidia.com/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-15 11:32:06 -04:00
Linus Torvalds	ab425febda	v6.2 merge window pull request Usual size of updates, a new driver a most of the bulk focusing on rxe: - Usual typos, style, and language updates - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp - Lots of RXE updates * Improve reply error handling for bad MR operations * Code tidying * Debug printing uses common loggers * Remove half implemented RD related stuff * Support IBA's recently defined Atomic Write and Flush operations - erdma support for atomic operations - New driver "mana" for Ethernet HW available in Azure VMs. This driver only supports DPDK -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5eIggAKCRCFwuHvBreF YeX7AP9+l5Y9J48OmK7y/YgADNo9g05agXp3E8EuUDmBU+PREgEAigdWaJVf2oea IctVja0ApLW5W+wsFt8Qh+V4PMiYTAM= =Q5V+ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual size of updates, a new driver, and most of the bulk focusing on rxe: - Usual typos, style, and language updates - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp - Lots of RXE updates: * Improve reply error handling for bad MR operations * Code tidying * Debug printing uses common loggers * Remove half implemented RD related stuff * Support IBA's recently defined Atomic Write and Flush operations - erdma support for atomic operations - New driver 'mana' for Ethernet HW available in Azure VMs. This driver only supports DPDK" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits) IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces RDMA: Add missed netdev_put() for the netdevice_tracker RDMA/rxe: Enable RDMA FLUSH capability for rxe device RDMA/cm: Make QP FLUSHABLE for supported device RDMA/rxe: Implement flush completion RDMA/rxe: Implement flush execution in responder side RDMA/rxe: Implement RC RDMA FLUSH service in requester side RDMA/rxe: Extend rxe packet format to support flush RDMA/rxe: Allow registering persistent flag for pmem MR only RDMA/rxe: Extend rxe user ABI to support flush RDMA: Extend RDMA kernel verbs ABI to support flush RDMA: Extend RDMA user ABI to support flush RDMA/rxe: Fix incorrect responder length checking RDMA/rxe: Fix oops with zero length reads RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define RDMA/hns: Fix XRC caps on HIP08 RDMA/hns: Fix error code of CMD RDMA/hns: Fix page size cap from firmware RDMA/hns: Fix PBL page MTR find RDMA/hns: Fix AH attr queried by query_qp ...	2022-12-14 09:27:13 -08:00
Li Zhijian	124011e6e9	RDMA/rxe: Enable RDMA FLUSH capability for rxe device Now we are ready to enable RDMA FLUSH capability for RXE. It can support Global Visibility and Persistence placement types. Link: https://lore.kernel.org/r/20221206130201.30986-11-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Li Zhijian	70aad902ce	RDMA/rxe: Implement flush completion Per IBA SPEC, FLUSH will ack in rdma read response with 0 length. Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH completion. Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Li Zhijian	ea1bb00ee9	RDMA/rxe: Implement flush execution in responder side Only the requested placement types that also registered in the destination memory region are acceptable. Otherwise, responder will also reply NAK "Remote Access Error" if it found a placement type violation. We will persist data via arch_wb_cache_pmem(), which could be architecture specific. This commit also adds 2 helpers to update qp.resp from the incoming packet. Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Li Zhijian	fa1fd682ad	RDMA/rxe: Implement RC RDMA FLUSH service in requester side Implement FLUSH request operation in the requester. Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Li Zhijian	02e9a31c89	RDMA/rxe: Extend rxe packet format to support flush Extend rxe opcode tables, headers, helper and constants to support flush operations. Refer to the IBA A19.4.1 for more FETH definition details Link: https://lore.kernel.org/r/20221206130201.30986-6-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Li Zhijian	02ea0a5115	RDMA/rxe: Allow registering persistent flag for pmem MR only Memory region could support at most 2 flush access flags: IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL But we only allow user to register persistent flush flags to the pmem MR where it has the ability of persisting data across power cycles. So registering a persistent access flag to a non-pmem MR will be rejected. Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com CC: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:02 -04:00
Bob Pearson	689c5421bf	RDMA/rxe: Fix incorrect responder length checking The code in rxe_resp.c at check_length() is incorrect as it compares pkt->opcode an 8 bit value against various mask bits which are all higher than 256 so nothing is ever reported. This patch rewrites this to compare against pkt->mask which is correct. However this now exposes another error. For UD send packets the value of the pmtu cannot be determined from qp->mtu. All that is required here is to later check if the payload fits into the posted receive buffer in that case. Fixes: `837a55847e` ("RDMA/rxe: Implement packet length validation on responder") Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:26:27 -04:00
Daisuke Matsuda	3282a549cf	RDMA/rxe: Fix oops with zero length reads The commit `686d348476` ("RDMA/rxe: Remove unnecessary mr testing") causes a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr is not set in check_rkey() (see IBA C9-88). The mr is NULL in this case, and a NULL pointer dereference occurs as shown below. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246 RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000 RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000 RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000 FS: 00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0 Call Trace: <IRQ> read_reply+0xda/0x310 [rdma_rxe] rxe_responder+0x82d/0xe50 [rdma_rxe] do_task+0x84/0x170 [rdma_rxe] tasklet_action_common.constprop.0+0xa7/0x120 __do_softirq+0xcb/0x2ac do_softirq+0x63/0x90 </IRQ> Support a NULL mr during read_reply() Fixes: `686d348476` ("RDMA/rxe: Remove unnecessary mr testing") Fixes: `b5f9a01fae` ("RDMA/rxe: Fix mr leak in RESPST_ERR_RNR") Link: https://lore.kernel.org/r/20221209045926.531689-1-matsuda-daisuke@fujitsu.com Link: https://lore.kernel.org/r/20221202145713.13152-1-lizhijian@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 15:57:51 -04:00
Jason Gunthorpe	d69e8c63fc	Linux 6.1-rc8 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmONI6weHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9xgH/jqXGuMoO1ikfmGb 7oY0W/f69G9V/e0DxFLvnIjhFgCUzdnNsmD4jQJA4x6QsxwLWuvpI282Ez+bHV5T U4RPsxJZIIMsXE2lKM9BRgeLzDdCt0aK4Pj+3x2x7NZC5cWFSQ8PyQJkCwg+0PQo u8Ly+GO8c4RUMf4/rrAZQq16qZUqGDaGm1EJhtSoa+KiR81LmUUmbDIK9Mr53rmQ wou+95XhibwMWr17WgXA28bTgYqn9UGr67V3qvTH2LC7GW8BCoKvn+3wh6TVhlWj dsWplXgcOP0/OHvSC5Sb1Uibk5Gx3DlIzYa6OfNZQuZ5xmQqm9kXjW8lmYpWFHy/ 38/5HWc= =EuoA -----END PGP SIGNATURE----- Merge tag 'v6.1-rc8' into rdma.git for-next For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 15:52:17 -04:00
Xiao Yang	4cd9f1d320	RDMA/rxe: Enable atomic write capability for rxe device The capability shows that rxe device supports atomic write operation. Link: https://lore.kernel.org/r/1669905568-62-4-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:10 -04:00
Xiao Yang	3aec427bb1	RDMA/rxe: Implement atomic write completion Generate an atomic write completion when the atomic write request has been finished. Link: https://lore.kernel.org/r/1669905568-62-3-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:10 -04:00
Xiao Yang	034e285f8b	RDMA/rxe: Make responder support atomic write on RC service Make responder process an atomic write request and send a read response on RC service. Link: https://lore.kernel.org/r/1669905568-62-2-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:09 -04:00
Xiao Yang	abb633cf28	RDMA/rxe: Make requester support atomic write on RC service Make requester process and send an atomic write request on RC service. Link: https://lore.kernel.org/r/1669905568-62-1-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:09 -04:00
Xiao Yang	5c7af6c793	RDMA/rxe: Extend rxe packet format to support atomic write Extend rxe_wr_opcode_info[] and rxe_opcode[] for new atomic write opcode. Link: https://lore.kernel.org/r/1669905432-14-5-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:09 -04:00
David Hildenbrand	129e636fe9	RDMA/siw: remove FOLL_FORCE usage GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE \| FOLL_WRITE \| FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/20221116102659.70287-13-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-11-30 15:58:59 -08:00
Zhang Xiaoxu	f67376d801	RDMA/rxe: Fix NULL-ptr-deref in rxe_qp_do_cleanup() when socket create failed There is a null-ptr-deref when mount.cifs over rdma: BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xad/0x130 rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] execute_in_process_context+0x25/0x90 __rxe_cleanup+0x101/0x1d0 [rdma_rxe] rxe_create_qp+0x16a/0x180 [rdma_rxe] create_qp.part.0+0x27d/0x340 ib_create_qp_kernel+0x73/0x160 rdma_create_qp+0x100/0x230 _smbd_get_connection+0x752/0x20f0 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The root cause of the issue is the socket create failed in rxe_qp_init_req(). So move the reset rxe_qp_do_cleanup() after the NULL ptr check. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20221122151437.1057671-1-zhangxiaoxu5@huawei.com Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-22 15:55:54 -04:00
Jason Gunthorpe	cb6562c380	RDMA/rxe: Do not NULL deref on debugging failure path Correct the mistake, mr is obviously NULL in this code path. Fixes: `2778b72b1d` ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c") Link: https://lore.kernel.org/r/Y3eeJW0AdyJYhYyQ@kili Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-22 15:53:02 -04:00
Li Zhijian	7d984dac8f	RDMA/rxe: Fix mr->map double free rxe_mr_cleanup() which tries to free mr->map again will be called when rxe_mr_init_user() fails: CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x5d panic+0x19e/0x349 end_report.part.0+0x54/0x7c kasan_report.cold+0xa/0xf rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe] __rxe_cleanup+0x10a/0x1e0 [rdma_rxe] rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe] ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs] This issue was firstly exposed since commit `b18c7da63f` ("RDMA/rxe: Fix memory leak in error path code") and then we fixed it in commit `8ff5f5d9d8` ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this fix was reverted together at last by commit `1e75550648` (Revert "RDMA/rxe: Create duplicate mapping tables for FMRs") Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is successfully allocated. Fixes: `1e75550648` ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-18 20:15:51 -04:00
Zhu Yanjun	8e1a76493b	RDMA/rxe: Remove reliable datagram support The rdma_rxe driver does not actually support the reliable datagram transport but contains a variable with RD opcodes in driver code. And this variable is never used. So remove it. Link: https://lore.kernel.org/r/20221112023537.432912-1-yanjun.zhu@intel.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-18 19:57:46 -04:00
Bernard Metzler	60da2d11fc	RDMA/siw: Set defined status for work completion with undefined status A malicious user may write undefined values into memory mapped completion queue elements status or opcode. Undefined status or opcode values will result in out-of-bounds access to an array mapping siw internal representation of opcode and status to RDMA core representation when reaping CQ elements. While siw detects those undefined values, it did not correctly set completion status to a defined value, thus defeating the whole purpose of the check. This bug leads to the following Smatch static checker warning: drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe() error: buffer overflow 'map_cqe_status' 10 <= 21 Fixes: `bdf1da5df9` ("RDMA/siw: Fix immediate work request flush to completion queue") Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-15 16:47:00 -04:00
Bob Pearson	5de087250f	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mmap.c Replace calls to pr_xxx() in rxe_mmap.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-17-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:07 -04:00
Bob Pearson	813728043b	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_icrc.c Replace calls to pr_xxx() in rxe_icrc.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-16-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:06 -04:00
Bob Pearson	c6aba5ea00	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c Replace calls to pr_xxx() in rxe.c with rxe_dbg_xxx(). Calls with a rxe device not yet in scope are left as is. Link: https://lore.kernel.org/r/20221103171013.20659-15-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:06 -04:00
Bob Pearson	fc50597934	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_task.c Replace calls to pr_xxx() in rxe_task.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-14-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:06 -04:00
Bob Pearson	25fd735a4c	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_av.c Replace calls to pr_xxx() in rxe_av.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-13-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:05 -04:00
Bob Pearson	14e501fdb0	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_verbs.c Replace calls to pr_xxx() in rxe_verbs.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-12-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:05 -04:00
Bob Pearson	0e6090024b	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_srq.c Replace calls to pr_xxx() in rxe_srq.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-11-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:05 -04:00
Bob Pearson	74ddf7233c	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_resp.c Replace calls to pr_xxx() in rxe_resp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:05 -04:00
Bob Pearson	0edfb15e30	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_req.c Replace calls to pr_xxx() in rxe_req.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:04 -04:00
Bob Pearson	6af70060d2	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_qp.c Replace calls to pr_xxx() in rxe_qp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:04 -04:00
Bob Pearson	34549e88e0	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_net.c Replace (some) calls to pr_xxx() in rxe_net.c with rxe_dbg_xxx(). Calls with a rxe device not yet in scope are left as is. Link: https://lore.kernel.org/r/20221103171013.20659-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:04 -04:00
Bob Pearson	e8a87efdf8	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.c Replace calls to pr_xxx() int rxe_mw.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:03 -04:00
Bob Pearson	2778b72b1d	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr(). Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:03 -04:00
Bob Pearson	52920f537a	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_cq.c Replace calls to pr_xxx() in rxe_cq.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:03 -04:00
Bob Pearson	27c4c520bd	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_comp.c Replace calls to pr_xxx() in rxe_comp.c with rxe_dbg_xxx(). Link: https://lore.kernel.org/r/20221103171013.20659-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:02 -04:00
Bob Pearson	4554bac48a	RDMA/rxe: Add ibdev_dbg macros for rxe Add macros borrowed from siw to call dynamic debug macro ibdev_dbg. Link: https://lore.kernel.org/r/20221103171013.20659-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-11-10 15:33:02 -04:00
Daisuke Matsuda	837a55847e	RDMA/rxe: Implement packet length validation on responder The function check_length() is supposed to check the length of inbound packets on responder, but it actually has been a stub since the driver was born. Let it check the payload length and the DMA length. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20221107055338.357184-1-matsuda-daisuke@fujitsu.com Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-11-09 19:54:57 +02:00
Bernard Metzler	bdf1da5df9	RDMA/siw: Fix immediate work request flush to completion queue Correctly set send queue element opcode during immediate work request flushing in post sendqueue operation, if the QP is in ERROR state. An undefined ocode value results in out-of-bounds access to an array for mapping the opcode between siw internal and RDMA core representation in work completion generation. It resulted in a KASAN BUG report of type 'global-out-of-bounds' during NFSoRDMA testing. This patch further fixes a potential case of a malicious user which may write undefined values for completion queue elements status or opcode, if the CQ is memory mapped to user land. It avoids the same out-of-bounds access to arrays for status and opcode mapping as described above. Fixes: `303ae1cdfd` ("rdma/siw: application interface") Fixes: `b0fff7317b` ("rdma/siw: completion queue methods") Reported-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20221107145057.895747-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-11-09 15:26:49 +02:00
Yunsheng Lin	692373d186	RDMA/rxe: cleanup some error handling in rxe_verbs.c Instead of 'goto and return', just return directly to simplify the error handling, and avoid some unnecessary return value check. Link: https://lore.kernel.org/r/20221028075053.3990467-1-xuhaoyue1@hisilicon.com Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 15:11:44 -03:00
Xiao Yang	b071850ef6	RDMA/rxe: Remove the duplicate assignment of mr->map_shift mr->map_shift is set to ilog2(RXE_BUF_PER_MAP) in both rxe_mr_init() and rxe_mr_alloc() so remove the duplicate one in rxe_mr_init(). Link: https://lore.kernel.org/r/1666855893-145-1-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 15:08:41 -03:00
Li Zhijian	875ab4a8d9	RDMA/rxe: Make sure requested access is a subset of {mr,mw}->access We should reject the requests with access flags that is not registered by MR/MW. For example, lookup_mr() should return NULL when requested access is 0x03 and mr->access is 0x01. Link: https://lore.kernel.org/r/20220927055337.22630-2-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 14:39:47 -03:00
Bob Pearson	63a18baef2	RDMA/rxe: Rename task->state_lock to task->lock Rename task-state_lock to task->lock Link: https://lore.kernel.org/r/20221021200118.2163-7-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:16 -03:00
Bob Pearson	dcef28528c	RDMA/rxe: Make rxe_do_task static The subroutine rxe_do_task() is only called in rxe_task.c. This patch makes it static and renames it do_task(). Link: https://lore.kernel.org/r/20221021200118.2163-6-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:15 -03:00
Bob Pearson	dccb23f6c3	RDMA/rxe: Split rxe_run_task() into two subroutines Split rxe_run_task(task, sched) into rxe_run_task(task) and rxe_sched_task(task). Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:15 -03:00
Bob Pearson	de669ae8af	RDMA/rxe: Removed unused name from rxe_task struct The name field in struct rxe_task is never used. This patch removes it. Link: https://lore.kernel.org/r/20221021200118.2163-4-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:15 -03:00
Bob Pearson	98a54f1706	RDMA/rxe: Remove init of task locks from rxe_qp.c The calls to spin_lock_init() for the tasklet spinlocks in rxe_qp_init_misc() are redundant since they are intiialized in rxe_init_task(). This patch removes them. Link: https://lore.kernel.org/r/20221021200118.2163-3-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:15 -03:00
Bob Pearson	05e88ebb9e	RDMA/rxe: Remove redundant header files Remove unneeded include files. Link: https://lore.kernel.org/r/20221021200118.2163-2-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-28 13:47:15 -03:00
Li Zhijian	b5f9a01fae	RDMA/rxe: Fix mr leak in RESPST_ERR_RNR rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr) to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning: WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe] ... Call Trace: rxe_dereg_mr+0x4c/0x60 [rdma_rxe] ib_dereg_mr_user+0xa8/0x200 [ib_core] ib_mr_pool_destroy+0x77/0xb0 [ib_core] nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma] nvme_rdma_free_queue+0x40/0x50 [nvme_rdma] nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma] nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma] process_one_work+0x582/0xa40 ? pwq_dec_nr_in_flight+0x100/0x100 ? rwlock_bug.part.0+0x60/0x60 worker_thread+0x2a9/0x700 ? process_one_work+0xa40/0xa40 kthread+0x168/0x1a0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Link: https://lore.kernel.org/r/20221024052049.20577-1-lizhijian@fujitsu.com Fixes: `8a1a0be894` ("RDMA/rxe: Replace mr by rkey in responder resources") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-25 08:59:29 +03:00
Li Zhijian	686d348476	RDMA/rxe: Remove unnecessary mr testing Before the testing, we already passed it to rxe_mr_copy() where mr could be dereferenced. so this checking is not needed. The only way that mr is NULL is when it reaches below line 780 with 'qp->resp.mr = NULL', which is not possible in Bob's explanation[1]. 778 if (res->state == rdatm_res_state_new) { 779 if (!res->replay) { 780 mr = qp->resp.mr; 781 qp->resp.mr = NULL; 782 } else { [1] https://lore.kernel.org/lkml/30ff25c4-ce66-eac4-eaa2-64c0db203a19@gmail.com/ Link: https://lore.kernel.org/r/1666582315-2-1-git-send-email-lizhijian@fujitsu.com CC: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-25 08:56:32 +03:00
Daisuke Matsuda	5ac814e02e	RDMA/rxe: Handle remote errors in the midst of a Read reply sequence Requesting nodes do not handle a reported error correctly if it is generated in the middle of multi-packet Read responses, and the node tries to resend the request endlessly. Let completer terminate the connection in that case. Link: https://lore.kernel.org/r/20221013014724.3786212-2-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-25 08:56:32 +03:00
Daisuke Matsuda	5ebc548f4f	RDMA/rxe: Make responder handle RDMA Read failures Currently, responder can reply packets with invalid payloads if it fails to copy messages to the packets. Add an error handling in read_reply() to inform a requesting node of the failure. Link: https://lore.kernel.org/r/20221013014724.3786212-1-matsuda-daisuke@fujitsu.com Suggested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-25 08:56:32 +03:00
yangx.jy@fujitsu.com	71d2363991	RDMA/rxe: Remove the member 'type' of struct rxe_mr The member 'type' is included in both struct rxe_mr and struct ib_mr so remove the duplicate one of struct rxe_mr. Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-24 14:50:14 +03:00
Jason Gunthorpe	33331a728c	Linux 6.0 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmM5/fMeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+Y0H/jG0HQjUHfIOGjPQ T33eaF5POoZKzZmsTNITbtya7B3/OTZZRGdSq9B8t5tElyPnZrdIfaXds17mCI8y 5DJUEQGdv6fqRttYLGKf6rk1kzABhhaTS8n9BgDFEsmvdqwG4AV6dLQr3HL09gTV Lu+Is86RPwmpgH0Z9u7DyFCF8yLjefyu0vl6eFm/SXjCE8gQM/LZQSi9mv5/YDxa MVKlsZKIkQm6P8a6r8wbGedKYBped4+4gYedsf/IcS0lvKdLIs/P7YgR5wKklSbM tqECvBOmKq1Fwj/oxH+fx0KLX/ZD1xxQwoQd+a9DOSo+BuPBt6KGojYT9gQRyFJp R7gyUCo= =2UOD -----END PGP SIGNATURE----- Merge tag 'v6.0' into rdma.git for-next Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-10-06 19:48:45 -03:00
Daisuke Matsuda	8ad891ed43	RDMA/rxe: Remove error/warning messages from packet receiver path Incoming packets to rxe are passed from UDP layer using an encapsulation socket. If there are any clients reachable to a node, they can invoke the encapsulation handler arbitrarily by sending malicious or irrelevant packets. This can potentially cause a message overflow and a subsequent slowdown on the node. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220929080023.304242-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-09-29 12:57:56 +03:00
Xiu Jianfeng	78657a445c	IB/rdmavt: Add __init/__exit annotations to module init/exit funcs Add missing __init/__exit annotations to module init/exit funcs. Fixes: `0194621b22` ("IB/rdmavt: Create module framework and handle driver registration") Link: https://lore.kernel.org/r/20220924091457.52446-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-09-27 10:15:25 -03:00
Bob Pearson	6c5e683925	RDMA/rxe: Remove redundant num_sge fields In include/uapi/rdma/rdma_user_rxe.h there are redundant copies of num_sge in the rxe_send_wr, rxe_recv_wqe, and rxe_dma_info. Only the ones in rxe_dma_info are actually used by the rxe kernel driver. The userspace would set these values, but the kernel never read them. This change has no affect on the current ABI and new or old versions of rdma-core operate correctly with new or old versions of the kernel rxe driver. Link: https://lore.kernel.org/r/20220913222716.18335-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-09-27 10:15:24 -03:00
Bob Pearson	fda5d0cf8a	RDMA/rxe: Fix resize_finish() in rxe_queue.c Currently in resize_finish() in rxe_queue.c there is a loop which copies the entries in the original queue into a newly allocated queue. The termination logic for this loop is incorrect. The call to queue_next_index() updates cons but has no effect on whether the queue is empty. So if the queue starts out empty nothing is copied but if it is not then the loop will run forever. This patch changes the loop to compare the value of cons to the original producer index. Fixes: `ae6e843fe0` ("RDMA/rxe: Add memory barriers to kernel queues") Link: https://lore.kernel.org/r/20220825221446.6512-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-09-27 10:15:24 -03:00
Bob Pearson	58651bbb30	RDMA/rxe: Set pd early in mr alloc routines Move setting of pd in mr objects ahead of any possible errors so that it will always be set in rxe_mr_cleanup() to avoid seg faults when rxe_put(mr_pd(mr)) is called. Fixes: `cf40367961` ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()") Link: https://lore.kernel.org/r/20220805183153.32007-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-09-27 10:15:24 -03:00
Li Zhijian	f994ae0a14	RDMA/rxe: Add send_common_ack() helper Most code in send_ack() and send_atomic_ack() are duplicate, move them to a new helper send_common_ack(). In newer IBA spec, some opcodes require acknowledge with a zero-length read response, with this new helper, we can easily implement it later. Link: https://lore.kernel.org/r/1659335010-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-09-26 14:14:25 -03:00
Daisuke Matsuda	954afc5a8f	RDMA/rxe: Use members of generic struct in rxe_mr rxe_mr and ib_mr have interchangeable members. Remove device specific members and use ones in the generic struct. Both 'iova' and 'length' are filled in ib_uverbs or ib_core layer after MR registration. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-22 12:46:39 +03:00
Bernard Metzler	a3c278807a	RDMA/siw: Fix QP destroy to wait for all references dropped. Delay QP destroy completion until all siw references to QP are dropped. The calling RDMA core will free QP structure after successful return from siw_qp_destroy() call, so siw must not hold any remaining reference to the QP upon return. A use-after-free was encountered in xfstest generic/460, while testing NFSoRDMA. Here, after a TCP connection drop by peer, the triggered siw_cm_work_handler got delayed until after QP destroy call, referencing a QP which has already freed. Fixes: `303ae1cdfd` ("rdma/siw: application interface") Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20220920082503.224189-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-20 21:23:52 +03:00
Bernard Metzler	754209850d	RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall. For header and trailer/padding processing, siw did not consume new skb data until minimum amount present to fill current header or trailer structure, including potential payload padding. Not consuming any data during upcall may cause a receive stall, since tcp_read_sock() is not upcalling again if no new data arrive. A NFSoRDMA client got stuck at RDMA Write reception of unaligned payload, if the current skb did contain only the expected 3 padding bytes, but not the 4 bytes CRC trailer. Expecting 4 more bytes already arrived in another skb, and not consuming those 3 bytes in the current upcall left the Write incomplete, waiting for the CRC forever. Fixes: `8b6a361b8c` ("rdma/siw: receive path") Reported-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20220920081202.223629-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-20 21:21:18 +03:00
Li Zhijian	415a04844a	RDMA/rxe: convert pr_warn to pr_debug They could be triggered by user APIs with invalid parameters. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/1662518901-2-2-git-send-email-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-08 11:03:15 +03:00
Li Zhijian	e2edba67fc	RDMA/rxe: use %u to print u32 variables struct ib_qp_cap { u32 max_send_wr; u32 max_recv_wr; u32 max_send_sge; u32 max_recv_sge; u32 max_inline_data; ... To avoid getting a negative value from dmesg: [410580.579965] rdma_rxe: invalid send sge = 65535 > 32 [410580.583818] rdma_rxe: invalid send wr = -1 > 1048576 [410582.771323] rdma_rxe: invalid recv sge = 65535 > 32 [410582.775310] rdma_rxe: invalid recv wr = -1 > 1048576 Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/1662518901-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-08 11:03:15 +03:00
Linus Walleij	0d1b756acf	RDMA/siw: Pass a pointer to virt_to_page() Functions that work on a pointer to virtual memory such as virt_to_pfn() and users of that function such as virt_to_page() are supposed to pass a pointer to virtual memory, ideally a (void ) or other pointer. However since many architectures implement virt_to_pfn() as a macro, this function becomes polymorphic and accepts both a (unsigned long) and a (void ). If we instead implement a proper virt_to_pfn(void addr) function the following happens (occurred on arch/arm): drivers/infiniband/sw/siw/siw_qp_tx.c:32:23: warning: incompatible integer to pointer conversion passing 'dma_addr_t' (aka 'unsigned int') to parameter of type 'const void ' [-Wint-conversion] drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: warning: passing argument 1 of 'virt_to_pfn' makes pointer from integer without a cast [-Wint-conversion] drivers/infiniband/sw/siw/siw_qp_tx.c:538:36: warning: incompatible integer to pointer conversion passing 'unsigned long long' to parameter of type 'const void ' [-Wint-conversion] Fix this with an explicit cast. In one case where the SIW SGE uses an unaligned u64 we need a double cast modifying the virtual address (va) to a platform-specific uintptr_t before casting to a (void ). Fixes: `b9be6f18cf` ("rdma/siw: transmit path") Cc: linux-rdma@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220902215918.603761-1-linus.walleij@linaro.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-04 10:21:59 +03:00
Tom Talpey	fc5e1acf6a	RDMA/siw: Add missing Kconfig selections The SoftiWARP Kconfig is missing "select" for CRYPTO and CRYPTO_CRC32C. In addition, it improperly "depends on" LIBCRC32C, this should be a "select", similar to net/sctp and others. As a dependency, SIW fails to appear in generic configurations. Link: https://lore.kernel.org/r/d366bf02-3271-754f-fc68-1a84016d0e19@talpey.com Signed-off-by: Tom Talpey <tom@talpey.com> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-01 10:12:01 +03:00
Daisuke Matsuda	2c02249fcb	RDMA/rxe: Delete error messages triggered by incoming Read requests An incoming Read request causes multiple Read responses. If a user MR to copy data from is unavailable or responder cannot send a reply, then the error messages can be printed for each response attempt, resulting in message overflow. Link: https://lore.kernel.org/r/20220829071218.1639065-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-08-31 09:57:09 +03:00
Zhu Yanjun	f07853582d	RDMA/rxe: Remove the unused variable obj The member variable obj in struct rxe_task is not needed. So remove it to save memory. Link: https://lore.kernel.org/r/20220822011615.805603-4-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-08-31 09:53:13 +03:00
Zhu Yanjun	548ce2e667	RDMA/rxe: Fix the error caused by qp->sk When sock_create_kern in the function rxe_qp_init_req fails, qp->sk is set to NULL. Then the function rxe_create_qp will call rxe_qp_do_cleanup to handle allocated resource. Before handling qp->sk, this variable should be checked. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220822011615.805603-3-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-08-31 09:53:13 +03:00
Zhu Yanjun	a625ca30ef	RDMA/rxe: Fix "kernel NULL pointer dereference" error When rxe_queue_init in the function rxe_qp_init_req fails, both qp->req.task.func and qp->req.task.arg are not initialized. Because of creation of qp fails, the function rxe_create_qp will call rxe_qp_do_cleanup to handle allocated resource. Before calling __rxe_do_task, both qp->req.task.func and qp->req.task.arg should be checked. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220822011615.805603-2-yanjun.zhu@linux.dev Reported-by: syzbot+ab99dc4c6e961eed8b8e@syzkaller.appspotmail.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-08-31 09:53:12 +03:00
Daisuke Matsuda	d4ecb56e86	RDMA/rxe: Remove an unused member from struct rxe_mr Commit `1e75550648` ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") brought back the member 'va' to struct rxe_mr. However, it is actually used by nobody and thus can be removed. Fixes: `1e75550648` ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/20220829012335.1212697-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-08-29 09:44:07 +03:00
Zhu Yanjun	fd5382c580	RDMA/rxe: Fix error unwind in rxe_create_qp() In the function rxe_create_qp(), rxe_qp_from_init() is called to initialize qp, internally things like the spin locks are not setup until rxe_qp_init_req(). If an error occures before this point then the unwind will call rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task() which will oops when trying to access the uninitialized spinlock. Move the spinlock initializations earlier before any failures. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220731063621.298405-1-yanjun.zhu@linux.dev Reported-by: syzbot+833061116fa28df97f3b@syzkaller.appspotmail.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 14:29:41 -03:00
Bob Pearson	62494ec7fb	RDMA/rxe: Split qp state for requester and completer Currently the requester can continue to process send wqes after an local qp operation error is detected because the setting of the qp state to the error state is deferred until later. This patch splits the qp state for the completer and requester into two separate states and sets qp->req.state = QP_STATE_ERROR as soon as the error is detected before another wqe can be executed. Link: https://lore.kernel.org/r/1658307368-1851-4-git-send-email-lizhijian@fujitsu.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 13:53:36 -03:00
Li Zhijian	ae720bdb70	RDMA/rxe: Generate error completion for error requester QP state As per IBTA specification, all subsequent WQEs while QP is in error state should be completed with a flush error. Here we check QP_STATE_ERROR after req_next_wqe() so that rxe_completer() has chance to be called where it will set CQ state to FLUSH ERROR and the completion can associate with its WQE. Link: https://lore.kernel.org/r/1658307368-1851-3-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 13:45:12 -03:00
Li Zhijian	dea4266f7b	RDMA/rxe: Update wqe_index for each wqe error completion Previously, if user space keeps sending abnormal wqe, queue.index will keep increasing while qp->req.wqe_index doesn't. Once qp->req.wqe_index==queue.index in next round, req_next_wqe() will treat queue as empty. In such case, no new completion would be generated. Update wqe_index for each wqe completion so that req_next_wqe() can get next wqe properly. Link: https://lore.kernel.org/r/1658307368-1851-2-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-08-02 13:41:36 -03:00
Li Zhijian	1e75550648	Revert "RDMA/rxe: Create duplicate mapping tables for FMRs" Below 2 commits will be reverted: commit `8ff5f5d9d8` ("RDMA/rxe: Prevent double freeing rxe_map_set()") commit `647bf13ce9` ("RDMA/rxe: Create duplicate mapping tables for FMRs") The community has a few bug reports which pointed this commit at last. Some proposals are raised up in the meantime but all of them have no follow-up operation. The previous commit led the map_set of FMR to be not available any more if the MR is registered again after invalidating. Although the mentioned patch try to fix a potential race in building/accessing the same table for fast memory regions, it broke rtrs etc ULPs. Since the latter could be worse, revert this patch. With previous commit, it's observed that a same MR in rnbd server will trigger below code path: -> rxe_mr_init_fast() \|-> alloc map_set() # map_set is uninitialized \|...-> rxe_map_mr_sg() # build the map_set \|-> rxe_mr_set_page() \|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means # we can access host memory(such rxe_mr_copy) \|...-> rxe_invalidate_mr() # mr->state change to FREE from VALID \|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE, # but map_set was not built again \|...-> rxe_mr_copy() # kernel crash due to access wild addresses # that lookup from the map_set The backtraces are not always identical. [1st]---------- RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe] Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75 RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000 RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004 RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04 R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038 R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000 FS: 0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe] rxe_responder+0x12ee/0x1b60 [rdma_rxe] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe] ? rxe_rcv+0x1d0/0x780 [rdma_rxe] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe] rxe_do_task+0x67/0xb0 [rdma_rxe] rxe_xmit_packet+0xc7/0x210 [rdma_rxe] rxe_requester+0x680/0xee0 [rdma_rxe] ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client] [2nd]---------- RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe] Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7 RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202 RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008 RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600 RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04 R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038 R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144 FS: 0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_responder+0x12ee/0x1b60 [rdma_rxe] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe] ? rxe_rcv+0x1d0/0x780 [rdma_rxe] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe] rxe_do_task+0x67/0xb0 [rdma_rxe] rxe_xmit_packet+0xc7/0x210 [rdma_rxe] rxe_requester+0x680/0xee0 [rdma_rxe] ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client] rxe_do_task+0x67/0xb0 [rdma_rxe] tasklet_action_common.constprop.0+0x92/0xc0 __do_softirq+0xe1/0x2d8 run_ksoftirqd+0x21/0x30 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/ Link: https://www.spinics.net/lists/linux-rdma/msg110836.html Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/ Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-27 12:22:57 +03:00
Bob Pearson	c2ea08ca5e	RDMA/rxe: Replace __rxe_do_task by rxe_run_task In rxe_req.c replace calls to __rxe_do_task() by calls to rxe_run_task(.., 0). Using __rxe_do_task is an error because the completer tasklet is not designed to be re-entrant and __rxe_do_task() should only be called when it is clear that no one else could be calling the completer tasklet as is the case in rxe_qp.c where this call is used in safe environments. Link: https://lore.kernel.org/r/20220630190425.2251-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	eff6d998ca	RDMA/rxe: Limit the number of calls to each tasklet Limit the maximum number of calls to each tasklet from rxe_do_task() before yielding the cpu. When the limit is reached reschedule the tasklet and exit the calling loop. This patch prevents one tasklet from consuming 100% of a cpu core and causing a deadlock or soft lockup. Link: https://lore.kernel.org/r/20220630190425.2251-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	8bb143c534	RDMA/rxe: Make the tasklet exits the same Make changes to the three tasklets so that the exit logic from each is the same. This makes the code easier to understand. Link: https://lore.kernel.org/r/20220630190425.2251-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	445fd4f4fb	RDMA/rxe: Fix rnr retry behavior Currently the completer tasklet when retransmit timer or the rnr timer fires the same flag (qp->req.need_retry) is set so that if either timer fires it will attempt to perform a retry flow on the send queue. This has the effect of responding to an RNR NAK at the first retransmit timer event which might not allow the requested rnr timeout. This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set, prevents a retry flow until the rnr nak timer fires. This patch fixes rnr retry errors which can be observed by running the pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this patch applied they do not occur. Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/ Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Bob Pearson	930119a172	RDMA/rxe: Add rxe_is_fenced() subroutine The code thc that decides whether to defer execution of a wqe in rxe_requester.c is isolated into a subroutine rxe_is_fenced() and removed from the call to req_next_wqe(). The condition whether a wqe should be fenced is changed to comply with the IBA. Currently an operation is fenced if the fence bit is set in the wqe flags and the last wqe has not completed. For normal operations the IBA actually only requires that the last read or atomic operation is complete. Link: https://lore.kernel.org/r/20220630190425.2251-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:43:00 -03:00
Md Haris Iqbal	174e7b1370	RDMA/rxe: For invalidate compare according to set keys in mr The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have the same value, including the variant bits. So, if mr->rkey is set, compare the invalidate key with it, otherwise compare with the mr->lkey. Since we already did a lookup on the non-varient bits to get this far, the check's only purpose is to confirm that the wqe has the correct variant bits. Fixes: `001345339f` ("RDMA/rxe: Separate HW and SW l/rkeys") Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-22 17:35:53 -03:00
Bob Pearson	1603f89935	RDMA/rxe: Fix mw bind to allow any consumer key portion The current implementation of rxe_check_bind_mw() in rxe_mw.c is incorrect since it requires the new key portion provided by the mw consumer to be different than the previous key portion. This is not required by the IBA. Remove the test. Link: https://lore.kernel.org/linux-rdma/fb4614e7-4cac-0dc7-3ef7-766dfd10e8f2@gmail.com/ Fixes: `32a577b4c3` ("Add support for bind MW work requests") Link: https://lore.kernel.org/r/20220714204619.13396-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-21 11:35:31 -03:00
Zhang Jiaming	5abb71b47c	RDMA/rxe: Fix spelling mistake in error print There is a spelling mistake (writeable) in function rxe_check_bind_mw. Fix it. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-21 09:59:29 +03:00
Xiao Yang	68691bad98	RDMA/rxe: Remove unused qp parameter The qp parameter in free_rd_atomic_resource() has become unused so remove it directly. Fixes: `15ae1375ea` ("RDMA/rxe: Fix qp reference counting for atomic ops") Link: https://lore.kernel.org/all/20220708035547.6592-1-yangx.jy@fujitsu.com/ Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-19 11:31:09 +03:00
lizhijian@fujitsu.com	03905ac285	RDMA/rxe: Remove unused mask parameter This parameter had been deprecated since below commit: `1a7085b342` ("RDMA/rxe: Skip adjusting remote addr for write in retry operation") Link: https://lore.kernel.org/r/20220715035340.1900168-1-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 14:50:35 +03:00
Xiao Yang	548c56dd2e	RDMA/rxe: Rename rxe_atomic_reply to atomic_reply It's better to use the unified naming format. Link: https://lore.kernel.org/r/20220705145212.12014-2-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 14:36:18 +03:00
Xiao Yang	882736fb3b	RDMA/rxe: Add common rxe_prepare_res() It's redundant to prepare resources for Read and Atomic requests by different functions. Replace them by a common rxe_prepare_res() with different parameters. In addition, the common rxe_prepare_res() can also be used by new Flush and Atomic Write requests in the future. Link: https://lore.kernel.org/r/20220705145212.12014-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 14:36:11 +03:00
Zhu Yanjun	37da51efe6	RDMA/rxe: Fix BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup The function rxe_create_qp calls rxe_qp_from_init. If some error occurs, the error handler of function rxe_qp_from_init will set both scq and rcq to NULL. Then rxe_create_qp calls rxe_put to handle qp. In the end, rxe_qp_do_cleanup is called by rxe_put. rxe_qp_do_cleanup directly accesses scq and rcq before checking them. This will cause null-ptr-deref error. The call graph is as below: rxe_create_qp { ... rxe_qp_from_init { ... err1: ... qp->rcq = NULL; <---rcq is set to NULL qp->scq = NULL; <---scq is set to NULL ... } qp_init: rxe_put{ ... rxe_qp_do_cleanup { ... atomic_dec(&qp->scq->num_wq); <--- scq is accessed ... atomic_dec(&qp->rcq->num_wq); <--- rcq is accessed } } Fixes: `4703b4f0d9` ("RDMA/rxe: Enforce IBA C11-17") Link: https://lore.kernel.org/r/20220705225414.315478-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 14:32:39 +03:00
Cheng Xu	3056fc6c32	RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY in this case. This may trigger a call trace in iw_cm. A simple way to trigger this: server: ib_send_lat client: ib_send_lat -R <server_ip> The call trace looks like this: kernel BUG at drivers/infiniband/core/iwcm.c:894! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <...> Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: <TASK> cm_work_handler+0x1dd/0x370 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x49/0x2e0 ? rescuer_thread+0x370/0x370 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: `6c52fdc244` ("rdma/siw: connection management") Link: https://lore.kernel.org/r/dae34b5fd5c2ea2bd9744812c1d2653a34a94c67.1657706960.git.chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 14:20:52 +03:00
Andrey Strachuk	aeea6cc067	RDMA: remove useless condition in siw_create_cq() Comparison of 'cq' with NULL is useless since 'cq' is a result of container_of and cannot be NULL in any reasonable scenario. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `303ae1cdfd` ("rdma/siw: application interface") Link: https://lore.kernel.org/r/20220711151251.17089-1-strochuk@ispras.ru Signed-off-by: Andrey Strachuk <strochuk@ispras.ru> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-07-18 12:27:28 +03:00
Zhang Jiaming	2635d2a8d4	IB: Fix spelling of 'writable' There is a typo (writeable) in qib_file_ops.c, qib_sd7220.c's comments, and in rxe_check_bind_mw() Link: https://lore.kernel.org/r/20220701074812.12615-1-jiaming@nfschina.com Link: https://lore.kernel.org/r/20220701080019.13329-1-jiaming@nfschina.com Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-04 10:14:57 -03:00
Bob Pearson	96938258b1	RDMA/rxe: Remove unnecessary include statement rxe_verbs.h includes the file <rdma/rdma_user_rxe.h>. It should have been <uapi/rdma/rdma_user_rxe.h>, however, it is not used and not required in this file. This patch removes the include statement. Link: https://lore.kernel.org/r/20220630190425.2251-4-rpearsonhpe@gmail.com Reported-by: Frank Zago <frank.zago@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-07-04 10:04:09 -03:00
Bob Pearson	f5d1f6d63c	RDMA/rxe: Replace include statement rxe_queue.h currently includes <uapi/rdma/rdma_user_rxe.h> for a definition of struct rxe_queue_buf. But it is only used as a pointer so the definition is not needed. This patch replaces the include statement with the declaration struct rxe_queue_buf; Link: https://lore.kernel.org/r/20220630190425.2251-5-rpearsonhpe@gmail.com Reported-by: Frank Zago <frank.zago@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 20:45:00 -03:00
Bob Pearson	cae3fa541e	RDMA/rxe: Convert pr_warn/err to pr_debug in pyverbs The pyverbs test suite generates a few dmesg traces from intentional error tests. This patch replaces those messages with pr_debug() calls which improves the usefullness of the tests. Link: https://lore.kernel.org/r/20220630190425.2251-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 20:45:00 -03:00
Bob Pearson	7cb33d1bc1	RDMA/rxe: Fix deadlock in rxe_do_local_ops() When a local operation (invalidate mr, reg mr, bind mw) is finished there will be no ack packet coming from a responder to cause the wqe to be completed. This may happen anyway if a subsequent wqe performs IO. Currently if the wqe is signalled the completer tasklet is scheduled immediately but not otherwise. This leads to a deadlock if the next wqe has the fence bit set in send flags and the operation is not signalled. This patch removes the condition that the wqe must be signalled in order to schedule the completer tasklet which is the simplest fix for this deadlock and is fairly low cost. This is the analog for local operations of always setting the ackreq bit in all last or only request packets even if the operation is not signalled. Link: https://lore.kernel.org/r/20220523223251.15350-1-rpearsonhpe@gmail.com Reported-by: Jenny Hack <jhack@hpe.com> Fixes: `c1a411268a` ("RDMA/rxe: Move local ops to subroutine") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 17:00:23 -03:00
Bob Pearson	dc18483881	RDMA/rxe: Merge normal and retry atomic flows Make the execution of the atomic operation in rxe_atomic_reply() conditional on res->replay and make duplicate_request() call into rxe_atomic_reply() to merge the two flows. This is modeled on the behavior of read reply. Delete the skb from the atomic responder resource since it is no longer used. Adjust the reference counting of the qp in send_atomic_ack() for this flow. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220606143836.3323-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 14:00:21 -03:00
Bob Pearson	8264411595	RDMA/rxe: Move atomic original value to res Move the saved original value to the atomic responder resource. This replaces saving it in the qp. In preparation for merging the normal and retry atomic responder flows. Link: https://lore.kernel.org/r/20220606143836.3323-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 14:00:21 -03:00
Bob Pearson	220e842815	RDMA/rxe: Move atomic responder res to atomic_reply Move the allocation of the atomic responder resource up into rxe_atomic_reply() from send_atomic_ack(). In preparation for merging the normal and retry atomic responder flows. Link: https://lore.kernel.org/r/20220606143836.3323-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 14:00:21 -03:00
Bob Pearson	0ed5493e43	RDMA/rxe: Add a responder state for atomic reply Add a responder state for atomic reply similar to read reply and rename process_atomic() rxe_atomic_reply(). In preparation for merging the normal and retry atomic responder flows. Link: https://lore.kernel.org/r/20220606143836.3323-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 13:54:04 -03:00
Bob Pearson	24f0ab0102	RDMA/rxe: Move code to rxe_prepare_atomic_res() Separate the code that prepares the atomic responder resource into a subroutine. This is preparation for merging the normal and retry atomic responder flows. Link: https://lore.kernel.org/r/20220606143836.3323-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 13:54:03 -03:00
Bob Pearson	b54c2a25ac	RDMA/rxe: Convert read side locking to rcu Use rcu_read_lock() for protecting read side operations in rxe_pool.c. Link: https://lore.kernel.org/r/20220612223434.31462-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 10:56:05 -03:00
Bob Pearson	215d0a755e	RDMA/rxe: Stop lookup of partially built objects Currently the rdma_rxe driver has a security weakness due to giving objects which are partially initialized indices allowing external actors to gain access to them by sending packets which refer to their index (e.g. qpn, rkey, etc) causing unpredictable results. This patch adds a new API rxe_finalize(obj) which enables looking up pool objects from indices using rxe_pool_get_index() for AH, QP, MR, and MW. They are added in create verbs only after the objects are fully initialized. It also adds wait for completion to destroy/dealloc verbs to assure that all references have been dropped before returning to rdma_core by implementing a new rxe_pool API rxe_cleanup() which drops a reference to the object and then waits for all other references to be dropped. When the last reference is dropped the object is completed by kref. After that it cleans up the object and if locally allocated frees the memory. In the special case of address handle objects the delay is implemented separately if the destroy_ah call is not sleepable. Combined with deferring cleanup code to type specific cleanup routines this allows all pending activity referring to objects to complete before returning to rdma_core. Link: https://lore.kernel.org/r/20220612223434.31462-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-30 10:56:01 -03:00
Xiao Yang	80a14dd4c3	RDMA/rxe: Remove useless pkt parameters The pkt parameters in prepare_ack_packet(), send_ack() and send_atomic_ack() have become useless by the following commits. So remove them directly. Fixes: `bf139b58af` ("RDMA/rxe: Remove unused pkt->offset") Fixes: `3896bde92d` ("RDMA/rxe: Fix extra copy in prepare_ack_packet") Link: https://lore.kernel.org/r/20220623131627.18903-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-06-24 19:35:21 -03:00
Dongliang Mu	1a685940e6	RDMA/rxe: fix xa_alloc_cycle() error return value check again Currently rxe_alloc checks ret to indicate error, but 1 is also a valid return and just indicates that the allocation succeeded with a wrap. Fix this by modifying the check to be < 0. Link: https://lore.kernel.org/r/20220609070656.1446121-1-dzm91@hust.edu.cn Fixes: `3225717f6d` ("RDMA/rxe: Replace red-black trees by xarrays") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-06-16 20:07:07 +03:00
Christophe JAILLET	7f60951ff4	RDMA/rxe: Fix an error handling path in rxe_get_mcg() The commit in the Fixes tag has shuffled some code. Now 'mcg_num' is incremented before the kzalloc(). So if the memory allocation fails, this increment must be undone. Fixes: `a926a903b7` ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock") Link: https://lore.kernel.org/r/fe137cd8b1f17593243aa73d59c18ea71ab9ee36.1653225896.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-24 12:55:12 -03:00
Jason Gunthorpe	a6f844da39	Linux 5.18 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmKKlIAeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC3oH/iPm/fLG2sJut8My sU0RC9K+6ESV5h2Qy6k00/lqKstlu4EvBjw4V8vYpx3Q2+hbSFMn2SeWqqqT3Lkk Zb8KINCFuuyMtdCBb42PV0zhUf5pCQF7ocm/Ae4jllDHtPmqk3WJ6IGtZBK5JBlw z6RR/wKt0y0MRj9eZyPyYjOee2L2vuVh4tgnexK/4L8g2ZtMMRThhvUzSMWG4zxR STYYNp0uFcfT1Vt85+ODevFH4TvdECAj+SqAegN+seHLM17YY7M0/WiIYpxGRv8P lIpDQl4PBU8EBkpI5hkpJ/3qPincbuVOMLsYfxFtpcjjG12vGjFp2krGpS3TedZQ 3mvaJ7c= =vLke -----END PGP SIGNATURE----- Merge tag 'v5.18' into rdma.git for-next Following patches have dependencies. Resolve the merge conflict in drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names for the fs functions following linux-next: https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-24 12:40:28 -03:00
Bernard Metzler	a2d36b02c1	RDMA/siw: Enable siw on tunnel devices Enable siw to attach to tunnel devices, there is no reason not to, siw properly generates all packets already. Link: https://lore.kernel.org/r/20220510143917.23735-1-bmt@zurich.ibm.com Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-11 13:52:38 -03:00
Bob Pearson	4703b4f0d9	RDMA/rxe: Enforce IBA C11-17 Add a counter to keep track of the number of WQs connected to a CQ and return an error if destroy_cq() is called while the counter is non zero. Link: https://lore.kernel.org/r/20220421014042.26985-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:45 -03:00
Bob Pearson	cde3f5d682	RDMA/rxe: Move mw cleanup code to rxe_mw_cleanup() Move code from rxe_dealloc_mw() to rxe_mw_cleanup() to allow flows which hold a reference to mw to complete. Link: https://lore.kernel.org/r/20220421014042.26985-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:45 -03:00
Bob Pearson	cf40367961	RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup() Move the code which tears down an mr to rxe_mr_cleanup to allow operations holding a reference to the mr to complete. Link: https://lore.kernel.org/r/20220421014042.26985-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:45 -03:00
Bob Pearson	ed2b5dd0f8	RDMA/rxe: Move qp cleanup code to rxe_qp_do_cleanup() Move the code from rxe_qp_destroy() to rxe_qp_do_cleanup(). This allows flows holding references to qp to complete before the qp object is torn down. Link: https://lore.kernel.org/r/20220421014042.26985-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:45 -03:00
Bob Pearson	4e05a4b329	RDMA/rxe: Check rxe_get() return value In the tasklets (completer, responder, and requester) check the return value from rxe_get() to detect failures to get a reference. This only occurs if the qp has had its reference count drop to zero which indicates that it no longer should be used. The ref is never 0 today because the tasklets are flushed before the ref is dropped. The next patch changes this so that the ref is dropped then the tasklets are flushed. Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:45 -03:00
Bob Pearson	b2a41678fc	RDMA/rxe: Add rxe_srq_cleanup() Move cleanup code from rxe_destroy_srq() to rxe_srq_cleanup() which is called after all references are dropped to allow code depending on the srq object to complete. Link: https://lore.kernel.org/r/20220421014042.26985-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-09 09:03:39 -03:00
Bob Pearson	0b1fbfb9e9	RDMA/rxe: Remove IB_SRQ_INIT_MASK Currently the #define IB_SRQ_INIT_MASK is used to distinguish the rxe_create_srq verb from the rxe_modify_srq verb so that some code can be shared between these two subroutines. This commit splits rxe_srq_chk_attr into two subroutines: rxe_srq_chk_init and rxe_srq_chk_attr which handle the create_srq and modify_srq verbs separately. Link: https://lore.kernel.org/r/20220421014042.26985-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-06 15:33:31 -03:00
Chengguang Xu	1a7085b342	RDMA/rxe: Skip adjusting remote addr for write in retry operation For write request the remote addr will be sent only with first packet so we don't have to adjust wqe->iova in retry operation. Link: https://lore.kernel.org/r/20220502053907.6388-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-06 13:12:56 -03:00
Zhu Yanjun	08d709d5e1	RDMA/rxe: Optimize the mr pool struct Based on the commit `c9f4c69583` ("RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC"), only the mr pool uses the RXE_POOL_ALLOC, As such, replace this flags with pool type to save memory. Link: https://lore.kernel.org/r/20220428041028.1363139-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-04 22:02:58 -03:00
Bob Pearson	bfdc0edd11	RDMA/rxe: Change mcg_lock to a _bh lock rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while rxe_recv.c uses _bh spinlocks for the same lock. As there is no case where the mcg_lock can be taken from an IRQ, change these all to bh locks so we don't have confusing mismatched lock types on the same spinlock. Fixes: `6090a0c4c7` ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-04 21:29:25 -03:00
Bob Pearson	a926a903b7	RDMA/rxe: Do not call dev_mc_add/del() under a spinlock These routines were not intended to be called under a spinlock and will throw debugging warnings: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50 CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:warn_bogus_irq_restore+0x2f/0x50 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x75/0x80 rxe_attach_mcast+0x304/0x480 [rdma_rxe] ib_attach_mcast+0x88/0xa0 [ib_core] ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs] ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs] ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs] do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Move them out of the spinlock, it is OK if there is some races setting up the MC reception at the ethernet layer with rbtree lookups. Fixes: `6090a0c4c7` ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-04 21:16:57 -03:00
Cheng Xu	ef91271c65	RDMA/siw: Fix a condition race issue in MPA request processing The calling of siw_cm_upcall and detaching new_cep with its listen_cep should be atomistic semantics. Otherwise siw_reject may be called in a temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep has not being cleared. This fixes a WARN: WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: iw_cm_wq cm_work_handler [iw_cm] RIP: 0010:siw_cep_put+0x125/0x130 [siw] Call Trace: <TASK> siw_reject+0xac/0x180 [siw] iw_cm_reject+0x68/0xc0 [iw_cm] cm_work_handler+0x59d/0xe20 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: `6c52fdc244` ("rdma/siw: connection management") Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-04 20:59:54 -03:00
Bob Pearson	e7734156b0	RDMA/rxe: Replace paylen by payload In finish_packet() in rxe_req.c a variable was incorrectly called paylen instead of payload. Elsewhere in the rxe source payload is always used for the RoCE payload length and paylen is always used for the UDP payload length. This will cause unnecessary confusion. Replace paylen by payload in finish_packet(). Link: https://lore.kernel.org/r/20220420172316.5465-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-05-04 11:21:10 -03:00
Li Zhijian	0f328c7034	RDMA/rxe: Remove useless parameters for update_state() wqe was not used by update_state() so far. Commit `aaaf62e066` ("RDMA/rxe: Remove useless argument for update_state()") just did a partial fixes. Link: https://lore.kernel.org/r/20220412022903.574238-1-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-25 15:51:32 -03:00
Bob Pearson	570a4bf744	RDMA/rxe: Recheck the MR in when generating a READ reply The rping benchmark fails on long runs. The root cause of this failure has been traced to a failure to compute a nonzero value of mr in rare situations. Fix this failure by correctly handling the computation of mr in read_reply() in rxe_resp.c in the replay flow. Fixes: `8a1a0be894` ("RDMA/rxe: Replace mr by rkey in responder resources") Link: https://lore.kernel.org/r/20220418174103.3040-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-20 11:21:24 -03:00
Bob Pearson	290c4a902b	RDMA/rxe: Fix "Replace mr by rkey in responder resources" The referenced commit generates a reference counting error if the rkey has the same index but the wrong key. In this case the reference taken by rxe_pool_get_index() is not dropped. Drop the reference if the keys don't match in rxe_recheck_mr(). Check that the mw and mr are still valid. Fixes: `8a1a0be894` ("RDMA/rxe: Replace mr by rkey in responder resources") Link: https://lore.kernel.org/r/20220411030647.20011-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-12 11:17:52 -03:00
Xiao Yang	2f917af777	RDMA/rxe: Generate a completion for unsupported/invalid opcode Current rxe_requester() doesn't generate a completion when processing an unsupported/invalid opcode. If rxe driver doesn't support a new opcode (e.g. RDMA Atomic Write) and RDMA library supports it, an application using the new opcode can reproduce this issue. Fix the issue by calling "goto err;". Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220410113513.27537-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-12 11:16:24 -03:00
Bob Pearson	98c8026331	RDMA/rxe: Remove reliable datagram support The rdma_rxe driver does not actually support the reliable datagram transport but contains two references to RD opcodes in driver code. This commit removes these references to RD transport opcodes which are never used. Link: https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-08 14:38:50 -03:00
Bob Pearson	409baed5d7	RDMA/rxe: Remove support for SMI QPs from rdma_rxe Currently the rdma_rxe driver supports SMI type QPs in a few places which is incorrect. RoCE devices never should support SMI QPs. This commit removes SMI QP support from the driver. Link: https://lore.kernel.org/r/20220407185416.16372-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-08 14:38:33 -03:00
Bob Pearson	5c477ee768	RDMA/rxe: Remove mc_grp_pool from struct rxe_dev Remove struct rxe_dev mc_grp_pool field. This field is no longer used. Link: https://lore.kernel.org/r/20220407184849.14359-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-08 14:38:18 -03:00
Bob Pearson	9227b6cec5	RDMA/rxe: Remove type 2A memory window capability Currently the rdma_rxe driver claims to support both 2A and 2B type memory windows. But the IBA requires 010-37.2.31: If an HCA supports the Base Memory Management extensions, the HCA shall support either Type 2A or Type 2B MWs, but not both. This commit removes the device capability bit for type 2A memory windows and adds a clarifying comment to rxe_mw.c. Link: https://lore.kernel.org/r/20220407184321.14207-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-08 14:37:17 -03:00
Jason Gunthorpe	e945c653c8	RDMA: Split kernel-only global device caps from uverbs device caps Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-06 15:02:13 -03:00
Niels Dossche	22cbc6c268	IB/rdmavt: add missing locks in rvt_ruc_loopback The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. rvt_error_qp is called form rvt_send_cq, which is called from rvt_qp_complete_swqe, which is called from rvt_send_complete, which is called from rvt_ruc_loopback in two places. Both of these places do not hold r_lock. Fix this by acquiring a spin_lock of r_lock in both of these places. The r_lock acquiring cannot be added in rvt_qp_complete_swqe because some of its other callers already have r_lock acquired. Link: https://lore.kernel.org/r/20220228195144.71946-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-04 14:45:51 -03:00
Niels Dossche	4d809f6969	IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. However, the commit I referenced in Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no longer covered by r_lock. This results in the lockdep assertion failing and also possibly in a race condition. Fixes: `d757c60eca` ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error") Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-04-04 14:45:02 -03:00
Linus Torvalds	2dacc1e57b	v5.18 merge window pull request Patchces for the merge window: - Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns - Minor cleanups: coding style, useless includes and documentation - Reorganize how multicast processing works in rxe - Replace a red/black tree with xarray in rxe which improves performance - DSCP support and HW address handle re-use in irdma - Simplify the mailbox command handling in hns - Simplify iser now that FMR is eliminated -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmI7d8gACgkQOG33FX4g mxq9TRAAm/3rEBtDyVan4Dm3mzq8xeJOTdYtQ29QNWZcwvhStNTo4c29jQJITZ9X Xd+m8VEPWzjkkZ4+hDFM9HhWjOzqDcCCFcbF88WDO0+P1M3HYYUE6AbIcSilAaT2 GIby3+kAnsTOAryrSzHLehYJKUYJuw5NRGsVAPY3r6hz3UECjNb/X1KTWhXaIfNy 2OTqFx5wMQ6hZO8e4a2Wz4bBYAYm95UfK+TgfRqUwhDLCDnkDELvHMPh9pXxJecG xzVg7W7Nv+6GWpUrqtM6W6YpriyjUOfbrtCJmBV3T/EdpiD6lZhKpkX23aLgu2Bi XoMM67aZ0i1Ft2trCIF8GbLaejG6epBkeKkUMMSozKqFP5DjhNbD2f3WAlMI15iW CcdyPS5re1ymPp66UlycTDSkN19wD8LfgQPLJCNM3EV8g8qs9LaKzEPWunBoZmw1 a+QX2zK8J07S21F8iq2sf/Qe1EDdmgOEOf7pmB/A5/PKqgXZPG7lYNYuhIKyFsdL mO+BqWWp0a953JJjsiutxyUCyUd8jtwwxRa0B66oqSQ1jO+xj6XDxjEL8ACuT8Iq 9wFJluTCN6lfxyV8mrSfweT0fQh/W/zVF7EJZHWdKXEzuADxMp9X06wtMQ82ea+k /wgN7DUpBZczRtlqaxt1VSkER75QwAMy/oSpA974+O120QCwLMQ= =aux1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: - Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns - Minor cleanups: coding style, useless includes and documentation - Reorganize how multicast processing works in rxe - Replace a red/black tree with xarray in rxe which improves performance - DSCP support and HW address handle re-use in irdma - Simplify the mailbox command handling in hns - Simplify iser now that FMR is eliminated * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits) RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit() IB/iser: Fix error flow in case of registration failure IB/iser: Generalize map/unmap dma tasks IB/iser: Use iser_fr_desc as registration context IB/iser: Remove iser_reg_data_sg helper function RDMA/rxe: Use standard names for ref counting RDMA/rxe: Replace red-black trees by xarrays RDMA/rxe: Shorten pool names in rxe_pool.c RDMA/rxe: Move max_elem into rxe_type_info RDMA/rxe: Replace obj by elem in declaration RDMA/rxe: Delete _locked() APIs for pool objects RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC RDMA/rxe: Replace mr by rkey in responder resources RDMA/rxe: Fix ref error in rxe_av.c RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT RDMA/irdma: Add support for address handle re-use RDMA/qib: Fix typos in comments RDMA/mlx5: Fix memory leak in error flow for subscribe event routine Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error" RDMA/rxe: Remove useless argument for update_state() ...	2022-03-24 19:17:39 -07:00
Bob Pearson	3197706abd	RDMA/rxe: Use standard names for ref counting Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put(). Significantly improves readability for new readers. Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-16 10:34:42 -03:00
Bob Pearson	3225717f6d	RDMA/rxe: Replace red-black trees by xarrays Currently the rxe driver uses red-black trees to add indices to the rxe object pools. Linux xarrays provide a better way to implement the same functionality for indices. This patch replaces red-black trees by xarrays for pool objects. Since xarrays already have a spinlock use that in place of the pool rwlock. Make sure that all changes in the xarray(index) and kref(ref counnt) occur atomically. Link: https://lore.kernel.org/r/20220304000808.225811-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-16 10:34:42 -03:00
Bob Pearson	df34dc9e03	RDMA/rxe: Shorten pool names in rxe_pool.c Replace pool names like "rxe-xx" with "xx". Just reduces clutter. Link: https://lore.kernel.org/r/20220304000808.225811-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:57 -03:00
Bob Pearson	3ccffe8abf	RDMA/rxe: Move max_elem into rxe_type_info Move the maximum number of elements from a parameter in rxe_pool_init to a member of the rxe_type_info array. Link: https://lore.kernel.org/r/20220304000808.225811-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:57 -03:00
Bob Pearson	b4a47f6836	RDMA/rxe: Replace obj by elem in declaration Fix a harmless typo replacing obj by elem in the cleanup fields. This has no effect but is confusing. Link: https://lore.kernel.org/r/20220304000808.225811-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:57 -03:00
Bob Pearson	3c3e4d582b	RDMA/rxe: Delete _locked() APIs for pool objects Since caller managed locks for indexed objects are no longer used these APIs are deleted. Link: https://lore.kernel.org/r/20220304000808.225811-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:57 -03:00
Bob Pearson	c9f4c69583	RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC There is only one remaining object type that allocates its own memory, that is mr. So the sense of RXE_POOL_NO_ALLOC is changed to RXE_POOL_ALLOC. Add checks to rxe_alloc() and rxe_add_to_pool() to make sure the correct call is used for the setting of this flag. Link: https://lore.kernel.org/r/20220304000808.225811-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:56 -03:00
Bob Pearson	8a1a0be894	RDMA/rxe: Replace mr by rkey in responder resources Currently rxe saves a copy of MR in responder resources for RDMA reads. Since the responder resources are never freed just over written if more are needed this MR may not have a reference freed until the QP is destroyed. This patch uses the rkey instead of the MR and on subsequent packets of a multipacket read reply message it looks up the MR from the rkey for each packet. This makes it possible for a user to deregister an MR or unbind a MW on the fly and get correct behaviour. Link: https://lore.kernel.org/r/20220304000808.225811-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:56 -03:00
Bob Pearson	63221acb0c	RDMA/rxe: Fix ref error in rxe_av.c The commit referenced below can take a reference to the AH which is never dropped. This only happens in the UD request path. This patch optionally passes that AH back to the caller so that it can hold the reference while the AV is being accessed and then drop it. Code to do this is added to rxe_req.c. The AV is also passed to rxe_prepare in rxe_net.c as an optimization. Fixes: `e2fe06c908` ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs") Link: https://lore.kernel.org/r/20220304000808.225811-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-15 20:49:56 -03:00
Chengguang Xu	aaaf62e066	RDMA/rxe: Remove useless argument for update_state() The argument 'payload' is not used in update_state(), so just remove it. Link: https://lore.kernel.org/r/20220307145047.3235675-2-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-14 20:36:01 -03:00
Chengguang Xu	7e8e611d6a	RDMA/rxe: Change variable and function argument to proper type The type of wqe length is u32 so in order to avoid overflow and shadow casting change variable and relevant function argument to proper type. Link: https://lore.kernel.org/r/20220307145047.3235675-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-03-14 20:36:01 -03:00
Bob Pearson	6090a0c4c7	RDMA/rxe: Cleanup rxe_mcast.c Finish adding subroutine comment headers to subroutines in rxe_mcast.c. Make minor api change cleanups. Link: https://lore.kernel.org/r/20220223230706.50332-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-02-23 20:29:15 -04:00

... 2 3 4 5 6 ...

1196 Commits