linux-loongson

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson synced 2025-09-01 23:33:06 +00:00

Author	SHA1	Message	Date
Michael Guralnik	cef7dde883	net/mlx5: Expand mkey page size to support 6 bits Protect the usage of the 6th bit with the relevant capability to ensure we are using the new page sizes with FW that supports the bit extension. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-11 14:56:07 +03:00
Junxian Huang	f4ccc0a2a0	RDMA/hns: Fix restricted __le16 degrades to integer issue Fix sparse warnings: restricted __le16 degrades to integer. Fixes: `5a87279591` ("RDMA/hns: Support hns HW stats") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409080508.g4mNSLwy-lkp@intel.com/ Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240909065331.3950268-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:42:53 +03:00
Zhang Zekun	9cd30319bb	IB/qib: Remove unused declarations in header file The definition of qib_rc_rnr_retry() has been removed since commit `b4238e7057` ("IB/qib: Use new rdmavt timers"). Also, the definition of mr_rcu_callback() has been remove since commit `7c2e11fe2d` ("IB/qib: Remove qp and mr functionality from qib"). So, let's remove the unused declartions. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Link: https://patch.msgid.link/20240909121408.80079-3-zhangzekun11@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:41:03 +03:00
Zhang Zekun	e4ed570122	IB/iser: Remove unused declaration in header file The definition of iser_finalize_rdma_unaligned_sg() has been removed since commit `dd0107a089` ("IB/iser: set block queue_virt_boundary"). Let's remove the unused declaration in header file. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Link: https://patch.msgid.link/20240909121408.80079-2-zhangzekun11@huawei.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:41:03 +03:00
Junxian Huang	fe51f6254d	RDMA/hns: Optimize hem allocation performance When allocating MTT hem, for each hop level of each hem that is being allocated, the driver iterates the hem list to find out whether the bt page has been allocated in this hop level. If not, allocate a new one and splice it to the list. The time complexity is O(n^2) in worst cases. Currently the allocation for-loop uses 'unit' as the step size. This actually has taken into account the reuse of last-hop-level MTT bt pages by multiple buffer pages. Thus pages of last hop level will never have been allocated, so there is no need to iterate the hem list in last hop level. Removing this unnecessary iteration can reduce the time complexity to O(n). Fixes: `38389eaa4d` ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-9-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
Chengchang Tang	ce196f6297	RDMA/hns: Fix 1bit-ECC recovery address in non-4K OS The 1bit-ECC recovery address read from HW only contain bits 64:12, so it should be fixed left-shifted 12 bits when used. Currently, the driver will shift the address left by PAGE_SHIFT when used, which is wrong in non-4K OS. Fixes: `2de949abd6` ("RDMA/hns: Recover 1bit-ECC error of RAM on chip") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
Junxian Huang	4321feefa5	RDMA/hns: Fix VF triggering PF reset in abnormal interrupt handler In abnormal interrupt handler, a PF reset will be triggered even if the device is a VF. It should be a VF reset. Fixes: `2b9acb9a97` ("RDMA/hns: Add the process of AEQ overflow for hip08") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
Chengchang Tang	74d315b5af	RDMA/hns: Fix spin_unlock_irqrestore() called with IRQs enabled Fix missuse of spin_lock_irq()/spin_unlock_irq() when spin_lock_irqsave()/spin_lock_irqrestore() was hold. This was discovered through the lock debugging, and the corresponding log is as follows: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 96 PID: 2074 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40 ... Call trace: warn_bogus_irq_restore+0x30/0x40 _raw_spin_unlock_irqrestore+0x84/0xc8 add_qp_to_list+0x11c/0x148 [hns_roce_hw_v2] hns_roce_create_qp_common.constprop.0+0x240/0x780 [hns_roce_hw_v2] hns_roce_create_qp+0x98/0x160 [hns_roce_hw_v2] create_qp+0x138/0x258 ib_create_qp_kernel+0x50/0xe8 create_mad_qp+0xa8/0x128 ib_mad_port_open+0x218/0x448 ib_mad_init_device+0x70/0x1f8 add_client_context+0xfc/0x220 enable_device_and_get+0xd0/0x140 ib_register_device.part.0+0xf4/0x1c8 ib_register_device+0x34/0x50 hns_roce_register_device+0x174/0x3d0 [hns_roce_hw_v2] hns_roce_init+0xfc/0x2c0 [hns_roce_hw_v2] __hns_roce_hw_v2_init_instance+0x7c/0x1d0 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0x9c/0x180 [hns_roce_hw_v2] Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
wenglianfa	d586628b16	RDMA/hns: Fix the overflow risk of hem_list_calc_ba_range() The max value of 'unit' and 'hop_num' is 2^24 and 2, so the value of 'step' may exceed the range of u32. Change the type of 'step' to u64. Fixes: `38389eaa4d` ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
wenglianfa	fd8489294d	RDMA/hns: Fix Use-After-Free of rsv_qp on HIP08 Currently rsv_qp is freed before ib_unregister_device() is called on HIP08. During the time interval, users can still dereg MR and rsv_qp will be used in this process, leading to a UAF. Move the release of rsv_qp after calling ib_unregister_device() to fix it. Fixes: `70f9252158` ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
Junxian Huang	6928d264e3	RDMA/hns: Don't modify rq next block addr in HIP09 QPC The field 'rq next block addr' in QPC can be updated by driver only on HIP08. On HIP09 HW updates this field while driver is not allowed. Fixes: `926a01dc00` ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-10 16:06:39 +03:00
WangYuli	dab2214fec	treewide: correct the typo 'retun' There are some spelling mistakes of 'retun' in comments which should be instead of 'return'. Link: https://lkml.kernel.org/r/63D0F870EE8E87A0+20240906054008.390188-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-09 16:47:43 -07:00
Selvin Xavier	227f51743b	RDMA/bnxt_re: Fix the max WQE size for static WQE support When variable size WQE is supported, max_qp_sges reported is more than 6. For devices that supports variable size WQE, the Send WQE size calculation is wrong when an an older library that doesn't support variable size WQE is used. Set the WQE size to 128 when static WQE is supported. Fixes: `de1d364c38` ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725444253-13221-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Selvin Xavier	c6b2b5c86d	RDMA/bnxt_re: Fix the compatibility flag for variable size WQE For older adapters that doesn't support variable size WQE, driver is wrongly reporting that variable WQE is supported, when the latest library is used. Report the variable WQE capability only if the driver supports it. Fixes: `10a104c0de` ("RDMA/bnxt_re: Enable variable size WQEs for user space applications") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725444253-13221-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Michael Guralnik	7ebb00cea4	RDMA/mlx5: Fix MR cache temp entries cleanup Fix the cleanup of the temp cache entries that are dynamically created in the MR cache. The cleanup of the temp cache entries is currently scheduled only when a new entry is created. Since in the cleanup of the entries only the mkeys are destroyed and the cache entry stays in the cache, subsequent registrations might reuse the entry and it will eventually be filled with new mkeys without cleanup ever getting scheduled again. On workloads that register and deregister MRs with a wide range of properties we see the cache ends up holding many cache entries, each holding the max number of mkeys that were ever used through it. Additionally, as the cleanup work is scheduled to run over the whole cache, any mkey that is returned to the cache after the cleanup was scheduled will be held for less than the intended 30 seconds timeout. Solve both issues by dropping the existing remove_ent_work and reusing the existing per-entry work to also handle the temp entries cleanup. Schedule the work to run with a 30 seconds delay every time we push an mkey to a clean temp entry. This ensures the cleanup runs on each entry only 30 seconds after the first mkey was pushed to an empty entry. As we have already been distinguishing between persistent and temp entries when scheduling the cache_work_func, it is not being scheduled in any other flows for the temp entries. Another benefit from moving to a per-entry cleanup is we now not required to hold the rb_tree mutex, thus enabling other flow to run concurrently. Fixes: `dd1b913fb0` ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/e4fa4bb03bebf20dceae320f26816cd2dde23a26.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Michael Guralnik	ee6d57a2e1	RDMA/mlx5: Limit usage of over-sized mkeys from the MR cache When searching the MR cache for suitable cache entries, don't use mkeys larger than twice the size required for the MR. This should ensure the usage of mkeys closer to the minimal required size and reduce memory waste. On driver init we create entries for mkeys with clear attributes and powers of 2 sizes from 4 to the max supported size. This solves the issue for anyone using mkeys that fit these requirements. In the use case where an MR is registered with different attributes, like an access flag we can't UMR, we'll create a new cache entry to store it upon dereg. Without this fix, any later registration with same attributes and smaller size will use the newly created cache entry and it's mkeys, disregarding the memory waste of using mkeys larger than required. For example, one worst-case scenario can be when registering and deregistering a 1GB mkey with ATS enabled which will cause the creation of a new cache entry to hold those type of mkeys. A user registering a 4k MR with ATS will end up using the new cache entry and an mkey that can support a 1GB MR, thus wasting x250k memory than actually needed in the HW. Additionally, allow all small registration to use the smallest size cache entry that is initialized on driver load even if size is larger than twice the required size. Fixes: `73d09b2fe8` ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/8ba3a6e3748aace2026de8b83da03aba084f78f4.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Michael Guralnik	6f5cd6ac9a	RDMA/mlx5: Fix counter update on MR cache mkey creation After an mkey is created, update the counter for pending mkeys before reshceduling the work that is filling the cache. Rescheduling the work with a full MR cache entry and a wrong 'pending' counter will cause us to miss disabling the fill_to_high_water flag. Thus leaving the cache full but with an indication that it's still needs to be filled up to it's full size (2 * limit). Next time an mkey will be taken from the cache, we'll unnecessarily continue the process of filling the cache to it's full size. Fixes: `57e7071683` ("RDMA/mlx5: Implement mkeys management via LIFO queue") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/0f44f462ba22e45f72cb3d0ec6a748634086b8d0.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Michael Guralnik	30e6bd8d3b	RDMA/mlx5: Drop redundant work canceling from clean_keys() The canceling of dealyed work in clean_keys() is a leftover from years back and was added to prevent races in the cleanup process of MR cache. The cleanup process was rewritten a few years ago and the canceling of delayed work and flushing of workqueue was added before the call to clean_keys(). Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/943d21f5a9dba7b98a3e1d531e3561ffe9745d71.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Cheng Xu	e77127ff64	RDMA/erdma: Return QP state in erdma_query_qp Fix qp_state and cur_qp_state to return correct values in struct ib_qp_attr. Fixes: `1550557717` ("RDMA/erdma: Add verbs implementation") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Cheng Xu	b80330f105	RDMA/erdma: Add disassociate ucontext support All IO pages mapped to user space are handled by rdma_user_mmap_io, so add empty stub for disassociate ucontext. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:09 +03:00
Cheng Xu	b24506f1c3	RDMA/erdma: Refactor the initialization and destruction of EQ We extracted the common parts of the initialization/destruction process to make the code cleaner. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:17:08 +03:00
Maher Sanalla	34efda1735	RDMA/mlx5: Enable ATS when allocating kernel MRs When creating kernel MRs, it is not definitive whether they will be used for peer-to-peer transactions or for other usecases, since address mapping is performed only after the MR is created. Since peer-to-peer transactions benefit significantly from ATS performance-wise, enable ATS on newly-allocated kernel MRs when supported. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Link: https://patch.msgid.link/fafd4c9f14cf438d2882d88649c2947e1d05d0b4.1725273403.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-09 21:16:40 +03:00
Patrisious Haddad	1403c8b147	IB/core: Fix ib_cache_setup_one error flow cleanup When ib_cache_update return an error, we exit ib_cache_setup_one instantly with no proper cleanup, even though before this we had already successfully done gid_table_setup_one, that results in the kernel WARN below. Do proper cleanup using gid_table_cleanup_one before returning the err in order to fix the issue. WARNING: CPU: 4 PID: 922 at drivers/infiniband/core/cache.c:806 gid_table_release_one+0x181/0x1a0 Modules linked in: CPU: 4 UID: 0 PID: 922 Comm: c_repro Not tainted 6.11.0-rc1+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:gid_table_release_one+0x181/0x1a0 Code: 44 8b 38 75 0c e8 2f cb 34 ff 4d 8b b5 28 05 00 00 e8 23 cb 34 ff 44 89 f9 89 da 4c 89 f6 48 c7 c7 d0 58 14 83 e8 4f de 21 ff <0f> 0b 4c 8b 75 30 e9 54 ff ff ff 48 8 3 c4 10 5b 5d 41 5c 41 5d 41 RSP: 0018:ffffc90002b835b0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c8527 RDX: 0000000000000000 RSI: ffffffff811c8534 RDI: 0000000000000001 RBP: ffff8881011b3d00 R08: ffff88810b3abe00 R09: 205d303839303631 R10: 666572207972746e R11: 72746e6520444947 R12: 0000000000000001 R13: ffff888106390000 R14: ffff8881011f2110 R15: 0000000000000001 FS: 00007fecc3b70800(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000340 CR3: 000000010435a001 CR4: 00000000003706b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x94/0xa0 ? __warn+0x9e/0x1c0 ? gid_table_release_one+0x181/0x1a0 ? report_bug+0x1f9/0x340 ? gid_table_release_one+0x181/0x1a0 ? handle_bug+0xa2/0x110 ? exc_invalid_op+0x31/0xa0 ? asm_exc_invalid_op+0x16/0x20 ? __warn_printk+0xc7/0x180 ? __warn_printk+0xd4/0x180 ? gid_table_release_one+0x181/0x1a0 ib_device_release+0x71/0xe0 ? __pfx_ib_device_release+0x10/0x10 device_release+0x44/0xd0 kobject_put+0x135/0x3d0 put_device+0x20/0x30 rxe_net_add+0x7d/0xa0 rxe_newlink+0xd7/0x190 nldev_newlink+0x1b0/0x2a0 ? __pfx_nldev_newlink+0x10/0x10 rdma_nl_rcv_msg+0x1ad/0x2e0 rdma_nl_rcv_skb.constprop.0+0x176/0x210 netlink_unicast+0x2de/0x400 netlink_sendmsg+0x306/0x660 __sock_sendmsg+0x110/0x120 ____sys_sendmsg+0x30e/0x390 ___sys_sendmsg+0x9b/0xf0 ? kstrtouint+0x6e/0xa0 ? kstrtouint_from_user+0x7c/0xb0 ? get_pid_task+0xb0/0xd0 ? proc_fail_nth_write+0x5b/0x140 ? __fget_light+0x9a/0x200 ? preempt_count_add+0x47/0xa0 __sys_sendmsg+0x61/0xd0 do_syscall_64+0x50/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `1901b91f99` ("IB/core: Fix potential NULL pointer dereference in pkey cache") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/79137687d829899b0b1c9835fcb4b258004c439a.1725273354.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-04 10:33:09 +03:00
Chris Mi	112e6e83a8	IB/mlx5: Fix UMR pd cleanup on error flow of driver init The cited commit moves the pd allocation from function mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup(). So the fix in commit [1] is broken. In error flow, will hit panic [2]. Fix it by checking pd pointer to avoid panic if it is NULL; [1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init [2] [ 347.567063] infiniband mlx5_0: Couldn't register device with driver model [ 347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 347.593438] #PF: supervisor read access in kernel mode [ 347.595176] #PF: error_code(0x0000) - not-present page [ 347.596962] PGD 0 P4D 0 [ 347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core] [ 347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282 [ 347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000 [ 347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000 [ 347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0 [ 347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000 [ 347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001 [ 347.609067] FS: 00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000 [ 347.610094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0 [ 347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 347.612804] Call Trace: [ 347.613130] <TASK> [ 347.613417] ? __die+0x20/0x60 [ 347.613793] ? page_fault_oops+0x150/0x3e0 [ 347.614243] ? free_msg+0x68/0x80 [mlx5_core] [ 347.614840] ? cmd_exec+0x48f/0x11d0 [mlx5_core] [ 347.615359] ? exc_page_fault+0x74/0x130 [ 347.615808] ? asm_exc_page_fault+0x22/0x30 [ 347.616273] ? ib_dealloc_pd_user+0x12/0xc0 [ib_core] [ 347.616801] mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib] [ 347.617365] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib] [ 347.618025] __mlx5_ib_add+0x96/0xd0 [mlx5_ib] [ 347.618539] mlx5r_probe+0xe9/0x310 [mlx5_ib] [ 347.619032] ? kernfs_add_one+0x107/0x150 [ 347.619478] ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib] [ 347.619984] auxiliary_bus_probe+0x3e/0x90 [ 347.620448] really_probe+0xc5/0x3a0 [ 347.620857] __driver_probe_device+0x80/0x160 [ 347.621325] driver_probe_device+0x1e/0x90 [ 347.621770] __driver_attach+0xec/0x1c0 [ 347.622213] ? __device_attach_driver+0x100/0x100 [ 347.622724] bus_for_each_dev+0x71/0xc0 [ 347.623151] bus_add_driver+0xed/0x240 [ 347.623570] driver_register+0x58/0x100 [ 347.623998] __auxiliary_driver_register+0x6a/0xc0 [ 347.624499] ? driver_register+0xae/0x100 [ 347.624940] ? 0xffffffffa0893000 [ 347.625329] mlx5_ib_init+0x16a/0x1e0 [mlx5_ib] [ 347.625845] do_one_initcall+0x4a/0x2a0 [ 347.626273] ? gcov_event+0x2e2/0x3a0 [ 347.626706] do_init_module+0x8a/0x260 [ 347.627126] init_module_from_file+0x8b/0xd0 [ 347.627596] __x64_sys_finit_module+0x1ca/0x2f0 [ 347.628089] do_syscall_64+0x4c/0x100 Fixes: `638420115c` ("IB/mlx5: Create UMR QP just before first reg_mr occurs") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Link: https://patch.msgid.link/778c40c60287992da5d6ec92bb07b67f7bb5e6ef.1725273295.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-04 10:28:49 +03:00
Kalesh AP	dc116b7fdd	RDMA/bnxt_re: Add support for MR Relaxed Ordering Some of the adapters support Relaxed Ordering for the MRs. Driver queries support for Memory region relax ordering support from firmware and set relax ordering bit in REGISTER_MR request, if the users request for the support. Also, this is supported only if the PCIe device has enabled relaxed ordering attribute. Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vijay Kumar Mandadapu <vijaykumar.mandadapu@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725256351-12751-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 12:33:47 +03:00
Kalesh AP	f786eebbbe	RDMA/bnxt_re: Avoid an extra hwrm per MR creation Firmware now have a new mr registration command where both MR allocation and registration can be done in a single hwrm command. Driver has to issue this new hwrm command whenever the support flag is set. This reduces the number of hwrm issued per MR creation and speed up the MR creation. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725256351-12751-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 12:33:47 +03:00
Kalesh AP	b98d969719	RDMA/bnxt_re: Rename a variable Renaming flags to access_flags for clarity. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725256351-12751-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 12:33:47 +03:00
Kalesh AP	543b455c6e	RDMA/bnxt_re: Update HW interface headers Updating the HW structures for the pcie relax ordering support. Newly added interface structures will be used in the followup patch. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725256351-12751-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 12:33:47 +03:00
Long Li	4a3b99bc04	RDMA/mana_ib: use the correct page size for mapping user-mode doorbell page When mapping doorbell page from user-mode, the driver should use the system page size as this memory is allocated via mmap() from user-mode. Cc: stable@vger.kernel.org Fixes: `0266a17763` ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1725030993-16213-2-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 10:32:08 +03:00
Long Li	9e517a8e9d	RDMA/mana_ib: use the correct page table index based on hardware page size MANA hardware uses 4k page size. When calculating the page table index, it should use the hardware page size, not the system page size. Cc: stable@vger.kernel.org Fixes: `0266a17763` ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1725030993-16213-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 10:32:08 +03:00
Chandramohan Akula	181028a0d8	RDMA/bnxt_re: Share a page to expose per SRQ info with userspace Gen P7 adapters needs to share a toggle bits information received in kernel driver with the user space. User space needs this info to arm the SRQ. User space application can get this page using the UAPI routines. Library will mmap this page and get the toggle bits to be used in the next ARM Doorbell. Uses a hash list to map the SRQ structure from the SRQ ID. SRQ structure is retrieved from the hash list while the library calls the UAPI routine to get the toggle page mapping. Currently the full page is mapped per SRQ. This can be optimized to enable multiple SRQs from the same application share the same page and different offsets in the page Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1724945645-14989-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 10:10:36 +03:00
Chandramohan Akula	b4207630e0	RDMA/bnxt_re: Refactor the BNXT_RE_METHOD_GET_TOGGLE_MEM method Refactor the code in this function to have common code. This is used in subsequent patches. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1724945645-14989-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 10:10:36 +03:00
Hongguang Gao	640c2cf84e	RDMA/bnxt_re: Get the toggle bits from SRQ events SRQ arming requires the toggle bits received from hardware. Get the toggle bits from SRQ notification for the gen p7 adapters. This value will be zero for the older adapters. Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1724945645-14989-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 10:10:36 +03:00
Shen Lichuan	e012316d83	RDMA/rdmavt: Convert to use ERR_CAST() As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Link: https://patch.msgid.link/20240828082720.33231-1-shenlichuan@vivo.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-09-02 10:09:42 +03:00
Yue Haibing	2d10b05bce	RDMA/cxgb4: Remove unused declarations Since commit `be4c9bad9d` ("MAINTAINERS: Add cxgb4 and iw_cxgb4 entries") c4iw_post_terminate() declaration is not used anymore. And other declarations were never implemented since introduction in commit `cfdda9d764` ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20240824091629.3659565-1-yuehaibing@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	e5bba9e027	RDMA/rtrs-clt: Remove an extra space No functional changes. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Alexei Pastuchov <alexei.pastuchov@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-12-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	bab9f8db42	RDMA/rtrs-clt: Do local invalidate after write io completion Switch local invalidate after write io completion avoid the chain usage of WR, this fixed the local protection error on LOCAL INVALIDATE WR. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-11-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Grzegorz Prajsner	667db86bcb	RDMA/rtrs: Register ib event handler Use ib_register_event_handler() to register event handlers for both client and server side. For now, all those handlers do, is to print type of incoming event. Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-10-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Md Haris Iqbal	d0e62bf7b5	RDMA/rtrs-srv: Avoid null pointer deref during path establishment For RTRS path establishment, RTRS client initiates and completes con_num of connections. After establishing all its connections, the information is exchanged between the client and server through the info_req message. During this exchange, it is essential that all connections have been established, and the state of the RTRS srv path is CONNECTED. So add these sanity checks, to make sure we detect and abort process in error scenarios to avoid null pointer deref. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-9-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	ff73958905	RDMA/rtrs-clt: Print request type for errors Extend the output to print also the request type. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Md Haris Iqbal	3e4289b29e	RDMA/rtrs-clt: Reset cid to con_num - 1 to stay in bounds In the function init_conns(), after the create_con() and create_cm() for loop if something fails. In the cleanup for loop after the destroy tag, we access out of bound memory because cid is set to clt_path->s.con_num. This commits resets the cid to clt_path->s.con_num - 1, to stay in bounds in the cleanup loop later. Fixes: `6a98d71dae` ("RDMA/rtrs: client: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	6793f9581f	RDMA/rtrs-clt: Reuse need_inval from mr mr has a member need_inval, which can be used to indicate if local invalidate is needed, switch to it and remove need_inv from rtrs_clt_io_req. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	3258cbbd86	RDMA/rtrs: Reset hb_missed_cnt after receiving other traffic from peer Reset hb_missed_cnt after receiving traffic from other peer, so hb is more robust again high load on host or network. Fixes: `6a98d71dae` ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	53c26f3ecd	RDMA/rtrs-clt: Rate limit errors in IO path On network errors, a large number of these logs are printed due to all the inflight IOs, rate limit them so they do not clutter kernel log. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Jack Wang	8c8dd4e13b	RDMA/rtrs-clt: Fix need_inv setting in error case In some cases need_inv can be missed for write requests, additionally driver has to handle missing invalidates for write requests. While at it, remove the else case from write invalidate path as it is possible to reach there. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:35 +03:00
Md Haris Iqbal	4842cfb07a	RDMA/rtrs: For HB error add additional clt/srv specific logging In case of HB error, we need to know the specific path on which it happened, for better debugging. Since the clt/srv path structures are not available in rtrs.c, it needs to be done in the individual HB error handler. This commit add those loging. A sample kernel log output after this commit: rtrs_core L357: <blya>: HB missed max reached. rtrs_server L717: <blya>: HB err handler for path=ip:x.x.x.x@ip:x.x.x.x . . rtrs_core L357: <blya>: HB missed max reached. rtrs_client L1519: <blya>: HB err handler for path=ip:x.x.x.x@ip:x.x.x.x Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20240821112217.41827-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-28 15:24:34 +03:00
Jason Gunthorpe	34cd192881	Merge branch 'bnxt_re_variable_wqes' into rdma.git for-next Selvin Xavier says: ============= Enable the Variable size Work Queue entry support for Gen P7 adapters. This would help in the better utilization of the queue memory and pci bandwidth due to the smaller send queue Work entries. ============= Based on v6.11-rc5 for dependencies. * bnxt_re_variable_wqes: (829 commits) RDMA/bnxt_re: Enable variable size WQEs for user space applications RDMA/bnxt_re: Handle variable WQE support for user applications RDMA/bnxt_re: Fix the table size for PSN/MSN entries RDMA/bnxt_re: Get the WQE index from slot index while completing the WQEs RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters Linux 6.11-rc5 ... Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-28 09:11:29 -03:00
Selvin Xavier	10a104c0de	RDMA/bnxt_re: Enable variable size WQEs for user space applications Add backward compatibility code to enable variable size WQEs only if the user lib supports it. Link: https://patch.msgid.link/r/1724042847-1481-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-27 10:15:57 -03:00
Selvin Xavier	d8ea645d69	RDMA/bnxt_re: Handle variable WQE support for user applications User library calculates the number of slots required for user applications and it can pass that information to the driver. Driver can use this value and update the HW directly. This mechanism is currently used only for the newly introduced variable size WQEs. Extend the bnxt_re_qp_req structure to pass the Send Queue slot count. Reorganize the code to get the sq_slots before initializing the Send Queue attributes. Link: https://patch.msgid.link/r/1724042847-1481-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-27 10:15:57 -03:00
Selvin Xavier	b930d0bac9	RDMA/bnxt_re: Fix the table size for PSN/MSN entries HW MSN table size is always a power of 2. So the pages should be mapped accordingly. Use the power of two calculation while get the number of PSN/MSN entries. Fixes: `6f6bfbc595` ("RDMA/bnxt_re: Expose the MSN table capability for user library") Link: https://patch.msgid.link/r/1724042847-1481-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-27 10:15:57 -03:00
Selvin Xavier	51edebb734	RDMA/bnxt_re: Get the WQE index from slot index while completing the WQEs While reporting the completions, SQ Work Queue index is required to identify the WQE that generated the completions. In variable WQE mode, FW returns the slot index for Error completions. Driver need to walk through the shadow queue between the consumer index and producer index and matches the slot index returned by FW. If a match is found, the next index of the shadow queue is the WQE index to be considered for remaining poll_cq loop. Link: https://patch.msgid.link/r/1724042847-1481-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-27 10:15:57 -03:00
Selvin Xavier	de1d364c38	RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters Variable size WQE means that each send Work Queue Entry to HW can use different WQE sizes as opposed to the static WQE size on the current devices. Set variable WQE mode for Gen P7 devices. Depth of the Queue will be a multiple of slot which is 16 bytes. The number of slots should be a multiple of 256 as per the HW requirement. Initialize the Software shadow queue to hold requests equal to the number of slots. Also, do not expose the variable size WQE capability until the last patch in the series. Link: https://patch.msgid.link/r/1724042847-1481-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-27 10:15:57 -03:00
Yehuda Yitschak	04e36fd27a	RDMA/efa: Add support for node guid Propagate the unique, per device, ID in the device attributes to the standard node_guid value in IB device. Link: https://patch.msgid.link/r/20240822171143.2800-1-mrgolin@amazon.com Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Yehuda Yitschak <yehuday@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:52:45 -03:00
Zhu Yanjun	86dfdd8288	RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency In the commit `aee2424246` ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs"), the function flush_workqueue is invoked to flush the work queue iwcm_wq. But at that time, the work queue iwcm_wq was created via the function alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM. Because the current process is trying to flush the whole iwcm_wq, if iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current process is not reclaiming memory or running on a workqueue which doesn't have the flag WQ_MEM_RECLAIM as that can break forward-progress guarantee leading to a deadlock. The call trace is as below: [ 125.350876][ T1430] Call Trace: [ 125.356281][ T1430] <TASK> [ 125.361285][ T1430] ? __warn (kernel/panic.c:693) [ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9)) [ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239) [ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9)) [ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9)) [ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970) [ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151) [ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm [ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910) [ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161) [ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm [ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma [ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma [ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231) [ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393) [ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339) [ 125.531837][ T1430] kthread (kernel/kthread.c:389) [ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342) [ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147) [ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342) [ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 125.566487][ T1430] </TASK> [ 125.566488][ T1430] ---[ end trace 0000000000000000 ]--- Fixes: `aee2424246` ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs") Link: https://patch.msgid.link/r/20240820113336.19860-1-yanjun.zhu@linux.dev Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:44:43 -03:00
zhenwei pi	444948ee12	RDMA/rxe: Fix __bth_set_resv6a __bth_set_resv6a is used to clear BIT [24, 29] of rxe_bth::qpn, the wrong expression leads other BITs into 1. Link: https://patch.msgid.link/r/20240822065223.1117056-4-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:42:38 -03:00
zhenwei pi	938aa9a333	RDMA/rxe: Fix misspelling of 'rmda' Fix 'rmda' into 'RDMA'. Link: https://patch.msgid.link/r/20240822065223.1117056-3-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:42:38 -03:00
zhenwei pi	c87c5f47ff	RDMA/rxe: Use sizeof instead of hard code number Use 'sizeof(union rdma_network_hdr)' instead of hard code GRH length for GSI and UD. Link: https://patch.msgid.link/r/20240822065223.1117056-2-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:42:38 -03:00
Jinjie Ruan	ae46d3fc17	RDMA/mlx4: Simplify an alloc_ordered_workqueue() invocation Let alloc_ordered_workqueue() format the workqueue name instead of calling snprintf() explicitly. Link: https://patch.msgid.link/r/20240823101840.515398-5-ruanjinjie@huawei.com Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:37:49 -03:00
Jinjie Ruan	87a55daa67	RDMA/mlx4: Simplify an alloc_ordered_workqueue() invocation Let alloc_ordered_workqueue() format the workqueue name instead of calling snprintf() explicitly. Link: https://patch.msgid.link/r/20240823101840.515398-4-ruanjinjie@huawei.com Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:37:49 -03:00
Jinjie Ruan	7229d7b64e	RDMA/mad: Simplify an alloc_ordered_workqueue() invocation Let alloc_ordered_workqueue() format the workqueue name instead of calling snprintf() explicitly. Link: https://patch.msgid.link/r/20240823101840.515398-3-ruanjinjie@huawei.com Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:37:49 -03:00
Jinjie Ruan	92c7ad8364	RDMA/qib: Simplify an alloc_ordered_workqueue() invocation Let alloc_ordered_workqueue() format the workqueue name instead of calling snprintf() explicitly. Link: https://patch.msgid.link/r/20240823101840.515398-2-ruanjinjie@huawei.com Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-23 11:37:49 -03:00
Zhang Zekun	e2e641fe1c	RDMA/ipoib: Remove unused declarations There are some declarations without function definition, which are listed as below: 1. ipoib_ib_tx_timer_func() has also been removed since commit `8966e28d2e` ("IB/ipoib: Use NAPI in UD/TX flows") 2. ipoib_pkey_event() has been removed since commit `ee1e2c82c2` ("IPoIB: Refresh paths instead of flushing them on SM change events") 3. ipoib_mcast_dev_down() has been removed since commit `988bd50300` ("IPoIB: Fix memory leak of multicast group structures") 4. ipoib_pkey_open() has been removed since commit `dd57c9308a` ("IB/ipoib: Avoid multicast join attempts with invalid P_key") Remove these unused declarations. Link: https://patch.msgid.link/r/20240818055702.79547-3-zhangzekun11@huawei.com Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-19 15:19:52 -03:00
Zhang Zekun	1fb797af8a	RDMA/core: Remove unused declaration rdma_resolve_ip_route() The definition of rdma_resolve_ip_route() has been removed. Remove the unused declaration. Fixes: `6aaecd3856` ("RDMA/core: Simplify roce_resolve_route_from_path()") Link: https://patch.msgid.link/r/20240818055702.79547-2-zhangzekun11@huawei.com Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-19 15:19:52 -03:00
Yue Haibing	53ffc09a3e	RDMA/mlx5: Remove two unused declarations Commit `e6fb246cca` ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") removed mlx5_ib_free_implicit_mr() but left the declaration. Commit `d98995b4bf` ("net/mlx5: Reimplement write combining test") left mlx5_ib_test_wc(). Remove the unused declarations. Link: https://patch.msgid.link/r/20240816101358.881247-1-yuehaibing@huawei.com Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-08-19 15:10:36 -03:00
Al Viro	88a2f6468d	struct fd: representation change We want the compiler to see that fdput() on empty instance is a no-op. The emptiness check is that file reference is NULL, while fdput() is "fput() if FDPUT_FPUT is present in flags". The reason why fdput() on empty instance is a no-op is something compiler can't see - it's that we never generate instances with NULL file reference combined with non-zero flags. It's not that hard to deal with - the real primitives behind fdget() et.al. are returning an unsigned long value, unpacked by (inlined) __to_fd() into the current struct file * + int. The lower bits are used to store flags, while the rest encodes the pointer. Linus suggested that keeping this unsigned long around with the extractions done by inlined accessors should generate a sane code and that turns out to be the case. Namely, turning struct fd into a struct-wrapped unsinged long, with fd_empty(f) => unlikely(f.word == 0) fd_file(f) => (struct file *)(f.word & ~3) fdput(f) => if (f.word & 1) fput(fd_file(f)) ends up with compiler doing the right thing. The cost is the patch footprint, of course - we need to switch f.file to fd_file(f) all over the tree, and it's not doable with simple search and replace; there are false positives, etc. Note that the sole member of that structure is an opaque unsigned long - all accesses should be done via wrappers and I don't want to use a name that would invite manual casts to file pointers, etc. The value of that member is equal either to (unsigned long)p \| flags, p being an address of some struct file instance, or to 0 for an empty fd. For now the new predicate (fd_empty(f)) has no users; all the existing checks have form (!fd_file(f)). We will convert to fd_empty() use later; here we only define it (and tell the compiler that it's unlikely to return true). This commit only deals with representation change; there will be followups. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-08-12 22:01:05 -04:00
Al Viro	1da91ea87a	introduce fd_file(), convert all accessors to it. For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-08-12 22:00:43 -04:00
Yishai Hadas	ec7ad65309	RDMA/mlx5: Introduce GET_DATA_DIRECT_SYSFS_PATH ioctl Introduce the 'GET_DATA_DIRECT_SYSFS_PATH' ioctl to return the sysfs path of the affiliated 'data direct' device for a given device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/403745463e0ef52adbef681ff09aa6a29a756352.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:50 +03:00
Yishai Hadas	de8f847a51	RDMA/mlx5: Add support for DMABUF MR registrations with Data-direct Add support for DMABUF MR registrations with Data-direct device. Upon userspace calling to register a DMABUF MR with the data direct bit set, the below algorithm will be followed. 1) Obtain a pinned DMABUF umem from the IB core using the user input parameters (FD, offset, length) and the DMA PF device. The DMA PF device is needed to allow the IOMMU to enable the DMA PF to access the user buffer over PCI. 2) Create a KSM MKEY by setting its entries according to the user buffer VA to IOVA mapping, with the MKEY being the data direct device-crossed MKEY. This KSM MKEY is umrable and will be used as part of the MR cache. The PD for creating it is the internal device 'data direct' kernel one. 3) Create a crossing MKEY that points to the KSM MKEY using the crossing access mode. 4) Manage the KSM MKEY by adding it to a list of 'data direct' MKEYs managed on the mlx5_ib device. 5) Return the crossing MKEY to the user, created with its supplied PD. Upon DMA PF unbind flow, the driver will revoke the KSM entries. The final deregistration will occur under the hood once the application deregisters its MKEY. Notes: - This version supports only the PINNED UMEM mode, so there is no dependency on ODP. - The IOVA supplied by the application must be system page aligned due to HW translations of KSM. - The crossing MKEY will not be umrable or part of the MR cache, as we cannot change its crossed (i.e. KSM) MKEY over UMR. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/1f99d8020ed540d9702b9e2252a145a439609ba6.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:50 +03:00
Yishai Hadas	3aa73c6b79	RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API instead of udata. This enables passing some new ioctl attributes to the drivers, as will be introduced in the next patches for mlx5 driver. Change the involved drivers accordingly. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/9a25b2fc02443f7c36c2d93499ae25252b6afd40.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:50 +03:00
Yishai Hadas	253c61dc25	RDMA/umem: Introduce an option to revoke DMABUF umem Introduce an option to revoke DMABUF umem. This option will retain the umem allocation while revoking its DMA mapping. Furthermore, any subsequent attempts to map the pages should fail once the umem has been revoked. This functionality will be utilized in the upcoming patches in the series, where we aim to delay umem deallocation until the mkey deregistration. However, we must unmap its pages immediately. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/a38270f2fe4a194868ca2312f4c1c760e51bcbff.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:49 +03:00
Yishai Hadas	682358fd35	RDMA/umem: Add support for creating pinned DMABUF umem with a given dma device Add support for creating pinned DMABUF umem with a specified DMA device instead of the DMA device of the given IB device. This API will be utilized in the upcoming patches of the series when multiple path DMAs are implemented. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/038aad36a43797e5591b20ba81051fc5758124f9.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:49 +03:00
Yishai Hadas	2e8e631d7a	RDMA/mlx5: Add the initialization flow to utilize the 'data direct' device Add the NET device initialization flow to utilize the 'data direct' device. When a NET mlx5_ib device is capable of 'data direct', the following sequence of actions will occur: - Find its affiliated 'data direct' VUID via a firmware command. - Create its own private PD and 'data direct' mkey. - Register to be notified when its 'data direct' driver is probed or removed. The DMA device of the affiliated 'data direct' device, including the private PD and the 'data direct' mkey, will be used later during MR registrations that request the data direct functionality. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/b11fa87b2a65bce4db8d40341bb6cee490fa4d06.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:49 +03:00
Yishai Hadas	6910e3660d	RDMA/mlx5: Introduce the 'data direct' driver Introduce the 'data direct' driver for a ConnectX-8 Data Direct device. The 'data direct' driver functions as the affiliated DMA device for one or more capable mlx5_ib devices. This DMA device, as the name suggests, is used exclusively for DMA operations. It can be considered a DMA engine managed by a PF/VF, lacking network capabilities and having minimal overall capabilities. Consequently, the DMA NIC PF will not be exposed to or directly used by software applications. The driver will not have any direct interface or interaction with the firmware (no command interface, no capabilities, etc.). It will operate solely over PCI to enable its DMA functionality. Registration and un-registration of the driver are handled as part of the mlx5_ib initialization and exit processes, as the mlx5_ib devices will effectively be its clients. The driver will serve as the DMA device for accessing another PCI device to achieve optimal performance (both on the same NUMA node, P2P access, etc.). Upon probing, it will read its VUID over PCI to handle mlx5_ib device registrations with the same VUID. Upon removal, it will notify its clients to allow them to clean up the resources that were mmaped with its DMA device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/b77edecfd476c3f445da96ab6aef499ae47b2829.1722512548.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:49 +03:00
Mark Bloch	0ea4ffb2bc	RDMA/mlx5: Expose vhca id for all ports in multiport mode In multiport mode, RDMA devices make it impossible for userspace to use DEVX to discover vhca id values for ports beyond port 1. This patch addresses the issue by exposing the vhca id of all ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://patch.msgid.link/41dea83aa51843aa4c067b4f73f28d64e51bd53c.1722331101.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:49 +03:00
Chiara Meiohas	df6d27a309	RDMA/nldev: Enhance netlink message parsing and validation Use strict parsing validation for set commands, and liberal validation for get commands. Additionally, remove all usage of nlmsg_parse_depricate(). Strict parsing validation fails when encountering unrecognized attributes in the Netlink message, while liberal parsing validation ignores them. In 57d7a8fd904c ("rdma: Add an option to display driver-specific QPs in the rdma tool") in iproute2, the attribute RDMA_NLDEV_ATTR_DRIVER_DETAILS was added. This cause backwards compatibility issues when using the rdma tool with the new attribute and an older kernel which does recognize this attribute. In this case, the command "rdma stat show mr" would fail, because the new rdma tool would fill the netlink message with the new attribute and the older kernel would fail as it used strict parsing and did not recognize the new attribute. In general, strict validation is appropriate for set commands as they modify the system, while liberal validation is suitable for get commands which only query system information. Replace all uses of nlmsg_parse_deprecated() with __nlmsg_parse(), using the NL_VALIDATE_LIBERAL flag. The nlmsg_parse_deprecated() function internally calls __nlmsg_parse() with the NL_VALIDATE_LIBERAL flag, but its name is confusing. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/f633a979a49db090d05c24a3ba83d30727bb777b.1722331020.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-11 11:12:49 +03:00
Saravanan Vajravel	2a777679b8	RDMA/mad: Improve handling of timed out WRs of mad agent Current timeout handler of mad agent acquires/releases mad_agent_priv lock for every timed out WRs. This causes heavy locking contention when higher no. of WRs are to be handled inside timeout handler. This leads to softlockup with below trace in some use cases where rdma-cm path is used to establish connection between peer nodes Trace: ----- BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u128:3:19767] CPU: 4 PID: 19767 Comm: kworker/u128:3 Kdump: loaded Tainted: G OE ------- --- 5.14.0-427.13.1.el9_4.x86_64 #1 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.4.8 11/26/2019 Workqueue: ib_mad1 timeout_sends [ib_core] RIP: 0010:__do_softirq+0x78/0x2ac RSP: 0018:ffffb253449e4f98 EFLAGS: 00000246 RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 000000000000001f RDX: 000000000000001d RSI: 000000003d1879ab RDI: fff363b66fd3a86b RBP: ffffb253604cbcd8 R08: 0000009065635f3b R09: 0000000000000000 R10: 0000000000000040 R11: ffffb253449e4ff8 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff8caa1fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd9ec9db900 CR3: 0000000891934006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __irq_exit_rcu+0xa1/0xc0 ? watchdog_timer_fn+0x1b2/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x127/0x2c0 ? hrtimer_interrupt+0xfc/0x210 ? __sysvec_apic_timer_interrupt+0x5c/0x110 ? sysvec_apic_timer_interrupt+0x37/0x90 ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? __do_softirq+0x78/0x2ac ? __do_softirq+0x60/0x2ac __irq_exit_rcu+0xa1/0xc0 sysvec_call_function_single+0x72/0x90 </IRQ> <TASK> asm_sysvec_call_function_single+0x16/0x20 RIP: 0010:_raw_spin_unlock_irq+0x14/0x30 RSP: 0018:ffffb253604cbd88 EFLAGS: 00000247 RAX: 000000000001960d RBX: 0000000000000002 RCX: ffff8cad2a064800 RDX: 000000008020001b RSI: 0000000000000001 RDI: ffff8cad5d39f66c RBP: ffff8cad5d39f600 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8caa443e0c00 R11: ffffb253604cbcd8 R12: ffff8cacb8682538 R13: 0000000000000005 R14: ffffb253604cbd90 R15: ffff8cad5d39f66c cm_process_send_error+0x122/0x1d0 [ib_cm] timeout_sends+0x1dd/0x270 [ib_core] process_one_work+0x1e2/0x3b0 ? __pfx_worker_thread+0x10/0x10 worker_thread+0x50/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> Simplified timeout handler by creating local list of timed out WRs and invoke send handler post creating the list. The new method acquires/ releases lock once to fetch the list and hence helps to reduce locking contetiong when processing higher no. of WRs Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://lore.kernel.org/r/20240722110325.195085-1-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-08-01 13:07:29 +03:00
Showrya M N	60dc7fcafe	RDMA/siw: Remove NETDEV_GOING_DOWN event handler Toggling link while running NVME-oF over siw hits a kernel panic due to race condition within siw_handler and ib_destroy_qp(). The IB_EVENT_PORT_ERR event can alone handle destroying qps. therefore remove unwanted processing in siw. Suggested-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Showrya M N <showrya@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20240724085428.3813-1-showrya@chelsio.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-29 11:43:47 +03:00
Anumula Murali Mohan Reddy	75ab1533d7	RDMA/cxgb4: use dma_mmap_coherent() for mapping non-contiguous memory dma_alloc_coherent() allocates contiguous memory irrespective of iommu mode, but after commit `f5ff79fddf` ("dma-mapping: remove CONFIG_DMA_REMAP") if iommu is enabled in translate mode, dma_alloc_coherent() may allocate non-contiguous memory. Attempt to map this memory results in panic. This patch fixes the issue by using dma_mmap_coherent() to map each page to user space. Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20240716142532.97423-1-anumula@chelsio.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-29 11:41:16 +03:00
Linus Torvalds	ebcfbf02ab	IOMMU Updates for Linux v6.11 - Core: * Support for the "ats-supported" device-tree property. * Removal of the 'ops' field from 'struct iommu_fwspec'. * Introduction of iommu_paging_domain_alloc() and partial conversion of existing users. * Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem. * Remove stale documentation. * Add missing MODULE_DESCRIPTION() macro. * Misc cleanups. - Allwinner Sun50i: * Ensure bypass mode is disabled on H616 SoCs. * Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker. * Add new device-tree compatible strings. - AMD Vi: * Use try_cmpxchg64() instead of cmpxchg64() when updating pte. - Arm SMMUv2: * Print much more useful information on context faults. * Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n. * Add new Qualcomm device-tree bindings. - Arm SMMUv3: * Support for hardware update of access/dirty bits and reporting via IOMMUFD. * More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support. * Add missing MODULE_DESCRIPTION() macro. * Minor fixes and cleanups. - NVIDIA Tegra: * Fix for benign fwspec initialisation issue exposed by rework on the core branch. - Intel VT-d: * Use try_cmpxchg64() instead of cmpxchg64() when updating pte. * Use READ_ONCE() to read volatile descriptor status. * Remove support for handling Execute-Requested requests. * Avoid calling iommu_domain_alloc(). * Minor fixes and refactoring. - Qualcomm MSM: * Updates to the device-tree bindings. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmaZTqMQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNApdB/wL2gW7ANJN3KDrOiWdq06P9fuzxbuiAegI aKGH+aT05kJjLBXpAE5K9Bas0RbgN8iIB4TITDR9jyLnMOlTP3poy0fvB8y27q00 /WkQ7yVPkZc58ySdEOGH/EbuQkiXcD1YTjTGWP9071xzbWTDbsYN0smfbvvB9LgI 56KhdcUtB0QsqhqBzyyznHJLFdpVvDpbkiAFDXJfor7SNOOtV9a4Ect6IYteaYKz S6+DWDEfUs+fHTEKEZ9sZVA745f2zPkT/YHY8vjLOEukWN07+3/2AKTra19DIgqF HCGitRyZjOut1fg8sLn0SUliCKe/G/bHlwSbHnxJQ73b91YDvpzD =xvLD -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Will Deacon: "Core: - Support for the "ats-supported" device-tree property - Removal of the 'ops' field from 'struct iommu_fwspec' - Introduction of iommu_paging_domain_alloc() and partial conversion of existing users - Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem - Remove stale documentation - Add missing MODULE_DESCRIPTION() macro - Misc cleanups Allwinner Sun50i: - Ensure bypass mode is disabled on H616 SoCs - Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker - Add new device-tree compatible strings AMD Vi: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte Arm SMMUv2: - Print much more useful information on context faults - Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n - Add new Qualcomm device-tree bindings Arm SMMUv3: - Support for hardware update of access/dirty bits and reporting via IOMMUFD - More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support - Add missing MODULE_DESCRIPTION() macro - Minor fixes and cleanups NVIDIA Tegra: - Fix for benign fwspec initialisation issue exposed by rework on the core branch Intel VT-d: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte - Use READ_ONCE() to read volatile descriptor status - Remove support for handling Execute-Requested requests - Avoid calling iommu_domain_alloc() - Minor fixes and refactoring Qualcomm MSM: - Updates to the device-tree bindings" * tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits) iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init() iommu/vt-d: Fix identity map bounds in si_domain_init() iommu: Move IOMMU_DIRTY_NO_CLEAR define dt-bindings: iommu: Convert msm,iommu-v0 to yaml iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address() iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP iommu/of: Support ats-supported device-tree property dt-bindings: PCI: generic: Add ats-supported property iommu: Remove iommu_fwspec ops OF: Simplify of_iommu_configure() ACPI: Retire acpi_iommu_fwspec_ops() iommu: Resolve fwspec ops automatically iommu/mediatek-v1: Clean up redundant fwspec checks RDMA/usnic: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() wifi: ath10k: Use iommu_paging_domain_alloc() drm/msm: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_paging_domain_alloc() ...	2024-07-19 09:59:58 -07:00
Linus Torvalds	3d51520954	RDMA v6.11 merge window Usual collection of small improvements and fixes: - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe, hf1, qib, ocrdma - bnxt_re support for MSN, which is a new retransmit logic - Initial mana support for RC qps - Use after free bug and cleanups in iwcm - Reduce resource usage in mlx5 when RDMA verbs features are not used - New verb to drain shared recieve queues, similar to normal recieve queues. This is necessary to allow ULPs a clean shutdown. Used in the iscsi rdma target - mlx5 support for more than 16 bits of doorbell indexes - Doorbell moderation support for bnxt_re - IB multi-plane support for mlx5 - New EFA adaptor PCI IDs - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't rename the device - A collection of hns bugs - Fix long standing bug in bnxt_re with incorrect endian handling of immediate data -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZpfvKQAKCRCFwuHvBreF YXomAP46gZpGv5mlMOAXePRuKq6glNZWl3pVuwuycnlmjQcEUQD/dhQbJz0rZKBr swuibPo83bFacfXJL7Wxd48m4G3EfgI= =1eXu -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual collection of small improvements and fixes: - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe, hf1, qib, ocrdma - bnxt_re support for MSN, which is a new retransmit logic - Initial mana support for RC qps - Use after free bug and cleanups in iwcm - Reduce resource usage in mlx5 when RDMA verbs features are not used - New verb to drain shared recieve queues, similar to normal recieve queues. This is necessary to allow ULPs a clean shutdown. Used in the iscsi rdma target - mlx5 support for more than 16 bits of doorbell indexes - Doorbell moderation support for bnxt_re - IB multi-plane support for mlx5 - New EFA adaptor PCI IDs - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't rename the device - A collection of hns bugs - Fix long standing bug in bnxt_re with incorrect endian handling of immediate data" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits) IB/hfi1: Constify struct flag_table RDMA/mana_ib: Set correct device into ib bnxt_re: Fix imm_data endianness RDMA: Fix netdev tracker in ib_device_set_netdev RDMA/hns: Fix mbx timing out before CMD execution is completed RDMA/hns: Fix insufficient extend DB for VFs. RDMA/hns: Fix undifined behavior caused by invalid max_sge RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 RDMA/hns: Fix missing pagesize and alignment check in FRMR RDMA/hns: Fix unmatch exception handling when init eq table fails RDMA/hns: Fix soft lockup under heavy CEQE load RDMA/hns: Check atomic wr length RDMA/ocrdma: Don't inline statistics functions RDMA/core: Introduce "name_assign_type" for an IB device RDMA/qib: Fix truncation compilation warnings in qib_verbs.c RDMA/qib: Fix truncation compilation warnings in qib_init.c RDMA/efa: Add EFA 0xefa3 PCI ID RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports RDMA/mlx5: Add plane index support when querying PTYS registers ...	2024-07-19 09:51:33 -07:00
Jakub Kicinski	dd3cd3ca69	aux-sysfs-irqs Shay Says: ========== Introduce auxiliary bus IRQs sysfs Today, PCI PFs and VFs, which are anchored on the PCI bus, display their IRQ information in the <pci_device>/msi_irqs/<irq_num> sysfs files. PCI subfunctions (SFs) are similar to PFs and VFs and these SFs are anchored on the auxiliary bus. However, these PCI SFs lack such IRQ information on the auxiliary bus, leaving users without visibility into which IRQs are used by the SFs. This absence makes it impossible to debug situations and to understand the source of interrupts/SFs for performance tuning and debug. Additionally, the SFs are multifunctional devices supporting RDMA, network devices, clocks, and more, similar to their peer PCI PFs and VFs. Therefore, it is desirable to have SFs' IRQ information available at the bus/device level. To overcome the above limitations, this short series extends the auxiliary bus to display IRQ information in sysfs, similar to that of PFs and VFs. It adds an 'irqs' directory under the auxiliary device and includes an <irq_num> sysfs file within it. For example: $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ 50 51 52 53 54 55 56 57 58 Patch summary: patch-1 adds auxiliary bus to support irqs used by auxiliary device patch-2 mlx5 driver using exposing irqs for PCI SF devices via auxiliary bus ========== -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmaQTCYACgkQSD+KveBX +j7nRAgAhyi8mD93AjpoXX8onbK3ZyPnGwLToCs0NT3EzT0BIwNvDovQp4rhcs16 3zVwvW+twVsbMuPYTpPVgcynpL6N0K/CoW+ubDGZaRIaf0nDmh4MY1wY/EUsVj8R FbeTi5L+9MyKvFbtO5d4cW1q7M0XVD3uR8Wle6PwvXZ1gcM59vsR1eml25NLTC8B Z9F9WKG+dFAni0ll/IL837Se3QQapRXtJQ3g6XbIcpXiMqgIrHZ9FyY0LvuWlQq4 LsIPKh7RySATmAYXwwpsnfdrilvvMHsyjlAoeNHEJBsAUY+kpOIFFi6J5EB+/oyo jhBhlc4Al0vUXis9jGTysO7mVYVOUQ== =DTxS -----END PGP SIGNATURE----- Merge tag 'aux-sysfs-irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== aux-sysfs-irqs Shay Says: ========== Introduce auxiliary bus IRQs sysfs Today, PCI PFs and VFs, which are anchored on the PCI bus, display their IRQ information in the <pci_device>/msi_irqs/<irq_num> sysfs files. PCI subfunctions (SFs) are similar to PFs and VFs and these SFs are anchored on the auxiliary bus. However, these PCI SFs lack such IRQ information on the auxiliary bus, leaving users without visibility into which IRQs are used by the SFs. This absence makes it impossible to debug situations and to understand the source of interrupts/SFs for performance tuning and debug. Additionally, the SFs are multifunctional devices supporting RDMA, network devices, clocks, and more, similar to their peer PCI PFs and VFs. Therefore, it is desirable to have SFs' IRQ information available at the bus/device level. To overcome the above limitations, this short series extends the auxiliary bus to display IRQ information in sysfs, similar to that of PFs and VFs. It adds an 'irqs' directory under the auxiliary device and includes an <irq_num> sysfs file within it. For example: $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/ 50 51 52 53 54 55 56 57 58 Patch summary: patch-1 adds auxiliary bus to support irqs used by auxiliary device patch-2 mlx5 driver using exposing irqs for PCI SF devices via auxiliary bus ========== * tag 'aux-sysfs-irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Expose SFs IRQs driver core: auxiliary bus: show auxiliary device IRQs RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded net/mlx5: Reimplement write combining test ==================== Link: https://patch.msgid.link/20240711213140.256997-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-15 12:42:45 -07:00
Christophe JAILLET	887cd308fd	IB/hfi1: Constify struct flag_table 'struct flag_table' are not modified in this driver. Constifying this structure moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 302932 40271 112 343315 53d13 drivers/infiniband/hw/hfi1/chip.o After: ===== text data bss dec hex filename 311636 31567 112 343315 53d13 drivers/infiniband/hw/hfi1/chip.o Link: https://lore.kernel.org/r/782b6a648bfbbf2bb83f81a73c0460b5bb7642a1.1720959310.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-07-15 10:24:21 -03:00
Konstantin Taranov	1df03a4b44	RDMA/mana_ib: Set correct device into ib Add mana_get_primary_netdev_rcu helper to get a primary netdevice for a given port. When mana is used with netvsc, the VF netdev is controlled by an upper netvsc device. In a baremetal case, the VF netdev is the primary device. Use the mana_get_primary_netdev_rcu() helper in the mana_ib to get the correct device for querying network states. Fixes: `8b184e4f1c` ("RDMA/mana_ib: Enable RoCE on port 1") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1720705077-322-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-14 10:49:53 +03:00
Jack Wang	95b087f87b	bnxt_re: Fix imm_data endianness When map a device between servers with MLX and BCM RoCE nics, RTRS server complain about unknown imm type, and can't map the device, After more debug, it seems bnxt_re wrongly handle the imm_data, this patch fixed the compat issue with MLX for us. In off list discussion, Selvin confirmed HW is working in little endian format and all data needs to be converted to LE while providing. This patch fix the endianness for imm_data Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20240710122102.37569-1-jinpu.wang@ionos.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-14 10:33:02 +03:00
David Ahern	2043a14fb3	RDMA: Fix netdev tracker in ib_device_set_netdev If a netdev has already been assigned, ib_device_set_netdev needs to release the reference on the older netdev but it is mistakenly being called for the new netdev. Fix it and in the process use netdev_put to be symmetrical with the netdev_hold. Fixes: `09f530f0c6` ("RDMA: Add netdevice_tracker to ib_device_set_netdev()") Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240710203310.19317-1-dsahern@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-14 10:32:57 +03:00
Chengchang Tang	bbddfa2255	RDMA/hns: Fix mbx timing out before CMD execution is completed When a large number of tasks are issued, the speed of HW processing mbx will slow down. The standard for judging mbx timeout in the current firmware is 30ms, and the current timeout standard for the driver is also 30ms. Considering that firmware scheduling in multi-function scenarios takes a certain amount of time, this will cause the driver to time out too early and report a failure before mbx execution times out. This patch introduces a new mechanism that can set different timeouts for different cmds and extends the timeout of mbx to 35ms. Fixes: `a04ff739f2` ("RDMA/hns: Add command queue support for hip08 RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-9-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:12 +03:00
Chengchang Tang	0b8e658f70	RDMA/hns: Fix insufficient extend DB for VFs. VFs and its PF will share the memory of the extend DB. Currently, the number of extend DB allocated by driver is only enough for PF. This leads to a probability of DB loss and some other problems in scenarios where both PF and VFs use a large number of QPs. Fixes: `6b63597d35` ("RDMA/hns: Add TSQ link table support") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:12 +03:00
Chengchang Tang	36397b9073	RDMA/hns: Fix undifined behavior caused by invalid max_sge If max_sge has been set to 0, roundup_pow_of_two() in set_srq_basic_param() may have undefined behavior. Fixes: `9dd052474a` ("RDMA/hns: Allocate one more recv SGE for HIP08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:12 +03:00
Chengchang Tang	24c6291346	RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 A shift-out-bounds may occur, if the max_inline_data has not been set. The related log: UBSAN: shift-out-of-bounds in kernel/include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' Call trace: dump_backtrace+0xb0/0x118 show_stack+0x20/0x38 dump_stack_lvl+0xbc/0x120 dump_stack+0x1c/0x28 __ubsan_handle_shift_out_of_bounds+0x104/0x240 set_ext_sge_param+0x40c/0x420 [hns_roce_hw_v2] hns_roce_create_qp+0xf48/0x1c40 [hns_roce_hw_v2] create_qp.part.0+0x294/0x3c0 ib_create_qp_kernel+0x7c/0x150 create_mad_qp+0x11c/0x1e0 ib_mad_init_device+0x834/0xc88 add_client_context+0x248/0x318 enable_device_and_get+0x158/0x280 ib_register_device+0x4ac/0x610 hns_roce_init+0x890/0xf98 [hns_roce_hw_v2] __hns_roce_hw_v2_init_instance+0x398/0x720 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0x108/0x1e0 [hns_roce_hw_v2] hclge_init_roce_client_instance+0x1a0/0x358 [hclge] hclge_init_client_instance+0xa0/0x508 [hclge] hnae3_register_client+0x18c/0x210 [hnae3] hns_roce_hw_v2_init+0x28/0xff8 [hns_roce_hw_v2] do_one_initcall+0xe0/0x510 do_init_module+0x110/0x370 load_module+0x2c6c/0x2f20 init_module_from_file+0xe0/0x140 idempotent_init_module+0x24c/0x350 __arm64_sys_finit_module+0x88/0xf8 invoke_syscall+0x68/0x1a0 el0_svc_common.constprop.0+0x11c/0x150 do_el0_svc+0x38/0x50 el0_svc+0x50/0xa0 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Fixes: `0c5e259b06` ("RDMA/hns: Fix incorrect sge nums calculation") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:12 +03:00
Chengchang Tang	d387d4b54e	RDMA/hns: Fix missing pagesize and alignment check in FRMR The offset requires 128B alignment and the page size ranges from 4K to 128M. Fixes: `68a997c5d2` ("RDMA/hns: Add FRMR support for hip08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:11 +03:00
Junxian Huang	543fb987bd	RDMA/hns: Fix unmatch exception handling when init eq table fails The hw ctx should be destroyed when init eq table fails. Fixes: `a5073d6054` ("RDMA/hns: Add eq support of hip08") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:11 +03:00
Junxian Huang	2fdf340383	RDMA/hns: Fix soft lockup under heavy CEQE load CEQEs are handled in interrupt handler currently. This may cause the CPU core staying in interrupt context too long and lead to soft lockup under heavy load. Handle CEQEs in BH workqueue and set an upper limit for the number of CEQE handled by a single call of work handler. Fixes: `a5073d6054` ("RDMA/hns: Add eq support of hip08") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:11 +03:00
Junxian Huang	6afa2c0bfb	RDMA/hns: Check atomic wr length 8 bytes is the only supported length of atomic. Add this check in set_rc_wqe(). Besides, stop processing WQEs and return from set_rc_wqe() if there is any error. Fixes: `384f881851` ("RDMA/hns: Add atomic support") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240710133705.896445-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:11 +03:00
Peng Hao	b851268018	RDMA/ocrdma: Don't inline statistics functions Fix the problem of KASAN causing the stack frame size to increase drivers/infiniband/hw/ocrdma/ocrdma_stats.c:686:16: error: stack frame size (20664) exceeds limit (8192) in 'ocrdma_dbgfs_ops_read' [-Werror,-Wframe-larger-than] static ssize_t ocrdma_dbgfs_ops_read(struct file filp, char __user buffer, ^ Some functions called by ocrdma_dbgfs_ops_read occupy a lot of stack space. Mark these functions as noinline_for_stack to prevent them from accumulating in ocrdma_dbgfs_ops_read. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20240710091657.26291-1-flyingpeng@tencent.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-11 13:25:11 +03:00
Lu Baolu	3b10f25704	RDMA/usnic: Use iommu_paging_domain_alloc() usnic_uiom_alloc_pd() allocates a paging domain for a given device. In this case, iommu_domain_alloc(dev->bus) is equivalent to iommu_paging_domain_alloc(dev). Replace it as iommu_domain_alloc() has been deprecated. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240610085555.88197-15-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>	2024-07-04 14:09:33 +01:00
Mark Zhang	af48f95492	RDMA/core: Introduce "name_assign_type" for an IB device The name_assign_type indicates how the name is provided. Currently these types are supported: - RDMA_NAME_ASSIGN_TYPE_UNKNOWN: Unknown or not set; - RDMA_NAME_ASSIGN_TYPE_USER: Name is provided by the user; The user-created sub device, rxe and siw device has this type. When filling nl device info, it is set in the new attribute RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE. User-space tools like udev "rdma_rename" could check this attribute to determine if this device needs to be renamed or not. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/522591bef9a369cc8e5dcb77787e017bffee37fe.1719837610.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-04 07:59:53 +03:00
Leon Romanovsky	f802078d3c	RDMA/qib: Fix truncation compilation warnings in qib_verbs.c Reduce nodename string size to fit IB_DEVICE_NODE_DESC_MAX. drivers/infiniband/hw/qib/qib_verbs.c: In function ‘qib_register_ib_device’: drivers/infiniband/hw/qib/qib_verbs.c:1554:40: error: ‘%s’ directive output may be truncated writing up to 64 bytes into a region of size 43 [-Werror=format-truncation=] 1554 \| "Intel Infiniband HCA %s", init_utsname()->nodename); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/qib/qib_verbs.c:1553:9: note: ‘snprintf’ output between 22 and 86 bytes into a destination of size 64 1553 \| snprintf(ibdev->node_desc, sizeof(ibdev->node_desc), \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1554 \| "Intel Infiniband HCA %s", init_utsname()->nodename); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Link: https://lore.kernel.org/r/1fb6393fa2e0702fef995834c3c7db972bbc4d06.1719837715.git.leon@kernel.org Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-04 07:56:56 +03:00
Leon Romanovsky	1b8ca05469	RDMA/qib: Fix truncation compilation warnings in qib_init.c drivers/infiniband/hw/qib/qib_init.c: In function ‘qib_init_one’: drivers/infiniband/hw/qib/qib_init.c:586:67: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size between 0 and 3 [-Werror=format-truncation=] 586 \| snprintf(wq_name, sizeof(wq_name), "qib%d_%d", \| ^~ In function ‘qib_create_workqueues’, inlined from ‘qib_init_one’ at drivers/infiniband/hw/qib/qib_init.c:1438:8: drivers/infiniband/hw/qib/qib_init.c:586:60: note: directive argument in the range [-2147483643, 254] 586 \| snprintf(wq_name, sizeof(wq_name), "qib%d_%d", \| ^~~~~~~~~~ drivers/infiniband/hw/qib/qib_init.c:586:25: note: ‘snprintf’ output between 7 and 27 bytes into a destination of size 8 586 \| snprintf(wq_name, sizeof(wq_name), "qib%d_%d", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 587 \| dd->unit, pidx); \| ~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Link: https://lore.kernel.org/r/ab5222c414a01e9d2c5129ef26836aace9ee2aa5.1719837715.git.leon@kernel.org Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-04 07:56:47 +03:00
Michael Margolin	346d2fc606	RDMA/efa: Add EFA 0xefa3 PCI ID Add support for 0xefa3 devices. Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240701095752.20246-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-01 15:38:05 +03:00
Mark Zhang	7a2210a57d	RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register Supports per-plane port counters by querying PPCNT register with the "extended port counters" group, as the query_vport_counter command doesn't support plane ports. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/06ffb582d67159b7def4654c8272d3d6e8bd2f2f.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:05 +03:00
Mark Zhang	3b43399b29	RDMA/mlx5: Add plane index support when querying PTYS registers Support the new "plane_ind" field when querying port PTYS registers. This is needed when querying the rate of a plane port. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/1f703c36306aa46917fcd88eadbb23b3e380d526.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:05 +03:00
Mark Zhang	294424839b	RDMA/nldev: Add support to dump device type and parent device if exists If a device has a specific type or a parent device, dump them as well. Example: $ rdma dev show smi1 3: smi1: node_type ca fw 20.38.1002 node_guid 9803:9b03:009f:d5ef sys_image_guid 9803:9b03:009f:d5ee type smi parent ibp8s0f1 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/4c022e3e34b5de1254a3b367d502a362cdd0c53a.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:05 +03:00
Mark Zhang	060c642b2a	RDMA/nldev: Add support to add/delete a sub IB device through netlink Add new netlink commands and attributes to support adding and deleting a sub IB device with admin privilege. Examples: $ rdma dev add smi1 type SMI parent ibp8s0f1 $ rdma dev del smi1 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/77cbf1b36359642be8a8d8c5c2f4e585b544282f.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:05 +03:00
Mark Zhang	026a425990	RDMA/mlx5: Support plane device and driver APIs to add and delete it This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to add and delete a plane device respectively. A mlx5 plane device is a rdma SMI device; It provides the SMI capability through user MAD for it's parent, the logical multi-plane aggregated device. For a plane port: - It supports QP0 only; - When adding a plane device, all plane ports are added; - For some commands like mad_ifc, both plane_index and native portnum is needed; - When querying or modifying a plane port context, the native portnum must be used, as the query/modify_hca_vport_context command doesn't support plane port. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/e933cd0562aece181f8657af2ca0f5b387d0f14e.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:05 +03:00
Mark Zhang	a9e0facacf	RDMA/core: Create GSI QP only when CM is supported GSI QP is not needed if the port doesn't support connection management. In following patches mlx5 is going to support IB ports that doesn't support CM. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/c449ebd955923b0e54c58832fd322f9d461b37a0.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:05 +03:00
Mark Zhang	bca5119762	RDMA/core: Support IB sub device with type "SMI" This patch adds 2 APIs, as well as driver operations to support adding and deleting an IB sub device, which provides part of functionalities of it's parent. A sub device has a type; for a sub device with type "SMI", it provides the smi capability through umad for its parent, meaning uverb is not supported. A sub device cannot live without a parent. So when a parent is released, all it's sub devices are released as well. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/44253f7508b21eb2caefea3980c2bc072869116c.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:38:04 +03:00
Mark Zhang	2a5db20fa5	RDMA/mlx5: Add support to multi-plane device and port When multi-plane is supported, a logical port, which is aggregation of multiple physical plane ports, is exposed for data transmission. Compared with a normal mlx5 IB port, this logical port supports all functionalities except Subnet Management. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:10:15 +03:00
Mark Zhang	50660c5197	RDMA/core: Create "issm*" device nodes only when SMI is supported For an IB port create it's issm device node only when it has SMI capability. In following patches mlx5 is going to support IB devices without this cap. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/359f73c9a388d5e3ae971e40d8507888b1ba6f93.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 15:10:15 +03:00
Selvin Xavier	24943dcdc1	RDMA/bnxt_re: Disable doorbell moderation if hardware register read fails If the HW register read fails, the FIFO will be always shown as full. DB moderation doesn't work in that case and the traffic fails. So disable this feature and log a message. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1719456065-27394-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-01 14:36:50 +03:00
Selvin Xavier	f2f4dc9124	RDMA/bnxt_re: Enable DB moderation for genP7 adapters Enable DB moderation support for GenP7 adapters also. Query from FW and update the status. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1719456065-27394-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-01 14:36:50 +03:00
Selvin Xavier	8e6e5ac7c4	RDMA/bnxt_re: Update the correct DB FIFO depth and mask for GenP7 GenP5 and P7 devices have different DB FIFO depth. Use different values based on the chip context. Instead of hardcoding doorbell FIFO related values, get it from the HWRM interface. Maintain backward compatibility by having default values when FW is not providing the doorbell FIFO related values. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1719456065-27394-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-07-01 14:36:50 +03:00
Leon Romanovsky	917918f57a	RDMA/device: Return error earlier if port in not valid There is no need to allocate port data if port provided is not valid. Fixes: `c2261dd76b` ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev") Link: https://lore.kernel.org/r/022047a8b16988fc88d4426da50bf60a4833311b.1719235449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-07-01 14:31:57 +03:00
Akiva Goldberger	589b844f1b	RDMA/mlx5: Send UAR page index as ioctl attribute Add UAR page index as a driver ioctl attribute to increase the number of supported indices, previously limited to 16 bits by mlx5_ib_create_cq struct. Link: https://lore.kernel.org/r/0e18b34d7ec3b1ae02d694b0d545aed7413c0ef7.1719512393.git.leon@kernel.org Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-27 16:28:22 -03:00
Akiva Goldberger	dd6d7f8574	RDMA: Pass entire uverbs attr bundle to create cq function Changes the create_cq verb signature by sending the entire uverbs attr bundle as a parameter. This allows drivers to send driver specific attrs through ioctl for the create_cq verb and access them in their driver specific code. Also adds a new enum value for driver specific ioctl attributes for methods already supporting UHW. Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-27 16:28:21 -03:00
Jakub Kicinski	193b9b2002	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: `e3f02f32a0` ("ionic: fix kernel panic due to multi-buffer handling") `d9c0420999` ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-27 12:14:11 -07:00
Yonatan Nachum	fe0812e4bc	RDMA/efa: Remove duplicate aenq enable macro We have the same macro in main and verbs files and we don't use the macro in the verbs file, remove it. Link: https://lore.kernel.org/r/20240624160918.27060-3-mrgolin@amazon.com Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 12:35:00 -03:00
Gal Pressman	47f9b4190a	RDMA/efa: Use offset_in_page() function Use offset_in_page() instead of open-coding it. Link: https://lore.kernel.org/r/20240624160918.27060-2-mrgolin@amazon.com Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 12:34:43 -03:00
Christophe JAILLET	5a905e33b2	RDMA/hfi1: Constify struct mmu_rb_ops 'struct mmu_rb_ops' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 10879 164 0 11043 2b23 drivers/infiniband/hw/hfi1/pin_system.o After: ===== text data bss dec hex filename 10907 140 0 11047 2b27 drivers/infiniband/hw/hfi1/pin_system.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/b826dd05eefa5f4d6a7a1b4d191eaf37c714ed04.1719259997.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 10:53:29 -03:00
Honggang LI	4adcaf969d	RDMA/rxe: Don't set BTH_ACK_MASK for UC or UD QPs BTH_ACK_MASK bit is used to indicate that an acknowledge (for this packet) should be scheduled by the responder. Both UC and UD QPs are unacknowledged, so don't set BTH_ACK_MASK for UC or UD QPs. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240624020348.494338-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 10:53:29 -03:00
Max Gurtovoy	58945ddd71	IB/isert: remove the handling of last WQE reached event This event is raised for QPs that are associated with a Shared RQ (SRQ). The iSER target does not support SRQ. Remove this dead code. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20240619171153.34631-3-mgurtovoy@nvidia.com Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 10:53:29 -03:00
Max Gurtovoy	844bc12e6d	IB/core: add support for draining Shared receive queues To avoid leakage for QPs assocoated with SRQ, according to IB spec (section 10.3.1): "Note, for QPs that are associated with an SRQ, the Consumer should take the QP through the Error State before invoking a Destroy QP or a Modify QP to the Reset State. The Consumer may invoke the Destroy QP without first performing a Modify QP to the Error State and waiting for the Affiliated Asynchronous Last WQE Reached Event. However, if the Consumer does not wait for the Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment leakage may occur. Therefore, it is good programming practice to teardown a QP that is associated with an SRQ by using the following process: - Put the QP in the Error State; - wait for the Affiliated Asynchronous Last WQE Reached Event; - either: - drain the CQ by invoking the Poll CQ verb and either wait for CQ to be empty or the number of Poll CQ operations has exceeded CQ capacity size; or - post another WR that completes on the same CQ and wait for this WR to return as a WC; - and then invoke a Destroy QP or Reset QP." Catch the Last WQE Reached Event in the core layer during drain QP flow. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20240619171153.34631-2-mgurtovoy@nvidia.com Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 10:53:29 -03:00
Leon Romanovsky	5953e0647c	RDMA/mlx4: Fix truncated output warning in alias_GUID.c drivers/infiniband/hw/mlx4/alias_GUID.c: In function ‘mlx4_ib_init_alias_guid_service’: drivers/infiniband/hw/mlx4/alias_GUID.c:878:74: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 5 [-Werror=format-truncation=] 878 \| snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); \| ^~ drivers/infiniband/hw/mlx4/alias_GUID.c:878:63: note: directive argument in the range [-2147483641, 2147483646] 878 \| snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); \| ^~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/alias_GUID.c:878:17: note: ‘snprintf’ output between 12 and 22 bytes into a destination of size 15 878 \| snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Fixes: `a0c64a17ab` ("mlx4: Add alias_guid mechanism") Link: https://lore.kernel.org/r/1951c9500109ca7e36dcd523f8a5f2d0d2a608d1.1718554641.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 10:53:29 -03:00
Leon Romanovsky	0d2e6992fc	RDMA/mlx4: Fix truncated output warning in mad.c Increase size of the name array to avoid truncated output warning. drivers/infiniband/hw/mlx4/mad.c: In function ‘mlx4_ib_alloc_demux_ctx’: drivers/infiniband/hw/mlx4/mad.c:2197:47: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 4 [-Werror=format-truncation=] 2197 \| snprintf(name, sizeof(name), "mlx4_ibt%d", port); \| ^~ drivers/infiniband/hw/mlx4/mad.c:2197:38: note: directive argument in the range [-2147483645, 2147483647] 2197 \| snprintf(name, sizeof(name), "mlx4_ibt%d", port); \| ^~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2197:9: note: ‘snprintf’ output between 10 and 20 bytes into a destination of size 12 2197 \| snprintf(name, sizeof(name), "mlx4_ibt%d", port); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2205:48: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Werror=format-truncation=] 2205 \| snprintf(name, sizeof(name), "mlx4_ibwi%d", port); \| ^~ drivers/infiniband/hw/mlx4/mad.c:2205:38: note: directive argument in the range [-2147483645, 2147483647] 2205 \| snprintf(name, sizeof(name), "mlx4_ibwi%d", port); \| ^~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2205:9: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 12 2205 \| snprintf(name, sizeof(name), "mlx4_ibwi%d", port); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2213:48: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Werror=format-truncation=] 2213 \| snprintf(name, sizeof(name), "mlx4_ibud%d", port); \| ^~ drivers/infiniband/hw/mlx4/mad.c:2213:38: note: directive argument in the range [-2147483645, 2147483647] 2213 \| snprintf(name, sizeof(name), "mlx4_ibud%d", port); \| ^~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2213:9: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 12 2213 \| snprintf(name, sizeof(name), "mlx4_ibud%d", port); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/hw/mlx4/mad.o] Error 1 Fixes: `fc06573dfa` ("IB/mlx4: Initialize SR-IOV IB support for slaves in master context") Link: https://lore.kernel.org/r/f3798b3ce9a410257d7e1ec7c9e285f1352e256a.1718554569.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-06-26 10:53:29 -03:00
Konstantin Taranov	82a5cc783d	RDMA/mana_ib: Ignore optional access flags for MRs Ignore optional ib_access_flags when an MR is created. Fixes: `0266a17763` ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717575368-14879-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:19:36 -03:00
Patrisious Haddad	36ab7ada64	RDMA/mlx5: Add check for srq max_sge attribute max_sge attribute is passed by the user, and is inserted and used unchecked, so verify that the value doesn't exceed maximum allowed value before using it. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/277ccc29e8d57bfd53ddeb2ac633f2760cf8cdd0.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:19:36 -03:00
Yishai Hadas	81497c148b	RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_init Fix unwind flow as part of mlx5_ib_stage_init_init to use the correct goto upon an error. Fixes: `758ce14aee` ("RDMA/mlx5: Implement MACsec gid addition and deletion") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/aa40615116eda14ec9eca21d52017d632ea89188.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:19:36 -03:00
Jason Gunthorpe	2e4c02fdec	RDMA/mlx5: Ensure created mkeys always have a populated rb_key cachable and mmkey.rb_key together are used by mlx5_revoke_mr() to put the MR/mkey back into the cache. In all cases they should be set correctly. alloc_cacheable_mr() was setting cachable but not filling rb_key, resulting in cache_ent_find_and_store() bucketing them all into a 0 length entry. implicit_get_child_mr()/mlx5_ib_alloc_implicit_mr() failed to set cachable or rb_key at all, so the cache was not working at all for implicit ODP. Cc: stable@vger.kernel.org Fixes: `8c1185fef6` ("RDMA/mlx5: Change check for cacheable mkeys") Fixes: `dd1b913fb0` ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7778c02dfa0999a30d6746c79a23dd7140a9c729.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:19:36 -03:00
Jason Gunthorpe	f637040c33	RDMA/mlx5: Follow rb_key.ats when creating new mkeys When a cache ent already exists but doesn't have any mkeys in it the cache will automatically create a new one based on the specification in the ent->rb_key. ent->ats was missed when creating the new key and so ma_translation_mode was not being set even though the ent requires it. Cc: stable@vger.kernel.org Fixes: `73d09b2fe8` ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/7c5613458ecb89fbe5606b7aa4c8d990bdea5b9a.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:19:36 -03:00
Jason Gunthorpe	c1eb251259	RDMA/mlx5: Remove extra unlock on error path The below commit lifted the locking out of this function but left this error path unlock behind resulting in unbalanced locking. Remove the missed unlock too. Cc: stable@vger.kernel.org Fixes: `627122280c` ("RDMA/mlx5: Add work to remove temporary entries from the cache") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/78090c210c750f47219b95248f9f782f34548bb1.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:19:36 -03:00
Leon Romanovsky	a92fbeac7e	RDMA/cache: Release GID table even if leak is detected When the table is released, we nullify pointer to GID table, it means that in case GID entry leak is detected, we will leak table too. Delete code that prevents table destruction. Fixes: `b150c3862d` ("IB/core: Introduce GID entry reference counts") Link: https://lore.kernel.org/r/a62560af06ba82c88ef9194982bfa63d14768ff9.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-21 10:17:44 -03:00
Leon Romanovsky	b7161db2d9	Merge branch 'mlx5-next' into wip/leon-for-next The req_transport_retries_exceeded counter shows the number of times requester detected transport retries exceed error. The req_rnr_retries_exceeded counter show the number of times the requester detected RNR NAKs retries exceed error. Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:54:04 +03:00
Patrisious Haddad	b339e0a39d	RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded The req_transport_retries_exceeded counter shows the number of times requester detected transport retries exceed error. The req_rnr_retries_exceeded counter show the number of times the requester detected RNR NAKs retries exceed error. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/250466af94f4989d638fab168e246035530e912f.1718301543.git.leon@kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:53:23 +03:00
Chiara Meiohas	a4e540119b	RDMA/mlx5: Set mkeys for dmabuf at PAGE_SIZE Set the mkey for dmabuf at PAGE_SIZE to support any SGL after a move operation. ib_umem_find_best_pgsz returns 0 on error, so it is incorrect to check the returned page_size against PAGE_SIZE Fixes: `90da7dc820` ("RDMA/mlx5: Support dma-buf based userspace memory region") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/1e2289b9133e89f273a4e68d459057d032cbc2ce.1718301631.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:39:29 +03:00
Jianbo Liu	5895e70f2e	IB/mlx5: Allocate resources just before first QP/SRQ is created Previously, all IB dev resources are initialized on driver load. As they are not always used, move the initialization to the time when they are needed. To be more specific, move PD (p0) and CQ (c0) initialization to the time when the first SRQ is created. and move SRQs（s0 and s1) initialization to the time first QP is created. To avoid concurrent creations, two new mutexes are also added. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Link: https://lore.kernel.org/r/98c3e53a8cc0bdfeb6dec6e5bb8b037d78ab00d8.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:37:57 +03:00
Jianbo Liu	638420115c	IB/mlx5: Create UMR QP just before first reg_mr occurs UMR QP is not used in some cases, so move QP and its CQ creations from driver load flow to the time first reg_mr occurs, that is when MR interfaces are first called. The initialization of dev->umrc.pd and dev->umrc.lock is still done in driver load because pd is needed for mlx5_mkey_cache_init and the lock is reused to protect against the concurrent creation. When testing 4G bytes memory registration latency with rtool [1] and 8 threads in parallel, there is minor performance degradation (<5% for the max latency) is seen for the first reg_mr with this change. Link: https://github.com/paravmellanox/rtool [1] Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Link: https://lore.kernel.org/r/55d3c4f8a542fd974d8a4c5816eccfb318a59b38.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:37:50 +03:00
Leon Romanovsky	ae6f6dd5fd	Delay mlx5_ib internal resources allocations From: Leon Romanovsky <leonro@nvidia.com> Internal mlx5_ib resources are created during mlx5_ib module load. This behavior is not optimal because it consumes resources that are not needed when SFs are created. This patch series delays the creation of mlx5_ib internal resources to the stage when they actually used. Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:35:47 +03:00
Jianbo Liu	d98995b4bf	net/mlx5: Reimplement write combining test The test of write combining was added before in mlx5_ib driver. It opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When BlueFlame is used, WQEs get written directly to a PCI BAR of the device (in addition to memory) so that the device handles them without having to access memory. In this test, the WQEs written in memory are different from the ones written to the BlueFlame which request CQE update. By checking the completion reports posted on CQ, we can know if BlueFlame succeeds or not. The write combining must be supported if BlueFlame succeeds as its register is written using write combining. This patch reimplements the test in the same way, but using a pair of SQ and CQ only. It is moved to mlx5_core as a general feature used by both mlx5_core and mlx5_ib. Besides, save write combine test result of the PCI function, so that its thousands of child functions such as SF can query without paying the time and resource penalty by itself. The test function is called only after failing to get the cached result. With this enhancement, all thousands of SFs of the PF attached to same driver no longer need to perform WC check explicitly, which is already done in the system. This saves several commands per SF, thereby speeds up SF creation and also saves completion EQ creation. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 18:33:59 +03:00
Leon Romanovsky	ef5513526b	Merge branch 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Leon Romanovsky says: ==================== net: mana: Allow variable size indirection table Like we talked, I created new shared branch for this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared * 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Allow variable size indirection table ==================== Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-16 10:51:25 +03:00
Shradha Gupta	7fc45cb686	net: mana: Allow variable size indirection table Allow variable size indirection table allocation in MANA instead of using a constant value MANA_INDIRECT_TABLE_SIZE. The size is now derived from the MANA_QUERY_VPORT_CONFIG and the indirection table is allocated dynamically. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/1718015319-9609-1-git-send-email-shradhagupta@linux.microsoft.com Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-12 21:08:42 +03:00
Konstantin Taranov	2a1251e3db	RDMA/mana_ib: Process QP error events in mana_ib Process QP fatal events from the error event queue. For that, find the QP, using QPN from the event, and then call its event_handler. To find the QPs, store created RC QPs in an xarray. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717754897-19858-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Wei Hu <weh@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 13:46:44 +03:00
Honggang LI	f67ac0061c	RDMA/rxe: Fix responder length checking for UD request packets According to the IBA specification: If a UD request packet is detected with an invalid length, the request shall be an invalid request and it shall be silently dropped by the responder. The responder then waits for a new request packet. commit `689c5421bf` ("RDMA/rxe: Fix incorrect responder length checking") defers responder length check for UD QPs in function `copy_data`. But it introduces a regression issue for UD QPs. When the packet size is too large to fit in the receive buffer. `copy_data` will return error code -EINVAL. Then `send_data_in` will return RESPST_ERR_MALFORMED_WQE. UD QP will transfer into ERROR state. Fixes: `689c5421bf` ("RDMA/rxe: Fix incorrect responder length checking") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240523094617.141148-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 11:51:32 +03:00
Bart Van Assche	aee2424246	RDMA/iwcm: Fix a use-after-free related to destroying CM IDs iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with an existing struct iw_cm_id (cm_id) as follows: conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make sure that cm_work_handler() does not trigger a use-after-free by only freeing of the struct rdma_id_private after all pending work has finished. Cc: stable@vger.kernel.org Fixes: `59c68ac31e` ("iw_cm: free cm_id resources on the last deref") Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 11:25:10 +03:00
Bart Van Assche	a1babdb5b6	RDMA/iwcm: Simplify cm_work_handler() Instead of complicating the code to avoid a spin_lock_irqsave() / spin_lock_irqrestore() pair before returning, simplify the code by removing the local variable 'empty'. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-5-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 11:15:27 +03:00
Bart Van Assche	e1168f09b3	RDMA/iwcm: Simplify cm_event_handler() queue_work() can test efficiently whether or not work is pending. Hence, simplify cm_event_handler() by always calling queue_work() instead of only if the list with pending work is empty. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-4-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 11:15:27 +03:00
Bart Van Assche	fc772e38bc	RDMA/iwcm: Change the return type of iwcm_deref_id() Since iwcm_deref_id() returns either 0 or 1, change its return type from 'int' into 'bool'. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-3-bvanassche@acm.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 11:15:27 +03:00
Bart Van Assche	b1bc15f8fb	RDMA/iwcm: Use list_first_entry() where appropriate Improve source code readability by using list_first_entry() where appropriate. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-2-bvanassche@acm.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-09 11:15:27 +03:00
Konstantin Taranov	c8683b995d	RDMA/mana_ib: extend query device Fill in properties of the ib device. Order the assignment in the order of fields in the struct ib_device_attr. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717070117-1234-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-02 11:50:15 +03:00
Konstantin Taranov	65357e2c16	RDMA/mana_ib: set node_guid Use the mac address for the node_guid of the IB device. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717070117-1234-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-06-02 11:50:15 +03:00
Honggang LI	03fa18a992	RDMA/rxe: Fix data copy for IB_SEND_INLINE For RDMA Send and Write with IB_SEND_INLINE, the memory buffers specified in sge list will be placed inline in the Send Request. The data should be copied by CPU from the virtual addresses of corresponding sge list DMA addresses. Cc: stable@kernel.org Fixes: `8d7c7c0eeb` ("RDMA: Add ib_virt_dma_to_page()") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240516095052.542767-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 16:20:18 +03:00
Michael Margolin	2d0e7ba468	RDMA/efa: Properly handle unexpected AQ completions Do not try to handle admin command completion if it has an unexpected command id and print a relevant error message. Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240513064630.6247-1-mrgolin@amazon.com Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:48:13 +03:00
Konstantin Taranov	e095405b45	RDMA/mana_ib: Modify QP state Implement modify QP state for RC QPs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1716366242-558-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:26:57 +03:00
Konstantin Taranov	fdefb91849	RDMA/mana_ib: Implement uapi to create and destroy RC QP Implement user requests to create and destroy an RC QP. As the user does not have an FMR queue, it is skipped and NO_FMR flag is used. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1716366242-558-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:26:57 +03:00
Konstantin Taranov	53657a0419	RDMA/mana_ib: Create and destroy RC QP Implement HW requests to create and destroy an RC QP. An RC QP may have 5 queues. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1716366242-558-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:26:57 +03:00
Christophe JAILLET	38c02d813a	RDMA/irdma: Annotate flexible array with __counted_by() in struct irdma_qvlist_info 'num_vectors' is used to count the number of elements in the 'qv_info' flexible array in "struct irdma_qvlist_info". So annotate it with __counted_by() to make it explicit and enable some additional checks. This allocation is done in irdma_save_msix_info(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/2ca8b14adf79c4795d7aa95bbfc79253a6bfed82.1716102112.git.christophe.jaillet@wanadoo.fr Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:14:19 +03:00
Michael Margolin	435cdbe9f7	RDMA/efa: Fail probe on missing BARs In case any of PCI BARs is missing during device probe we would like to fail as early as possible. Fail if any of the required BARs isn't listed as a memory BAR. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240513081019.26998-1-mrgolin@amazon.com Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:07:55 +03:00
Selvin Xavier	056620da89	RDMA/bnxt_re: Fix the max msix vectors macro bnxt_re no longer decide the number of MSI-x vectors used by itself. Its decided by bnxt_en now. So when bnxt_en changes this value, system crash is seen. Depend on the max value reported by bnxt_en instead of using the its own macros. Fixes: `3034322113` ("bnxt_en: Remove runtime interrupt vector allocation") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1716195418-11767-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 15:00:07 +03:00
Selvin Xavier	6f6bfbc595	RDMA/bnxt_re: Expose the MSN table capability for user library BNXT_RE_COMP_MASK_UCNTX_HW_RETX_ENABLED was introduced to share the HW retransmit capability between driver and lib. The main difference in implementation for HW Retransmit support is the usage of MSN table or PSN table . When HW retrans is enabled, HW expects MSN table to be allocated by driver/lib, else PSN table (for older adapters). FW expose a new field which gives MSN capability. Drivers and libs can depend on the new field instead of HW Retrasns capability. For adapters which support HW_RETX feature, MSN table capability will be set. For older adapters, this value will be 0(to maintain backward compatibility with older FW). Rename UAPI just to capture the correct name of the HW capability that driver/library is interested in. No functional impact even if older rdma-core is used. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1716876697-25970-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 13:28:02 +03:00
Selvin Xavier	8d310ba845	RDMA/bnxt_re: Allow MSN table capability check FW reports the HW capability to use PSN table or MSN table and driver/library need to select it based on this capability. Use the new capability instead of the older capability check for HW retransmission while handling the MSN/PSN table. FW report zero (PSN table) for older adapters to maintain backward compatibility. Also, Updated the FW interface structures to handle the new fields. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1716876697-25970-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-30 13:28:02 +03:00
Steven Rostedt (Google)	2c92ca849f	tracing/treewide: Remove second parameter of __assign_str() With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str \| while read a ; do sed -e 's/$__assign_str([^,][^ ,]$ ,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>	2024-05-22 20:14:47 -04:00
Linus Torvalds	d90be6e4aa	Driver core changes for 6.10-rc1 Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally. All of these have been in linux-next for a very long time with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G ox8PXMxuFTaUEdT/69FQ =2sEo -----END PGP SIGNATURE----- Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: device property: Fix a typo in the description of device_get_child_node_count() kernfs: mount: Remove unnecessary ‘NULL’ values from knparent scsi: Use device_show_string() helper for sysfs attributes platform/x86: Use device_show_string() helper for sysfs attributes perf: Use device_show_string() helper for sysfs attributes IB/qib: Use device_show_string() helper for sysfs attributes hwmon: Use device_show_string() helper for sysfs attributes driver core: Add device_show_string() helper for sysfs attributes treewide: Use sysfs_bin_attr_simple_read() helper sysfs: Add sysfs_bin_attr_simple_read() helper module: don't ignore sysfs_create_link() failures driver core: Remove unused platform_notify, platform_notify_remove	2024-05-22 12:13:40 -07:00
Linus Torvalds	f0bae243b2	pci-v6.10-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/ R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M 6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3 pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7 mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK CSeqSUK9XmXxFNA= =xjoS -----END PGP SIGNATURE----- Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...	2024-05-21 10:09:28 -07:00
Linus Torvalds	25f4874662	RDMA v6.10 merge window Normal set of driver updates and small fixes: - Small improvements and fixes for erdma, efa, hfi1, bnxt_re - Fix a UAF crash after module unload on leaking restrack entry - Continue adding full RDMA support in mana with support for EQs, GID's and CQs - Improvements to the mkey cache in mlx5 - DSCP traffic class support in hns and several bug fixes - Cap the maximum number of MADs in the receive queue to avoid OOM - Another batch of rxe bug fixes from large scale testing - __iowrite64_copy() optimizations for write combining MMIO memory - Remove NULL checks before dev_put/hold() - EFA support for receive with immediate - Fix a recent memleaking regression in a cma error path -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZkeo2gAKCRCFwuHvBreF YbuNAQChzGmS4F0JAn5Wj0CDvkZghELqtvzEb92SzqcgdyQafAD/fC7f23LJ4OsO 1ZIaQEZu7j9DVg5PKFZ7WfdXjGTKqwA= =QRXg -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Aside from the usual things this has an arch update for __iowrite64_copy() used by the RDMA drivers. This API was intended to generate large 64 byte MemWr TLPs on PCI. These days most processors had done this by just repeating writel() in a loop. S390 and some new ARM64 designs require a special helper to get this to generate. - Small improvements and fixes for erdma, efa, hfi1, bnxt_re - Fix a UAF crash after module unload on leaking restrack entry - Continue adding full RDMA support in mana with support for EQs, GID's and CQs - Improvements to the mkey cache in mlx5 - DSCP traffic class support in hns and several bug fixes - Cap the maximum number of MADs in the receive queue to avoid OOM - Another batch of rxe bug fixes from large scale testing - __iowrite64_copy() optimizations for write combining MMIO memory - Remove NULL checks before dev_put/hold() - EFA support for receive with immediate - Fix a recent memleaking regression in a cma error path" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits) RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw RDMA/IPoIB: Fix format truncation compilation errors bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq RDMA/efa: Support QP with unsolicited write w/ imm. receive IB/hfi1: Remove generic .ndo_get_stats64 IB/hfi1: Do not use custom stat allocator RDMA/hfi1: Use RMW accessors for changing LNKCTL2 RDMA/mana_ib: implement uapi for creation of rnic cq RDMA/mana_ib: boundary check before installing cq callbacks RDMA/mana_ib: introduce a helper to remove cq callbacks RDMA/mana_ib: create and destroy RNIC cqs RDMA/mana_ib: create EQs for RNIC CQs RDMA/core: Remove NULL check before dev_{put, hold} RDMA/ipoib: Remove NULL check before dev_{put, hold} RDMA/mlx5: Remove NULL check before dev_{put, hold} RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources. RDMA/core: Add an option to display driver-specific QPs in the rdmatool RDMA/efa: Add shutdown notifier RDMA/mana_ib: Fix missing ret value IB/mlx5: Use __iowrite64_copy() for write combining stores ...	2024-05-18 13:04:15 -07:00
Zhu Yanjun	9c0731832d	RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw When running blktests nvme/rdma, the following kmemleak issue will appear. kmemleak: Kernel memory leak detector initialized (mempool available:36041) kmemleak: Automatic memory scanning thread started kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 8 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 17 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak) unreferenced object 0xffff88855da53400 (size 192): comm "rdma", pid 10630, jiffies 4296575922 hex dump (first 32 bytes): 37 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00 7............... 10 34 a5 5d 85 88 ff ff 10 34 a5 5d 85 88 ff ff .4.].....4.].... backtrace (crc 47f66721): [<ffffffff911251bd>] kmalloc_trace+0x30d/0x3b0 [<ffffffffc2640ff7>] alloc_gid_entry+0x47/0x380 [ib_core] [<ffffffffc2642206>] add_modify_gid+0x166/0x930 [ib_core] [<ffffffffc2643468>] ib_cache_update.part.0+0x6d8/0x910 [ib_core] [<ffffffffc2644e1a>] ib_cache_setup_one+0x24a/0x350 [ib_core] [<ffffffffc263949e>] ib_register_device+0x9e/0x3a0 [ib_core] [<ffffffffc2a3d389>] 0xffffffffc2a3d389 [<ffffffffc2688cd8>] nldev_newlink+0x2b8/0x520 [ib_core] [<ffffffffc2645fe3>] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core] [<ffffffffc264648c>] rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core] [<ffffffff9270e7b5>] netlink_unicast+0x445/0x710 [<ffffffff9270f1f1>] netlink_sendmsg+0x761/0xc40 [<ffffffff9249db29>] __sys_sendto+0x3a9/0x420 [<ffffffff9249dc8c>] __x64_sys_sendto+0xdc/0x1b0 [<ffffffff92db0ad3>] do_syscall_64+0x93/0x180 [<ffffffff92e00126>] entry_SYSCALL_64_after_hwframe+0x71/0x79 The root cause: rdma_put_gid_attr is not called when sgid_attr is set to ERR_PTR(-ENODEV). Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/19bf5745-1b3b-4b8a-81c2-20d945943aaf@linux.dev/T/ Fixes: `f8ef1be816` ("RDMA/cma: Avoid GID lookups on iWARP devices") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240510211247.31345-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-12 13:04:11 +03:00
Leon Romanovsky	49ca2b2ef3	RDMA/IPoIB: Fix format truncation compilation errors Truncate the device name to store IPoIB VLAN name. [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 allmodconfig [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 W=1 drivers/infiniband/ulp/ipoib/ drivers/infiniband/ulp/ipoib/ipoib_vlan.c: In function ‘ipoib_vlan_add’: drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:52: error: ‘%04x’ directive output may be truncated writing 4 bytes into a region of size between 0 and 15 [-Werror=format-truncation=] 187 \| snprintf(intf_name, sizeof(intf_name), "%s.%04x", \| ^~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:48: note: directive argument in the range [0, 65535] 187 \| snprintf(intf_name, sizeof(intf_name), "%s.%04x", \| ^~~~~~~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:9: note: ‘snprintf’ output between 6 and 21 bytes into a destination of size 16 187 \| snprintf(intf_name, sizeof(intf_name), "%s.%04x", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 \| ppriv->dev->name, pkey); \| ~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: * [scripts/Makefile.build:244: drivers/infiniband/ulp/ipoib/ipoib_vlan.o] Error 1 make[6]: * Waiting for unfinished jobs.... Fixes: `9baa0b0364` ("IB/ipoib: Add rtnl_link_ops support") Link: https://lore.kernel.org/r/e9d3e1fef69df4c9beaf402cc3ac342bad680791.1715240029.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-05-12 12:36:46 +03:00
Jakub Kicinski	e7073830cc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c `35d92abfba` ("net: hns3: fix kernel crash when devlink reload during initialization") `2a1a1a7b5f` ("net: hns3: add command queue trace for hns3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-09 10:01:01 -07:00
Linus Torvalds	1bbc991585	qibfs leak fix -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZjwicQAKCRBZ7Krx/gZQ 67WgAP4iDssAVApazVvOUecQBtOd6xBvQUf1k3acMKa03hcdwAEAmM8qN0MpYUkz MyVA6oRvC8selaWh62eXamWBgjc1Rwo= =jLRy -----END PGP SIGNATURE----- Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dentry leak fix from Al Viro: "Dentry leak fix in the qibfs driver that I forgot to send a pull request for ;-/ My apologies - it actually sat in vfs.git#fixes for more than two months..." * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qibfs: fix dentry leak	2024-05-09 08:39:10 -07:00
Michal Schmidt	78cfd17142	bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq Undefined behavior is triggered when bnxt_qplib_alloc_init_hwq is called with hwq_attr->aux_depth != 0 and hwq_attr->aux_stride == 0. In that case, "roundup_pow_of_two(hwq_attr->aux_stride)" gets called. roundup_pow_of_two is documented as undefined for 0. Fix it in the one caller that had this combination. The undefined behavior was detected by UBSAN: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 24 PID: 1075 Comm: (udev-worker) Not tainted 6.9.0-rc6+ #4 Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super Server/H12SSW-iN, BIOS 2.7 10/25/2023 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ubsan_epilogue+0x5/0x30 __ubsan_handle_shift_out_of_bounds.cold+0x61/0xec __roundup_pow_of_two+0x25/0x35 [bnxt_re] bnxt_qplib_alloc_init_hwq+0xa1/0x470 [bnxt_re] bnxt_qplib_create_qp+0x19e/0x840 [bnxt_re] bnxt_re_create_qp+0x9b1/0xcd0 [bnxt_re] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kmalloc+0x1b6/0x4f0 ? create_qp.part.0+0x128/0x1c0 [ib_core] ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re] create_qp.part.0+0x128/0x1c0 [ib_core] ib_create_qp_kernel+0x50/0xd0 [ib_core] create_mad_qp+0x8e/0xe0 [ib_core] ? __pfx_qp_event_handler+0x10/0x10 [ib_core] ib_mad_init_device+0x2be/0x680 [ib_core] add_client_context+0x10d/0x1a0 [ib_core] enable_device_and_get+0xe0/0x1d0 [ib_core] ib_register_device+0x53c/0x630 [ib_core] ? srso_alias_return_thunk+0x5/0xfbef5 bnxt_re_probe+0xbd8/0xe50 [bnxt_re] ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re] auxiliary_bus_probe+0x49/0x80 ? driver_sysfs_add+0x57/0xc0 really_probe+0xde/0x340 ? pm_runtime_barrier+0x54/0x90 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x78/0x110 driver_probe_device+0x1f/0xa0 __driver_attach+0xba/0x1c0 bus_for_each_dev+0x8f/0xe0 bus_add_driver+0x146/0x220 driver_register+0x72/0xd0 __auxiliary_driver_register+0x6e/0xd0 ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re] bnxt_re_mod_init+0x3e/0xff0 [bnxt_re] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re] do_one_initcall+0x5b/0x310 do_init_module+0x90/0x250 init_module_from_file+0x86/0xc0 idempotent_init_module+0x121/0x2b0 __x64_sys_finit_module+0x5e/0xb0 do_syscall_64+0x82/0x160 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode_prepare+0x149/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode+0x75/0x230 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x8e/0x160 ? srso_alias_return_thunk+0x5/0xfbef5 ? __count_memcg_events+0x69/0x100 ? srso_alias_return_thunk+0x5/0xfbef5 ? count_memcg_events.constprop.0+0x1a/0x30 ? srso_alias_return_thunk+0x5/0xfbef5 ? handle_mm_fault+0x1f0/0x300 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_user_addr_fault+0x34e/0x640 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4e5132821d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 db 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffca9c906a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000563ec8a8f130 RCX: 00007f4e5132821d RDX: 0000000000000000 RSI: 00007f4e518fa07d RDI: 000000000000003b RBP: 00007ffca9c90760 R08: 00007f4e513f6b20 R09: 00007ffca9c906f0 R10: 0000563ec8a8faa0 R11: 0000000000000246 R12: 00007f4e518fa07d R13: 0000000000020000 R14: 0000563ec8409e90 R15: 0000563ec8a8fa60 </TASK> ---[ end trace ]--- Fixes: `0c4dcd6028` ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20240507103929.30003-1-mschmidt@redhat.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-09 11:59:31 +03:00
Michael Margolin	2b8af5001a	RDMA/efa: Support QP with unsolicited write w/ imm. receive Add a new EFA flags attribute for QP creation, and support unsolicited write with immediate flag. QPs created with this flag set will not consume receive work requests for incoming RDMA write with immediate. Expose device capability bit for this feature support. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240506151829.6475-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-08 11:18:26 +03:00
Eric Dumazet	1eb2cded45	net: annotate writes on dev->mtu from ndo_change_mtu() Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit `501a90c945` ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-07 16:19:14 -07:00
Breno Leitao	f483f6a29d	IB/hfi1: Remove generic .ndo_get_stats64 Commit `3e2f544dd8` ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240503111333.552360-2-leitao@debian.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 17:10:38 +03:00
Breno Leitao	5194947e6a	IB/hfi1: Do not use custom stat allocator With commit `34d21de99c` ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the hfi1 driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240503111333.552360-1-leitao@debian.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 17:10:38 +03:00
Ilpo Järvinen	8f3b7103b4	RDMA/hfi1: Use RMW accessors for changing LNKCTL2 Convert open coded RMW accesses for LNKCTL2 to use pcie_capability_clear_and_set_word() which makes its easier to understand what the code tries to do. In addition, this futureproofs the code. LNKCTL2 is not really owned by any driver because it is a collection of control bits that PCI core might need to touch. RMW accessors already have support for proper locking for a selected set of registers to avoid losing concurrent updates (LNKCTL2 is not yet among the registers that need protection but likely will be in the future). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240503133640.15899-1-ilpo.jarvinen@linux.intel.com Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 16:11:40 +03:00
Konstantin Taranov	44b607ad4c	RDMA/mana_ib: implement uapi for creation of rnic cq Enable users to create RNIC CQs using a corresponding flag. With the previous request size, an ethernet CQ is created. As a response, return ID of the created CQ. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1714137160-5222-6-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 15:27:23 +03:00
Konstantin Taranov	f79edef79b	RDMA/mana_ib: boundary check before installing cq callbacks Add a boundary check inside mana_ib_install_cq_cb to prevent index overflow. Fixes: `2a31c5a7e0` ("RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1714137160-5222-5-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 15:27:23 +03:00
Konstantin Taranov	3e41105263	RDMA/mana_ib: introduce a helper to remove cq callbacks Intoduce the mana_ib_remove_cq_cb helper to remove cq callbacks. The helper removes code duplicates. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1714137160-5222-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 15:27:23 +03:00
Konstantin Taranov	5843415916	RDMA/mana_ib: create and destroy RNIC cqs Implement RNIC requests for creation and destruction of RNIC CQs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1714137160-5222-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 15:27:23 +03:00
Konstantin Taranov	e73c882f0a	RDMA/mana_ib: create EQs for RNIC CQs Create EQs within mana_ib device. Such EQs are required for creation of RNIC CQs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1714137160-5222-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 15:27:23 +03:00
Jules Irenge	48d80b4844	RDMA/core: Remove NULL check before dev_{put, hold} Coccinelle reports a warning WARNING: NULL check before dev_{put, hold} functions is not needed The reason is the call netdev_{put, hold} of dev_{put,hold} will check NULL There is no need to check before using dev_{put, hold} Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Link: https://lore.kernel.org/r/ZjF1Eedxwhn4JSkz@octinomon.home Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-05 15:12:35 +03:00
Lukas Wunner	7b53e926bf	IB/qib: Use device_show_string() helper for sysfs attributes Deduplicate sysfs ->show() callbacks which expose a string at a static memory location. Use the newly introduced device_show_string() helper in the driver core instead by declaring those sysfs attributes with DEVICE_STRING_ATTR_RO(). No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/r/185a88dd3e6e18bf508a875d87c95ea2a5c3fa13.1713608122.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-05-04 17:37:03 +02:00
Breno Leitao	1c8f43f477	IB/hfi1: allocate dummy net_device dynamically Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from struct hfi1_netdev_rx by converting it into a pointer. Then use the leverage alloc_netdev() to allocate the net_device object at hfi1_alloc_rx(). [1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/ Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Leon Romanovsky <leon@kernel.org> Link: https://lore.kernel.org/r/20240430162213.746492-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-02 18:20:17 -07:00
Jules Irenge	e4e40a8702	RDMA/ipoib: Remove NULL check before dev_{put, hold} Coccinelle reports a warning WARNING: NULL check before dev_{put, hold} functions is not needed The reason is the call netdev_{put, hold} of dev_{put,hold} will check NULL There is no need to check before using dev_{put, hold} Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Link: https://lore.kernel.org/r/ZjGDFatHRMI6Eg7M@octinomon.home Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-02 17:46:38 +03:00
Jules Irenge	82e966130d	RDMA/mlx5: Remove NULL check before dev_{put, hold} Coccinelle reports a warning WARNING: NULL check before dev_{put, hold} functions is not needed The reason is the call netdev_{put, hold} of dev_{put,hold} will check NULL There is no need to check before using dev_{put, hold} Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Link: https://lore.kernel.org/r/ZjGC4qXrOwZE0aHi@octinomon.home Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-05-02 17:45:59 +03:00
Eric Dumazet	05d6d49209	inet: introduce dst_rtable() helper I added dst_rt6_info() in commit `e8dfd42c17` ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-30 18:32:38 -07:00
Chiara Meiohas	fd3af5e218	RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources. Allow user to see driver-specific QPs (the "driver_detail" QPs) through the rdmatool, when requested. When creating DCT, DCI and REG_UMR QPs, we designate them as driver_detail resources. When filling the QP info for the rdma tool, for the driver_detail QPs: -the QP type is IB_QPT_DRIVER -the subtype is a string with the QP name ("DCT", "DCI", "REG_UMR") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://lore.kernel.org/r/452432d7d0917f053a80a893a614169857fe3b10.1713268997.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-30 11:19:37 +03:00
Chiara Meiohas	e18fa0bbce	RDMA/core: Add an option to display driver-specific QPs in the rdmatool Utilize the -dd flag (driver-specific details) in the rdmatool to view driver-specific QPs which are not exposed yet. Add the netlink attribute to mark request to convey driver details and use it to return QP subtype as a string. $ rdma resource show qp link ibp8s0f1 link ibp8s0f1/1 lqpn 360 type UD state RTS sq-psn 0 comm [mlx5_ib] link ibp8s0f1/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp8s0f1/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] $ rdma resource show qp link ibp8s0f1 -dd link ibp8s0f1/1 lqpn 360 type UD state RTS sq-psn 0 comm [mlx5_ib] link ibp8s0f1/1 lqpn 465 type DRIVER subtype REG_UMR state RTS sq-psn 0 comm [mlx5_ib] link ibp8s0f1/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp8s0f1/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] $ rdma resource show 0: ibp8s0f0: pd 3 cq 4 qp 3 cm_id 0 mr 0 ctx 0 srq 2 1: ibp8s0f1: pd 3 cq 4 qp 3 cm_id 0 mr 0 ctx 0 srq 2 $ rdma resource show -dd 0: ibp8s0f0: pd 3 cq 4 qp 4 cm_id 0 mr 0 ctx 0 srq 2 1: ibp8s0f1: pd 3 cq 4 qp 4 cm_id 0 mr 0 ctx 0 srq 2 Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://lore.kernel.org/r/2607bb3ddec3cae3443c2ea19e9f700825d20a98.1713268997.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-30 11:19:29 +03:00
Eric Dumazet	e8dfd42c17	ipv6: introduce dst_rt6_info() helper Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 13:32:01 +01:00
Michael Margolin	f847e84015	RDMA/efa: Add shutdown notifier Add driver function to stop the device and release any active IRQs as preparation for shutdown. This should fix issues caused by unexpected AQ interrupts when booting kernel using kexec and possible data integrity issues when the system is being shutdown during traffic. Link: https://lore.kernel.org/r/20240425171814.25216-1-mrgolin@amazon.com Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-29 09:02:58 -03:00
Jakub Kicinski	2bd87951de	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c `89884459a0` ("wifi: mac80211: fix idle calculation with multi-link") `87f5500285` ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c `1971d13ffa` ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") `4090fa373f` ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c `4dcd0e83ea` ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") `e2dc7bfd67` ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 12:41:37 -07:00
Damien Le Moal	d6f0bbcd2f	RDMA/vmw_pvrdma: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-13-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2024-04-25 12:53:31 -05:00
Damien Le Moal	91d343de20	IB/qib: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-12-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>	2024-04-25 12:53:31 -05:00
Konstantin Taranov	f88320b698	RDMA/mana_ib: Fix missing ret value Set ret to -ENODEV when netdev_master_upper_dev_get_rcu returns NULL. Fixes: `8b184e4f1c` ("RDMA/mana_ib: Enable RoCE on port 1") Link: https://lore.kernel.org/r/1713881751-21621-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-23 12:02:44 -03:00
Jason Gunthorpe	ef302283dd	IB/mlx5: Use __iowrite64_copy() for write combining stores mlx5 has a built in self-test at driver startup to evaluate if the platform supports write combining to generate a 64 byte PCIe TLP or not. This has proven necessary because a lot of common scenarios end up with broken write combining (especially inside virtual machines) and there is other way to learn this information. This self test has been consistently failing on new ARM64 CPU designs (specifically with NVIDIA Grace's implementation of Neoverse V2). The C loop around writeq() generates some pretty terrible ARM64 assembly, but historically this has worked on a lot of existing ARM64 CPUs till now. We see it succeed about 1 time in 10,000 on the worst effected systems. The CPU architects speculate that the load instructions interspersed with the stores makes the WC buffers statistically flush too often and thus the generation of large TLPs becomes infrequent. This makes the boot up test unreliable in that it indicates no write-combining, however userspace would be fine since it uses a ST4 instruction. Further, S390 has similar issues where only the special zpci_memcpy_toio() will actually generate large TLPs, and the open coded loop does not trigger it at all. Fix both ARM64 and S390 by switching to __iowrite64_copy() which now provides architecture specific variants that have a high change of generating a large TLP with write combining. x86 continues to use a similar writeq loop in the generate __iowrite64_copy(). Fixes: `11f552e217` ("IB/mlx5: Test write combining support") Link: https://lore.kernel.org/r/6-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 17:11:20 -03:00
Bob Pearson	1a633bdc8f	RDMA/rxe: Let destroy qp succeed with stuck packet In some situations a sent packet may get queued in the NIC longer than than timeout of a ULP. Currently if this happens the ULP may try to reset the link by destroying the qp and setting up an alternate connection but will fail because the rxe driver is waiting for the packet to finish getting sent and be returned to the skb destructor function where the qp reference holding things up will be dropped. This patch modifies the way that the qp is passed to the destructor to pass the qp index and not a qp pointer. Then the destructor will attempt to lookup the qp from its index and if it fails exit early. This requires taking a reference on the struct sock rather than the qp allowing the qp to be destroyed while the sk is still around waiting for the packet to finish. Link: https://lore.kernel.org/r/20240329145513.35381-15-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:55:57 -03:00
Bob Pearson	9cc6290991	RDMA/rxe: Get rid of pkt resend on err Currently the rxe_driver detects packet drops by ip_local_out() which occur before the packet is sent on the wire and attempts to resend them. This is redundant with the usual retry mechanism which covers packets that get dropped in transit to or from the remote node. The way this is implemented is not robust since it sets need_req_skb and waits for the number of local skbs outstanding for this qp to drop below a low water mark. This is racy since the skb may be sent to the destructor before the requester can set the need_req_skb flag. This will cause a deadlock in the send path for that qp. This patch removes this mechanism since the normal retry path will correct the error and resend the packet and it makes no difference if the packet is dropped locally or later. Link: https://lore.kernel.org/r/20240329145513.35381-14-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:34 -03:00
Bob Pearson	55bec1c440	RDMA/rxe: Make rxe_loopback match rxe_send behavior The rxe send path currently counts the number of skbs outstanding between the rxe driver and the ethernet driver to prevent too many packets to accumulate waiting to send. This patch makes the local loopback path behave the same way. The loopback path forwards the packets to the receive path which will eventually call kfree_skb on all packets and drop the qp references. This makes the loopback path more useful for software testing. Link: https://lore.kernel.org/r/20240329145513.35381-13-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	8776618dbb	RDMA/rxe: Fix incorrect rxe_put in error path In rxe_send() a ref is taken on the qp to keep it alive until the kfree_skb() has a chance to call the skb destructor rxe_skb_tx_dtor() which drops the reference. If the packet has an incorrect protocol the error path just calls kfree_skb() which will call the destructor which will drop the ref. Currently the driver also calls rxe_put() which is incorrect. Additionally since the packets sent to rxe_send() are under the control of the driver and it only ever produces IPV4 or IPV6 packets the simplest fix is to remove all the code in this block. Link: https://lore.kernel.org/r/20240329145513.35381-12-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: `9eb7f8e44d` ("IB/rxe: Move refcounting earlier in rxe_send()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	23bc06af54	RDMA/rxe: Don't call direct between tasks Replace calls to rxe_run_task() with rxe_sched_task(). This prevents the tasks from all running on the same cpu. This change slightly reduces performance for single qp send and write benchmarks in loopback mode but greatly improves the performance with multiple qps because if run task is used all the work tends to be performed on one cpu. For actual on the wire benchmarks there is no noticeable performance change. Link: https://lore.kernel.org/r/20240329145513.35381-11-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	3d807a3ebc	RDMA/rxe: Don't call rxe_requester from rxe_completer Instead of rescheduling rxe_requester from rxe_completer() just extend the duration of rxe_sender() by one pass. Setting run_requester_again forces rxe_completer() to return 0 which will cause rxe_sender() to be called at least one more time. Link: https://lore.kernel.org/r/20240329145513.35381-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	4891f4fed0	RDMA/rxe: Don't schedule rxe_completer from rxe_requester Now that rxe_completer() is always called serially after rxe_requester() there is no reason to schedule rxe_completer() from rxe_requester(). Link: https://lore.kernel.org/r/20240329145513.35381-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	cd8aaddf0d	RDMA/rxe: Remove save/rollback_state in rxe_requester Now that req.task and comp.task are merged it is no longer necessary to call save_state() before calling rxe_xmit_pkt() and rollback_state() if rxe_xmit_pkt() fails. This was done originally to prevent races between rxe_completer() and rxe_requester() which now cannot happen. Link: https://lore.kernel.org/r/20240329145513.35381-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	67f57892f9	RDMA/rxe: Merge request and complete tasks Currently the rxe driver has three work queue tasks per qp. These are the req.task, comp.task and resp.task which call rxe_requester(), rxe_completer() and rxe_responder() respectively directly or on work queues. Each of these subroutines checks to see if there is work to be performed on the send queue or on the response packet queue or the request packet queue and will run until there is no work remaining or yield the cpu and reschedule itself until there is no work remaining. This commit combines the req.task and comp.task into a single send.task and renames the resp.task to the recv.task. The combined send.task calls rxe_requester() and rxe_completer() serially and continues until all work on both the send queue and the response packet queue are done. In various benchmarks the performance is either improved or left the same. At high scale there is a significant reduction in the load on the cpu. This is the first step in combining these two tasks. Once they are serialized cross rescheduling of req.task and comp.task can be more efficiently handled by just letting the send.task continue to run. This will be done in the next several patches. Link: https://lore.kernel.org/r/20240329145513.35381-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	ff30e45376	RDMA/rxe: Remove redundant scheduling of rxe_completer In rxe_post_send_kernel() if the qp is in the error state after posting the work requests the rxe_completer() task is scheduled. But, the only way to move the qp into the error state is to call rxe_qp_error() which also schedules the rxe_completer() task to drain the queues. Calling it a second time has no effect. This commit removes the redundant call. Link: https://lore.kernel.org/r/20240329145513.35381-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:33 -03:00
Bob Pearson	b703374837	RDMA/rxe: Allow good work requests to be executed A previous commit incorrectly added an 'if(!err)' before scheduling the requester task in rxe_post_send_kernel(). But if there were send wrs successfully added to the send queue before a bad wr they might never get executed. This commit fixes this by scheduling the requester task if any wqes were successfully posted in rxe_post_send_kernel() in rxe_verbs.c. Link: https://lore.kernel.org/r/20240329145513.35381-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: `5bf944f241` ("RDMA/rxe: Add error messages") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:32 -03:00
Bob Pearson	2b23b60973	RDMA/rxe: Fix seg fault in rxe_comp_queue_pkt In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the resp_pkts queue and then a decision is made whether to run the completer task inline or schedule it. Finally the skb is dereferenced to bump a 'hw' performance counter. This is wrong because if the completer task is already running in a separate thread it may have already processed the skb and freed it which can cause a seg fault. This has been observed infrequently in testing at high scale. This patch fixes this by changing the order of enqueuing the packet until after the counter is accessed. Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: `0b1e5b99a4` ("IB/rxe: Add port protocol stats") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-04-22 16:54:32 -03:00
Michael Guralnik	ca0b44e20a	IB/core: Implement a limit on UMAD receive List The existing behavior of ib_umad, which maintains received MAD packets in an unbounded list, poses a risk of uncontrolled growth. As user-space applications extract packets from this list, the rate of extraction may not match the rate of incoming packets, leading to potential list overflow. To address this, we introduce a limit to the size of the list. After considering typical scenarios, such as OpenSM processing, which can handle approximately 100k packets per second, and the 1-second retry timeout for most packets, we set the list size limit to 200k. Packets received beyond this limit are dropped, assuming they are likely timed out by the time they are handled by user-space. Notably, packets queued on the receive list due to reasons like timed-out sends are preserved even when the list is full. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7197cb58a7d9e78399008f25036205ceab07fbd5.1713268818.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-21 14:00:35 +03:00
Chengchang Tang	349e859952	RDMA/hns: Modify the print level of CQE error Too much print may lead to a panic in kernel. Change ibdev_err() to ibdev_err_ratelimited(), and change the printing level of cqe dump to debug level. Fixes: `7c044adca2` ("RDMA/hns: Simplify the cqe code of poll cq") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-11-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Chengchang Tang	4125269bb9	RDMA/hns: Use complete parentheses in macros Use complete parentheses to ensure that macro expansion does not produce unexpected results. Fixes: `a25d13cbe8` ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-10-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
wenglianfa	9a84848dce	RDMA/hns: Add mutex_destroy() Add mutex_destroy(). Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-9-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Chengchang Tang	ee04549328	RDMA/hns: Fix GMV table pagesize GMV's BA table only supports 4K pages. Currently, PAGESIZE is used to calculate gmv_bt_num, which will cause an abnormal number of gmv_bt_num in a 64K OS. Fixes: `d6d91e4621` ("RDMA/hns: Add support for configuring GMV table") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
wenglianfa	dc3bda6e56	RDMA/hns: Fix mismatch exception rollback When dma_alloc_coherent() fails in hns_roce_alloc_hem(), just call kfree() to release hem instead of hns_roce_free_hem(). Fixes: `c00743cbf2` ("RDMA/hns: Simplify 'struct hns_roce_hem' allocation") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Chengchang Tang	a942ec2745	RDMA/hns: Fix UAF for cq async event The refcount of CQ is not protected by locks. When CQ asynchronous events and CQ destruction are concurrent, CQ may have been released, which will cause UAF. Use the xa_lock() to protect the CQ refcount. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Chengchang Tang	b46494b6f9	RDMA/hns: Fix deadlock on SRQ async events. xa_lock for SRQ table may be required in AEQ. Use xa_store_irq()/ xa_erase_irq() to avoid deadlock. Fixes: `81fce6291d` ("RDMA/hns: Add SRQ asynchronous event support") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Chengchang Tang	2ce384307f	RDMA/hns: Add max_ah and cq moderation capacities in query_device() Add max_ah and cq moderation capacities to hns_roce_query_device(). Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Chengchang Tang	f4caa864af	RDMA/hns: Remove unused parameters and variables Remove unused parameters and variables. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Yangyang Li	bfb6be4014	RDMA/hns: Use macro instead of magic number Use macro instead of magic number. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240412091616.370789-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 15:06:47 +03:00
Zhengchao Shao	203b70fda6	RDMA/hns: Fix return value in hns_roce_map_mr_sg As described in the ib_map_mr_sg function comment, it returns the number of sg elements that were mapped to the memory region. However, hns_roce_map_mr_sg returns the number of pages required for mapping the DMA area. Fix it. Fixes: `9b2cf76c9f` ("RDMA/hns: Optimize PBL buffer allocation process") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240411033851.2884771-1-shaozhengchao@huawei.com Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:59:08 +03:00
Konstantin Taranov	8859f009ac	RDMA/mana_ib: Configure mac address in RNIC Set local mac address in RNIC, which is required by the HW. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-7-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:28:26 +03:00
Konstantin Taranov	faafb8b126	RDMA/mana_ib: Adding and deleting GIDs Implement add_gid and del_gid for RNIC. IPv4 and IPv6 addresses are supported. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-6-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:28:26 +03:00
Konstantin Taranov	8b184e4f1c	RDMA/mana_ib: Enable RoCE on port 1 Set netdev and RoCEv2 flag to enable GID population on port 1. Use GIDs of the master netdev. As mc->ports[] stores slave devices, use a helper to get the master netdev. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-5-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:28:26 +03:00
Konstantin Taranov	4bda1d5332	RDMA/mana_ib: Implement port parameters Implement port parameters for RNIC: 1) extend query_port() method 2) implement get_link_layer() 3) implement query_pkey() Only port 1 can store GIDs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-4-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:28:26 +03:00
Konstantin Taranov	1a79c2b9d4	RDMA/mana_ib: Create and destroy rnic adapter Add functions for RNIC creation and destruction. If creation fails, the ib_probe fails as well. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-3-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:28:26 +03:00
Konstantin Taranov	98b889c439	RDMA/mana_ib: Add EQ creation for rnic adapter Create an error EQ for the RNIC adapter. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712738551-22075-2-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 14:28:26 +03:00
Konstantin Taranov	23f59f4e83	RDMA/mana_ib: Use num_comp_vectors of ib_device Use num_comp_vectors of struct ib_device instead of max_num_queues from gdma_context. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712911656-17352-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-16 13:29:47 +03:00
Jakub Kicinski	0e36c21d76	Merge branch mana-ib-flex of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git Erick Archer says: ==================== mana: Add flex array to struct mana_cfg_rx_steer_req_v2 (part) The "struct mana_cfg_rx_steer_req_v2" uses a dynamically sized set of trailing elements. Specifically, it uses a "mana_handle_t" array. So, use the preferred way in the kernel declaring a flexible array [1]. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Also, avoid the open-coded arithmetic in the memory allocator functions [2] using the "struct_size" macro. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. Now, it is also possible to use the "flex_array_size" helper to compute the size of these trailing elements in the "memcpy" function. Specifically, the first commit adds the flex member and the patches 2 and 3 refactor the consumers of the "struct mana_cfg_rx_steer_req_v2". This code was detected with the help of Coccinelle, and audited and modified manually. The Coccinelle script used to detect this code pattern is the following: virtual report @rule1@ type t1; type t2; identifier i0; identifier i1; identifier i2; identifier ALLOC =~ "kmalloc\|kzalloc\|kmalloc_node\|kzalloc_node\|vmalloc\|vzalloc\|kvmalloc\|kvzalloc"; position p1; @@ i0 = sizeof(t1) + sizeof(t2) * i1; ... i2 = ALLOC@p1(..., i0, ...); @script:python depends on report@ p1 << rule1.p1; @@ msg = "WARNING: verify allocation on line %s" % (p1[0].line) coccilib.report.print_report(p1[0],msg) Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2] v1: https://lore.kernel.org/linux-hardening/AS8PR02MB7237974EF1B9BAFA618166C38B382@AS8PR02MB7237.eurprd02.prod.outlook.com/ v2: https://lore.kernel.org/linux-hardening/AS8PR02MB723729C5A63F24C312FC9CD18B3F2@AS8PR02MB7237.eurprd02.prod.outlook.com/ ==================== Link: https://lore.kernel.org/r/AS8PR02MB72374BD1B23728F2E3C3B1A18B022@AS8PR02MB7237.eurprd02.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-11 07:38:28 -07:00
Zhu Yanjun	dfcdb38b21	RDMA/rxe: Return the correct errno In the function __rxe_add_to_pool, the function xa_alloc_cyclic is called. The return value of the function xa_alloc_cyclic is as below: " Return: 0 if the allocation succeeded without wrapping. 1 if the allocation succeeded after wrapping, -ENOMEM if memory could not be allocated or -EBUSY if there are no free entries in @limit. " But now the function __rxe_add_to_pool only returns -EINVAL. All the returned error value should be returned to the caller. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240408142142.792413-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-11 14:47:26 +03:00
Konstantin Taranov	c8fc935f4b	RDMA/mana_ib: remove useless return values from dbg prints Remove printing ret value on success as it was always 0. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1712672465-29960-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-11 14:30:24 +03:00
Leon Romanovsky	e537deecda	RDMA/mana_ib: Add flex array to struct mana_cfg_rx_steer_req_v2 The "struct mana_cfg_rx_steer_req_v2" uses a dynamically sized set of trailing elements. Specifically, it uses a "mana_handle_t" array. So, use the preferred way in the kernel declaring a flexible array [1]. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Also, avoid the open-coded arithmetic in the memory allocator functions [2] using the "struct_size" macro. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. Now, it is also possible to use the "flex_array_size" helper to compute the size of these trailing elements in the "memcpy" function. Specifically, the first commit adds the flex member and the patches 2 and 3 refactor the consumers of the "struct mana_cfg_rx_steer_req_v2". This code was detected with the help of Coccinelle, and audited and modified manually. The Coccinelle script used to detect this code pattern is the following: virtual report @rule1@ type t1; type t2; identifier i0; identifier i1; identifier i2; identifier ALLOC =~ "kmalloc\|kzalloc\|kmalloc_node\|kzalloc_node\|vmalloc\|vzalloc\|kvmalloc\|kvzalloc"; position p1; @@ i0 = sizeof(t1) + sizeof(t2) * i1; ... i2 = ALLOC@p1(..., i0, ...); @script:python depends on report@ p1 << rule1.p1; @@ msg = "WARNING: verify allocation on line %s" % (p1[0].line) coccilib.report.print_report(p1[0],msg) [1] https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [2] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/all/AS8PR02MB72374BD1B23728F2E3C3B1A18B022@AS8PR02MB7237.eurprd02.prod.outlook.com Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> * mana-ib-flex: net: mana: Avoid open coded arithmetic RDMA/mana_ib: Prefer struct_size over open coded arithmetic net: mana: Add flex array to struct mana_cfg_rx_steer_req_v2	2024-04-11 13:48:53 +03:00
Erick Archer	29b8e13a8b	RDMA/mana_ib: Prefer struct_size over open coded arithmetic This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "req" variable is a pointer to "struct mana_cfg_rx_steer_req_v2" and this structure ends in a flexible array: struct mana_cfg_rx_steer_req_v2 { [...] mana_handle_t indir_tab[] __counted_by(num_indir_entries); }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kzalloc() function. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <erick.archer@outlook.com> Link: https://lore.kernel.org/r/AS8PR02MB72375EB06EE1A84A67BE722E8B022@AS8PR02MB7237.eurprd02.prod.outlook.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-11 13:46:47 +03:00
Junxian Huang	ee20cc17e9	RDMA/hns: Support DSCP Add support for DSCP configuration. For DSCP, get dscp-prio mapping via hns3 nic driver api .get_dscp_prio() and fill the SL (in WQE for UD or in QPC for RC) with the priority value. The prio-tc mapping is configured to HW by hns3 nic driver. HW will select a corresponding TC according to SL and the prio-tc mapping. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240315093551.1650088-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-09 10:27:41 +03:00
Guillaume Nault	ec20b28300	ipv4: Set scope explicitly in ip_route_output(). Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope. This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN. Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-08 13:20:51 +01:00
Or Har-Toov	2ca7e93bc9	RDMA/mlx5: Adding remote atomic access flag to updatable flags Currently IB_ACCESS_REMOTE_ATOMIC is blocked from being updated via UMR although in some cases it should be possible. These cases are checked in mlx5r_umr_can_reconfig function. Fixes: `ef3642c4f5` ("RDMA/mlx5: Fix error unwinds for rereg_mr") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/24dac73e2fa48cb806f33a932d97f3e402a5ea2c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-08 13:34:54 +03:00
Or Har-Toov	8c1185fef6	RDMA/mlx5: Change check for cacheable mkeys umem can be NULL for user application mkeys in some cases. Therefore umem can't be used for checking if the mkey is cacheable and it is changed for checking a flag that indicates it. Also make sure that all mkeys which are not returned to the cache will be destroyed. Fixes: `dd1b913fb0` ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/2690bc5c6896bcb937f89af16a1ff0343a7ab3d0.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-08 13:34:51 +03:00
Or Har-Toov	0611a8e8b4	RDMA/mlx5: Uncacheable mkey has neither rb_key or cache_ent As some mkeys can't be modified with UMR due to some UMR limitations, like the size of translation that can be updated, not all user mkeys can be cached. Fixes: `dd1b913fb0` ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Link: https://lore.kernel.org/r/f2742dd934ed73b2d32c66afb8e91b823063880c.1712140377.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-08 13:34:47 +03:00
Michael Guralnik	be121ffb38	RDMA/mlx5: Fix port number for counter query in multi-port configuration Set the correct port when querying PPCNT in multi-port configuration. Distinguish between cases where switchdev mode was enabled to multi-port configuration and don't overwrite the queried port to 1 in multi-port case. Fixes: `74b30b3ad5` ("RDMA/mlx5: Set local port to one when accessing counters") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/9bfcc8ade958b760a51408c3ad654a01b11f7d76.1712134988.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-08 13:33:10 +03:00
Konstantin Taranov	f10242b3da	RDMA/mana_ib: Use struct mana_ib_queue for RAW QPs Use struct mana_ib_queue and its helpers for RAW QPs Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-5-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-02 11:30:23 +03:00
Konstantin Taranov	688bac28e3	RDMA/mana_ib: Use struct mana_ib_queue for WQs Use struct mana_ib_queue and its helpers for WQs Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-02 11:30:23 +03:00
Konstantin Taranov	60a7ac0b8b	RDMA/mana_ib: Use struct mana_ib_queue for CQs Use struct mana_ib_queue and its helpers for CQs Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-02 11:30:23 +03:00
Konstantin Taranov	46f5be7cd4	RDMA/mana_ib: Introduce helpers to create and destroy mana queues Intoduce helpers to work with mana ib queues (struct mana_ib_queue). A queue always consists of umem, gdma_region, and id. A queue can become a WQ or a CQ. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1711483688-24358-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-02 11:30:22 +03:00
Mark Zhang	b68e1acb58	RDMA/cm: Print the old state when cm_destroy_id gets timeout The old state is helpful for debugging, as the current state is always IB_CM_IDLE when timeout happens. Fixes: `96d9cbe2f2` ("RDMA/cm: add timeout to cm_destroy_id wait") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/20240322112049.2022994-1-markzhang@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-01 15:16:36 +03:00
Wenchao Hao	ca537a3477	RDMA/restrack: Fix potential invalid address access struct rdma_restrack_entry's kern_name was set to KBUILD_MODNAME in ib_create_cq(), while if the module exited but forgot del this rdma_restrack_entry, it would cause a invalid address access in rdma_restrack_clean() when print the owner of this rdma_restrack_entry. These code is used to help find one forgotten PD release in one of the ULPs. But it is not needed anymore, so delete them. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20240318092320.1215235-1-haowenchao2@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-01 14:49:27 +03:00
Boshi Yu	df0e16bab5	RDMA/erdma: Remove unnecessary __GFP_ZERO flag The dma_alloc_coherent() interface automatically zero the memory returned. Thus, we do not need to specify the __GFP_ZERO flag explicitly when we call dma_alloc_coherent(). Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://lore.kernel.org/r/20240311113821.22482-4-boshiyu@alibaba-inc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-01 14:46:01 +03:00
Boshi Yu	fdb09ed15f	RDMA/erdma: Unify the names related to doorbell records There exist two different names for the doorbell records: db_info and db_record. We use dbrec for cpu address of the doorbell record and dbrec_dma for dma address of the doorbell recordi uniformly. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://lore.kernel.org/r/20240311113821.22482-3-boshiyu@alibaba-inc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-01 14:46:01 +03:00
Boshi Yu	f0697bf078	RDMA/erdma: Allocate doorbell records from dma pool Currently, the 8 byte doorbell record is allocated along with the queue buffer, which may result in waste of dma space when the queue buffer is page aligned. To address this issue, we introduce a dma pool named db_pool and allocate doorbell record from it. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://lore.kernel.org/r/20240311113821.22482-2-boshiyu@alibaba-inc.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-01 14:46:01 +03:00
Yanjun.Zhu	481047d7e8	RDMA/rxe: Fix the problem "mutex_destroy missing" When a mutex lock is not used any more, the function mutex_destroy should be called to mark the mutex lock uninitialized. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yanjun.Zhu <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240314065140.27468-1-yanjun.zhu@linux.dev Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-04-01 14:36:41 +03:00
Linus Torvalds	6207b37eb5	RDMA v6.9 Very small update this cycle: - Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5, irdma, rxe, rtrs, mana - Simplify the hns hem mechanism - Fix EFA's MSI-X allocation in resource constrained configurations - Fix a KASN splat in srpt - Narrow hns's congestion control selection to QPs granularity and allow userspace to select it - Solve a parallel module loading race between the CM module and a driver module - Flexible array cleanup - Dump hns's SCC Conext to 'rdma res' for debugging - Make mana build page lists for HW objects that require a 0 offset correctly - Stuck CM ID debugging -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZfgzdQAKCRCFwuHvBreF YbS7AQDLy6uJ/1dgrZQ4efcyQDs6H93LG4jWZKoA7F9Oho+MFQEAsQM/UL4nj18O T6vHl30N0Ee0aOCqET7HBbnFGKEADAE= =KxUj -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Very small update this cycle: - Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5, irdma, rxe, rtrs, mana - Simplify the hns hem mechanism - Fix EFA's MSI-X allocation in resource constrained configurations - Fix a KASN splat in srpt - Narrow hns's congestion control selection to QPs granularity and allow userspace to select it - Solve a parallel module loading race between the CM module and a driver module - Flexible array cleanup - Dump hns's SCC Conext to 'rdma res' for debugging - Make mana build page lists for HW objects that require a 0 offset correctly - Stuck CM ID debugging" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (29 commits) RDMA/cm: add timeout to cm_destroy_id wait RDMA/mana_ib: Use virtual address in dma regions for MRs RDMA/mana_ib: Fix bug in creation of dma regions RDMA/hns: Append SCC context to the raw dump of QPC RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store() RDMA/uverbs: Remove flexible arrays from struct *_filter RDMA/device: Fix a race between mad_client and cm_client init RDMA/hns: Fix mis-modifying default congestion control algorithm RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user RDMA/srpt: Do not register event handler until srpt device is fully setup RDMA/irdma: Remove duplicate assignment RDMA/efa: Limit EQs to available MSI-X vectors RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function RDMA/mana_ib: Introduce mana_ib_get_netdev helper function RDMA/mana_ib: Introduce mdev_to_gc helper function RDMA/hns: Simplify 'struct hns_roce_hem' allocation ...	2024-03-18 15:34:03 -07:00
Manjunath Patil	96d9cbe2f2	RDMA/cm: add timeout to cm_destroy_id wait Add timeout to cm_destroy_id, so that userspace can trigger any data collection that would help in analyzing the cause of delay in destroying the cm_id. New noinline function helps dtrace/ebpf programs to hook on to it. Existing functionality isn't changed except triggering a probe-able new function at every timeout interval. We have seen cases where CM messages stuck with MAD layer (either due to software bug or faulty HCA), leading to cm_id getting stuck in the following call stack. This patch helps in resolving such issues faster. kernel: ... INFO: task XXXX:56778 blocked for more than 120 seconds. ... Call Trace: __schedule+0x2bc/0x895 schedule+0x36/0x7c schedule_timeout+0x1f6/0x31f ? __slab_free+0x19c/0x2ba wait_for_completion+0x12b/0x18a ? wake_up_q+0x80/0x73 cm_destroy_id+0x345/0x610 [ib_cm] ib_destroy_cm_id+0x10/0x20 [ib_cm] rdma_destroy_id+0xa8/0x300 [rdma_cm] ucma_destroy_id+0x13e/0x190 [rdma_ucm] ucma_write+0xe0/0x160 [rdma_ucm] __vfs_write+0x3a/0x16d vfs_write+0xb2/0x1a1 ? syscall_trace_enter+0x1ce/0x2b8 SyS_write+0x5c/0xd3 do_syscall_64+0x79/0x1b9 entry_SYSCALL_64_after_hwframe+0x16d/0x0 Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Link: https://lore.kernel.org/r/20240309063323.458102-1-manjunath.b.patil@oracle.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-10 13:17:54 +02:00
Konstantin Taranov	2d5c008157	RDMA/mana_ib: Use virtual address in dma regions for MRs Introduce mana_ib_create_dma_region() to create dma regions with iova for MRs. It allows creating MRs with any page offset. Previously, only page-aligned addresses worked. For dma regions that must have a zero dma offset (e.g., for queues), mana_ib_create_zero_offset_dma_region() is added. To get the zero offset, ib_umem_find_best_pgoff() is used with zero pgoff_bitmask. Fixes: `0266a17763` ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1709560361-26393-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-07 11:32:33 +02:00
Konstantin Taranov	e02497fb65	RDMA/mana_ib: Fix bug in creation of dma regions Use ib_umem_dma_offset() helper to calculate correct dma offset. Fixes: `0266a17763` ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1709560361-26393-2-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-07 11:32:33 +02:00
wenglianfa	124a9fbe43	RDMA/hns: Append SCC context to the raw dump of QPC SCCC (SCC Context) is a context with QP granularity that contains information about congestion control. Dump SCCC and QPC together to improve troubleshooting. When dumping raw QPC with rdmatool, there will be a total of 576 bytes data output, where the first 512 bytes is QPC and the last 64 bytes is SCCC. When congestion control is disabled, the 64 byte SCCC will be all 0. Example: $rdma res show qp -jpr [ { "ifindex": 0, "ifname": "hns_0", "data": [ 67,0,0,0... 512bytes 4,0,2... 64bytes] },... } ] Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240305055257.823513-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-07 11:26:10 +02:00
Gustavo A. R. Silva	155f04366e	RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. There are currently a couple of objects (`alloc_head` and `bundle`) in `struct bundle_priv` that contain a couple of flexible structures: struct bundle_priv { /* Must be first / struct bundle_alloc_head alloc_head; ... / * Must be last. bundle ends in a flex array which overlaps * internal_buffer. / struct uverbs_attr_bundle bundle; u64 internal_buffer[32]; }; So, in order to avoid ending up with a couple of flexible-array members in the middle of a struct, we use the `struct_group_tagged()` helper to separate the flexible array from the rest of the members in the flexible structures: struct uverbs_attr_bundle { struct_group_tagged(uverbs_attr_bundle_hdr, hdr, ... the rest of the members ); struct uverbs_attr attrs[]; }; With the change described above, we now declare objects of the type of the tagged struct without embedding flexible arrays in the middle of another struct: struct bundle_priv { / Must be first */ struct bundle_alloc_head_hdr alloc_head; ... struct uverbs_attr_bundle_hdr bundle; u64 internal_buffer[32]; }; We also use `container_of()` whenever we need to retrieve a pointer to the flexible structures. Notice that the `bundle_size` computed in `uapi_compute_bundle_size()` remains the same. So, with these changes, fix the following warnings: drivers/infiniband/core/uverbs_ioctl.c:45:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 45 \| struct bundle_alloc_head alloc_head; \| ^~~~~~~~~~ drivers/infiniband/core/uverbs_ioctl.c:67:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 67 \| struct uverbs_attr_bundle bundle; \| ^~~~~~ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZeIgeZ5Sb0IZTOyt@neat Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-03 15:38:44 +02:00
Junxian Huang	6ec429d588	RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity Currently, congestion control algorithm is statically configured in FW, and all QPs use the same algorithm(except UD which has a fixed configuration of DCQCN). This is not flexible enough. Support userspace configuring congestion control algorithm with QP granularity while creating QPs. If the algorithm is not specified in userspace, use the default one. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240301104845.1141083-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-03 15:01:33 +02:00
Eric Dumazet	e353ea9ce4	rtnetlink: prepare nla_put_iflink() to run under RCU We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00
Al Viro	aa23317d02	qibfs: fix dentry leak simple_recursive_removal() drops the pinning references to all positives in subtree. For the cases when its argument has been kept alive by the pinning alone that's exactly the right thing to do, but here the argument comes from dcache lookup, that needs to be balanced by explicit dput(). Fixes: `e41d237818` "qib_fs: switch to simple_recursive_removal()" Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-02-25 23:58:42 -05:00
Alexey Kodanev	7a7b7f575a	RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store() strnlen() may return 0 (e.g. for "\0\n" string), it's better to check the result of strnlen() before using 'len - 1' expression for the 'buf' array index. Detected using the static analysis tool - Svace. Fixes: `dc3b66a0ce` ("RDMA/rtrs-clt: Add a minimum latency multipath policy") Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Link: https://lore.kernel.org/r/20240221113204.147478-1-aleksei.kodanev@bell-sw.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-25 18:21:42 +02:00
Erick Archer	14b526f55b	RDMA/uverbs: Remove flexible arrays from struct _filter When a struct containing a flexible array is included in another struct, and there is a member after the struct-with-flex-array, there is a possibility of memory overlap. These cases must be audited [1]. See: struct inner { ... int flex[]; }; struct outer { ... struct inner header; int overlap; ... }; This is the scenario for all the "struct _filter" structures that are included in the following "struct ib_flow_spec_" structures: struct ib_flow_spec_eth struct ib_flow_spec_ib struct ib_flow_spec_ipv4 struct ib_flow_spec_ipv6 struct ib_flow_spec_tcp_udp struct ib_flow_spec_tunnel struct ib_flow_spec_esp struct ib_flow_spec_gre struct ib_flow_spec_mpls The pattern is like the one shown below: struct _filter { ... u8 real_sz[]; }; struct ib_flow_spec_* { ... struct _filter val; struct _filter mask; }; In this case, the trailing flexible array "real_sz" is never allocated and is only used to calculate the size of the structures. Here the use of the "offsetof" helper can be changed by the "sizeof" operator because the goal is to get the size of these structures. Therefore, the trailing flexible arrays can also be removed. However, due to the trailing padding that can be induced in structs it is possible that the: offsetof(struct _filter, real_sz) != sizeof(struct _filter) This situation happens with the "struct ib_flow_ipv6_filter" and to avoid it the "__packed" macro is used in this structure. But now, the "sizeof(struct ib_flow_ipv6_filter)" has changed. This is not a problem since this size is not used in the code. The situation now is that "sizeof(struct ib_flow_spec_ipv6)" has also changed (this struct contains the struct ib_flow_ipv6_filter). This is also not a problem since it is only used to set the size of the "union ib_flow_spec", which can store all the "ib_flow_spec_*" structures. Link: https://lore.kernel.org/r/20240217142913.4285-1-erick.archer@gmx.com Signed-off-by: Erick Archer <erick.archer@gmx.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-02-21 13:28:52 -04:00
Shifeng Li	7a8bccd8b2	RDMA/device: Fix a race between mad_client and cm_client init The mad_client will be initialized in enable_device_and_get(), while the devices_rwsem will be downgraded to a read semaphore. There is a window that leads to the failed initialization for cm_client, since it can not get matched mad port from ib_mad_port_list, and the matched mad port will be added to the list after that. mad_client \| cm_client ------------------\|-------------------------------------------------------- ib_register_device\| enable_device_and_get down_write(&devices_rwsem) xa_set_mark(&devices, DEVICE_REGISTERED) downgrade_write(&devices_rwsem) \| \|ib_cm_init \|ib_register_client(&cm_client) \|down_read(&devices_rwsem) \|xa_for_each_marked (&devices, DEVICE_REGISTERED) \|add_client_context \|cm_add_one \|ib_register_mad_agent \|ib_get_mad_port \|__ib_get_mad_port \|list_for_each_entry(entry, &ib_mad_port_list, port_list) \|return NULL \|up_read(&devices_rwsem) \| add_client_context\| ib_mad_init_device\| ib_mad_port_open \| list_add_tail(&port_priv->port_list, &ib_mad_port_list) up_read(&devices_rwsem) \| Fix it by using down_write(&devices_rwsem) in ib_register_client(). Fixes: `d0899892ed` ("RDMA/device: Provide APIs from the core code to help unregistration") Link: https://lore.kernel.org/r/20240203035313.98991-1-lishifeng@sangfor.com.cn Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-02-21 13:15:50 -04:00
Luoyouming	d20a7cf9f7	RDMA/hns: Fix mis-modifying default congestion control algorithm Commit `27c5fd271d` ("RDMA/hns: The UD mode can only be configured with DCQCN") adds a check of congest control alorithm for UD. But that patch causes a problem: hr_dev->caps.congest_type is global, used by all QPs, so modifying this field to DCQCN for UD QPs causes other QPs unable to use any other algorithm except DCQCN. Revert the modification in commit `27c5fd271d` ("RDMA/hns: The UD mode can only be configured with DCQCN"). Add a new field cong_type to struct hns_roce_qp and configure DCQCN for UD QPs. Fixes: `27c5fd271d` ("RDMA/hns: The UD mode can only be configured with DCQCN") Fixes: `f91696f2f0` ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240219061805.668170-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-19 09:50:31 +02:00
Arnd Bergmann	eb5c7465c3	RDMA/srpt: fix function pointer cast warnings clang-16 notices that srpt_qp_event() gets called through an incompatible pointer here: drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void ()(struct ib_event , struct srpt_rdma_ch )' to 'void ()(struct ib_event , void )' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1815 \| = (void()(struct ib_event , void*))srpt_qp_event; Change srpt_qp_event() to use the correct prototype and adjust the argument inside of it. Fixes: `a42d985bd5` ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240213100728.458348-1-arnd@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-14 11:15:54 +02:00
Kamal Heib	5ba4e6d586	RDMA/qedr: Fix qedr_create_user_qp error flow Avoid the following warning by making sure to free the allocated resources in case that qedr_init_user_queue() fail. -----------[ cut here ]----------- WARNING: CPU: 0 PID: 143192 at drivers/infiniband/core/rdma_core.c:874 uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] Modules linked in: tls target_core_user uio target_core_pscsi target_core_file target_core_iblock ib_srpt ib_srp scsi_transport_srp nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs 8021q garp mrp stp llc ext4 mbcache jbd2 opa_vnic ib_umad ib_ipoib sunrpc rdma_ucm ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm hfi1 intel_rapl_msr intel_rapl_common mgag200 qedr sb_edac drm_shmem_helper rdmavt x86_pkg_temp_thermal drm_kms_helper intel_powerclamp ib_uverbs coretemp i2c_algo_bit kvm_intel dell_wmi_descriptor ipmi_ssif sparse_keymap kvm ib_core rfkill syscopyarea sysfillrect video sysimgblt irqbypass ipmi_si ipmi_devintf fb_sys_fops rapl iTCO_wdt mxm_wmi iTCO_vendor_support intel_cstate pcspkr dcdbas intel_uncore ipmi_msghandler lpc_ich acpi_power_meter mei_me mei fuse drm xfs libcrc32c qede sd_mod ahci libahci t10_pi sg crct10dif_pclmul crc32_pclmul crc32c_intel qed libata tg3 ghash_clmulni_intel megaraid_sas crc8 wmi [last unloaded: ib_srpt] CPU: 0 PID: 143192 Comm: fi_rdm_tagged_p Kdump: loaded Not tainted 5.14.0-408.el9.x86_64 #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.14.0 01/25/2022 RIP: 0010:uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] Code: 5d 41 5c 41 5d 41 5e e9 0f 26 1b dd 48 89 df e8 67 6a ff ff 49 8b 86 10 01 00 00 48 85 c0 74 9c 4c 89 e7 e8 83 c0 cb dd eb 92 <0f> 0b eb be 0f 0b be 04 00 00 00 48 89 df e8 8e f5 ff ff e9 6d ff RSP: 0018:ffffb7c6cadfbc60 EFLAGS: 00010286 RAX: ffff8f0889ee3f60 RBX: ffff8f088c1a5200 RCX: 00000000802a0016 RDX: 00000000802a0017 RSI: 0000000000000001 RDI: ffff8f0880042600 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8f11fffd5000 R11: 0000000000039000 R12: ffff8f0d5b36cd80 R13: ffff8f088c1a5250 R14: ffff8f1206d91000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8f11d7c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000147069200e20 CR3: 00000001c7210002 CR4: 00000000001706f0 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? ib_uverbs_close+0x1f/0xb0 [ib_uverbs] ? uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] ? __warn+0x81/0x110 ? uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] ? report_bug+0x10a/0x140 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] ib_uverbs_close+0x1f/0xb0 [ib_uverbs] __fput+0x94/0x250 task_work_run+0x5c/0x90 do_exit+0x270/0x4a0 do_group_exit+0x2d/0x90 get_signal+0x87c/0x8c0 arch_do_signal_or_restart+0x25/0x100 ? ib_uverbs_ioctl+0xc2/0x110 [ib_uverbs] exit_to_user_mode_loop+0x9c/0x130 exit_to_user_mode_prepare+0xb6/0x100 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x69/0x90 ? do_syscall_64+0x69/0x90 ? common_interrupt+0x43/0xa0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x1470abe3ec6b Code: Unable to access opcode bytes at RIP 0x1470abe3ec41. RSP: 002b:00007fff13ce9108 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: fffffffffffffffc RBX: 00007fff13ce9218 RCX: 00001470abe3ec6b RDX: 00007fff13ce9200 RSI: 00000000c0181b01 RDI: 0000000000000004 RBP: 00007fff13ce91e0 R08: 0000558d9655da10 R09: 0000558d9655dd00 R10: 00007fff13ce95c0 R11: 0000000000000246 R12: 00007fff13ce9358 R13: 0000000000000013 R14: 0000558d9655db50 R15: 00007fff13ce9470 </TASK> --[ end trace 888a9b92e04c5c97 ]-- Fixes: `df15856132` ("RDMA/qedr: restructure functions that create/destroy QPs") Signed-off-by: Kamal Heib <kheib@redhat.com> Link: https://lore.kernel.org/r/20240208223628.2040841-1-kheib@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-12 13:49:04 +02:00
Bart Van Assche	fdfa083549	RDMA/srpt: Support specifying the srpt_service_guid parameter Make loading ib_srpt with this parameter set work. The current behavior is that setting that parameter while loading the ib_srpt kernel module triggers the following kernel crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 Call Trace: <TASK> parse_one+0x18c/0x1d0 parse_args+0xe1/0x230 load_module+0x8de/0xa60 init_module_from_file+0x8b/0xd0 idempotent_init_module+0x181/0x240 __x64_sys_finit_module+0x5a/0xb0 do_syscall_64+0x5f/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Cc: LiHonggang <honggangli@163.com> Reported-by: LiHonggang <honggangli@163.com> Fixes: `a42d985bd5` ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240205004207.17031-1-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-05 09:57:53 +02:00
Guoqing Jiang	aafe4cc509	RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user This one is not needed since commit `954afc5a8f` ("RDMA/rxe: Use members of generic struct in rxe_mr"). Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20240202124144.16033-1-guoqing.jiang@linux.dev Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 13:05:43 +02:00
William Kucharski	c21a8870c9	RDMA/srpt: Do not register event handler until srpt device is fully setup Upon rare occasions, KASAN reports a use-after-free Write in srpt_refresh_port(). This seems to be because an event handler is registered before the srpt device is fully setup and a race condition upon error may leave a partially setup event handler in place. Instead, only register the event handler after srpt device initialization is complete. Fixes: `a42d985bd5` ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: William Kucharski <william.kucharski@oracle.com> Link: https://lore.kernel.org/r/20240202091549.991784-2-william.kucharski@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:44:50 +02:00
Daniel Vacek	e6f57c6881	IB/hfi1: Fix sdma.h tx->num_descs off-by-one error Unfortunately the commit `fd8958efe877` introduced another error causing the `descs` array to overflow. This reults in further crashes easily reproducible by `sendmsg` system call. [ 1080.836473] general protection fault, probably for non-canonical address 0x400300015528b00a: 0000 [#1] PREEMPT SMP PTI [ 1080.869326] RIP: 0010:hfi1_ipoib_build_ib_tx_headers.constprop.0+0xe1/0x2b0 [hfi1] -- [ 1080.974535] Call Trace: [ 1080.976990] <TASK> [ 1081.021929] hfi1_ipoib_send_dma_common+0x7a/0x2e0 [hfi1] [ 1081.027364] hfi1_ipoib_send_dma_list+0x62/0x270 [hfi1] [ 1081.032633] hfi1_ipoib_send+0x112/0x300 [hfi1] [ 1081.042001] ipoib_start_xmit+0x2a9/0x2d0 [ib_ipoib] [ 1081.046978] dev_hard_start_xmit+0xc4/0x210 -- [ 1081.148347] __sys_sendmsg+0x59/0xa0 crash> ipoib_txreq 0xffff9cfeba229f00 struct ipoib_txreq { txreq = { list = { next = 0xffff9cfeba229f00, prev = 0xffff9cfeba229f00 }, descp = 0xffff9cfeba229f40, coalesce_buf = 0x0, wait = 0xffff9cfea4e69a48, complete = 0xffffffffc0fe0760 <hfi1_ipoib_sdma_complete>, packet_len = 0x46d, tlen = 0x0, num_desc = 0x0, desc_limit = 0x6, next_descq_idx = 0x45c, coalesce_idx = 0x0, flags = 0x0, descs = {{ qw = {0x8024000120dffb00, 0x4} # SDMA_DESC0_FIRST_DESC_FLAG (bit 63) }, { qw = { 0x3800014231b108, 0x4} }, { qw = { 0x310000e4ee0fcf0, 0x8} }, { qw = { 0x3000012e9f8000, 0x8} }, { qw = { 0x59000dfb9d0000, 0x8} }, { qw = { 0x78000e02e40000, 0x8} }} }, sdma_hdr = 0x400300015528b000, <<< invalid pointer in the tx request structure sdma_status = 0x0, SDMA_DESC0_LAST_DESC_FLAG (bit 62) complete = 0x0, priv = 0x0, txq = 0xffff9cfea4e69880, skb = 0xffff9d099809f400 } If an SDMA send consists of exactly 6 descriptors and requires dword padding (in the 7th descriptor), the sdma_txreq descriptor array is not properly expanded and the packet will overflow into the container structure. This results in a panic when the send completion runs. The exact panic varies depending on what elements of the container structure get corrupted. The fix is to use the correct expression in _pad_sdma_tx_descs() to test the need to expand the descriptor array. With this patch the crashes are no longer reproducible and the machine is stable. Fixes: `fd8958efe8` ("IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors") Cc: stable@vger.kernel.org Reported-by: Mats Kronberg <kronberg@nsc.liu.se> Tested-by: Mats Kronberg <kronberg@nsc.liu.se> Signed-off-by: Daniel Vacek <neelx@redhat.com> Link: https://lore.kernel.org/r/20240201081009.1109442-1-neelx@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:40:06 +02:00
Mustafa Ismail	926e8ea4b8	RDMA/irdma: Remove duplicate assignment Remove the unneeded assignment of the qp_num which is already set in irdma_create_qp(). Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20240131233953.400483-1-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:37:58 +02:00
Mustafa Ismail	630bdb6f28	RDMA/irdma: Add AE for too many RNRS Add IRDMA_AE_LLP_TOO_MANY_RNRS to the list of AE's processed as an abnormal asyncronous event. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@gmail.com> Link: https://lore.kernel.org/r/20240131233849.400285-5-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:36:26 +02:00
Mustafa Ismail	666047f3ec	RDMA/irdma: Set the CQ read threshold for GEN 1 The CQ shadow read threshold is currently not set for GEN 2. This could cause an invalid CQ overflow condition, so remove the GEN check that exclused GEN 1. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20240131233849.400285-4-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:36:26 +02:00
Shiraz Saleem	ee107186bc	RDMA/irdma: Validate max_send_wr and max_recv_wr Validate that max_send_wr and max_recv_wr is within the supported range. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Change-Id: I2fc8b10292b641fddd20b36986a9dae90a93f4be Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20240131233849.400285-3-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:36:26 +02:00
Mike Marciniszyn	bd97cea7b1	RDMA/irdma: Fix KASAN issue with tasklet KASAN testing revealed the following issue assocated with freeing an IRQ. [50006.466686] Call Trace: [50006.466691] <IRQ> [50006.489538] dump_stack+0x5c/0x80 [50006.493475] print_address_description.constprop.6+0x1a/0x150 [50006.499872] ? irdma_sc_process_ceq+0x483/0x790 [irdma] [50006.505742] ? irdma_sc_process_ceq+0x483/0x790 [irdma] [50006.511644] kasan_report.cold.11+0x7f/0x118 [50006.516572] ? irdma_sc_process_ceq+0x483/0x790 [irdma] [50006.522473] irdma_sc_process_ceq+0x483/0x790 [irdma] [50006.528232] irdma_process_ceq+0xb2/0x400 [irdma] [50006.533601] ? irdma_hw_flush_wqes_callback+0x370/0x370 [irdma] [50006.540298] irdma_ceq_dpc+0x44/0x100 [irdma] [50006.545306] tasklet_action_common.isra.14+0x148/0x2c0 [50006.551096] __do_softirq+0x1d0/0xaf8 [50006.555396] irq_exit_rcu+0x219/0x260 [50006.559670] irq_exit+0xa/0x20 [50006.563320] smp_apic_timer_interrupt+0x1bf/0x690 [50006.568645] apic_timer_interrupt+0xf/0x20 [50006.573341] </IRQ> The issue is that a tasklet could be pending on another core racing the delete of the irq. Fix by insuring any scheduled tasklet is killed after deleting the irq. Fixes: `44d9e52977` ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Link: https://lore.kernel.org/r/20240131233849.400285-2-sindhu.devale@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-02-04 11:36:25 +02:00
Yonatan Nachum	809c9c3bd6	RDMA/efa: Limit EQs to available MSI-X vectors When creating EQs we take into consideration the max number of EQs the device reported it can support and the number of available CPUs. There are situations where the number of EQs the device reported it can support and the PCI configuration of MSI-X is different, take it in account as well when creating EQs. Also request at least 1 MSI-X vector for the management queue and allow the kernel to return a number of vectors in a range between 1 and the max supported MSI-X vectors according to the PCI config. Reviewed-by: Michael Margolin <mrgolin@amazon.com> Reviewed-by: Yonatan Goldhirsh <ygold@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://lore.kernel.org/r/20240131093403.18624-1-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-31 14:32:26 +02:00
Yishai Hadas	be551ee157	RDMA/mlx5: Relax DEVX access upon modify commands Relax DEVX access upon modify commands to be UVERBS_ACCESS_READ. The kernel doesn't need to protect what firmware protects, or what causes no damage to anyone but the user. As firmware needs to protect itself from parallel access to the same object, don't block parallel modify/query commands on the same object in the kernel side. This change will allow user space application to run parallel updates to different entries in the same bulk object. Tested-by: Tamar Mashiah <tmashiah@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/7407d5ed35dc427c1097699e12b49c01e1073406.1706433934.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-31 11:15:39 +02:00
Mark Zhang	43fdbd1402	IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported debugfs entries for RRoCE general CC parameters must be exposed only when they are supported, otherwise when accessing them there may be a syndrome error in kernel log, for example: $ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument $ dmesg mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22) Fixes: `66fb1d5df6` ("IB/mlx5: Extend debug control for CC parameters") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/e7ade70bad52b7468bdb1de4d41d5fad70c8b71c.1706433934.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-31 11:15:29 +02:00
Leon Romanovsky	4d5e86a566	RDMA/mlx5: Fix fortify source warning while accessing Eth segment ------------[ cut here ]------------ memcpy: detected field-spanning write (size 56) of single field "eseg->inline_hdr.start" at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 (size 2) WARNING: CPU: 0 PID: 293779 at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] Modules linked in: 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) pci_hyperv_intf mlxdevm(OE) mlx_compat(OE) tls mlxfw(OE) psample nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink mst_pciconf(OE) knem(OE) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd irqbypass cuse nfsv3 nfs fscache netfs xfrm_user xfrm_algo ipmi_devintf ipmi_msghandler binfmt_misc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 snd_pcsp aesni_intel crypto_simd cryptd snd_pcm snd_timer joydev snd soundcore input_leds serio_raw evbug nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc drm efi_pstore ip_tables x_tables autofs4 psmouse virtio_net net_failover failover floppy [last unloaded: mlx_compat(OE)] CPU: 0 PID: 293779 Comm: ssh Tainted: G OE 6.2.0-32-generic #32~22.04.1-Ubuntu Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] Code: 0c 01 00 a8 01 75 25 48 8b 75 a0 b9 02 00 00 00 48 c7 c2 10 5b fd c0 48 c7 c7 80 5b fd c0 c6 05 57 0c 03 00 01 e8 95 4d 93 da <0f> 0b 44 8b 4d b0 4c 8b 45 c8 48 8b 4d c0 e9 49 fb ff ff 41 0f b7 RSP: 0018:ffffb5b48478b570 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb5b48478b628 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5b48478b5e8 R13: ffff963a3c609b5e R14: ffff9639c3fbd800 R15: ffffb5b480475a80 FS: 00007fc03b444c80(0000) GS:ffff963a3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556f46bdf000 CR3: 0000000006ac6003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x72/0x90 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] ? __warn+0x8d/0x160 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] ? report_bug+0x1bb/0x1d0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x19/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] mlx5_ib_post_send_nodrain+0xb/0x20 [mlx5_ib] ipoib_send+0x2ec/0x770 [ib_ipoib] ipoib_start_xmit+0x5a0/0x770 [ib_ipoib] dev_hard_start_xmit+0x8e/0x1e0 ? validate_xmit_skb_list+0x4d/0x80 sch_direct_xmit+0x116/0x3a0 __dev_xmit_skb+0x1fd/0x580 __dev_queue_xmit+0x284/0x6b0 ? _raw_spin_unlock_irq+0xe/0x50 ? __flush_work.isra.0+0x20d/0x370 ? push_pseudo_header+0x17/0x40 [ib_ipoib] neigh_connected_output+0xcd/0x110 ip_finish_output2+0x179/0x480 ? __smp_call_single_queue+0x61/0xa0 __ip_finish_output+0xc3/0x190 ip_finish_output+0x2e/0xf0 ip_output+0x78/0x110 ? __pfx_ip_finish_output+0x10/0x10 ip_local_out+0x64/0x70 __ip_queue_xmit+0x18a/0x460 ip_queue_xmit+0x15/0x30 __tcp_transmit_skb+0x914/0x9c0 tcp_write_xmit+0x334/0x8d0 tcp_push_one+0x3c/0x60 tcp_sendmsg_locked+0x2e1/0xac0 tcp_sendmsg+0x2d/0x50 inet_sendmsg+0x43/0x90 sock_sendmsg+0x68/0x80 sock_write_iter+0x93/0x100 vfs_write+0x326/0x3c0 ksys_write+0xbd/0xf0 ? do_syscall_64+0x69/0x90 __x64_sys_write+0x19/0x30 do_syscall_64+0x59/0x90 ? do_user_addr_fault+0x1d0/0x640 ? exit_to_user_mode_prepare+0x3b/0xd0 ? irqentry_exit_to_user_mode+0x9/0x20 ? irqentry_exit+0x43/0x50 ? exc_page_fault+0x92/0x1b0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fc03ad14a37 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdf8697fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000008024 RCX: 00007fc03ad14a37 RDX: 0000000000008024 RSI: 0000556f46bd8270 RDI: 0000000000000003 RBP: 0000556f46bb1800 R08: 0000000000007fe3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 0000556f46bc66b0 R14: 000000000000000a R15: 0000556f46bb2f50 </TASK> ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/r/8228ad34bd1a25047586270f7b1fb4ddcd046282.1706433934.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2024-01-31 11:15:17 +02:00
Alexey Dobriyan	a400073ce3	RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype mlx5_ib_copy_pas() doesn't exist anymore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://lore.kernel.org/r/a2cb861e-d11e-4567-8a73-73763d1dc199@p183 Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:06:57 +02:00
Alexey Dobriyan	9bd5653f76	RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype c4iw_ep_redirect() doesn't exist. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://lore.kernel.org/r/a67881e7-469b-43d9-9973-ad7579eb2c4b@p183 Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:05:30 +02:00
Kalesh AP	80dde187f7	RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq Before populating the response, driver has to check the status of HWRM command. Fixes: `37cb11acf1` ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1705985677-15551-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:04:39 +02:00
Kalesh AP	3687b450c5	RDMA/bnxt_re: Return error for SRQ resize SRQ resize is not supported in the driver. But driver is not returning error from bnxt_re_modify_srq() for SRQ resize. Fixes: `37cb11acf1` ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1705985677-15551-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:04:39 +02:00
Kalesh AP	8eaca6b599	RDMA/bnxt_re: Fix unconditional fence for newer adapters Older adapters required an unconditional fence for non-wire memory operations. Newer adapters doesn't require this and therefore, disabling the unconditional fence. Fixes: `1801d87b35` ("RDMA/bnxt_re: Support new 5760X P7 devices") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1705985677-15551-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:04:39 +02:00
Kalesh AP	8fcbf0a55f	RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config After the cited commit, there is no possibility that this check can return true. Remove it. Fixes: `a43c26fa2e` ("RDMA/bnxt_re: Remove the sriov config callback") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1705985677-15551-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:04:39 +02:00
Kalesh AP	282fd66e2e	RDMA/bnxt_re: Avoid creating fence MR for newer adapters Limit the usage of fence MR to adapters older than Gen P5 products. Fixes: `1801d87b35` ("RDMA/bnxt_re: Support new 5760X P7 devices") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1705985677-15551-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:04:39 +02:00
Konstantin Taranov	2a31c5a7e0	RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function Use a helper function to install callbacks to CQs. This patch removes code repetition. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1705965781-3235-4-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:03:14 +02:00
Konstantin Taranov	3b73eb3a4a	RDMA/mana_ib: Introduce mana_ib_get_netdev helper function Use a helper function to access netdevs using a port number. This patch removes code repetitions as well as removes the need to explicitly use gdma_dev, which was error-prone. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1705965781-3235-3-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:03:07 +02:00
Konstantin Taranov	71c8cbfcdc	RDMA/mana_ib: Introduce mdev_to_gc helper function Use a helper function to access gdma_context from mana_ib_dev. This patch removes code repetitions as well as removes the need to explicitly use gdma_dev, which was error-prone. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1705965781-3235-2-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 12:03:00 +02:00
Zhipeng Lu	809aa64ebf	IB/hfi1: Fix a memleak in init_credit_return When dma_alloc_coherent fails to allocate dd->cr_base[i].va, init_credit_return should deallocate dd->cr_base and dd->cr_base[i] that allocated before. Or those resources would be never freed and a memleak is triggered. Fixes: `7724105686` ("IB/hfi1: add driver files") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Link: https://lore.kernel.org/r/20240112085523.3731720-1-alexious@zju.edu.cn Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:56:27 +02:00
Yunsheng Lin	c00743cbf2	RDMA/hns: Simplify 'struct hns_roce_hem' allocation 'struct hns_roce_hem' is used to refer to the last level of dma buffer managed by the hw, pointed by a single BA(base address) in the previous level of BT(base table), so the dma buffer in 'struct hns_roce_hem' must be contiguous. Right now the size of dma buffer in 'struct hns_roce_hem' is decided by mhop->buf_chunk_size in get_hem_table_config(), which ensure the mhop->buf_chunk_size is power of two of PAGE_SIZE, so there will be only one contiguous dma buffer allocated in hns_roce_alloc_hem(), which means hem->chunk_list and chunk->mem for linking multi dma buffers is unnecessary. This patch removes the hem->chunk_list and chunk->mem and other related macro and function accordingly. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240113085935.2838701-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:54:38 +02:00
Chengchang Tang	2eb999b3d4	RDMA/hns: Support adaptive PBL hopnum In the current implementation, a fixed addressing level is used for PBL. But in fact, the necessary addressing level is related to page size and the size of MR. This patch calculates the addressing level according to page size and the size of MR, and uses the addressing level to configure the PBL. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240113085935.2838701-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:54:38 +02:00
Chengchang Tang	0ff6c9779a	RDMA/hns: Support flexible umem page size In the current implementation, a fixed page size is used to configure the umem PBL, which is not flexible enough and is not conducive to the performance of the HW. Find a best page size to get better performance. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240113085935.2838701-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:54:38 +02:00
Chengchang Tang	6afc859518	RDMA/hns: Alloc MTR memory before alloc_mtt() MTR memory allocation do not depend on allocation of mtt. This patch moves the allocation of mtr before mtt in preparation for the following optimization. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240113085935.2838701-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:54:38 +02:00
Chengchang Tang	4f5731b1fb	RDMA/hns: Refactor mtr_init_buf_cfg() page_shift and page_cnt is only used in mtr_map_bufs(). And these parameter could be calculated indepedently. Strip the computation of page_shift and page_cnt from mtr_init_buf_cfg(), reducing the number of parameters of it. This helps reducing coupling between mtr_init_buf_cfg() and mtr_map_bufs(). Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240113085935.2838701-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:54:38 +02:00
Chengchang Tang	a4ca341080	RDMA/hns: Refactor mtr find hns_roce_mtr_find() is a collection of multiple functions, and the return value is also difficult to understand, which is not conducive to modification and maintenance. Separate the function of obtaining MTR root BA from this function. And some adjustments has been made to improve readability. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240113085935.2838701-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:54:38 +02:00
Christian Heusel	64854534ff	RDMA/ipoib: Print symbolic error name instead of error code Utilize the %pe print specifier to get the symbolic error name as a string (i.e "-ENOMEM") in the log message instead of the error code to increase its readability. This change was suggested in https://lore.kernel.org/all/92972476-0b1f-4d0a-9951-af3fc8bc6e65@suswa.mountain/ Signed-off-by: Christian Heusel <christian@heusel.eu> Link: https://lore.kernel.org/r/20240111141311.987098-1-christian@heusel.eu Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:50:58 +02:00
Li Zhijian	190a7affee	RDMA/rxe: Remove rxe_info from rxe_set_mtu commit `9ac01f434a` ("RDMA/rxe: Extend dbg log messages to err and info") newly added this info. But it did only show null device when the rdma_rxe is being loaded because dev_name(rxe->ib_dev->dev) has not yet been assigned at the moment: "(null): rxe_set_mtu: Set mtu to 1024" Remove it to silent this message, check the mtu from it backend link instead if needed. CC: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240109083253.3629967-2-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:49:50 +02:00
Li Zhijian	6482718086	RDMA/rxe: Improve newline in printing messages Previously rxe_{dbg,info,err}() macros are appended built-in newline, but some users will add redundant newline sometimes. So remove the built-in newline for these macros. In terms of rxe_{dbg,info,err}_xxx() macros, because they don't have built-in newline, append newline when using them. CC: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240109083253.3629967-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:49:50 +02:00
Randy Dunlap	4c09f7405e	IB/hfi1: fix spellos and kernel-doc Fix spelling mistakes as reported by codespell. Fix kernel-doc warnings. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/20240111042754.17530-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-25 11:48:50 +02:00
Linus Torvalds	bf9ca811bb	RDMA v6.8 merge window Small cycle, with some typical driver updates - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and bnxt_re - Many small siw cleanups without an overeaching theme - Debugfs stats for hns - Fix a TX queue timeout in IPoIB and missed locking of the mcast list - Support more features of P7 devices in bnxt_re including a new work submission protocol - CQ interrupts for MANA - netlink stats for erdma - EFA multipath PCI support - Fix Incorrect MR invalidation in iser -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZaCQDQAKCRCFwuHvBreF YemEAQCTGebv0k2hbocDOmKml5awt8j9aDJX3aO7Zpfi0AYUtwEAzk+kgN4yAo+B Vinvpu171zry+QvmGJsXv2mtZkXH6QY= =HT3p -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Small cycle, with some typical driver updates: - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and bnxt_re - Many small siw cleanups without an overeaching theme - Debugfs stats for hns - Fix a TX queue timeout in IPoIB and missed locking of the mcast list - Support more features of P7 devices in bnxt_re including a new work submission protocol - CQ interrupts for MANA - netlink stats for erdma - EFA multipath PCI support - Fix Incorrect MR invalidation in iser" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/bnxt_re: Fix error code in bnxt_re_create_cq() RDMA/efa: Add EFA query MR support IB/iser: Prevent invalidating wrong MR RDMA/erdma: Add hardware statistics support RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos RDMA/mana_ib: Add CQ interrupt support for RAW QP RDMA/mana_ib: query device capabilities RDMA/mana_ib: register RDMA device with GDMA RDMA/bnxt_re: Fix the sparse warnings RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applications RDMA/bnxt_re: Share a page to expose per CQ info with userspace RDMA/bnxt_re: Add UAPI to share a page with user space IB/ipoib: Fix mcast list locking RDMA/mlx5: Expose register c0 for RDMA device net/mlx5: E-Switch, expose eswitch manager vport net/mlx5: Manage ICM type of SW encap RDMA/mlx5: Support handling of SW encap ICM area net/mlx5: Introduce indirect-sw-encap ICM properties RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters ...	2024-01-12 13:52:21 -08:00
Linus Torvalds	ab5f3fcb7c	arm64 updates for 6.8 * for-next/cpufeature - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde Thunder-X machines. - Avoid mapping KPTI trampoline when it is not required. - Make CPU capability API more robust during early initialisation. * for-next/early-idreg-overrides - Remove dependencies on core kernel helpers from the early command-line parsing logic in preparation for moving this code before the kernel is mapped. * for-next/fpsimd - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd code sequences in the kernel with pre-emption enabled. * for-next/kbuild - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y. - Makefile cleanups. * for-next/lpa2-prep - Preparatory work for enabling the 'LPA2' extension, which will introduce 52-bit virtual and physical addressing even with 4KiB pages (including for KVM guests). * for-next/misc - Remove dead code and fix a typo. * for-next/mm - Pass NUMA node information for IRQ stack allocations. * for-next/perf - Add perf support for the Synopsys DesignWare PCIe PMU. - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced in Armv8.8. - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver. - Minor PMU driver fixes and optimisations. * for-next/rip-vpipt - Remove what support we had for the obsolete VPIPT I-cache policy. * for-next/selftests - Improvements to the SVE and SME selftests. * for-next/stacktrace - Refactor kernel unwind logic so that it can used by BPF unwinding and, eventually, reliable backtracing. * for-next/sysregs - Update a bunch of register definitions based on the latest XML drop from Arm. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmWWvKYQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNIiTB/9agZBkEhZjP2sNDGyE4UFwawweWHkt2r8h WyvdwP91Z/AIsYSsGYu36J0l4pOnMKp/i6t+rt031SK4j+Q8hJYhSfDt3RvVbc0/ Pz9D18V6cLrfq+Yxycqq9ufVdjs+m+CQ5WeLaRGmNIyEzJ/Jv/qrAN+2r603EeLP nq08qMZhDIQd2ZzbigCnGaNrTsVSafFfBFv1GsgDvnMZAjs1G6457A6zu+NatNUc +TMSG+3EawutHZZ2noXl0Ra7VOfIbVZFiUssxRPenKQByHHHR+QB2c/O1blri+dm XLMutvqO2/WvYGIfXO5koqZqvpVeR3zXxPwmGi5hQBsmOjtXzKd+ =U4mo -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "CPU features: - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde Thunder-X machines - Avoid mapping KPTI trampoline when it is not required - Make CPU capability API more robust during early initialisation Early idreg overrides: - Remove dependencies on core kernel helpers from the early command-line parsing logic in preparation for moving this code before the kernel is mapped FPsimd: - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd code sequences in the kernel with pre-emption enabled KBuild: - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y - Makefile cleanups LPA2 prep: - Preparatory work for enabling the 'LPA2' extension, which will introduce 52-bit virtual and physical addressing even with 4KiB pages (including for KVM guests). Misc: - Remove dead code and fix a typo MM: - Pass NUMA node information for IRQ stack allocations Perf: - Add perf support for the Synopsys DesignWare PCIe PMU - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced in Armv8.8 - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver. - Minor PMU driver fixes and optimisations RIP VPIPT: - Remove what support we had for the obsolete VPIPT I-cache policy Selftests: - Improvements to the SVE and SME selftests Stacktrace: - Refactor kernel unwind logic so that it can used by BPF unwinding and, eventually, reliable backtracing Sysregs: - Update a bunch of register definitions based on the latest XML drop from Arm" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits) kselftest/arm64: Don't probe the current VL for unsupported vector types efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad arm64: properly install vmlinuz.efi arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64: memory: remove duplicated include arm: perf: Fix ARCH=arm build with GCC arm64: Align boot cpucap handling with system cpucap handling arm64: Cleanup system cpucap handling MAINTAINERS: add maintainers for DesignWare PCIe PMU driver drivers/perf: add DesignWare PCIe PMU driver PCI: Move pci_clear_and_set_dword() helper to PCI header PCI: Add Alibaba Vendor ID to linux/pci_ids.h docs: perf: Add description for Synopsys DesignWare PCIe PMU driver arm64: irq: set the correct node for shadow call stack Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD arm64: fpsimd: Preserve/restore kernel mode NEON at context switch ...	2024-01-08 16:32:09 -08:00
Linus Torvalds	c604110e66	vfs-6.8.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUxRQAKCRCRxhvAZXjc ov/QAQDzvge3oQ9MEymmOiyzzcF+HhAXBr+9oEsYJjFc1p0TsgEA61gXjZo7F1jY KBqd6znOZCR+Waj0kIVJRAo/ISRBqQc= =0bRl -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Add Jan Kara as VFS reviewer - Show correct device and inode numbers in proc/<pid>/maps for vma files on stacked filesystems. This is now easily doable thanks to the backing file work from the last cycles. This comes with selftests Cleanups: - Remove a redundant might_sleep() from wait_on_inode() - Initialize pointer with NULL, not 0 - Clarify comment on access_override_creds() - Rework and simplify eventfd_signal() and eventfd_signal_mask() helpers - Process aio completions in batches to avoid needless wakeups - Completely decouple struct mnt_idmap from namespaces. We now only keep the actual idmapping around and don't stash references to namespaces - Reformat maintainer entries to indicate that a given subsystem belongs to fs/ - Simplify fput() for files that were never opened - Get rid of various pointless file helpers - Rename various file helpers - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from last cycle - Make relatime_need_update() return bool - Use GFP_KERNEL instead of GFP_USER when allocating superblocks - Replace deprecated ida_simple_() calls with their current ida_() counterparts Fixes: - Fix comments on user namespace id mapping helpers. They aren't kernel doc comments so they shouldn't be using /** - s/Retuns/Returns/g in various places - Add missing parameter documentation on can_move_mount_beneath() - Rename i_mapping->private_data to i_mapping->i_private_data - Fix a false-positive lockdep warning in pipe_write() for watch queues - Improve __fget_files_rcu() code generation to improve performance - Only notify writer that pipe resizing has finished after setting pipe->max_usage otherwise writers are never notified that the pipe has been resized and hang - Fix some kernel docs in hfsplus - s/passs/pass/g in various places - Fix kernel docs in ntfs - Fix kcalloc() arguments order reported by gcc 14 - Fix uninitialized value in reiserfs" * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) reiserfs: fix uninit-value in comp_keys watch_queue: fix kcalloc() arguments order ntfs: dir.c: fix kernel-doc function parameter warnings fs: fix doc comment typo fs tree wide selftests/overlayfs: verify device and inode numbers in /proc/pid/maps fs/proc: show correct device and inode numbers in /proc/pid/maps eventfd: Remove usage of the deprecated ida_simple_xx() API fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation fs/hfsplus: wrapper.c: fix kernel-doc warnings fs: add Jan Kara as reviewer fs/inode: Make relatime_need_update return bool pipe: wakeup wr_wait after setting max_usage file: remove __receive_fd() file: stop exposing receive_fd_user() fs: replace f_rcuhead with f_task_work file: remove pointless wrapper file: s/close_fd_get_file()/file_close_fd()/g Improve __fget_files_rcu() code generation (and thus __fget_light()) file: massage cleanup of files that failed to open fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() ...	2024-01-08 10:26:08 -08:00
Dan Carpenter	d24b923f1d	RDMA/bnxt_re: Fix error code in bnxt_re_create_cq() Return -ENOMEM if get_zeroed_page() fails. Don't return success. Fixes: `e275919d96` ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace") Link: https://lore.kernel.org/r/d714306e-b7d7-4e89-b973-a9ff0f260c78@moroto.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-08 11:41:49 +02:00
Michael Margolin	2307157c85	RDMA/efa: Add EFA query MR support Add EFA driver uapi definitions and register a new query MR method that currently returns the physical interconnects the device is using to reach the MR. Update admin definitions and efa-abi accordingly. Reviewed-by: Anas Mousa <anasmous@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240104095155.10676-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-01-07 12:02:27 +02:00
Sergey Gorenko	2f1888281e	IB/iser: Prevent invalidating wrong MR The iser_reg_resources structure has two pointers to MR but only one mr_valid field. The implementation assumes that we use only sig_mr when pi_enable is true. Otherwise, we use only mr. However, it is only sometimes correct. Read commands without protection information occur even when pi_enble is true. For example, the following SCSI commands have a Data-In buffer but never have protection information: READ CAPACITY (16), INQUIRY, MODE SENSE(6), MAINTENANCE IN. So, we use sig_mr for some SCSI commands and mr for the other SCSI commands. In most cases, it works fine because the remote invalidation is applied. However, there are two cases when the remote invalidation is not applicable. 1. Small write commands when all data is sent as an immediate. 2. The target does not support the remote invalidation feature. The lazy invalidation is used if the remote invalidation is impossible. Since, at the lazy invalidation, we always invalidate the MR we want to use, the wrong MR may be invalidated. To fix the issue, we need a field per MR that indicates the MR needs invalidation. Since the ib_mr structure already has such a field, let's use ib_mr.need_inval instead of iser_reg_resources.mr_valid. Fixes: `b76a439982` ("IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover") Link: https://lore.kernel.org/r/20231219072311.40989-1-sergeygo@nvidia.com Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-01-04 16:37:03 -04:00
Cheng Xu	63a43a675c	RDMA/erdma: Add hardware statistics support First, we add a new command to query hardware statistics, and then implement two functions: ib_device_ops.alloc_hw_port_stats and ib_device_ops.get_hw_stats to allow rdma tool can get the statistics of erdma device. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20231227084800.99091-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-30 17:23:17 +02:00
Cheng Xu	68cf9d82f7	RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests Hardware response, such as the result of query statistics, may be too long to be directly accommodated within the CQE structure. To address this, we introduce a DMA pool to hold the hardware's responses of CMDQ requests. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20231227084800.99091-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-30 17:23:17 +02:00
Randy Dunlap	d42fafb895	IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos Drop one kernel-doc comment to prevent a warning: iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device' and spell 2 words correctly (buffer and deferred). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/20231222234623.25231-1-rdunlap@infradead.org Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-26 10:40:46 +02:00
Long Li	c15d7802a4	RDMA/mana_ib: Add CQ interrupt support for RAW QP At probing time, the MANA core code allocates EQs for supporting interrupts on Ethernet queues. The same interrupt mechanisum is used by RAW QP. Use the same EQs for delivering interrupts on the CQ for the RAW QP. Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1702692255-23640-4-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-20 10:29:03 +02:00
Long Li	2c20e20b22	RDMA/mana_ib: query device capabilities With RDMA device registered, use it to query on hardware capabilities and cache this information for future query requests to the driver. Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1702692255-23640-3-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-20 10:25:40 +02:00
Long Li	a7f0636d22	RDMA/mana_ib: register RDMA device with GDMA Software client needs to register with the RDMA management interface on the SoC to access more features, including querying device capabilities and RC queue pair. Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1702692255-23640-2-git-send-email-longli@linuxonhyperv.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-20 10:25:40 +02:00
Selvin Xavier	82a8903a9f	RDMA/bnxt_re: Fix the sparse warnings Fix the following warnings reported drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: warning: invalid assignment: \|= drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: left side has type restricted __le16 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:909:27: right side has type unsigned long ... drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: warning: invalid assignment: \|= drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: left side has type restricted __le64 drivers/infiniband/hw/bnxt_re/qplib_fp.c:1620:44: right side has type unsigned long long Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312200537.HoNqPL5L-lkp@intel.com/ Fixes: `07f830ae49` ("RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1703046717-8914-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-20 10:10:56 +02:00
Selvin Xavier	9248f363d0	RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applications User Doorbell page indexes start at an offset for GenP7 adapters. Fix the offset that will be used for user doorbell page indexes. Fixes: `a62d685814` ("RDMA/bnxt_re: Update the BAR offsets") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1702987900-5363-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-20 09:45:45 +02:00
Selvin Xavier	e275919d96	RDMA/bnxt_re: Share a page to expose per CQ info with userspace Gen P7 adapters needs to share a toggle bits information received in kernel driver with the user space. User space needs this info during the request notify call back to arm the CQ. User space application can get this page using the UAPI routines. Library will mmap this page and get the toggle bits to be used in the next ARM Doorbell. Uses a hash list to map the CQ structure from the CQ ID. CQ structure is retrieved from the hash list while the library calls the UAPI routine to get the toggle page mapping. Currently the full page is mapped per CQ. This can be optimized to enable multiple CQs from the same application share the same page and different offsets in the page. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1702535484-26844-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-17 15:35:58 +02:00
Selvin Xavier	9b0a7a2cb8	RDMA/bnxt_re: Add UAPI to share a page with user space Gen P7 adapters require to share a toggle value for CQ and SRQ. This is received by the driver as part of interrupt notifications and needs to be shared with the user space. Add a new UAPI infrastructure to get the shared page for CQ and SRQ. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1702535484-26844-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-17 15:35:58 +02:00
Shuai Xue	ad6534c626	PCI: Add Alibaba Vendor ID to linux/pci_ids.h The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma") and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the Vendor ID to linux/pci_ids.h so that it can shared by several drivers later. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/20231208025652.87192-3-xueshuai@linux.alibaba.com Signed-off-by: Will Deacon <will@kernel.org>	2023-12-13 13:35:41 +00:00
Daniel Vacek	4f973e211b	IB/ipoib: Fix mcast list locking Releasing the `priv->lock` while iterating the `priv->multicast_list` in `ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to remove the items while in the middle of iteration. If the mcast is removed while the lock was dropped, the for loop spins forever resulting in a hard lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel): Task A (kworker/u72:2 below) \| Task B (kworker/u72:0 below) -----------------------------------+----------------------------------- ipoib_mcast_join_task(work) \| ipoib_ib_dev_flush_light(work) spin_lock_irq(&priv->lock) \| __ipoib_ib_dev_flush(priv, ...) list_for_each_entry(mcast, \| ipoib_mcast_dev_flush(dev = priv->dev) &priv->multicast_list, list) \| ipoib_mcast_join(dev, mcast) \| spin_unlock_irq(&priv->lock) \| \| spin_lock_irqsave(&priv->lock, flags) \| list_for_each_entry_safe(mcast, tmcast, \| &priv->multicast_list, list) \| list_del(&mcast->list); \| list_add_tail(&mcast->list, &remove_list) \| spin_unlock_irqrestore(&priv->lock, flags) spin_lock_irq(&priv->lock) \| \| ipoib_mcast_remove_list(&remove_list) (Here, `mcast` is no longer on the \| list_for_each_entry_safe(mcast, tmcast, `priv->multicast_list` and we keep \| remove_list, list) spinning on the `remove_list` of \| >>> wait_for_completion(&mcast->done) the other thread which is blocked \| and the list is still valid on \| it's stack.) Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent eventual sleeps. Unfortunately we could not reproduce the lockup and confirm this fix but based on the code review I think this fix should address such lockups. crash> bc 31 PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2" -- [exception RIP: ipoib_mcast_join_task+0x1b1] RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 work (&priv->mcast_task{,.work}) RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 &mcast->list RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 mcast R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 dev priv (&priv->lock) &priv->multicast_list (aka head) ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib] #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f199a8c7e68 ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 (empty) crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000 mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>, mcast_mutex.owner.counter = 0xff1c69998efec000 crash> b 8 PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0" -- #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib] #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib] #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib] #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f1980153e68 ff646f1980153e68: ff1c6a1a04dc83f0 <<< work = &priv->flush_light crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000 flush_light.func = 0xffffffffc0943820 <ipoib_ib_dev_flush_light>, broadcast = 0x0, The mcast(s) on the `remove_list` (the remaining part of the ex `priv->multicast_list`): crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 \| paste - - ff1c6a192bd0c200 done.done = 0x0, ff1c6a192d60ac00 done.done = 0x0, Reported-by: Yuya Fujita-bishamonten <fj-lsoft-rh-driver@dl.jp.fujitsu.com> Signed-off-by: Daniel Vacek <neelx@redhat.com> Link: https://lore.kernel.org/all/20231212080746.1528802-1-neelx@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-12 10:28:39 +02:00
Leon Romanovsky	afcda192db	Expose c0 and SW encap ICM for RDMA These two series from Mark and Shun extend RDMA mlx5 API. Mark's series provides c0 register used to match egress traffic sent by local device. Shun's series adds new type for ICM area. Link: https://lore.kernel.org/all/cover.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-12 09:04:59 +02:00
Mark Bloch	d727d27db5	RDMA/mlx5: Expose register c0 for RDMA device This patch introduces improvements for matching egress traffic sent by the local device. When applicable, all egress traffic from the local vport is now tagged with the provided value. This enhancement is particularly useful for FDB steering purposes. The primary focus of this update is facilitating the transmission of traffic from the hypervisor to a VF. To achieve this, one must initiate an SQ on the hypervisor and subsequently create a rule in the FDB that matches on the eswitch manager vport and the SQN of the aforementioned SQ. Obtaining the SQN can be had from SQ opened, and the eswitch manager vport match can be substituted with the register c0 value exposed by this patch. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/aa4120a91c98ff1c44f1213388c744d4cb0324d6.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-12 09:04:07 +02:00
Shun Hao	a429ec96c0	RDMA/mlx5: Support handling of SW encap ICM area New type for this ICM area, now the user can allocate/deallocate the new type of SW encap ICM memory, to store the encap header data which are managed by SW. Signed-off-by: Shun Hao <shunh@nvidia.com> Link: https://lore.kernel.org/r/546fe43fc700240709e30acf7713ec6834d652bd.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-12 09:03:57 +02:00
Selvin Xavier	07f830ae49	RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters GenP7 HW expects an MSN table instead of PSN table. Check for the HW retransmission capability and populate the MSN table if HW retansmission is supported. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-11 09:56:29 +02:00
Selvin Xavier	cdae3936b2	RDMA/bnxt_re: Doorbell changes Update the Doorbell routines to support the latest HW definitions. Use common routine to prepare the Doorbell key. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-11 09:56:29 +02:00
Selvin Xavier	6027c20dad	RDMA/bnxt_re: Get the toggle bits from CQ completions Get the toggle bits from CQ completions. For older adapters these values are 0. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-11 09:56:28 +02:00
Selvin Xavier	880a5dd188	RDMA/bnxt_re: Update the HW interface definitions Adds HW interface definitions to support the new chip revision. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-11 09:56:28 +02:00
Selvin Xavier	a62d685814	RDMA/bnxt_re: Update the BAR offsets Update the BAR offsets for handling GenP7 adapters. Use the values populated by L2 driver for getting the Doorbell offsets. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-11 09:56:28 +02:00
Selvin Xavier	1801d87b35	RDMA/bnxt_re: Support new 5760X P7 devices Add basic support for 5760X P7 devices. Add new chip revisions. The first version support is similar to the existing P5 adapters. Extend the current support for P5 adapters to P7 also. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1701946060-13931-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-11 09:56:28 +02:00
Chengchang Tang	288f535951	RDMA/hns: Fix memory leak in free_mr_init() When a reserved QP fails to be created, the memory of the remaining created reserved QPs is leaked. Fixes: `70f9252158` ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231207114231.2872104-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-07 15:09:16 +02:00
Chengchang Tang	f31683a522	RDMA/hns: Remove unnecessary checks for NULL in mtr_alloc_bufs() ib_umem_get() never return NULL. Fixes: `3c873161a0` ("RDMA/hns: Add support for addressing when hopnum is 0") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231207114231.2872104-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-07 15:09:16 +02:00
Junxian Huang	7243396aaf	RDMA/hns: Add a max length of gid table IB-core and rdma-core restrict the sgid_index specified by users, which is uint8_t/u8 data type, to only be within the range of 0-255, so it's meaningless to support excessively large gid_table_len. On the other hand, ib-core creates as many sysfs gid files as gid_table_len, most of which are not only useless because of the reason above, but also greatly increase the traversal time of the sysfs gid files for applications. This patch limits the maximum length of gid table to 256. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231207114231.2872104-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-07 15:09:16 +02:00
Junxian Huang	d3f4020a21	RDMA/hns: Response dmac to userspace While creating AH, dmac is already resolved in kernel. Response dmac to userspace so that userspace doesn't need to resolve dmac repeatedly. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231207114231.2872104-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-07 15:09:16 +02:00
Chengchang Tang	95f6b40082	RDMA/hns: Rename the interrupts Now, different devices may have the same interrupt name, which makes it difficult for users to distinguish between these interrupts. Modify the naming style to be consistent with our network devices. Before: "hns-aeq-0" "hns-ceq-0" ... Now: "hns-0000:35:00.0-aeq-0" "hns-0000:35:00.0-ceq-0" ... Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231207114231.2872104-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-12-07 15:09:16 +02:00
Guoqing Jiang	b7a2768a1c	RDMA/siw: Call orq_get_current if possible Use orq_get_current() in siw_orq_empty(). Link: https://lore.kernel.org/r/20231203092655.28102-5-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:13:30 -04:00
Guoqing Jiang	0b988c1bee	RDMA/siw: Set qp_state in siw_query_qp Run test_query_rc_qp against siw failed since siw didn't set qp_state accordingly. To address it, introduce siw_qp_state_to_ib_qp_state which convert SIW_QP_STATE_IDLE to IB_QPS_INIT which is similar as in cxgb4. rdma-core# ./build/bin/run_tests.py --dev siw0 tests.test_qp.QPTest.test_query_rc_qp -v test_query_rc_qp (tests.test_qp.QPTest) Queries an RC QP after creation. Verifies that its properties are as ... FAIL ====================================================================== FAIL: test_query_rc_qp (tests.test_qp.QPTest) Queries an RC QP after creation. Verifies that its properties are as ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/gjiang/rdma-core/tests/test_qp.py", line 284, in test_query_rc_qp self.query_qp_common_test(e.IBV_QPT_RC) File "/home/gjiang/rdma-core/tests/test_qp.py", line 265, in query_qp_common_test self.verify_qp_attrs(caps, e.IBV_QPS_INIT, qp_init_attr, qp_attr) File "/home/gjiang/rdma-core/tests/test_qp.py", line 239, in verify_qp_attrs self.assertEqual(state, attr.qp_state) AssertionError: <ibv_qp_state.IBV_QPS_INIT: 1> != 0 ---------------------------------------------------------------------- Ran 1 test in 0.057s FAILED (failures=1) Link: https://lore.kernel.org/r/20231203092655.28102-4-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:13:30 -04:00
Guoqing Jiang	51ac45a663	RDMA/siw: Reduce memory usage of struct siw_rx_stream We can reduce the memory of the struct by move some of it's member. Before, /* size: 144, cachelines: 3, members: 17 / / sum members: 124, holes: 3, sum holes: 12 / / sum bitfield members: 7 bits (0 bytes) / / padding: 7 / / bit_padding: 1 bits / After / size: 128, cachelines: 2, members: 17 / / padding: 3 / / bit_padding: 1 bits */ Link: https://lore.kernel.org/r/20231203092655.28102-3-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:13:30 -04:00
Guoqing Jiang	84de14baf8	RDMA/siw: Move tx_cpu ahead We can reduce one cacheline for the usage of struct siw_qp. Before, /* size: 1928, cachelines: 31, members: 38 / / sum members: 1920, holes: 2, sum holes: 8 / / paddings: 4, sum paddings: 13 / / forced alignments: 3 / after / size: 1920, cachelines: 30, members: 38 / / paddings: 4, sum paddings: 13 / / forced alignments: 3 */ Link: https://lore.kernel.org/r/20231203092655.28102-2-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:13:30 -04:00
Shifeng Li	e3e82fcb79	RDMA/irdma: Avoid free the non-cqp_request scratch When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: `44d9e52977` ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:02:41 -04:00
Mike Marciniszyn	03769f72d6	RDMA/irdma: Fix support for 64k pages Virtual QP and CQ require a 4K HW page size but the driver passes PAGE_SIZE to ib_umem_find_best_pgsz() instead. Fix this by using the appropriate 4k value in the bitmap passed to ib_umem_find_best_pgsz(). Fixes: `693a5386ef` ("RDMA/irdma: Split mr alloc and free into new functions") Link: https://lore.kernel.org/r/20231129202143.1434-4-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:02:41 -04:00
Mike Marciniszyn	0a5ec366de	RDMA/irdma: Ensure iWarp QP queue memory is OS paged aligned The SQ is shared for between kernel and used by storing the kernel page pointer and passing that to a kmap_atomic(). This then requires that the alignment is PAGE_SIZE aligned. Fix by adding an iWarp specific alignment check. Fixes: `e965ef0e7b` ("RDMA/irdma: Split QP handler into irdma_reg_user_mr_type_qp") Link: https://lore.kernel.org/r/20231129202143.1434-3-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:02:41 -04:00
Mike Marciniszyn	4fbc3a52cd	RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz 64k pages introduce the situation in this diagram when the HCA 4k page size is being used: +-------------------------------------------+ <--- 64k aligned VA \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ \| o \| \| \| \| o \| \| \| \| o \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ <--- Live HCA page \|OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO\| <--- offset \| \| <--- VA \| MR data \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ \| o \| \| \| \| o \| \| \| \| o \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ The VA addresses are coming from rdma-core in this diagram can be arbitrary, but for 64k pages, the VA may be offset by some number of HCA 4k pages and followed by some number of HCA 4k pages. The current iterator doesn't account for either the preceding 4k pages or the following 4k pages. Fix the issue by extending the ib_block_iter to contain the number of DMA pages like comment [1] says and by using __sg_advance to start the iterator at the first live HCA page. The changes are contained in a parallel set of iterator start and next functions that are umem aware and specific to umem since there is one user of the rdma_for_each_block() without umem. These two fixes prevents the extra pages before and after the user MR data. Fix the preceding pages by using the __sq_advance field to start at the first 4k page containing MR data. Fix the following pages by saving the number of pgsz blocks in the iterator state and downcounting on each next. This fix allows for the elimination of the small page crutch noted in the Fixes. Fixes: `10c75ccb54` ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:02:41 -04:00
Christian Brauner	3652117f85	eventfd: simplify eventfd_signal() Ever since the eventfd type was introduced back in 2007 in commit `e1ad7468c7` ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by: Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-28 14:08:38 +01:00
Jack Wang	50af5d12f7	RDMA/IPoIB: Add tx timeout work to recover queue stop situation As we sometime run into TX timeout from IPoIB, queue seems stopped and can't recover. Diff with Mellanox OFED show Mellanox driver has timeout work to recover in such case. Add TX timeout work/NAPI work to recover such case. Also increase the watchdog_timeo to 10 seconds, so more tolerant to error. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231121130316.126364-3-jinpu.wang@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-26 11:32:00 +02:00
Jack Wang	753fff78f4	RDMA/IPoIB: Fix error code return in ipoib_mcast_join Return the error code in case of ib_sa_join_multicast fail. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231121130316.126364-2-jinpu.wang@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-26 11:31:36 +02:00
Shifeng Li	2b78832f50	RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info() When removing the irdma driver or unplugging its aux device, the ccq queue is released before destorying the cqp_cmpl_wq queue. But in the window, there may still be completion events for wqes. That will cause a UAF in irdma_sc_ccq_get_cqe_info(). [34693.333191] BUG: KASAN: use-after-free in irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333194] Read of size 8 at addr ffff889097f80818 by task kworker/u67:1/26327 [34693.333194] [34693.333199] CPU: 9 PID: 26327 Comm: kworker/u67:1 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [34693.333200] Hardware name: SANGFOR Inspur/NULL, BIOS 4.1.13 08/01/2016 [34693.333211] Workqueue: cqp_cmpl_wq cqp_compl_worker [irdma] [34693.333213] Call Trace: [34693.333220] dump_stack+0x71/0xab [34693.333226] print_address_description+0x6b/0x290 [34693.333238] ? irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333240] kasan_report+0x14a/0x2b0 [34693.333251] irdma_sc_ccq_get_cqe_info+0x82f/0x8c0 [irdma] [34693.333264] ? irdma_free_cqp_request+0x151/0x1e0 [irdma] [34693.333274] irdma_cqp_ce_handler+0x1fb/0x3b0 [irdma] [34693.333285] ? irdma_ctrl_init_hw+0x2c20/0x2c20 [irdma] [34693.333290] ? __schedule+0x836/0x1570 [34693.333293] ? strscpy+0x83/0x180 [34693.333296] process_one_work+0x56a/0x11f0 [34693.333298] worker_thread+0x8f/0xf40 [34693.333301] ? __kthread_parkme+0x78/0xf0 [34693.333303] ? rescuer_thread+0xc50/0xc50 [34693.333305] kthread+0x2a0/0x390 [34693.333308] ? kthread_destroy_worker+0x90/0x90 [34693.333310] ret_from_fork+0x1f/0x40 Fixes: `44d9e52977` ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Shifeng Li <lishifeng1992@126.com> Link: https://lore.kernel.org/r/20231121101236.581694-1-lishifeng1992@126.com Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:52:52 +02:00
Kalesh AP	422b19f7f0	RDMA/bnxt_re: Correct module description string The word "Driver" is repeated twice in the "modinfo bnxt_re" output description. Fix it. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1700555387-6277-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:51:48 +02:00
Supriti Singh	640233258e	RDMA/rtrs: Use %pe to print errors While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <supriti.singh@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-10-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:43:34 +02:00
Supriti Singh	e76f514dc9	RDMA/rtrs-clt: Use %pe to print errors While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <supriti.singh@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-9-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:42:59 +02:00
Jack Wang	0c8bb6eb70	RDMA/rtrs-clt: Remove the warnings for req in_use check As we chain the WR during write request: memory registration, rdma write, local invalidate, if only the last WR fail to send due to send queue overrun, the server can send back the reply, while client mark the req->in_use to false in case of error in rtrs_clt_req when error out from rtrs_post_rdma_write_sg. Fixes: `6a98d71dae` ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Jack Wang	6d09f6f7d7	RDMA/rtrs-clt: Fix the max_send_wr setting For each write request, we need Request, Response Memory Registration, Local Invalidate. Fixes: `6a98d71dae` ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Md Haris Iqbal	c4d32e77fc	RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flight Destroying path files may lead to the freeing of rdma_stats. This creates the following race. An IO is in-flight, or has just passed the session state check in process_read/process_write. The close_work gets triggered and the function rtrs_srv_close_work() starts and does destroy path which frees the rdma_stats. After this the function process_read/process_write resumes and tries to update the stats through the function rtrs_srv_update_rdma_stats This commit solves the problem by moving the destroy path function to a later point. This point makes sure any inflights are completed. This is done by qp drain, and waiting for all in-flights through ops_id. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Md Haris Iqbal	3a71cd6ca0	RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is true Since srv_mr->iu is allocated and used only when always_invalidate is true, free it only when always_invalidate is true. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Md Haris Iqbal	ed1e52aefa	RDMA/rtrs-srv: Check return values while processing info request While processing info request, it could so happen that the srv_path goes to CLOSING state, cause of any of the error events from RDMA. That state change should be picked up while trying to change the state in process_info_req, by checking the return value. In case the state change call in process_info_req fails, we fail the processing. We should also check the return value for rtrs_srv_path_up, since it sends a link event to the client above, and the client can fail for any reason. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Jack Wang	3e44a61b5d	RDMA/rtrs-clt: Start hb after path_up If we start hb too early, it will confuse server side to close the session. Fixes: `6a98d71dae` ("RDMA/rtrs: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Jack Wang	3ee7ecd712	RDMA/rtrs-srv: Do not unconditionally enable irq When IO is completed, rtrs can be called in softirq context, unconditionally enabling irq could cause panic. To be on safe side, use spin_lock_irqsave and spin_unlock_irqrestore instread. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-22 13:40:55 +02:00
Md Haris Iqbal	0529e26d8b	RDMA/rtrs-clt: Add warning logs for RDMA events Some RDMA CM events can trigger connection close or recovery for a certain rtrs_clt_path. Such close/recovery triggers should happen after an appropriate log message, since they can lead to IO failures. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231115152749.424301-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-19 15:00:45 +02:00
Junxian Huang	eb7854d63d	RDMA/hns: Support SW stats with debugfs Support SW stats with debugfs. Query output: $ cat /sys/kernel/debug/hns_roce/hns_0/sw_stat/sw_stat aeqe --- 3341 ceqe --- 0 cmds --- 6764 cmds_err --- 0 posted_mbx --- 3344 polled_mbx --- 3 mbx_event --- 3341 qp_create_err --- 0 qp_modify_err --- 0 cq_create_err --- 0 cq_modify_err --- 0 srq_create_err --- 0 srq_modify_err --- 0 xrcd_alloc_err --- 0 mr_reg_err --- 0 mr_rereg_err --- 0 ah_create_err --- 0 mmap_err --- 0 uctx_alloc_err --- 0 Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Link: https://lore.kernel.org/r/20231114123449.1106162-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-19 14:55:43 +02:00
Junxian Huang	ca7ad04cd5	RDMA/hns: Add debugfs to hns RoCE Add debugfs to hns RoCE. This patch only adds an empty directory "hns_roce" to debugs root directory. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231114123449.1106162-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-19 14:55:43 +02:00
Junxian Huang	f45b83ad39	RDMA/hns: Fix inappropriate err code for unsupported operations EOPNOTSUPP is more situable than EINVAL for allocating XRCD while XRC is not supported and unsupported resizing SRQ. Fixes: `32548870d4` ("RDMA/hns: Add support for XRC on HIP09") Fixes: `221109e643` ("RDMA/hns: Add interception for resizing SRQs") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231114123449.1106162-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-19 14:55:43 +02:00
Mustafa Ismail	bd6da690c2	RDMA/irdma: Add wait for suspend on SQD Currently, there is no wait for the QP suspend to complete on a modify to SQD state. Add a wait, after the modify to SQD state, for the Suspend Complete AE. While we are at it, update the suspend timeout value in irdma_prep_tc_change to use IRDMA_EVENT_TIMEOUT_MS too. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231114170246.238-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 16:31:42 +02:00
Mustafa Ismail	ba12ab66aa	RDMA/irdma: Do not modify to SQD on error Remove the modify to SQD before going to ERROR state. It is not needed. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231114170246.238-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 16:31:42 +02:00
Guoqing Jiang	79844118d6	RDMA/siw: Update comments for siw_qp_sq_process There is no siw_sq_work_handler in code, change it with siw_tx_thread since siw_run_sq -> siw_sq_resume -> siw_qp_sq_process. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-18-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	d9a5b48681	RDMA/siw: Introduce siw_destroy_cep_sock Add one helper to simplify code a bit. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310091735.oG7bTvLR-lkp@intel.com/` Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-17-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	788bbf4c2f	RDMA/siw: Only check attrs->cap.max_send_wr in siw_create_qp We can just check max_send_wr here given both max_send_wr and max_recv_wr are defined as u32 type, and we also need to ensure num_sqe (derived from max_send_wr) shouldn't be zero. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-16-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	3beced14d1	RDMA/siw: Fix typo Replace ORRQ with ORQ. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-15-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	a410a73278	RDMA/siw: Remove siw_sk_save_upcalls Let's move it into siw_sk_assign_cm_upcalls, then we only need to get sk_callback_lock once. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-14-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	77b59bd932	RDMA/siw: Cleanup siw_accept With the initialization of rv and the two added label, we can simplifiy code a bit. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-13-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	08456d4db7	RDMA/siw: Introduce siw_free_cm_id Factor out a helper to simplify code. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310091656.JlrmcNXB-lkp@intel.com/ Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-12-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	b5c9154320	RDMA/siw: Introduce siw_cep_set_free_and_put Add the helper which can be used in some places. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-11-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:14 +02:00
Guoqing Jiang	25680c1f26	RDMA/siw: Add one parameter to siw_destroy_cpulist With that we can reuse it in siw_init_cpulist. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-10-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	6a343cc3bf	RDMA/siw: Introduce SIW_STAG_MAX_INDEX Add the macro to remove magic number in the code. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-9-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	60d2136db8	RDMA/siw: Factor out siw_rx_data helper Remove the redundant code given they share the same logic. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-8-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	065186d228	RDMA/siw: No need to check term_info.valid before call siw_send_terminate Remove the redundate checking since siw_send_terminate check it inside. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-7-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	659da08ed8	RDMA/siw: Remove rcu from siw_qp Remove it since it is not used. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-6-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	d248960941	RDMA/siw: Remove goto lable in siw_mmap Let's remove it since the failure case only falls through to the useless label. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-5-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	2109ddf032	RDMA/siw: Use iov.iov_len in kernel_sendmsg We can pass iov.iov_len here. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-4-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	a2b64565e8	RDMA/siw: Introduce siw_update_skb_rcvd There are some places share the same logic, factor a common helper for it. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-3-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Guoqing Jiang	3a179fe34a	RDMA/siw: Introduce siw_get_page Add the wrapper function to get either pbl page or umem page. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20231113115726.12762-2-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-15 15:58:13 +02:00
Leon Romanovsky	b9a85e5eec	RDMA/usnic: Silence uninitialized symbol smatch warnings The patch `1da177e4c3`: "Linux-2.6.12-rc2" from Apr 16, 2005 (linux-next), leads to the following Smatch static checker warning: drivers/infiniband/hw/mthca/mthca_cmd.c:644 mthca_SYS_EN() error: uninitialized symbol 'out'. drivers/infiniband/hw/mthca/mthca_cmd.c 636 int mthca_SYS_EN(struct mthca_dev dev) 637 { 638 u64 out; 639 int ret; 640 641 ret = mthca_cmd_imm(dev, 0, &out, 0, 0, CMD_SYS_EN, CMD_TIME_CLASS_D); We pass out here and it gets used without being initialized. err = mthca_cmd_post(dev, in_param, out_param ? out_param : 0, ^^^^^^^^^^ in_modifier, op_modifier, op, context->token, 1); It's the same in mthca_cmd_wait() and mthca_cmd_poll(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/533bc3df-8078-4397-b93d-d1f6cec9b636@moroto.mountain Link: https://lore.kernel.org/r/c559cb7113158c02d75401ac162652072ef1b5f0.1699867650.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-11-15 15:57:39 +02:00
Eric Biggers	057a301681	RDMA/irdma: Use crypto_shash_digest() in irdma_ieq_check_mpacrc() Simplify irdma_ieq_check_mpacrc() by using crypto_shash_digest() instead of an init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20231029045756.153943-1-ebiggers@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:46:04 +02:00
Eric Biggers	9aac6c05a5	RDMA/siw: Use crypto_shash_digest() in siw_qp_prepare_tx() Simplify siw_qp_prepare_tx() by using crypto_shash_digest() instead of an init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20231029045839.154071-1-ebiggers@kernel.org Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:35:12 +02:00
Chandramohan Akula	48f996d4ad	RDMA/bnxt_re: Remove roundup_pow_of_two depth for all hardware queue resources Rounding up the queue depth to power of two is not a hardware requirement. In order to optimize the per connection memory usage, removing drivers implementation which round up to the queue depths to the power of 2. Implements a mask to maintain backward compatibility with older library. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1698069803-1787-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:27:14 +02:00
Chandramohan Akula	3a4304d826	RDMA/bnxt_re: Refactor the queue index update The queue index wrap around logic is based on power of 2 size depth. All queues are created with power of 2 depth. This increases the memory usage by the driver. This change is required for the next patches that avoids the power of 2 depth requirement for each of the queues. Update the function that increments producer index and consumer index during wrap around. Also, changes the index handling across multiple functions. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1698069803-1787-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:26:41 +02:00
Junxian Huang	efb9cbf664	RDMA/hns: Fix unnecessary err return when using invalid congest control algorithm Add a default congest control algorithm so that driver won't return an error when the configured algorithm is invalid. Fixes: `f91696f2f0` ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231028093242.670325-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:25:29 +02:00
Philipp Stanner	c170d4ff21	RDMA/hfi1: Copy userspace arrays safely Currently, memdup_user() is utilized at two positions to copy userspace arrays. This is done without overflow checks. Use the new wrapper memdup_array_user() to copy the arrays more safely. Suggested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://lore.kernel.org/r/20231102191308.52046-2-pstanner@redhat.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:21:33 +02:00
Shigeru Yoshida	0550d4604e	RDMA/core: Fix uninit-value access in ib_get_eth_speed() KMSAN reported the following uninit-value access issue: lo speed is unknown, defaulting to 1000 ===================================================== BUG: KMSAN: uninit-value in ib_get_width_and_speed drivers/infiniband/core/verbs.c:1889 [inline] BUG: KMSAN: uninit-value in ib_get_eth_speed+0x546/0xaf0 drivers/infiniband/core/verbs.c:1998 ib_get_width_and_speed drivers/infiniband/core/verbs.c:1889 [inline] ib_get_eth_speed+0x546/0xaf0 drivers/infiniband/core/verbs.c:1998 siw_query_port drivers/infiniband/sw/siw/siw_verbs.c:173 [inline] siw_get_port_immutable+0x6f/0x120 drivers/infiniband/sw/siw/siw_verbs.c:203 setup_port_data drivers/infiniband/core/device.c:848 [inline] setup_device drivers/infiniband/core/device.c:1244 [inline] ib_register_device+0x1589/0x1df0 drivers/infiniband/core/device.c:1383 siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline] siw_newlink+0x129e/0x13d0 drivers/infiniband/sw/siw/siw_main.c:490 nldev_newlink+0x8fd/0xa60 drivers/infiniband/core/nldev.c:1763 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0xe8a/0x1120 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0xf4b/0x1230 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1242/0x1420 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x997/0xd60 net/socket.c:2588 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2642 __sys_sendmsg net/socket.c:2671 [inline] __do_sys_sendmsg net/socket.c:2680 [inline] __se_sys_sendmsg net/socket.c:2678 [inline] __x64_sys_sendmsg+0x2fa/0x4a0 net/socket.c:2678 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Local variable lksettings created at: ib_get_eth_speed+0x4b/0xaf0 drivers/infiniband/core/verbs.c:1974 siw_query_port drivers/infiniband/sw/siw/siw_verbs.c:173 [inline] siw_get_port_immutable+0x6f/0x120 drivers/infiniband/sw/siw/siw_verbs.c:203 CPU: 0 PID: 11257 Comm: syz-executor.1 Not tainted 6.6.0-14500-g1c41041124bd #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 ===================================================== If __ethtool_get_link_ksettings() fails, `netdev_speed` is set to the default value, SPEED_1000. In this case, if `lanes` field of struct ethtool_link_ksettings is not initialized, an uninitialized value is passed to ib_get_width_and_speed(). This causes the above issue. This patch resolves the issue by initializing `lanes` to 0. Fixes: `cb06b6b3f6` ("RDMA/core: Get IB width and speed from netdev") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://lore.kernel.org/r/20231108143113.1360567-1-syoshida@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:19:07 +02:00
Bernard Metzler	476b7c7e00	RDMA/siw: Use ib_umem_get() to pin user pages Abandon siw private code to pin user pages during user memory registration, but use ib_umem_get() instead. This will help maintaining the driver in case of changes to the memory subsystem. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20231104075643.195186-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-11-13 10:14:00 +02:00
Linus Torvalds	43468456c9	RDMA for v6.7 The usual collection of patches: - Bug fixes for hns, mlx5, and hfi1 - Hardening patches for size_, counted_by, strscpy - rts fixes from static analysis - Dump SRQ objects in rdma netlink, with hns support - Fix a performance regression in mlx5 MR deregistration - New XDR (200Gb/lane) link speed - SRQ record doorbell latency optimization for hns - IPSEC support for mlx5 multi-port mode - ibv_rereg_mr() support for irdma - Affiliated event support for bnxt_re - Opt out for the spec compliant qkey security enforcement as we discovered SW that breaks under enforcement - Comment and trivial updates -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZUEvXwAKCRCFwuHvBreF YQ/iAPkBl7vCH+oFGJKnh9/pUjKrWwRYBkGNgNesH7lX4Q/cIAEA+R3bAI2d4O4i tMrRRowTnxohbVRebuy4Pfsoo6m9dA4= =TxMJ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Nothing exciting this cycle, most of the diffstat is changing SPDX 'or' to 'OR'. Summary: - Bugfixes for hns, mlx5, and hfi1 - Hardening patches for size_, counted_by, strscpy - rts fixes from static analysis - Dump SRQ objects in rdma netlink, with hns support - Fix a performance regression in mlx5 MR deregistration - New XDR (200Gb/lane) link speed - SRQ record doorbell latency optimization for hns - IPSEC support for mlx5 multi-port mode - ibv_rereg_mr() support for irdma - Affiliated event support for bnxt_re - Opt out for the spec compliant qkey security enforcement as we discovered SW that breaks under enforcement - Comment and trivial updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (50 commits) IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF RDMA/mlx5: Fix mkey cache WQ flush RDMA/hfi1: Workaround truncation compilation error IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock RDMA/core: Remove NULL check before dev_{put, hold} RDMA/hfi1: Remove redundant assignment to pointer ppd RDMA/mlx5: Change the key being sent for MPV device affiliation RDMA/bnxt_re: Fix clang -Wimplicit-fallthrough in bnxt_re_handle_cq_async_error() RDMA/hns: Fix init failure of RoCE VF and HIP08 RDMA/hns: Fix unnecessary port_num transition in HW stats allocation RDMA/hns: The UD mode can only be configured with DCQCN RDMA/hns: Add check for SL RDMA/hns: Fix signed-unsigned mixed comparisons RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common() RDMA/hns: Fix printing level of asynchronous events RDMA/core: Add support to set privileged QKEY parameter RDMA/bnxt_re: Do not report SRQ error in srq notification RDMA/bnxt_re: Report async events and errors RDMA/bnxt_re: Update HW interface headers IB/mlx5: Fix rdma counter binding for RAW QP ...	2023-11-02 15:20:30 -10:00
Linus Torvalds	6ed92e559a	SCSI misc on 20231102 Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc, scsi_debug) plus the usual assorted minor fixes and updates. The major change this time around is a prep patch for rethreading of the driver reset handler API not to take a scsi_cmd structure which starts to reduce various drivers' dependence on scsi_cmd in error handling. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZUORLiYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQ4WAQDDIhzp /PiJBBtt0U9ii/lYqRLrOVnN0extKEgEGO+FbwEAssKgs+5Jn/7XCgdpSrx8Co3/ 0cPXrZGxs7tFpFWLZjM= =AlRU -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc, scsi_debug) plus the usual assorted minor fixes and updates. The major change this time around is a prep patch for rethreading of the driver reset handler API not to take a scsi_cmd structure which starts to reduce various drivers' dependence on scsi_cmd in error handling" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (132 commits) scsi: ufs: core: Leave space for '\0' in utf8 desc string scsi: ufs: core: Conversion to bool not necessary scsi: ufs: core: Fix race between force complete and ISR scsi: megaraid: Fix up debug message in megaraid_abort_and_reset() scsi: aic79xx: Fix up NULL command in ahd_done() scsi: message: fusion: Initialize return value in mptfc_bus_reset() scsi: mpt3sas: Fix loop logic scsi: snic: Remove useless code in snic_dr_clean_pending_req() scsi: core: Add comment to target_destroy in scsi_host_template scsi: core: Clean up scsi_dev_queue_ready() scsi: pmcraid: Add missing scsi_device_put() in pmcraid_eh_target_reset_handler() scsi: target: core: Fix kernel-doc comment scsi: pmcraid: Fix kernel-doc comment scsi: core: Handle depopulation and restoration in progress scsi: ufs: core: Add support for parsing OPP scsi: ufs: core: Add OPP support for scaling clocks and regulators scsi: ufs: dt-bindings: common: Add OPP table scsi: scsi_debug: Add param to control sdev's allow_restart scsi: scsi_debug: Add debugfs interface to fail target reset scsi: scsi_debug: Add new error injection type: Reset LUN failed ...	2023-11-02 15:13:50 -10:00
Linus Torvalds	426ee5196d	sysctl-6.7-rc1 To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last 2 patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2 FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc 9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7 6f0KvCogg0fU =Qf50 -----END PGP SIGNATURE----- Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last two patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now" * tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits) watchdog: move softlockup_panic back to early_param proc: sysctl: prevent aliased sysctls from getting passed to init intel drm: Remove now superfluous sentinel element from ctl_table array Drivers: hv: Remove now superfluous sentinel element from ctl_table array raid: Remove now superfluous sentinel element from ctl_table array fw loader: Remove the now superfluous sentinel element from ctl_table array sgi-xp: Remove the now superfluous sentinel element from ctl_table array vrf: Remove the now superfluous sentinel element from ctl_table array char-misc: Remove the now superfluous sentinel element from ctl_table array infiniband: Remove the now superfluous sentinel element from ctl_table array macintosh: Remove the now superfluous sentinel element from ctl_table array parport: Remove the now superfluous sentinel element from ctl_table array scsi: Remove now superfluous sentinel element from ctl_table array tty: Remove now superfluous sentinel element from ctl_table array xen: Remove now superfluous sentinel element from ctl_table array hpet: Remove now superfluous sentinel element from ctl_table array c-sky: Remove now superfluous sentinel element from ctl_talbe array powerpc: Remove now superfluous sentinel element from ctl_table arrays riscv: Remove now superfluous sentinel element from ctl_table array x86/vdso: Remove now superfluous sentinel element from ctl_table array ...	2023-11-01 20:51:41 -10:00
Linus Torvalds	89ed67ef12	Networking changes for 6.7. Core & protocols ---------------- - Support usec resolution of TCP timestamps, enabled selectively by a route attribute. - Defer regular TCP ACK while processing socket backlog, try to send a cumulative ACK at the end. Increase single TCP flow performance on a 200Gbit NIC by 20% (100Gbit -> 120Gbit). - The Fair Queuing (FQ) packet scheduler: - add built-in 3 band prio / WRR scheduling - support bypass if the qdisc is mostly idle (5% speed up for TCP RR) - improve inactive flow reporting - optimize the layout of structures for better cache locality - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern replacement for the old MD5 option. - Add more retransmission timeout (RTO) related statistics to TCP_INFO. - Support sending fragmented skbs over vsock sockets. - Make sure we send SIGPIPE for vsock sockets if socket was shutdown(). - Add sysctl for ignoring lower limit on lifetime in Router Advertisement PIO, based on an in-progress IETF draft. - Add sysctl to control activation of TCP ping-pong mode. - Add sysctl to make connection timeout in MPTCP configurable. - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps limit the number of wakeups. - Support netlink GET for MDB (multicast forwarding), allowing user space to request a single MDB entry instead of dumping the entire table. - Support selective FDB flushing in the VXLAN tunnel driver. - Allow limiting learned FDB entries in bridges, prevent OOM attacks. - Allow controlling via configfs netconsole targets which were created via the kernel cmdline at boot, rather than via configfs at runtime. - Support multiple PTP timestamp event queue readers with different filters. - MCTP over I3C. BPF --- - Add new veth-like netdevice where BPF program defines the logic of the xmit routine. It can operate in L3 and L2 mode. - Support exceptions - allow asserting conditions which should never be true but are hard for the verifier to infer. With some extra flexibility around handling of the exit / failure. https://lwn.net/Articles/938435/ - Add support for local per-cpu kptr, allow allocating and storing per-cpu objects in maps. Access to those objects operates on the value for the current CPU. This allows to deprecate local one-off implementations of per-CPU storage like BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps. - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is for systemd to re-implement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services. - Enable open-coded task_vma iteration, after maple tree conversion made it hard to directly walk VMAs in tracing programs. - Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF. - Allow source address selection with bpf__fib_lookup(). - Add ability to pin BPF timer to the current CPU. - Prevent creation of infinite loops by combining tail calls and fentry/fexit programs. - Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs. - Inherit system settings for CPU security mitigations. - Add BPF v4 CPU instruction support for arm32 and s390x. Changes to common code ---------------------- - overflow: add DEFINE_FLEX() for on-stack definition of structs with flexible array members. - Process doc update with more guidance for reviewers. Driver API ---------- - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy mutex in most places and remove a lot of smaller locks. - Create a common DPLL configuration API. Allow configuring and querying state of PLL circuits used for clock syntonization, in network time distribution. - Unify fragmented and full page allocation APIs in page pool code. Let drivers be ignorant of PAGE_SIZE. - Rework PHY state machine to avoid races with calls to phy_stop(). - Notify DSA drivers of MAC address changes on user ports, improve correctness of offloads which depend on matching port MAC addresses. - Allow antenna control on injected WiFi frames. - Reduce the number of variants of napi_schedule(). - Simplify error handling when composing devlink health messages. Misc ---- - A lot of KCSAN data race "fixes", from Eric. - A lot of __counted_by() annotations, from Kees. - A lot of strncpy -> strscpy and printf format fixes. - Replace master/slave terminology with conduit/user in DSA drivers. - Handful of KUnit tests for netdev and WiFi core. Removed ------- - AppleTalk COPS. - AppleTalk ipddp. - TI AR7 CPMAC Ethernet driver. Drivers ------- - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - add a driver for the Intel E2000 IPUs - make CRC/FCS stripping configurable - cross-timestamping for E823 devices - basic support for E830 devices - use aux-bus for managing client drivers - i40e: report firmware versions via devlink - nVidia/Mellanox: - support 4-port NICs - increase max number of channels to 256 - optimize / parallelize SF creation flow - Broadcom (bnxt): - enhance NIC temperature reporting - support PAM4 speeds and lane configuration - Marvell OcteonTX2: - PTP pulse-per-second output support - enable hardware timestamping for VFs - Solarflare/AMD: - conntrack NAT offload and offload for tunnels - Wangxun (ngbe/txgbe): - expose HW statistics - Pensando/AMD: - support PCI level reset - narrow down the condition under which skbs are linearized - Netronome/Corigine (nfp): - support CHACHA20-POLY1305 crypto in IPsec offload - Ethernet NICs embedded, slower, virtual: - Synopsys (stmmac): - add Loongson-1 SoC support - enable use of HW queues with no offload capabilities - enable PPS input support on all 5 channels - increase TX coalesce timer to 5ms - RealTek USB (r8152): improve efficiency of Rx by using GRO frags - xen: support SW packet timestamping - add drivers for implementations based on TI's PRUSS (AM64x EVM) - nVidia/Mellanox Ethernet datacenter switches: - avoid poor HW resource use on Spectrum-4 by better block selection for IPv6 multicast forwarding and ordering of blocks in ACL region - Ethernet embedded switches: - Microchip: - support configuring the drive strength for EMI compliance - ksz9477: partial ACL support - ksz9477: HSR offload - ksz9477: Wake on LAN - Realtek: - rtl8366rb: respect device tree config of the CPU port - Ethernet PHYs: - support Broadcom BCM5221 PHYs - TI dp83867: support hardware LED blinking - CAN: - add support for Linux-PHY based CAN transceivers - at91_can: clean up and use rx-offload helpers - WiFi: - MediaTek (mt76): - new sub-driver for mt7925 USB/PCIe devices - HW wireless <> Ethernet bridging in MT7988 chips - mt7603/mt7628 stability improvements - Qualcomm (ath12k): - WCN7850: - enable 320 MHz channels in 6 GHz band - hardware rfkill support - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster - read board data variant name from SMBIOS - QCN9274: mesh support - RealTek (rtw89): - TDMA-based multi-channel concurrency (MCC) - Silicon Labs (wfx): - Remain-On-Channel (ROC) support - Bluetooth: - ISO: many improvements for broadcast support - mark BCM4378/BCM4387 as BROKEN_LE_CODED - add support for QCA2066 - btmtksdio: enable Bluetooth wakeup from suspend Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmU8XsYACgkQMUZtbf5S Irv19RAAnud/24OOF5XMEJkIcYlnfqximh4XO6PujRSYkSkOUJdZTF6iJPgf3pSP YpwoHYbYKHYfeOf8+3bTNESiQNSnoVmvmvwiS6/7lZ3behHUrGLQzW9Htc3EZyWH 2h6QkDZ5OOjfg0bwYSfp3vXkmMH2k8WE9Y0NvCkhcohqZi13Rmp14RnyPmNb2d1V yZRYDMSM133KqE6gnBr1Ct65IEvnKeGlCUN2mTGqOJgdn6DZMsyxvtt0y4rmN7Ab 41+CgPU5SfxfbYpW+Dl2HJpgfte3WrC57KC6AM0PAPJzPmQWgeB/m9mjz/apj6Bg bhsEIo7FdvbCnQm3yWPhK2OgCAcSwLr8jfGMU+Q+W4VnL5SRRR3Rm0zjsze+kHNP OfqJgxzl3DpvoJqVBy1h5FGcZt0XHwhksm4cTxWqIahsF+veY0ECBXbuBBQx9XTF Y7INfI8ulg7wISJs+CJfIClYkgOibTw2u8taBS5ikbtgxNqp5D4QqODn7UefQap1 PR/IDYODF+zRgmMJLeBqSa6fij6BkfOEDiOWak5kggBoZdtbtmeKI6tzze06CNdW lWv1WEhRufxnwK+IuWsEkjhiMbs2WGLvkJ5JbgQV9BfqHfIfiqBCrcWtT/WbQnGt lmU46CXh1t/FZEqbmK9h+8vsIIfrcDl6jb5npEiKPRG00vDKRTM= =46nS -----END PGP SIGNATURE----- Merge tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support usec resolution of TCP timestamps, enabled selectively by a route attribute. - Defer regular TCP ACK while processing socket backlog, try to send a cumulative ACK at the end. Increase single TCP flow performance on a 200Gbit NIC by 20% (100Gbit -> 120Gbit). - The Fair Queuing (FQ) packet scheduler: - add built-in 3 band prio / WRR scheduling - support bypass if the qdisc is mostly idle (5% speed up for TCP RR) - improve inactive flow reporting - optimize the layout of structures for better cache locality - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern replacement for the old MD5 option. - Add more retransmission timeout (RTO) related statistics to TCP_INFO. - Support sending fragmented skbs over vsock sockets. - Make sure we send SIGPIPE for vsock sockets if socket was shutdown(). - Add sysctl for ignoring lower limit on lifetime in Router Advertisement PIO, based on an in-progress IETF draft. - Add sysctl to control activation of TCP ping-pong mode. - Add sysctl to make connection timeout in MPTCP configurable. - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps limit the number of wakeups. - Support netlink GET for MDB (multicast forwarding), allowing user space to request a single MDB entry instead of dumping the entire table. - Support selective FDB flushing in the VXLAN tunnel driver. - Allow limiting learned FDB entries in bridges, prevent OOM attacks. - Allow controlling via configfs netconsole targets which were created via the kernel cmdline at boot, rather than via configfs at runtime. - Support multiple PTP timestamp event queue readers with different filters. - MCTP over I3C. BPF: - Add new veth-like netdevice where BPF program defines the logic of the xmit routine. It can operate in L3 and L2 mode. - Support exceptions - allow asserting conditions which should never be true but are hard for the verifier to infer. With some extra flexibility around handling of the exit / failure: https://lwn.net/Articles/938435/ - Add support for local per-cpu kptr, allow allocating and storing per-cpu objects in maps. Access to those objects operates on the value for the current CPU. This allows to deprecate local one-off implementations of per-CPU storage like BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps. - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is for systemd to re-implement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services. - Enable open-coded task_vma iteration, after maple tree conversion made it hard to directly walk VMAs in tracing programs. - Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF. - Allow source address selection with bpf__fib_lookup(). - Add ability to pin BPF timer to the current CPU. - Prevent creation of infinite loops by combining tail calls and fentry/fexit programs. - Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs. - Inherit system settings for CPU security mitigations. - Add BPF v4 CPU instruction support for arm32 and s390x. Changes to common code: - overflow: add DEFINE_FLEX() for on-stack definition of structs with flexible array members. - Process doc update with more guidance for reviewers. Driver API: - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy mutex in most places and remove a lot of smaller locks. - Create a common DPLL configuration API. Allow configuring and querying state of PLL circuits used for clock syntonization, in network time distribution. - Unify fragmented and full page allocation APIs in page pool code. Let drivers be ignorant of PAGE_SIZE. - Rework PHY state machine to avoid races with calls to phy_stop(). - Notify DSA drivers of MAC address changes on user ports, improve correctness of offloads which depend on matching port MAC addresses. - Allow antenna control on injected WiFi frames. - Reduce the number of variants of napi_schedule(). - Simplify error handling when composing devlink health messages. Misc: - A lot of KCSAN data race "fixes", from Eric. - A lot of __counted_by() annotations, from Kees. - A lot of strncpy -> strscpy and printf format fixes. - Replace master/slave terminology with conduit/user in DSA drivers. - Handful of KUnit tests for netdev and WiFi core. Removed: - AppleTalk COPS. - AppleTalk ipddp. - TI AR7 CPMAC Ethernet driver. Drivers: - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - add a driver for the Intel E2000 IPUs - make CRC/FCS stripping configurable - cross-timestamping for E823 devices - basic support for E830 devices - use aux-bus for managing client drivers - i40e: report firmware versions via devlink - nVidia/Mellanox: - support 4-port NICs - increase max number of channels to 256 - optimize / parallelize SF creation flow - Broadcom (bnxt): - enhance NIC temperature reporting - support PAM4 speeds and lane configuration - Marvell OcteonTX2: - PTP pulse-per-second output support - enable hardware timestamping for VFs - Solarflare/AMD: - conntrack NAT offload and offload for tunnels - Wangxun (ngbe/txgbe): - expose HW statistics - Pensando/AMD: - support PCI level reset - narrow down the condition under which skbs are linearized - Netronome/Corigine (nfp): - support CHACHA20-POLY1305 crypto in IPsec offload - Ethernet NICs embedded, slower, virtual: - Synopsys (stmmac): - add Loongson-1 SoC support - enable use of HW queues with no offload capabilities - enable PPS input support on all 5 channels - increase TX coalesce timer to 5ms - RealTek USB (r8152): improve efficiency of Rx by using GRO frags - xen: support SW packet timestamping - add drivers for implementations based on TI's PRUSS (AM64x EVM) - nVidia/Mellanox Ethernet datacenter switches: - avoid poor HW resource use on Spectrum-4 by better block selection for IPv6 multicast forwarding and ordering of blocks in ACL region - Ethernet embedded switches: - Microchip: - support configuring the drive strength for EMI compliance - ksz9477: partial ACL support - ksz9477: HSR offload - ksz9477: Wake on LAN - Realtek: - rtl8366rb: respect device tree config of the CPU port - Ethernet PHYs: - support Broadcom BCM5221 PHYs - TI dp83867: support hardware LED blinking - CAN: - add support for Linux-PHY based CAN transceivers - at91_can: clean up and use rx-offload helpers - WiFi: - MediaTek (mt76): - new sub-driver for mt7925 USB/PCIe devices - HW wireless <> Ethernet bridging in MT7988 chips - mt7603/mt7628 stability improvements - Qualcomm (ath12k): - WCN7850: - enable 320 MHz channels in 6 GHz band - hardware rfkill support - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster - read board data variant name from SMBIOS - QCN9274: mesh support - RealTek (rtw89): - TDMA-based multi-channel concurrency (MCC) - Silicon Labs (wfx): - Remain-On-Channel (ROC) support - Bluetooth: - ISO: many improvements for broadcast support - mark BCM4378/BCM4387 as BROKEN_LE_CODED - add support for QCA2066 - btmtksdio: enable Bluetooth wakeup from suspend" * tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1816 commits) net: pcs: xpcs: Add 2500BASE-X case in get state for XPCS drivers net: bpf: Use sockopt_lock_sock() in ip_sock_set_tos() net: mana: Use xdp_set_features_flag instead of direct assignment vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size() iavf: delete the iavf client interface iavf: add a common function for undoing the interrupt scheme iavf: use unregister_netdev iavf: rely on netdev's own registered state iavf: fix the waiting time for initial reset iavf: in iavf_down, don't queue watchdog_task if comms failed iavf: simplify mutex_trylock+sleep loops iavf: fix comments about old bit locks doc/netlink: Update schema to support cmd-cnt-name and cmd-max-name tools: ynl: introduce option to process unknown attributes or types ipvlan: properly track tx_errors netdevsim: Block until all devices are released nfp: using napi_build_skb() to replace build_skb() net: dsa: microchip: ksz9477: Fix spelling mistake "Enery" -> "Energy" net: dsa: microchip: Ensure Stable PME Pin State for Wake-on-LAN net: dsa: microchip: Refactor switch shutdown routine for WoL preparation ...	2023-10-31 05:10:11 -10:00
George Kennedy	2ef422f063	IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF In the unlikely event that workqueue allocation fails and returns NULL in mlx5_mkey_cache_init(), delete the call to mlx5r_umr_resource_cleanup() (which frees the QP) in mlx5_ib_stage_post_ib_reg_umr_init(). This will avoid attempted double free of the same QP when __mlx5_ib_add() does its cleanup. Resolves a splat: Syzkaller reported a UAF in ib_destroy_qp_user workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642): failed to create work queue infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642): mr cache init failed -12 ================================================================== BUG: KASAN: slab-use-after-free in ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642 Call Trace: <TASK> kasan_report (mm/kasan/report.c:590) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... </TASK> Allocated by task 1642: __kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026 mm/slab_common.c:1039) create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720 ./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209) ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347) mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Freed by task 1642: __kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822) ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112) mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198) mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076 drivers/infiniband/hw/mlx5/main.c:4065) __mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168) mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402) ... Fixes: `04876c12c1` ("RDMA/mlx5: Move init and cleanup of UMR to umr.c") Link: https://lore.kernel.org/r/1698170518-4006-1-git-send-email-george.kennedy@oracle.com Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-31 11:16:05 -03:00
Moshe Shemesh	a53e215f90	RDMA/mlx5: Fix mkey cache WQ flush The cited patch tries to ensure no pending works on the mkey cache workqueue by disabling adding new works and call flush_workqueue(). But this workqueue also has delayed works which might still be pending the delay time to be queued. Add cancel_delayed_work() for the delayed works which waits to be queued and then the flush_workqueue() will flush all works which are already queued and running. Fixes: `374012b004` ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/b8722f14e7ed81452f791764a26d2ed4cfa11478.1698256179.git.leon@kernel.org Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-31 10:57:49 -03:00
Jason Gunthorpe	162e348024	Merge tag 'v6.6' into rdma.git for-next Resolve conflict by taking the spin_lock hunk from for-next: https://lore.kernel.org/r/20230928113851.5197a1ec@canb.auug.org.au Required for the next patch. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-10-31 10:54:48 -03:00
Linus Torvalds	14ab6d425e	vfs-6.7.ctime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppYgAKCRCRxhvAZXjc okIHAP9anLz1QDyMLH12ASuHjgBc0Of3jcB6NB97IWGpL4O21gEA46ohaD+vcJuC YkBLU3lXqQ87nfu28ExFAzh10hG2jwM= =m4pB -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...	2023-10-30 09:47:13 -10:00
Linus Torvalds	df9c65b5fc	vfs-6.7.iov_iter -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppQwAKCRCRxhvAZXjc om2kAP4u+eLsrhJHfUPUttGUEkSkZE+5/s/f1A/1GcV51usLSgEAu8urxAnP49GW INaDABXaFfKx8/KI/H2YFZPKGwlNEwY= =PDDE -----END PGP SIGNATURE----- Merge tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iov_iter updates from Christian Brauner: "This contain's David's iov_iter cleanup work to convert the iov_iter iteration macros to inline functions: - Remove last_offset from iov_iter as it was only used by ITER_PIPE - Add a __user tag on copy_mc_to_user()'s dst argument on x86 to match that on powerpc and get rid of a sparse warning - Convert iter->user_backed to user_backed_iter() in the sound PCM driver - Convert iter->user_backed to user_backed_iter() in a couple of infiniband drivers - Renumber the type enum so that the ITER_* constants match the order in iterate_and_advance() - Since the preceding patch puts UBUF and IOVEC at 0 and 1, change user_backed_iter() to just use the type value and get rid of the extra flag - Convert the iov_iter iteration macros to always-inline functions to make the code easier to follow. It uses function pointers, but they get optimised away - Move the check for ->copy_mc to _copy_from_iter() and copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc() where it gets repeated for every segment. Instead, we check once and invoke a side function that can use iterate_bvec() rather than iterate_and_advance() and supply a different step function - Move the copy-and-csum code to net/ where it can be in proximity with the code that uses it - Fold memcpy_and_csum() in to its two users - Move csum_and_copy_from_iter_full() out of line and merge in csum_and_copy_from_iter() since the former is the only caller of the latter - Move hash_and_copy_to_iter() to net/ where it can be with its only caller" tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter, net: Move hash_and_copy_to_iter() to net/ iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together iov_iter, net: Fold in csum_and_memcpy() iov_iter, net: Move csum_and_copy_to/from_iter() to net/ iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() iov_iter: Convert iterate() to inline funcs iov_iter: Derive user-backedness from the iterator type iov_iter: Renumber ITER_ constants infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC sound: Fix snd_pcm_readv()/writev() to use iov access functions iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user() iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE	2023-10-30 09:24:21 -10:00
Leon Romanovsky	d4b2d16571	RDMA/hfi1: Workaround truncation compilation error Increase name array to be large enough to overcome the following compilation error. drivers/infiniband/hw/hfi1/efivar.c: In function ‘read_hfi1_efi_var’: drivers/infiniband/hw/hfi1/efivar.c:124:44: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 124 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^ drivers/infiniband/hw/hfi1/efivar.c:124:9: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64 124 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/efivar.c:133:52: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 133 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^ drivers/infiniband/hw/hfi1/efivar.c:133:17: note: ‘snprintf’ output 2 or more bytes (assuming 65) into a destination of size 64 133 \| snprintf(name, sizeof(name), "%s-%s", prefix_name, kind); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:243: drivers/infiniband/hw/hfi1/efivar.o] Error 1 Fixes: `c03c08d50b` ("IB/hfi1: Check upper-case EFI variables") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/238fa39a8fd60e87a5ad7e1ca6584fcdf32e9519.1698159993.git.leonro@nvidia.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-25 11:13:57 +03:00
Chengfeng Ye	2f19c4b839	IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock handle_receive_interrupt_napi_sp() running inside interrupt handler could introduce inverse lock ordering between &dd->irq_src_lock and &dd->uctxt_lock, if read_mod_write() is preempted by the isr. [CPU0] \| [CPU1] hfi1_ipoib_dev_open() \| --> hfi1_netdev_enable_queues() \| --> enable_queues(rx) \| --> hfi1_rcvctrl() \| --> set_intr_bits() \| --> read_mod_write() \| --> spin_lock(&dd->irq_src_lock) \| \| hfi1_poll() \| --> poll_next() \| --> spin_lock_irq(&dd->uctxt_lock) \| \| --> hfi1_rcvctrl() \| --> set_intr_bits() \| --> read_mod_write() \| --> spin_lock(&dd->irq_src_lock) <interrupt> \| --> handle_receive_interrupt_napi_sp() \| --> set_all_fastpath() \| --> hfi1_rcd_get_by_index() \| --> spin_lock_irqsave(&dd->uctxt_lock) \| This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch use spin_lock_irqsave() on &dd->irq_src_lock inside read_mod_write() to prevent the possible deadlock scenario. Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Link: https://lore.kernel.org/r/20230926101116.2797-1-dg573847474@gmail.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-25 11:12:43 +03:00
Yang Li	7a1c2abf9a	RDMA/core: Remove NULL check before dev_{put, hold} The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20231024003815.89742-1-yang.lee@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-24 18:16:04 +03:00
Colin Ian King	b55706366c	RDMA/hfi1: Remove redundant assignment to pointer ppd Pointer ppd is being assigned a value in a for-loop however it is never read. The assignment is redundant and can be removed. Cleans up clang scan build warning: drivers/infiniband/hw/hfi1/init.c:1030:3: warning: Value stored to 'ppd' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20231023141733.667807-1-colin.i.king@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-24 17:33:25 +03:00
Patrisious Haddad	02e7d139e5	RDMA/mlx5: Change the key being sent for MPV device affiliation Change the key that we send from IB driver to EN driver regarding the MPV device affiliation, since at that stage the IB device is not yet initialized, so its index would be zero for different IB devices and cause wrong associations between unrelated master and slave devices. Instead use a unique value from inside the core device which is already initialized at this stage. Fixes: `0d293714ac` ("RDMA/mlx5: Send events from IB driver about device affiliation state") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-24 17:10:58 +03:00
Nathan Chancellor	9040c0d96f	RDMA/bnxt_re: Fix clang -Wimplicit-fallthrough in bnxt_re_handle_cq_async_error() Clang warns (or errors with CONFIG_WERROR=y): drivers/infiniband/hw/bnxt_re/main.c:1114:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] 1114 \| default: \| ^ drivers/infiniband/hw/bnxt_re/main.c:1114:2: note: insert 'break;' to avoid fall-through 1114 \| default: \| ^ \| break; 1 error generated. Clang is a little more pedantic than GCC, which does not warn when falling through to a case that is just break or return. Clang's version is more in line with the kernel's own stance in deprecated.rst, which states that all switch/case blocks must end in either break, fallthrough, continue, goto, or return. Add the missing break to silence the warning. Closes: https://github.com/ClangBuiltLinux/linux/issues/1950 Fixes: `b02fd3f79e` ("RDMA/bnxt_re: Report async events and errors") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20231020-bnxt_re-implicit-fallthrough-v1-1-b5c19534a6c9@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-22 10:31:01 +03:00
Junxian Huang	07f06e0e5c	RDMA/hns: Fix init failure of RoCE VF and HIP08 During device init, a struct for HW stats will be allocated. As HW stats are not supported for VF and HIP08, currently hns_roce_alloc_hw_port_stats() returns NULL in this case. However, ib-core considers the returned NULL pointer as memory allocation failure and returns ENOMEM, eventually leading to the failure of VF and HIP08 init. In the case where the driver does not support the .alloc_hw_port_stats() ops, ib-core will return EOPNOTSUPP and ignore this error code in the upper layer function. So for VF and HIP08, just don't set the HW stats ops to ib-core. Fixes: `5a87279591` ("RDMA/hns: Support hns HW stats") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Junxian Huang	b4a797b894	RDMA/hns: Fix unnecessary port_num transition in HW stats allocation The num_ports capability of devices should be compared with the number of port(i.e. the input param "port_num") but not the port index(i.e. port_num - 1). Fixes: `5a87279591` ("RDMA/hns: Support hns HW stats") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Luoyouming	27c5fd271d	RDMA/hns: The UD mode can only be configured with DCQCN Due to hardware limitations, only DCQCN is supported for UD. Therefore, the default algorithm for UD is set to DCQCN. Fixes: `f91696f2f0` ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Luoyouming	5e617c18b1	RDMA/hns: Add check for SL SL set by users may exceed the capability of devices. So add check for this situation. Fixes: `fba429fcf9` ("RDMA/hns: Fix missing fields in address vector") Fixes: `70f9252158` ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: `f0cb411aad` ("RDMA/hns: Use new interface to modify QP context") Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Chengchang Tang	b5f9efff10	RDMA/hns: Fix signed-unsigned mixed comparisons The ib_mtu_enum_to_int() and uverbs_attr_get_len() may returns a negative value. In this case, mixed comparisons of signed and unsigned types will throw wrong results. This patch adds judgement for this situation. Fixes: `30b707886a` ("RDMA/hns: Support inline data in extented sge space for RC") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Chengchang Tang	c64e9710f9	RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common() ucmd in hns_roce_create_qp_common() are not initialized. But it works fine until new member sdb_addr is added to struct hns_roce_ib_create_qp. If the user-mode driver uses an old version ABI, then the value of the new member will be undefined after ib_copy_from_udata(). This patch fixes it by initialize this variable to 0. And the default value of the new member sdb_addr will be 0 which is invalid. Fixes: `0425e3e6e0` ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Chengchang Tang	9faef73ef4	RDMA/hns: Fix printing level of asynchronous events The current driver will print all asynchronous events. Some of the print levels are set improperly, e.g. SRQ limit reach and SRQ last wqe reach, which may also occur during normal operation of the software. Currently, the information of these event is printed as a warning, which causes a large amount of printing even during normal use of the application. As a result, the service performance deteriorates. This patch fixes the printing storms by modifying the print level. Fixes: `b00a92c8f2` ("RDMA/hns: Move all prints out of irq handle") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20231017125239.164455-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:49:40 +03:00
Patrisious Haddad	465d6b42f1	RDMA/core: Add support to set privileged QKEY parameter Add netlink command that enables/disables privileged QKEY by default. It is disabled by default, since according to IB spec only privileged users are allowed to use privileged QKEY. According to the IB specification rel-1.6, section 3.5.3: "QKEYs with the most significant bit set are considered controlled QKEYs, and a HCA does not allow a consumer to arbitrarily specify a controlled QKEY." Using rdma tool, $rdma system set privileged-qkey on When enabled non-privileged users would be able to use controlled QKEYs which are considered privileged. Using rdma tool, $rdma system set privileged-qkey off When disabled only privileged users would be able to use controlled QKEYs. You can also use the command below to check the parameter state: $rdma system show netns shared privileged-qkey off copy-on-fork on Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/90398be70a9d23d2aa9d0f9fd11d2c264c1be534.1696848201.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-19 09:31:00 +03:00
Jeff Layton	7e6481cebd	qib: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-5-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-10-18 13:26:16 +02:00
Chandramohan Akula	45cfa8864c	RDMA/bnxt_re: Do not report SRQ error in srq notification In the SRQ notification handler, do not report the SRQ_ERROR in the default event case, as there was no error. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1697049097-31992-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-15 11:48:32 +03:00
Chandramohan Akula	b02fd3f79e	RDMA/bnxt_re: Report async events and errors Report QP, SRQ and CQ async events and errors. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1697049097-31992-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-15 11:48:32 +03:00
Chandramohan Akula	d60a779673	RDMA/bnxt_re: Update HW interface headers Updating the HW structures for the affiliated event and error reporting. Newly added interface structures will be used in the followup patch. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1697049097-31992-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-15 11:48:32 +03:00
Patrisious Haddad	c1336bb4aa	IB/mlx5: Fix rdma counter binding for RAW QP Previously when we had a RAW QP, we bound a counter to it when it moved to INIT state, using the counter context inside RQC. But when we try to modify that counter later in RTS state we used modify QP which tries to change the counter inside QPC instead of RQC. Now we correctly modify the counter set_id inside of RQC instead of QPC for the RAW QP. Fixes: `d14133dd41` ("IB/mlx5: Support set qp counter") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/2e5ab6713784a8fe997d19c508187a0dfecf2dfc.1696847964.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-15 11:04:01 +03:00
Mike Christie	194605d45d	scsi: target: Have drivers report if they support direct submissions In some cases, like with multiple LUN targets or where the target has to respond to transport level requests from the receiving context it can be better to defer cmd submission to a helper thread. If the backend driver blocks on something like request/tag allocation it can block the entire target submission path and other LUs and transport IO on that session. In other cases like single LUN targets with storage that can support all the commands that the target can queue, then it's best to submit the cmd to the backend from the target's cmd receiving context. Subsequent commits will allow the user to config what they prefer, but drivers like loop can't directly submit because they can be called from a context that can't sleep. And, drivers like vhost-scsi can support direct submission, but need to keep their default behavior of deferring execution to avoid possible regressions where the backend can block. Make the drivers tell LIO core if they support direct submissions and their current default, so we can prevent users from misconfiguring the system and initialize devices correctly. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230928020907.5730-2-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-10-13 15:53:57 -04:00
Jakub Kicinski	1bc60524ca	Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== This PR is collected from https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. [1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode ==================== Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-13 09:35:34 -07:00
Jakub Kicinski	0e6bb5b7f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: kernel/bpf/verifier.c `829955981c` ("bpf: Fix verifier log for async callback return values") `a923819fb2` ("bpf: Treat first argument as return value for bpf_throw") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-12 17:07:34 -07:00
Christian Marangi	73382e919f	netdev: replace napi_reschedule with napi_schedule Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit `bfe13f54f5` ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-11 17:28:06 -07:00
Joel Granados	6d07cc269b	infiniband: Remove the now superfluous sentinel element from ctl_table array This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel from iwcm_ctl_table and ucma_ctl_table Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2023-10-11 12:16:13 -07:00
Sindhu Devale	5ac388db27	RDMA/irdma: Add support to re-register a memory region Add support for reregister MR verb API by doing a de-register followed by a register MR with the new attributes. Reuse resources like iwmr handle and HW stag where possible. Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20231004151306.228-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-09 08:39:49 +03:00
Leon Romanovsky	c38d23a544	RDMA/core: Require admin capabilities to set system parameters Like any other set command, require admin permissions to do it. Cc: stable@vger.kernel.org Fixes: `2b34c55802` ("RDMA/core: Add command to set ib_core device net namspace sharing mode") Link: https://lore.kernel.org/r/75d329fdd7381b52cbdf87910bef16c9965abb1f.1696443438.git.leon@kernel.org Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-10-05 20:01:47 +03:00
Chuck Lever	0aa44595d6	RDMA/core: Fix a couple of obvious typos in comments Fix typos. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169643338101.8035.6826446669479247727.stgit@manet.1015granger.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-04 21:55:44 +03:00
Leon Romanovsky	16419098e8	IPsec packet offload support in multiport RoCE devices This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. Thanks [1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/ Link: https://lore.kernel.org/all/20231002083832.19746-1-leon@kernel.org Signed-of-by: Leon Romanovsky <leon@kernel.org> * mlx5-next: (576 commits) net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode Linux 6.6-rc3 ...	2023-10-04 21:21:49 +03:00
Kees Cook	964168970c	IB/hfi1: Annotate struct tid_rb_node with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct tid_rb_node. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-7-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Kees Cook	2aba54a9e0	IB/mthca: Annotate struct mthca_icm_table with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct mthca_icm_table. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-6-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Kees Cook	bd8eec5bfa	IB/srp: Annotate struct srp_fr_pool with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct srp_fr_pool. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Bart Van Assche <bvanassche@acm.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-5-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Kees Cook	0bc018b7a7	RDMA/siw: Annotate struct siw_pbl with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct siw_pbl. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-4-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Kees Cook	ed7c64de62	RDMA/usnic: Annotate struct usnic_uiom_chunk with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct usnic_uiom_chunk. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Christian Benvenuti <benve@cisco.com> Cc: Nelson Escobar <neescoba@cisco.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-3-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Kees Cook	fc424078f5	RDMA/core: Annotate struct ib_pkey_cache with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ib_pkey_cache. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: "Håkon Bugge" <haakon.bugge@oracle.com> Cc: Avihai Horon <avihaih@nvidia.com> Cc: Anand Khoje <anand.a.khoje@oracle.com> Cc: Mark Bloch <mbloch@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-2-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Leon Romanovsky	c99a7457e5	RDMA/mlx5: Remove not-used cache disable flag During execution of mlx5_mkey_cache_cleanup(), there is a guarantee that MR are not registered and/or destroyed. It means that we don't need newly introduced cache disable flag. Fixes: `374012b004` ("RDMA/mlx5: Fix mkey cache possible deadlock on cleanup") Link: https://lore.kernel.org/r/c7e9c9f98c8ae4a7413d97d9349b29f5b0a23dbe.1695921626.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-10-02 14:32:44 +03:00
Mark Zhang	e0fe97efdb	RDMA/cma: Initialize ib_sa_multicast structure to 0 when join Initialize the structure to 0 so that it's fields won't have random values. For example fields like rec.traffic_class (as well as rec.flow_label and rec.sl) is used to generate the user AH through: cma_iboe_join_multicast cma_make_mc_event ib_init_ah_from_mcmember And a random traffic_class causes a random IP DSCP in RoCEv2. Fixes: `b5de0c60cc` ("RDMA/cma: Fix use after free race in roce multicast join") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/20230927090511.603595-1-markzhang@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 13:10:40 +03:00
Yangyang Li	c9813b0b99	RDMA/hns: Support SRQ record doorbell Compared with normal doorbell, using record doorbell can shorten the process of ringing the doorbell and reduce the latency. Add a flag HNS_ROCE_CAP_FLAG_SRQ_RECORD_DB to allow FW to enable/disable SRQ record doorbell. If the flag above is set, allocate the dma buffer for SRQ record doorbell and write the buffer address into SRQC during SRQ creation. For userspace SRQ, add a flag HNS_ROCE_RSP_SRQ_CAP_RECORD_DB to notify userspace whether the SRQ record doorbell is enabled. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230926130026.583088-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 11:47:08 +03:00
Patrisious Haddad	0d293714ac	RDMA/mlx5: Send events from IB driver about device affiliation state Send blocking events from IB driver whenever the device is done being affiliated or if it is removed from an affiliation. This is useful since now the EN driver can register to those event and know when a device is affiliated or not. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/a7491c3e483cfd8d962f5f75b9a25f253043384a.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 11:20:59 +03:00
Patrisious Haddad	8dc0fd2f56	RDMA/ipoib: Add support for XDR speed in ethtool The IBTA specification 1.7 has new speed - XDR, supporting signaling rate of 200Gb. Ethtool support of IPoIB driver translates IB speed to signaling rate. Added translation of XDR IB type to rate of 200Gb Ethernet speed. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/ca252b79b7114af967de3d65f9a38992d4d87a14.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:57 +03:00
Patrisious Haddad	4f4db19089	IB/mlx5: Adjust mlx5 rate mapping to support 800Gb Adjust mlx5 function which maps the speed rate from IB spec values to internal driver values to be able to handle speeds up to 800Gb. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/301c803d8486b0df8aefad3fb3cc10dc58671985.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:54 +03:00
Patrisious Haddad	b28ad32442	IB/mlx5: Rename 400G_8X speed to comply to naming convention Rename 400G_8X speed to comply to naming convention. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:50 +03:00
Patrisious Haddad	948f0bf5ad	IB/mlx5: Add support for 800G_8X lane speed Add a check for 800G_8X speed when querying PTYS and report it back correctly when needed. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/26fd0b6e1fac071c3eb779657bb3d8ba47f47c4f.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:46 +03:00
Or Har-Toov	561b4a3ac6	IB/mlx5: Expose XDR speed through MAD Under MAD query port, Report NDR speed when NDR is supported in the port capability mask. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/d30bdec2a66a8a2edd1d84ee61453c58cf346b43.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:43 +03:00
Or Har-Toov	703289ce43	IB/core: Add support for XDR link speed Add new IBTA speed XDR, the new rate that was added to Infiniband spec as part of XDR and supporting signaling rate of 200Gb. In order to report that value to rdma-core, add new u32 field to query_port response. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/9d235fc600a999e8274010f0e18b40fa60540e6c.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:39 +03:00
Shay Drory	57e7071683	RDMA/mlx5: Implement mkeys management via LIFO queue Currently, mkeys are managed via xarray. This implementation leads to a degradation in cases many MRs are unregistered in parallel, due to xarray internal implementation, for example: deregistration 1M MRs via 64 threads is taking ~15% more time[1]. Hence, implement mkeys management via LIFO queue, which solved the degradation. [1] 2.8us in kernel v5.19 compare to 3.2us in kernel v6.4 Signed-off-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/fde3d4cfab0f32f0ccb231cd113298256e1502c5.1695283384.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:36:18 +03:00
Shay Drory	374012b004	RDMA/mlx5: Fix mkey cache possible deadlock on cleanup Fix the deadlock by refactoring the MR cache cleanup flow to flush the workqueue without holding the rb_lock. This adds a race between cache cleanup and creation of new entries which we solve by denied creation of new entries after cache cleanup started. Lockdep: WARNING: possible circular locking dependency detected [ 2785.326074 ] 6.2.0-rc6_for_upstream_debug_2023_01_31_14_02 #1 Not tainted [ 2785.339778 ] ------------------------------------------------------ [ 2785.340848 ] devlink/53872 is trying to acquire lock: [ 2785.341701 ] ffff888124f8c0c8 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0xc8/0x900 [ 2785.343403 ] [ 2785.343403 ] but task is already holding lock: [ 2785.344464 ] ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] [ 2785.346273 ] [ 2785.346273 ] which lock already depends on the new lock. [ 2785.346273 ] [ 2785.347720 ] [ 2785.347720 ] the existing dependency chain (in reverse order) is: [ 2785.349003 ] [ 2785.349003 ] -> #1 (&dev->cache.rb_lock){+.+.}-{3:3}: [ 2785.350160 ] __mutex_lock+0x14c/0x15c0 [ 2785.350962 ] delayed_cache_work_func+0x2d1/0x610 [mlx5_ib] [ 2785.352044 ] process_one_work+0x7c2/0x1310 [ 2785.352879 ] worker_thread+0x59d/0xec0 [ 2785.353636 ] kthread+0x28f/0x330 [ 2785.354370 ] ret_from_fork+0x1f/0x30 [ 2785.355135 ] [ 2785.355135 ] -> #0 ((work_completion)(&(&ent->dwork)->work)){+.+.}-{0:0}: [ 2785.356515 ] __lock_acquire+0x2d8a/0x5fe0 [ 2785.357349 ] lock_acquire+0x1c1/0x540 [ 2785.358121 ] __flush_work+0xe8/0x900 [ 2785.358852 ] __cancel_work_timer+0x2c7/0x3f0 [ 2785.359711 ] mlx5_mkey_cache_cleanup+0xfb/0x250 [mlx5_ib] [ 2785.360781 ] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x16/0x30 [mlx5_ib] [ 2785.361969 ] __mlx5_ib_remove+0x68/0x120 [mlx5_ib] [ 2785.362960 ] mlx5r_remove+0x63/0x80 [mlx5_ib] [ 2785.363870 ] auxiliary_bus_remove+0x52/0x70 [ 2785.364715 ] device_release_driver_internal+0x3c1/0x600 [ 2785.365695 ] bus_remove_device+0x2a5/0x560 [ 2785.366525 ] device_del+0x492/0xb80 [ 2785.367276 ] mlx5_detach_device+0x1a9/0x360 [mlx5_core] [ 2785.368615 ] mlx5_unload_one_devl_locked+0x5a/0x110 [mlx5_core] [ 2785.369934 ] mlx5_devlink_reload_down+0x292/0x580 [mlx5_core] [ 2785.371292 ] devlink_reload+0x439/0x590 [ 2785.372075 ] devlink_nl_cmd_reload+0xaef/0xff0 [ 2785.372973 ] genl_family_rcv_msg_doit.isra.0+0x1bd/0x290 [ 2785.374011 ] genl_rcv_msg+0x3ca/0x6c0 [ 2785.374798 ] netlink_rcv_skb+0x12c/0x360 [ 2785.375612 ] genl_rcv+0x24/0x40 [ 2785.376295 ] netlink_unicast+0x438/0x710 [ 2785.377121 ] netlink_sendmsg+0x7a1/0xca0 [ 2785.377926 ] sock_sendmsg+0xc5/0x190 [ 2785.378668 ] __sys_sendto+0x1bc/0x290 [ 2785.379440 ] __x64_sys_sendto+0xdc/0x1b0 [ 2785.380255 ] do_syscall_64+0x3d/0x90 [ 2785.381031 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 2785.381967 ] [ 2785.381967 ] other info that might help us debug this: [ 2785.381967 ] [ 2785.383448 ] Possible unsafe locking scenario: [ 2785.383448 ] [ 2785.384544 ] CPU0 CPU1 [ 2785.385383 ] ---- ---- [ 2785.386193 ] lock(&dev->cache.rb_lock); [ 2785.386940 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.388327 ] lock(&dev->cache.rb_lock); [ 2785.389425 ] lock((work_completion)(&(&ent->dwork)->work)); [ 2785.390414 ] [ 2785.390414 ] * DEADLOCK * [ 2785.390414 ] [ 2785.391579 ] 6 locks held by devlink/53872: [ 2785.392341 ] #0: ffffffff84c17a50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 [ 2785.393630 ] #1: ffff888142280218 (&devlink->lock_key){+.+.}-{3:3}, at: devlink_get_from_attrs_lock+0x12d/0x2d0 [ 2785.395324 ] #2: ffff8881422d3c38 (&dev->lock_key){+.+.}-{3:3}, at: mlx5_unload_one_devl_locked+0x4a/0x110 [mlx5_core] [ 2785.397322 ] #3: ffffffffa0e59068 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_detach_device+0x60/0x360 [mlx5_core] [ 2785.399231 ] #4: ffff88810e3cb0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600 [ 2785.400864 ] #5: ffff88817e8f1260 (&dev->cache.rb_lock){+.+.}-{3:3}, at: mlx5_mkey_cache_cleanup+0x77/0x250 [mlx5_ib] Fixes: `b958451783` ("RDMA/mlx5: Change the cache structure to an RB-tree") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-09-26 12:33:53 +03:00
Shay Drory	dab994bcc6	RDMA/mlx5: Fix NULL string error checkpath is complaining about NULL string, change it to 'Unknown'. Fixes: `37aa5c36aa` ("IB/mlx5: Add UARs write-combining and non-cached mapping") Signed-off-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/8638e5c14fadbde5fa9961874feae917073af920.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:44 +03:00
Hamdan Igbaria	2fad8f06a5	RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation The mutex was not unlocked on some of the error flows. Moved the unlock location to include all the error flow scenarios. Fixes: `e1f4a52ac1` ("RDMA/mlx5: Create an indirect flow table for steering anchor") Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Link: https://lore.kernel.org/r/1244a69d783da997c0af0b827c622eb00495492e.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:40 +03:00
Michael Guralnik	4f14c6c021	RDMA/mlx5: Fix assigning access flags to cache mkeys After the change to use dynamic cache structure, new cache entries can be added and the mkey allocation can no longer assume that all mkeys created for the cache have access_flags equal to zero. Example of a flow that exposes the issue: A user registers MR with RO on a HCA that cannot UMR RO and the mkey is created outside of the cache. When the user deregisters the MR, a new cache entry is created to store mkeys with RO. Later, the user registers 2 MRs with RO. The first MR is reused from the new cache entry. When we try to get the second mkey from the cache we see the entry is empty so we go to the MR cache mkey allocation flow which would have allocated a mkey with no access flags, resulting the user getting a MR without RO. Fixes: `dd1b913fb0` ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/8a802700b82def3ace3f77cd7a9ad9d734af87e7.1695203958.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:29:33 +03:00
David Howells	7ebc540b35	infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC Use user_backed_iter() to see if iterator is UBUF/IOVEC rather than poking inside the iterator. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-5-dhowells@redhat.com cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> cc: Jason Gunthorpe <jgg@ziepe.ca> cc: Leon Romanovsky <leon@kernel.org> cc: linux-rdma@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-09-25 14:30:28 +02:00
Christophe JAILLET	d7f393430a	IB/mlx4: Fix the size of a buffer in add_port_entries() In order to be sure that 'buff' is never truncated, its size should be 12, not 11. When building with W=1, this fixes the following warnings: drivers/infiniband/hw/mlx4/sysfs.c: In function ‘add_port_entries’: drivers/infiniband/hw/mlx4/sysfs.c:268:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=] 268 \| sprintf(buff, "%d", i); \| ^ drivers/infiniband/hw/mlx4/sysfs.c:268:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11 268 \| sprintf(buff, "%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/sysfs.c:286:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=] 286 \| sprintf(buff, "%d", i); \| ^ drivers/infiniband/hw/mlx4/sysfs.c:286:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11 286 \| sprintf(buff, "%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~ Fixes: `c1e7e46612` ("IB/mlx4: Add iov directory in sysfs under the ib device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0bb1443eb47308bc9be30232cc23004c4d4cf43e.1695448530.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-23 21:53:24 +03:00
Justin Stitt	cb7ab7854b	IB/qib: Replace deprecated strncpy `strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We know `txselect_list` is expected to be NUL-terminated based on its use in `param_get_string()`: \| int param_get_string(char buffer, const struct kernel_param kp) \| { \| const struct kparam_string *kps = kp->str; \| return scnprintf(buffer, PAGE_SIZE, "%s\n", kps->string); \| } Note that `txselect_list` is assigned to `kp_txselect`'s string field: \| static struct kparam_string kp_txselect = { \| .string = txselect_list, \| .maxlen = MAX_ATTEN_LEN \| }; Wherein it is then assigned the set and get methods: \| module_param_call(txselect, setup_txselect, param_get_string, \| &kp_txselect, S_IWUSR \| S_IRUGO); Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-qib-qib_iba7322-c-v1-1-373727763f5b@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-22 13:29:19 +03:00
Justin Stitt	c2d0c5b28a	IB/hfi1: Replace deprecated strncpy `strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We see that `buf` is expected to be NUL-terminated based on it's use within a trace event wherein `is_misc_err_name` and `is_various_name` map to `is_name` through `is_table`: \| TRACE_EVENT(hfi1_interrupt, \| TP_PROTO(struct hfi1_devdata dd, const struct is_table is_entry, \| int src), \| TP_ARGS(dd, is_entry, src), \| TP_STRUCT__entry(DD_DEV_ENTRY(dd) \| __array(char, buf, 64) \| __field(int, src) \| ), \| TP_fast_assign(DD_DEV_ASSIGN(dd); \| is_entry->is_name(__entry->buf, 64, \| src - is_entry->start); \| __entry->src = src; \| ), \| TP_printk("[%s] source: %s [%d]", __get_str(dev), __entry->buf, \| __entry->src) \| ); Considering the above, a suitable replacement is `strscpy_pad` due to the fact that it guarantees NUL-termination on the destination buffer while maintaining the NUL-padding behavior that strncpy provides. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-hfi1-chip-c-v1-1-37afcf4964d9@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-22 13:28:54 +03:00
Justin Stitt	f0cc82ca11	RDMA/irdma: Replace deprecated strncpy `strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. A suitable replacement is `strscpy_pad` due to the fact that it guarantees NUL-termination on the destination buffer. It is unclear to me whether `i40iw_client.name` requires NUL-padding but have opted to keep the NUL-padding behavior that strncpy provides to ensure no functional change. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20230921-strncpy-drivers-infiniband-hw-irdma-i40iw_if-c-v1-1-22d87aef7186@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-22 13:27:27 +03:00
Selvin Xavier	a83c692789	RDMA/bnxt_re: Decrement resource stats correctly rc_qp_count and ud_qp_count is not decremented during qp destroy. Fix this. Fixes: `cb95709e0d` ("bnxt_re: Update the hw counters for resource stats") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1695199280-13520-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-21 11:56:23 +03:00
Selvin Xavier	9fc5f9a92f	RDMA/bnxt_re: Fix the handling of control path response data Flag that indicate control path command completion should be cleared only after copying the command response data. As soon as the is_in_used flag is clear, the waiting thread can proceed with wrong response data. This wrong data is causing multiple issues like wrong lkey used in data traffic and wrong AH Id etc. Use a memory barrier to ensure that the response data is copied and visible to the process waiting on a different cpu core before clearing the is_in_used flag. Clear the is_in_used after copying the command response. Fixes: `bcfee4ce3e` ("RDMA/bnxt_re: remove redundant cmdq_bitmap") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1695199280-13520-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-21 11:56:23 +03:00
wenglianfa	58c49c097f	RDMA/hns: Support SRQ restrack ops for hns driver The SRQ restrack attributes come from the context maintained by ROCEE. Example: $ rdma res show srq -jp -dd [ { "ifindex": 0, "ifname": "hns_0", "srqn": 0, "type": "BASIC", "lqpn": [ "14-15","22-23" ], "pdn": 2, "pid": 1224, "comm": "ib_send_bw",{ "drv_srqn": 0, "drv_wqe_cnt": 512, "drv_max_gs": 2, "drv_xrcdn": 0 } } ] $ rdma res show srq link hns_0 -jpr [ { "ifindex": 0, "ifname": "hns_0", "data": [ 149,0,0,0,0,0,0,0,0,0,0,0,119,101,120,99,0, 46,62,31,0,0,0,0,3,0,0,1,0,58,62,31,0,0,0,0, 30,159,15,0,0,0,64,5,0,0,0,0,0,0,0,0,0,0,0, 9,0,0,0,0,0,0,0,0 ] } ] Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-20 10:51:45 +03:00
wenglianfa	aebf8145e1	RDMA/core: Add support to dump SRQ resource in RAW format Add support to dump SRQ resource in raw format. It enable drivers to return the entire device specific SRQ context without setting each field separately. Example: $ rdma res show srq -r dev hns3 149000... $ rdma res show srq -j -r [{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}] Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-20 10:50:54 +03:00
wenglianfa	0e32d7d43b	RDMA/core: Add dedicated SRQ resource tracker function Add a dedicated callback function for SRQ resource tracker. Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-20 10:50:54 +03:00
Ilpo Järvinen	8bf7187d97	RDMA/hfi1: Use FIELD_GET() to extract Link Width Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of custom masking and shifting, and remove extract_width() which only wraps that FIELD_GET(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230919125648.1920-2-ilpo.jarvinen@linux.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-20 10:30:38 +03:00
Zhu Yanjun	c5930a1aa0	RDMA/rtrs: Fix the problem of variable not initialized fully No functionality change. The variable which is not initialized fully will introduce potential risks. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230919020806.534183-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-19 12:30:33 +03:00
Zhu Yanjun	20a02837fb	RDMA/rtrs: Require holding rcu_read_lock explicitly No functional change. The function get_next_path_rr needs to hold rcu_read_lock. As such, if no rcu read lock, warnings will pop out. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230919073727.540207-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-19 11:23:17 +03:00
Gustavo A. R. Silva	81760bedc6	RDMA/core: Use size_{add,sub,mul}() in calls to struct_size() If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` provides against potential integer overflows is defeated. Fix this by hardening calls to `struct_size()` with `size_add()`, `size_sub()` and `size_mul()`. Fixes: `467f432a52` ("RDMA/core: Split port and device counter sysfs attributes") Fixes: `a4676388e2` ("RDMA/core: Simplify how the gid_attrs sysfs is created") Fixes: `e9dd5daf88` ("IB/umad: Refactor code to use cdev_device_add()") Fixes: `324e227ea7` ("RDMA/device: Add ib_device_get_by_netdev()") Fixes: `5aad26a7ea` ("IB/core: Use struct_size() in kzalloc()") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZQdt4NsJFwwOYxUR@work Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-19 10:33:45 +03:00
David Ahern	4ececeb839	IB/hfi1: Remove open coded reference to skb frag offset frag offset should be obtained from skb_frag_off; update the code. Signed-off-by: David Ahern <dsahern@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/20230911215753.12325-1-dsahern@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 14:24:15 +03:00
Leon Romanovsky	18126c7676	RDMA/cma: Fix truncation compilation warning in make_cma_ports The following compilation error is false alarm as RDMA devices don't have such large amount of ports to actually cause to format truncation. drivers/infiniband/core/cma_configfs.c: In function ‘make_cma_ports’: drivers/infiniband/core/cma_configfs.c:223:57: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=] 223 \| snprintf(port_str, sizeof(port_str), "%u", i + 1); \| ^ drivers/infiniband/core/cma_configfs.c:223:17: note: ‘snprintf’ output between 2 and 11 bytes into a destination of size 10 223 \| snprintf(port_str, sizeof(port_str), "%u", i + 1); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[5]: *** [scripts/Makefile.build:243: drivers/infiniband/core/cma_configfs.o] Error 1 Fixes: `045959db65` ("IB/cma: Add configfs for rdma_cm") Link: https://lore.kernel.org/r/a7e3b347ee134167fa6a3787c56ef231a04bc8c2.1694434639.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-09-18 11:52:10 +03:00
Cheng Xu	b2abdffb50	RDMA/erdma: Fix NULL pointer access in regmr_cmd Fix the crash of regmr_cmd called by erdma_ib_alloc_mr. The reason is that mr->mem.mtt is not initialized but it is accessed in regmr_cmd. The call trace information: BUG: kernel NULL pointer dereference, address: 0000000000000000 <...> RIP: 0010:regmr_cmd+0x170/0x1c0 [erdma] <...> Call Trace: ? __die+0x20/0x70 ? page_fault_oops+0x66/0x150 ? do_user_addr_fault+0x61/0x660 ? exc_page_fault+0x65/0x140 ? asm_exc_page_fault+0x22/0x30 ? regmr_cmd+0x170/0x1c0 [erdma] ? preempt_count_add+0x70/0xa0 ? _raw_spin_lock_irqsave+0x19/0x50 ? _raw_spin_unlock_irqrestore+0x1b/0x40 ? erdma_alloc_idx+0x51/0x90 [erdma] erdma_get_dma_mr+0xa3/0x120 [erdma] __ib_alloc_pd+0xeb/0x1c0 [ib_core] Fixes: `7244b4aa42` ("RDMA/erdma: Refactor the storage structure of MTT entries") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/3d140c1d-524a-4dbe-a51c-aee4f7ecafdb@moroto.mountain/ Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230908060559.80203-1-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 10:42:19 +03:00
Dan Carpenter	6b5f0749ce	RDMA/erdma: Fix error code in erdma_create_scatter_mtt() The erdma_create_scatter_mtt() function is supposed to return error pointers. Returning NULL will lead to an Oops. Fixes: `ed10435d35` ("RDMA/erdma: Implement hierarchical MTT") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/1eb400d5-d8a3-4a8e-b3da-c43c6c377f86@moroto.mountain Acked-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-18 09:12:00 +03:00
Konstantin Meskhidze	c489800e0d	RDMA/uverbs: Fix typo of sizeof argument Since size of 'hdr' pointer and '*hdr' structure is equal on 64-bit machines issue probably didn't cause any wrong behavior. But anyway, fixing of typo is required. Fixes: `da0f60df7b` ("RDMA/uverbs: Prohibit write() calls with too small buffers") Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com> Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20230905103258.1738246-1-konstantin.meskhidze@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 15:01:09 +03:00
Artem Chernyshev	8fb8a82086	RDMA/cxgb4: Check skb value for failure to allocate get_skb() can fail to allocate skb, so check it. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `5be78ee924` ("RDMA/cxgb4: Fix LE hash collision bug for active open connection") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Link: https://lore.kernel.org/r/20230905124048.284165-1-artem.chernyshev@red-soft.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:55:53 +03:00
Bernard Metzler	53a3f77704	RDMA/siw: Fix connection failure handling In case immediate MPA request processing fails, the newly created endpoint unlinks the listening endpoint and is ready to be dropped. This special case was not handled correctly by the code handling the later TCP socket close, causing a NULL dereference crash in siw_cm_work_handler() when dereferencing a NULL listener. We now also cancel the useless MPA timeout, if immediate MPA request processing fails. This patch furthermore simplifies MPA processing in general: Scheduling a useless TCP socket read in sk_data_ready() upcall is now surpressed, if the socket is already moved out of TCP_ESTABLISHED state. Fixes: `6c52fdc244` ("rdma/siw: connection management") Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230905145822.446263-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:53:23 +03:00
Bart Van Assche	e193b7955d	RDMA/srp: Do not call scsi_done() from srp_abort() After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler callback, it performs one of the following actions: * Call scsi_queue_insert(). * Call scsi_finish_command(). * Call scsi_eh_scmd_add(). Hence, SCSI abort handlers must not call scsi_done(). Otherwise all the above actions would trigger a use-after-free. Hence remove the scsi_done() call from srp_abort(). Keep the srp_free_req() call before returning SUCCESS because we may not see the command again if SUCCESS is returned. Cc: Bob Pearson <rpearsonhpe@gmail.com> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: `d853667091` ("IB/srp: Avoid having aborted requests hang") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230823205727.505681-1-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:15:58 +03:00
Rohit Chavan	e57b0eef66	RDMA/core: Fix repeated words in comments Corrected the repeated words in the documentation of rdma_replace_ah_attr() and ib_resolve_unicast_gid_dmac() functions. Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230824081304.408-1-roheetchavan@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:14:59 +03:00
Krzysztof Kozlowski	3ec648c631	IB: Use capital "OR" for multiple licenses in SPDX Documentation/process/license-rules.rst and checkpatch expect the SPDX identifier syntax for multiple licenses to use capital "OR". Correct it to keep consistent format and avoid copy-paste issues. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230823092912.122674-1-krzysztof.kozlowski@linaro.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-11 14:14:00 +03:00
Linus Torvalds	b89b029377	SCSI misc on 20230902 Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and the usual minor updates and bug fixes but no significant core changes. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZPLlcCYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXH4AP9Mm8Hk ICCySnXJ/VcxrAU2ekUlPxGT8CNT2WqRhIqKegEAu0jagPyNcTjX+oBCNVOM33zU p5uU6IEHDA3d36qu04w= =8gks -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and the usual minor updates and bug fixes but no significant core changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (116 commits) scsi: storvsc: Handle additional SRB status values scsi: libsas: Delete sas_ata_task.retry_count scsi: libsas: Delete sas_ata_task.stp_affil_pol scsi: libsas: Delete sas_ata_task.set_affil_pol scsi: libsas: Delete sas_ssp_task.task_prio scsi: libsas: Delete sas_ssp_task.enable_first_burst scsi: libsas: Delete sas_ssp_task.retry_count scsi: libsas: Delete struct scsi_core scsi: libsas: Delete enum sas_phy_type scsi: libsas: Delete enum sas_class scsi: libsas: Delete sas_ha_struct.lldd_module scsi: target: Fix write perf due to unneeded throttling scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE scsi: pm8001: Remove unused declarations scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock scsi: elx: sli4: Remove code duplication scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s scsi: qla2xxx: Remove unused declarations scsi: pmcraid: Use pci_dev_id() to simplify the code scsi: pm80xx: Set RETFIS when requested by libsas ...	2023-09-02 12:02:41 -07:00
Linus Torvalds	f7e97ce269	v6.6 merge window RDMA pull request Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZPEqkAAKCRCFwuHvBreF YZrNAPoCBfU+VjCKNr2yqF7s52os5ZdBV7Uuh4txHcXWW9H7GAD/f19i2u62fzNu C27jj4cztemMBb8mgwyxPw/wLg7NLwY= =pC6k -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits) IB/hfi1: Reduce printing of errors during driver shut down RDMA/hfi1: Move user SDMA system memory pinning code to its own file RDMA/hfi1: Use list_for_each_entry() helper RDMA/mlx5: Fix trailing */ formatting in block comment RDMA/rxe: Fix redundant break statement in switch-case. RDMA/efa: Fix wrong resources deallocation order RDMA/siw: Call llist_reverse_order in siw_run_sq RDMA/siw: Correct wrong debug message RDMA/siw: Balance the reference of cep->kref in the error path Revert "IB/isert: Fix incorrect release of isert connection" RDMA/bnxt_re: Fix kernel doc errors RDMA/irdma: Prevent zero-length STAG registration RDMA/erdma: Implement hierarchical MTT RDMA/erdma: Refactor the storage structure of MTT entries RDMA/erdma: Renaming variable names and field names of struct erdma_mem RDMA/hns: Support hns HW stats RDMA/hns: Dump whole QP/CQ/MR resource in raw RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp() RDMA/mlx4: Copy union directly RDMA/irdma: Drop unused kernel push code ...	2023-09-01 16:49:33 -07:00
Linus Torvalds	bd6c11bc43	Networking changes for 6.6. Core ---- - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations. - Store netdevs in an xarray, to simplify iterating over netdevs. - Refactor nexthop selection for multipath routes. - Improve sched class lifetime handling. - Add backup nexthop ID support for bridge. - Implement drop reasons support in openvswitch. - Several data races annotations and fixes. - Constify the sk parameter of routing functions. - Prepend kernel version to netconsole message. Protocols --------- - Implement support for TCP probing the peer being under memory pressure. - Remove hard coded limitation on IPv6 specific info placement inside the socket struct. - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor. - Scaling-up the IPv6 expired route GC via a separated list of expiring routes. - In-kernel support for the TLS alert protocol. - Better support for UDP reuseport with connected sockets. - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size. - Get rid of additional ancillary per MPTCP connection struct socket. - Implement support for BPF-based MPTCP packet schedulers. - Format MPTCP subtests selftests results in TAP. - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation. BPF --- - Multi-buffer support in AF_XDP. - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds. - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it. - Add SO_REUSEPORT support for TC bpf_sk_assign. - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64. - Support defragmenting IPv(4\|6) packets in BPF. - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling. - Introduce bpf map element count and enable it for all program types. - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy. - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress. - Add uprobe support for the bpf_get_func_ip helper. - Check skb ownership against full socket. - Support for up to 12 arguments in BPF trampoline. - Extend link_info for kprobe_multi and perf_event links. Netfilter --------- - Speed-up process exit by aborting ruleset validation if a fatal signal is pending. - Allow NLA_POLICY_MASK to be used with BE16/BE32 types. Driver API ---------- - Page pool optimizations, to improve data locality and cache usage. - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers. - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info. - Extend and use the yaml devlink specs to [re]generate the split ops. - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes. - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool. - Remove phylink legacy mode support. - Support offload LED blinking to phy. - Add devlink port function attributes for IPsec. New hardware / drivers ---------------------- - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers ------- - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq 9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO /QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz 6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1 /daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3 Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n 4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP DCW8VGFOZjZM =9CsR -----END PGP SIGNATURE----- Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4\|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...	2023-08-29 11:33:01 -07:00
Linus Torvalds	615e95831e	v6.6-vfs.ctime -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2 XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0= =eJz5 -----END PGP SIGNATURE----- Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...	2023-08-28 09:31:32 -07:00
Jakub Kicinski	3c5066c6b0	Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== mlx5 MACsec RoCEv2 support From Patrisious: This series extends previously added MACsec offload support to cover RoCE traffic either. In order to achieve that, we need configure MACsec with offload between the two endpoints, like below: REMOTE_MAC=10:70:fd:43:71:c0 * ip addr add 1.1.1.1/16 dev eth2 * ip link set dev eth2 up * ip link add link eth2 macsec0 type macsec encrypt on * ip macsec offload macsec0 mac * ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16 * ip macsec add macsec0 rx port 1 address $REMOTE_MAC * ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5 * ip addr add 10.1.0.1/16 dev macsec0 * ip link set dev macsec0 up And in a similar manner on the other machine, while noting the keys order would be reversed and the MAC address of the other machine. RDMA traffic is separated through relevant GID entries and in case of IP ambiguity issue - meaning we have a physical GIDs and a MACsec GIDs with the same IP/GID, we disable our physical GID in order to force the user to only use the MACsec GID. v0: https://lore.kernel.org/netdev/20230813064703.574082-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion net/mlx5: Add RoCE MACsec steering infrastructure in core net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic net/mlx5: Configure MACsec steering for egress RoCEv2 traffic IB/core: Reorder GID delete code for RoCE net/mlx5: Add MACsec priorities in RDMA namespaces RDMA/mlx5: Implement MACsec gid addition and deletion net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering net/mlx5: Remove netdevice from MACsec steering net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style net/mlx5: Remove dependency of macsec flow steering on ethernet net/mlx5e: Move MACsec flow steering operations to be used as core library macsec: add functions to get macsec real netdevice and check offload ==================== Link: https://lore.kernel.org/r/20230821073833.59042-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-24 11:32:18 -07:00
Petr Pavlu	7d22b1cb9d	mlx4: Connect the infiniband part to the auxiliary bus Use the auxiliary bus to perform device management of the infiniband part of the mlx4 driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:28 +01:00
Petr Pavlu	73d68002a0	mlx4: Replace the mlx4_interface.event callback with a notifier Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:27 +01:00
Petr Pavlu	7ba189ac52	mlx4: Use 'void ' as the event param of mlx4_dispatch_event() Function mlx4_dispatch_event() takes an 'unsigned long' as its event parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR), a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit integer (remaining events). In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device, the mlx4_interface.event callback is replaced with a notifier and function mlx4_dispatch_event() gets updated to invoke atomic_notifier_call_chain(). This requires forwarding the input 'param' value from the former function to the latter. A problem is that the notifier call takes 'void ' as its 'param' value, compared to 'unsigned long' used by mlx4_dispatch_event(). Re-passing the value would need either punning it to 'void ' or passing down the address of the input 'param'. Both approaches create a number of unnecessary casts. Change instead the input 'param' of mlx4_dispatch_event() from 'unsigned long' to 'void '. A mlx4_eqe pointer can be passed directly, callers using an int value are adjusted to pass its address. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:27 +01:00
Petr Pavlu	71ab55a9af	mlx4: Get rid of the mlx4_interface.get_dev callback Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev() and the associated mlx4_interface.get_dev callbacks. This is done in preparation to use an auxiliary bus to model the mlx4 driver structure. The change is motivated by the following situation: * The mlx4_en interface is being initialized by mlx4_en_add() and mlx4_en_activate(). * The latter activate function calls mlx4_en_init_netdev() -> register_netdev() to register a new net_device. * A netdev event NETDEV_REGISTER is raised for the device. * The netdev notififier mlx4_ib_netdev_event() is called and it invokes mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() -> mlx4_en_get_netdev() [via mlx4_interface.get_dev]. This chain creates a problem when mlx4_en gets switched to be an auxiliary driver. It contains two device calls which would both need to take a respective device lock. Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer call mlx4_get_protocol_dev() but instead to utilize the information passed in net_device.parent and net_device.dev_port. This data is sufficient to determine that an updated port is one that the mlx4_ib driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to date. Following that, update mlx4_ib_get_netdev() to also not call mlx4_get_protocol_dev() and instead scan all current netdevs to find find a matching one. Note that mlx4_ib_get_netdev() is called early from ib_register_device() and cannot use data tracked in mlx4_ib_dev.iboe.netdevs which is not at that point yet set. Finally, remove function mlx4_get_protocol_dev() and the mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they became unused. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-23 08:25:27 +01:00
Douglas Miller	f5acc36b07	IB/hfi1: Reduce printing of errors during driver shut down The driver prints unnecessary prints for error conditions on shutdown remove them to quiet it down. Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/169271327832.1855761.3756156924805531643.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:31:45 +03:00
Brendan Cunningham	d2c0234634	RDMA/hfi1: Move user SDMA system memory pinning code to its own file Move user SDMA system memory page-pinning code from user_sdma.c to pin_system.c. Put declarations for non-static functions in pinning.h. System memory pinning is necessary for processing user SDMA requests but actual steps are invisible to user SDMA request-processing code. Moving system memory pinning code for user SDMA to its own file makes this distinction apparent. These changes have no effect on userspace. Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/169271327311.1855761.4736714053318724062.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:31:45 +03:00
Jinjie Ruan	3d91dfe72a	RDMA/hfi1: Use list_for_each_entry() helper Convert list_for_each() to list_for_each_entry() so that the pos list_head pointer and list_entry() call are no longer needed, which can reduce a few lines of code. No functional changed. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230822033539.3692453-1-ruanjinjie@huawei.com Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:28:27 +03:00
Rohit Chavan	d3c2245754	RDMA/mlx5: Fix trailing / formatting in block comment Resolved a formatting issue where the trailing / in a block comment was placed on a same line instead of separate line. Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230822120451.8215-1-roheetchavan@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:27:16 +03:00
Rohit Chavan	6812e06999	RDMA/rxe: Fix redundant break statement in switch-case. Removed unreachable break statement after return. Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230822091304.7312-1-roheetchavan@gmail.com Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:23:00 +03:00
Yonatan Nachum	dc202c57e9	RDMA/efa: Fix wrong resources deallocation order When trying to destroy QP or CQ, we first decrease the refcount and potentially free memory regions allocated for the object and then request the device to destroy the object. If the device fails, the object isn't fully destroyed so the user/IB core can try to destroy the object again which will lead to underflow when trying to decrease an already zeroed refcount. Deallocate resources in reverse order of allocating them to safely free them. Fixes: `ff6629f88c` ("RDMA/efa: Do not delay freeing of DMA pages") Reviewed-by: Michael Margolin <mrgolin@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://lore.kernel.org/r/20230822082725.31719-1-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:21:53 +03:00
Guoqing Jiang	9dfccb6d0d	RDMA/siw: Call llist_reverse_order in siw_run_sq We can call the function to get fifo list. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230821133255.31111-4-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:05:12 +03:00
Guoqing Jiang	bee024d204	RDMA/siw: Correct wrong debug message We need to print num_sle first then pbl->max_buf per the condition. Also replace mem->pbl with pbl while at it. Fixes: `303ae1cdfd` ("rdma/siw: application interface") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230821133255.31111-3-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:05:12 +03:00
Guoqing Jiang	b056327bee	RDMA/siw: Balance the reference of cep->kref in the error path The siw_connect can go to err in below after cep is allocated successfully: 1. If siw_cm_alloc_work returns failure. In this case socket is not assoicated with cep so siw_cep_put can't be called by siw_socket_disassoc. We need to call siw_cep_put twice since cep->kref is increased once after it was initialized. 2. If siw_cm_queue_work can't find a work, which means siw_cep_get is not called in siw_cm_queue_work, so cep->kref is increased twice by siw_cep_get and when associate socket with cep after it was initialized. So we need to call siw_cep_put three times (one in siw_socket_disassoc). 3. siw_send_mpareqrep returns error, this scenario is similar as 2. So we need to remove one siw_cep_put in the error path. Fixes: `6c52fdc244` ("rdma/siw: connection management") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230821133255.31111-2-guoqing.jiang@linux.dev Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-22 17:05:12 +03:00
Leon Romanovsky	dfe261107c	Revert "IB/isert: Fix incorrect release of isert connection" Commit: `699826f4e3` ("IB/isert: Fix incorrect release of isert connection") is causing problems on OPA when DEVICE_REMOVAL is happening. ------------[ cut here ]------------ WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359 ib_cq_pool_cleanup+0xac/0xb0 [ib_core] Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015 RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core] Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83 c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206 RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640 RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18 R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38 FS: 00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0 Call Trace: <TASK> ? __warn+0x80/0x130 ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core] ? report_bug+0x195/0x1a0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core] disable_device+0x9d/0x160 [ib_core] __ib_unregister_device+0x42/0xb0 [ib_core] ib_unregister_device+0x22/0x30 [ib_core] rvt_unregister_device+0x20/0x90 [rdmavt] hfi1_unregister_ib_device+0x16/0xf0 [hfi1] remove_one+0x55/0x1a0 [hfi1] pci_device_remove+0x36/0xa0 device_release_driver_internal+0x193/0x200 driver_detach+0x44/0x90 bus_remove_driver+0x69/0xf0 pci_unregister_driver+0x2a/0xb0 hfi1_mod_cleanup+0xc/0x3c [hfi1] __do_sys_delete_module.constprop.0+0x17a/0x2f0 ? exit_to_user_mode_prepare+0xc4/0xd0 ? syscall_trace_enter.constprop.0+0x126/0x1a0 do_syscall_64+0x5c/0x90 ? syscall_exit_to_user_mode+0x12/0x30 ? do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x12/0x30 ? do_syscall_64+0x69/0x90 ? exc_page_fault+0x65/0x150 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff1e643f5ab Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8 RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000 R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8 R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8 </TASK> ---[ end trace 0000000000000000 ]--- And... restrack: ------------[ cut here ]------------ infiniband hfi1_0: BUG: RESTRACK detected leak of resources restrack: Kernel PD object allocated by ib_isert is not freed restrack: Kernel CQ object allocated by ib_core is not freed restrack: Kernel QP object allocated by rdma_cm is not freed restrack: ------------[ cut here ]------------ Fixes: `699826f4e3` ("IB/isert: Fix incorrect release of isert connection") Reported-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@cornelisnetworks.com Link: https://lore.kernel.org/r/a27982d3235005c58f6d321f3fad5eb6e1beaf9e.1692604607.git.leonro@nvidia.com Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-08-22 17:01:04 +03:00
Leon Romanovsky	c6c0052df2	RDMA/bnxt_re: Fix kernel doc errors Fix set of the following errors due to use of wrong kernel doc format to describe function parameters: drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:68: warning: Function parameter or member 'rcfw' not described in '__wait_for_resp' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308180600.oOnkIAQV-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308180401.iaj2ktTc-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308180214.Lt9NAhbM-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308180055.6zM4AK6V-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202308172136.ipx1wvs6-lkp@intel.com/ Link: https://lore.kernel.org/r/4b22c385f1b68590ace8f82f2985d14b20999432.1692539554.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-08-21 10:40:38 +03:00
Patrisious Haddad	58dbd6428a	RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion Add RoCE MACsec rules when a gid is added for the MACsec netdevice and handle their cleanup when the gid is removed or the MACsec SA is deleted. Also support alias IP for the MACsec device, as long as we don't have more ips than what the gid table can hold. In addition handle the case where a gid is added but there are still no SAs added for the MACsec device, so the rules are added later on when the SAs are added. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-08-20 12:35:24 +03:00
Patrisious Haddad	a019b1258d	IB/core: Reorder GID delete code for RoCE Reorder GID delete code so that the driver del_gid operation is executed before nullifying the gid attribute ndev parameter, this allows drivers to access the ndev during their gid delete operation, which makes more sense since they had access to it during the gid addition operation. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-08-20 12:35:24 +03:00
Patrisious Haddad	758ce14aee	RDMA/mlx5: Implement MACsec gid addition and deletion Handle MACsec IP ambiguity issue, since mlx5 hw can't support programming both the MACsec and the physical gid when they have the same IP address, because it wouldn't know to whom to steer the traffic. Hence in such case we delete the physical gid from the hw gid table, which would then cause all traffic sent over it to fail, and we'll only be able to send traffic over the MACsec gid. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-08-20 12:35:24 +03:00
Christopher Bednarz	bb6d73d9ad	RDMA/irdma: Prevent zero-length STAG registration Currently irdma allows zero-length STAGs to be programmed in HW during the kernel mode fast register flow. Zero-length MR or STAG registration disable HW memory length checks. Improve gaps in bounds checking in irdma by preventing zero-length STAG or MR registrations except if the IB_PD_UNSAFE_GLOBAL_RKEY is set. This addresses the disclosure CVE-2023-25775. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230818144838.1758-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-19 19:28:17 +03:00
Cheng Xu	ed10435d35	RDMA/erdma: Implement hierarchical MTT Hierarchical MTT allows large MR registration without the need of continuous physical address. This commit adds the support of hierarchical MTT support for erdma. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817102151.75964-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-19 14:41:01 +03:00
Cheng Xu	7244b4aa42	RDMA/erdma: Refactor the storage structure of MTT entries Currently our MTT only support inline mtt entries (0 level MTT) and indirect MTT entries (1 level mtt), which will limit the maximum length of MRs. In order to implement a multi-level MTT, we refactor the structure of MTT first. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817102151.75964-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-19 14:40:30 +03:00
Cheng Xu	d7cfbba90b	RDMA/erdma: Renaming variable names and field names of struct erdma_mem Currently, variable names and field names of struct erdma_mem contain 'mtt', which is not accurate. Renaming them with 'xxx_mem' or 'mem'. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230817102151.75964-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-19 14:36:29 +03:00
Chengchang Tang	5a87279591	RDMA/hns: Support hns HW stats Support query hns HW stats for rdma-tool to help debugging. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230816091812.2899366-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-19 14:26:34 +03:00
Chengchang Tang	c4bb187379	RDMA/hns: Dump whole QP/CQ/MR resource in raw Currently, some fields in the QP/CQ/MR resource can be dumped by rdma-tool, but this information is not enough. It is very inconvenient to continue to expand on the current field, and it will also introduce some trouble to parse these raw data. This patch dump whole resource in raw to avoid the above problems. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230816091812.2899366-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-19 14:26:08 +03:00
Jakub Kicinski	7ff57803d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c `fa165e1949` ("sfc: don't unregister flow_indr if it was never registered") `3bf969e88a` ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-18 12:44:56 -07:00
Leon Romanovsky	5f513c8b97	RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp() Fix the following warning reported by kbuild: drivers/infiniband/hw/irdma/verbs.c:584: warning: Function parameter or member 'udata' not described in 'irdma_setup_umode_qp' Fixes: `3a84987204` ("RDMA/irdma: Allow accurate reporting on QP max send/recv WR") Link: https://lore.kernel.org/r/2c9bcd2b773c400a1699bd7973e22bfba1e4b379.1692260011.git.leonro@nvidia.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308171620.m4MNACWz-lkp@intel.com/ Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-08-18 13:48:16 -03:00
Gustavo A. R. Silva	18ddaeb03b	RDMA/mlx4: Copy union directly Copy union directly instead of using memcpy(). Note that in this case, a direct assignment is more readable and consistent with the subsequent assignments. This addresses the following -Wstringop-overflow warning seen in s390 with defconfig: drivers/infiniband/hw/mlx4/main.c:296:33: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=] 296 \| memcpy(&port_gid_table->gids[free].gid, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 297 \| &attr->gid, sizeof(attr->gid)); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This helps with the ongoing efforts to globally enable -Wstringop-overflow. Link: https://github.com/KSPP/linux/issues/308 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZNvimeRAPkJ24zRG@work Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-16 09:28:33 +03:00
Shiraz Saleem	295c95aa7e	RDMA/irdma: Drop unused kernel push code The driver has code blocks for kernel push WQEs but does not map the doorbell page rendering this mode non functional [1] Remove code associated with this feature from the kernel fast path as there is currently no plan of record to support this. This also address a sparse issue reported by lkp. drivers/infiniband/hw/irdma/uk.c:285:24: sparse: sparse: incorrect type in assignment (different base types) @@ expected bool [usertype] push_wqe:1 @@ got restricted __le32 [usertype] push_db @@ drivers/infiniband/hw/irdma/uk.c:285:24: sparse: expected bool [usertype] push_wqe:1 drivers/infiniband/hw/irdma/uk.c:285:24: sparse: got restricted __le32 [usertype] push_db drivers/infiniband/hw/irdma/uk.c:386:24: sparse: sparse: incorrect type in assignment (different base types) @@ expected bool [usertype] push_wqe:1 @@ got restricted __le32 [usertype] *push_db @@ [1] https://lore.kernel.org/linux-rdma/20230815051809.GB22185@unreal/T/#t Fixes: `272bba19d6` ("RDMA: Remove unnecessary ternary operators") Fixes: `551c46edc7` ("RDMA/irdma: Add user/kernel shared libraries") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308110251.BV6BcwUR-lkp@intel.com/ Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230816001209.1721-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-16 09:24:52 +03:00
Saravanan Vajravel	0a30e59f22	RDMA/bnxt_re: Add support for dmabuf pinned memory regions Support the new verb which indicates dmabuf support. bnxt doesn't support ODP. So use the pinned version of the dmabuf APIs to enable bnxt_re devices to work as dmabuf importer. Link: https://lore.kernel.org/r/1690790473-25850-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-08-15 15:54:08 -03:00
Selvin Xavier	213d2b9bb2	RDMA/bnxt_re: Protect the PD table bitmap Syncrhonization is required to avoid simultaneous allocation of the PD. Add a new mutex lock to handle allocation from the PD table. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1692032419-21680-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-15 09:06:06 +03:00
Kashyap Desai	811e0ce9e6	RDMA/bnxt_re: Initialize mutex dbq_lock Fix the missing dbq_lock mutex initialization Fixes: `2ad4e6303a` ("RDMA/bnxt_re: Implement doorbell pacing algorithm") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1692032419-21680-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-15 09:06:06 +03:00
Kalesh AP	ca60fd116c	IB/core: Add more speed parsing in ib_get_width_and_speed() When the Ethernet driver does not provide the number of lanes in the __ethtool_get_link_ksettings() response, the function ib_get_width_and_speed() does not take consideration of 50G, 100G and 200G speeds while calculating the IB width and speed. Update the width and speed for the above netdev speeds. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1690966823-8159-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-13 10:41:57 +03:00
Guoqing Jiang	25944c0681	RDMA/cxgb4: Set sq_sig_type correctly Replace '0' with IB_SIGNAL_REQ_WR given the sq_sig_type is either IB_SIGNAL_ALL_WR or IB_SIGNAL_REQ_WR per the below. enum ib_sig_type { IB_SIGNAL_ALL_WR, IB_SIGNAL_REQ_WR }; Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230731092106.10396-1-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-13 10:26:11 +03:00
Kashyap Desai	64b632654b	RDMA/bnxt_re: Initialize dpi_tbl_lock mutex Fix the missing dpi_tbl_lock mutex initialization. Fixes: `0ac20faf5d` ("RDMA/bnxt_re: Reorg the bar mapping") Link: https://lore.kernel.org/r/1691642677-21369-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-08-10 16:35:54 -03:00
Kalesh AP	5ac8480ae4	RDMA/bnxt_re: Fix error handling in probe failure path During bnxt_re_dev_init(), when bnxt_re_setup_chip_ctx() fails unregister with L2 first before bailing out probe. Fixes: `ae8637e131` ("RDMA/bnxt_re: Add chip context to identify 57500 series") Link: https://lore.kernel.org/r/1691642677-21369-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-08-10 16:35:54 -03:00
Selvin Xavier	5363fc488d	RDMA/bnxt_re: Properly order ib_device_unalloc() to avoid UAF ib_dealloc_device() should be called only after device cleanup. Fix the dealloc sequence. Fixes: `6d758147c7` ("RDMA/bnxt_re: Use auxiliary driver interface") Link: https://lore.kernel.org/r/1691642677-21369-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-08-10 16:35:54 -03:00
Yue Haibing	d952f54d01	RDMA/hns: Remove unused declaration hns_roce_modify_srq() Commit `c7bcb13442` ("RDMA/hns: Add SRQ support for hip08 kernel mode") declared but never implemented this. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230804130418.41728-1-yuehaibing@huawei.com Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-08 21:35:05 +03:00
Ivan Orlov	64917f4c35	RDMA: Make all 'class' structures const Now that the driver core allows for struct class to be in read-only memory, making all 'class' structures to be declared at build time placing them into read-only memory, instead of having to be dynamically allocated at load time. Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Yishai Hadas <yishaih@nvidia.com> Cc: Ivan Orlov <ivan.orlov0322@gmail.com> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/2023080427-commuting-crewless-cbee@gregkh Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-08 13:25:15 +03:00
Maher Sanalla	f14c1a14e6	net/mlx5: Allocate completion EQs dynamically This commit enables the dynamic allocation of EQs at runtime, allowing for more flexibility in managing completion EQs and reducing the memory overhead of driver load. Whenever a CQ is created for a given vector index, the driver will lookup to see if there is an already mapped completion EQ for that vector, if so, utilize it. Otherwise, allocate a new EQ on demand and then utilize it for the CQ completion events. Add a protection lock to the EQ table to protect from concurrent EQ creation attempts. While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an EQ on demand if no EQ is found for the given vector. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-08-07 10:53:52 -07:00
Maher Sanalla	674dd4e2e0	net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max() To accurately represent its purpose, rename the function that retrieves the value of maximum vectors from mlx5_comp_vectors_count() to mlx5_comp_vectors_max(). Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2023-08-07 10:53:51 -07:00
Ruan Jinjie	849b1955ad	RDMA: Remove unnecessary NULL values The NULL initialization of the pointers assigned by kzalloc() first is not necessary, because if the kzalloc() failed, the pointers will be assigned NULL, otherwise it works as usual. so remove it. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230804082102.3361961-1-ruanjinjie@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:56:57 +03:00
Xiang Yang	26b7d1a271	IB/uverbs: Fix an potential error pointer dereference smatch reports the warning below: drivers/infiniband/core/uverbs_std_types_counters.c:110 ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ() error: 'uattr' dereferencing possible ERR_PTR() The return value of uattr maybe ERR_PTR(-ENOENT), fix this by checking the value of uattr before using it. Fixes: `ebb6796bd3` ("IB/uverbs: Add read counters support") Signed-off-by: Xiang Yang <xiangyang3@huawei.com> Link: https://lore.kernel.org/r/20230804022525.1916766-1-xiangyang3@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:49:59 +03:00
Chengchang Tang	9e03dbea2b	RDMA/hns: Fix CQ and QP cache affinity Currently, the affinity between QP cache and CQ cache is not considered when assigning QPN, it will affect the message rate of HW. Allocate QPN from QP cache with better CQ affinity to get better performance. Fixes: `71586dd200` ("RDMA/hns: Create QP with selected QPN for bank load balance") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230804012711.808069-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:46:58 +03:00
Junxian Huang	c9c0bd3c17	RDMA/hns: Fix inaccurate error label name in init instance This patch fixes inaccurate error label name in init instance. Fixes: `70f9252158` ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230804012711.808069-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:46:58 +03:00
Junxian Huang	706efac447	RDMA/hns: Fix incorrect post-send with direct wqe of wr-list Currently, direct wqe is not supported for wr-list. RoCE driver excludes direct wqe for wr-list by judging whether the number of wr is 1. For a wr-list where the second wr is a length-error atomic wr, the post-send driver handles the first wr and adds 1 to the wr number counter firstly. While handling the second wr, the driver finds out a length error and terminates the wr handle process, remaining the counter at 1. This causes the driver mistakenly judges there is only 1 wr and thus enters the direct wqe process, carrying the current length-error atomic wqe. This patch fixes the error by adding a judgement whether the current wr is a bad wr. If so, use the normal doorbell process but not direct wqe despite the wr number is 1. Fixes: `01584a5edc` ("RDMA/hns: Add support of direct wqe") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230804012711.808069-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:46:58 +03:00
Chengchang Tang	df1bcf90a6	RDMA/hns: Fix port active speed HW supports a variety of different speed, but the current speed is fixed. The real speed should be querried from ethernet. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230804012711.808069-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:46:58 +03:00
Kalesh AP	14611b9b98	RDMA/bnxt_re: Remove unnecessary variable initializations Remove unnecessary variable initializations. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1691052326-32143-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:39:42 +03:00
Kalesh AP	00d0427fd8	RDMA/bnxt_re: Avoid unnecessary memset Avoid memset by initializing the variables during declaration itself. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1691052326-32143-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:39:42 +03:00
Kalesh AP	e59a5cec3f	RDMA/bnxt_re: Cleanup bnxt_re_process_raw_qp_pkt_rx() function - Remove unnecessary memset by initializing the variables during declaration itself. - Arranged variable declarartion in RCT order. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1691052326-32143-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:39:42 +03:00
Selvin Xavier	c9f3e4e1d8	RDMA/bnxt_re: Fix the sideband buffer size handling for FW commands bnxt_qplib_rcfw_alloc_sbuf allocates 24 bytes and it is better to fit on stack variables. This way we can avoid unwanted kmalloc call. Call dma_alloc_coherent directly instead of wrapper bnxt_qplib_rcfw_alloc_sbuf. Also, FW expects the side buffer needs to be aligned to BNXT_QPLIB_CMDQE_UNITS(16B). So align the size to have the extra padding bytes. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1691052326-32143-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:39:42 +03:00
Kalesh AP	fd28c8a8c7	RDMA/bnxt_re: Remove a redundant flag After the cited commit, BNXT_RE_FLAG_GOT_MSIX is redundant. Remove it. Fixes: `3034322113` ("bnxt_en: Remove runtime interrupt vector allocation") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1691052326-32143-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:39:42 +03:00
Kalesh AP	f19fba1f79	RDMA/bnxt_re: Fix max_qp count for virtual functions Driver has not accounted QP1 for virtual functions when fetching device attributes and hence max_qp count is one less than active_qp count. Fixed driver so that it counts QP1 for virtual functions as well while fetching device attributes Fixes: `ccd9d0d3df` ("RDMA/bnxt_re: Enable RoCE on virtual functions") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1691052326-32143-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-07 16:39:42 +03:00
Douglas Miller	4fdfaef71f	IB/hfi1: Fix possible panic during hotplug remove During hotplug remove it is possible that the update counters work might be pending, and may run after memory has been freed. Cancel the update counters work before freeing memory. Fixes: `7724105686` ("IB/hfi1: add driver files") Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/169099756100.3927190.15284930454106475280.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-03 21:13:57 +03:00
Gustavo A. R. Silva	38313c6d2a	RDMA/irdma: Replace one-element array with flexible-array member One-element and zero-length arrays are deprecated. So, replace one-element array in struct irdma_qvlist_info with flexible-array member. A patch for this was sent a while ago[1]. However, it seems that, at the time, the changes were partially folded[2][3], and the actual flexible-array transformation was omitted. This patch fixes that. The only binary difference seen before/after changes is shown below: \| drivers/infiniband/hw/irdma/hw.o \| @@ -868,7 +868,7 @@ \| drivers/infiniband/hw/irdma/hw.c:484 (discriminator 2) \| size += struct_size(iw_qvlist, qv_info, rf->msix_count); \| 55b: imul $0x45c,%rdi,%rdi \|- 562: add $0x10,%rdi \|+ 562: add $0x4,%rdi which is, of course, expected as it reflects the mistake made while folding the patch I've mentioned above. Worth mentioning is the fact that with this change we save 12 bytes of memory, as can be inferred from the diff snapshot above. Notice that: $ pahole -C rdma_qv_info idrivers/infiniband/hw/irdma/hw.o struct irdma_qv_info { u32 v_idx; /* 0 4 / u16 ceq_idx; / 4 2 / u16 aeq_idx; / 6 2 / u8 itr_idx; / 8 1 / / size: 12, cachelines: 1, members: 4 / / padding: 3 / / last cacheline: 12 bytes */ }; Link: https://lore.kernel.org/linux-hardening/20210525230038.GA175516@embeddedor/ [1] Link: https://lore.kernel.org/linux-hardening/bf46b428deef4e9e89b0ea1704b1f0e5@intel.com/ [2] Link: https://lore.kernel.org/linux-rdma/20210520143809.819-1-shiraz.saleem@intel.com/T/#u [3] Fixes: `44d9e52977` ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZMpsQrZadBaJGkt4@work Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-03 21:11:18 +03:00
Yue Haibing	2897f1925b	RDMA/hns: Remove unused function declarations commit `b16f818847` ("RDMA/hns: Refactor eq code for hip06") left behind hns_roce_cleanup_eq_table(). commit `773f841ab1` ("RDMA/hns: Avoid filling sgid index when modifying QP to RTR") leave hns_get_gid_index() unused. Remove both. Link: https://lore.kernel.org/r/20230731135916.32392-1-yuehaibing@huawei.com Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-31 15:27:37 -03:00
Bob Pearson	5d122db2ff	RDMA/rxe: Fix incomplete state save in rxe_requester If a send packet is dropped by the IP layer in rxe_requester() the call to rxe_xmit_packet() can fail with err == -EAGAIN. To recover, the state of the wqe is restored to the state before the packet was sent so it can be resent. However, the routines that save and restore the state miss a significnt part of the variable state in the wqe, the dma struct which is used to process through the sge table. And, the state is not saved before the packet is built which modifies the dma struct. Under heavy stress testing with many QPs on a fast node sending large messages to a slow node dropped packets are observed and the resent packets are corrupted because the dma struct was not restored. This patch fixes this behavior and allows the test cases to succeed. Fixes: `3050b99850` ("IB/rxe: Fix race condition between requester and completer") Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-31 15:24:12 -03:00
Bob Pearson	cc28f35115	RDMA/rxe: Fix rxe_modify_srq This patch corrects an error in rxe_modify_srq where if the caller changes the srq size the actual new value is not returned to the caller since it may be larger than what is requested. Additionally it open codes the subroutine rcv_wqe_size() which adds very little value, and makes some whitespace changes. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230620140142.9452-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-31 15:24:12 -03:00
Bob Pearson	5993b75d0b	RDMA/rxe: Fix unsafe drain work queue code If create_qp does not fully succeed it is possible for qp cleanup code to attempt to drain the send or recv work queues before the queues have been created causing a seg fault. This patch checks to see if the queues exist before attempting to drain them. Link: https://lore.kernel.org/r/20230620135519.9365-3-rpearsonhpe@gmail.com Reported-by: syzbot+2da1965168e7dbcba136@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-rdma/00000000000012d89205fe7cfe00@google.com/raw Fixes: `49dc9c1f0c` ("RDMA/rxe: Cleanup reset state handling in rxe_resp.c") Fixes: `fbdeb828a2` ("RDMA/rxe: Cleanup error state handling in rxe_comp.c") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-31 15:24:12 -03:00
Bob Pearson	e0ba8ff467	RDMA/rxe: Move work queue code to subroutines This patch: - Moves code to initialize a qp send work queue to a subroutine named rxe_init_sq. - Moves code to initialize a qp recv work queue to a subroutine named rxe_init_rq. - Moves initialization of qp request and response packet queues ahead of work queue initialization so that cleanup of a qp if it is not fully completed can successfully attempt to drain the packet queues without a seg fault. - Makes minor whitespace cleanups. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230620135519.9365-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-31 15:24:12 -03:00
Ruan Jinjie	272bba19d6	RDMA: Remove unnecessary ternary operators There are a little ternary operators, the true or false judgment of which is unnecessary in C language semantics. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230731085118.394443-1-ruanjinjie@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-31 15:16:12 +03:00
Shetu Ayalew	f0ff2a2dd0	IB/mlx5: Add HW counter called rx_dct_connect The rx_dct_connect counter shows the number of received connection requests for the associated DCTs. Signed-off-by: Shetu Ayalew <shetu@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/01cd24cd7f591734741309921fdc01fc770d84a8.1690121941.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-07-31 11:40:32 +03:00
Michael Guralnik	186b169cf1	RDMA/umem: Set iova in ODP flow Fixing the ODP registration flow to set the iova correctly. The calculation in ib_umem_num_dma_blocks() function assumes the iova of the umem is set correctly. When iova is not set, the calculation in ib_umem_num_dma_blocks() is equivalent to length/page_size, which is true only when memory is aligned. For unaligned memory, iova must be set for the ALIGN() in the ib_umem_num_dma_blocks() to take effect and return a correct value. mlx5_ib uses ib_umem_num_dma_blocks() to decide the mkey size to use for the MR. Without this fix, when registering unaligned ODP MR, a wrong size mkey might be chosen and this might cause the UMR to fail. UMR would fail over insufficient size to update the mkey translation: infiniband mlx5_0: dump_cqe:273:(pid 0): dump error cqe 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 0f 00 78 06 25 00 00 58 00 da ac d2 infiniband mlx5_0: mlx5_ib_post_send_wait:806:(pid 20311): reg umr failed (6) infiniband mlx5_0: pagefault_real_mr:661:(pid 20311): Failed to update mkey page tables Fixes: `f0093fb1a7` ("RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr") Fixes: `a665aca89a` ("RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()") Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/3d4be7ca2155bf239dd8c00a2d25974a92c26ab8.1689757344.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-31 11:38:38 +03:00
Ruan Jinjie	50f338cd88	RDMA/mthca: Remove unnecessary NULL assignments There are many pointers assigned first, which need not to be initialized, so remove the NULL assignments. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230731065543.2285928-1-ruanjinjie@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-31 10:59:06 +03:00
Yang Li	d43ea9c3d5	RDMA/irdma: Fix one kernel-doc comment Remove description of @free_hwcqp in irdma_destroy_cqp(). to silence the warning: drivers/infiniband/hw/irdma/hw.c:580: warning: Excess function parameter 'free_hwcqp' description in 'irdma_destroy_cqp' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6028 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20230731015915.34867-1-yang.lee@linux.alibaba.com Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-31 10:38:53 +03:00
Bernard Metzler	91f36237b4	RDMA/siw: Fix tx thread initialization. Immediately removing the siw module after insertion may crash in siw_stop_tx_thread(), if the according thread did not yet had a chance to initialize its wait queue and siw_stop_tx_thread() tries to wakeup that thread. Initializing the threads state before spwaning it fixes it. Reported-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230728114418.124328-1-bmt@zurich.ibm.com Tested-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-31 10:05:23 +03:00
Ruan Jinjie	a45e5f1859	RDMA/mlx: Remove unnecessary variable initializations Remove unnecessary variable initializations. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20230728065139.3411703-1-ruanjinjie@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-31 10:05:23 +03:00
Sindhu Devale	72d422c246	RDMA/irdma: Use HW specific minimum WQ size HW GEN1 and GEN2 have different min WQ sizes but they are currently set to the same value. Use a gen specific attribute min_hw_wq_size and extend ABI to pass it to user-space. Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155525.1081-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 15:43:00 +03:00
Sindhu Devale	3a84987204	RDMA/irdma: Allow accurate reporting on QP max send/recv WR Currently the attribute cap.max_send_wr and cap.max_recv_wr sent from user-space during create QP are the provider computed SQ/RQ depth as opposed to raw values passed from application. This inhibits computation of an accurate value for max_send_wr and max_recv_wr for this QP in the kernel which matches the value returned in user create QP. Also these capabilities needs to be reported from the driver in query QP. Add support by extending the ABI to allow the raw cap.max_send_wr and cap.max_recv_wr to be passed from user-space, while keeping compatibility for the older scheme. The internal HW depth and shift needed for the WQs needs to be computed now for both kernel and user-mode QPs. Add new helpers to assist with this: irdma_uk_calc_depth_shift_sq, irdma_uk_calc_depth_shift_rq and irdma_uk_calc_depth_shift_wq. Consolidate all the user mode QP setup into a new function irdma_setup_umode_qp which keeps it with its counterpart irdma_setup_kmode_qp. Signed-off-by: Youvaraj Sagar <youvaraj.sagar@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155525.1081-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 15:43:00 +03:00
Haoyue Xu	cb06b6b3f6	RDMA/core: Get IB width and speed from netdev Previously, there was no way to query the number of lanes for a network card, so the same netdev_speed would result in a fixed pair of width and speed. As network card specifications become more diverse, such fixed mode is no longer suitable, so a method is needed to obtain the correct width and speed based on the number of lanes. This patch retrieves netdev lanes and speed from net_device and translates them to IB width and speed. Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230721092052.2090449-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 14:55:37 +03:00
Chandramohan Akula	8b6573ff34	bnxt_re: Update the debug counters for doorbell pacing Add debug counters to track the Doorbell pacing events and report the doorbell pacing debug stats. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1690383081-15033-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 14:37:04 +03:00
Chandramohan Akula	4405baf85a	bnxt_re: Expose the missing hw counters Add code to expose some of the HW counters related to tx/rx data and Congestion control. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1690383081-15033-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 14:37:04 +03:00
Chandramohan Akula	cb95709e0d	bnxt_re: Update the hw counters for resource stats Report the additional resource counters which enables better debugging. Includes active RC/UD QPs, Watermark of the resources and a count that indicates the resize cq operations after driver load. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1690383081-15033-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 14:37:04 +03:00
Chandramohan Akula	063975feed	bnxt_re: Reorganize the resource stats Move the resource stats to a separate stats structure. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1690383081-15033-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-30 14:37:04 +03:00
Mustafa Ismail	693e1cdebb	RDMA/irdma: Cleanup and rename irdma_netdev_vlan_ipv6() The return value from irdma_netdev_vlan_ipv6() is not used. Rename the functions and change to a void return. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155505.1069-5-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-26 15:05:36 +03:00
Krzysztof Czurylo	e49bad785e	RDMA/irdma: Add table based lookup for CQ pointer during an event Add a CQ table based loookup to allow quick search for CQ pointer having CQ ID in case of CQ related asynchrononous event. The table is implemented in a similar fashion to QP table. Also add a reference counters for CQ. This is to prevent destroying CQ while an asynchronous event is being processed. The memory resource table size is sized higher with this update, and this table doesn't need to be physically contiguous, so use a vzalloc vs kzalloc to allocate the table. Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155505.1069-4-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-26 15:05:36 +03:00
Sindhu Devale	133b1cba46	RDMA/irdma: Refactor error handling in create CQP In case of a failure in irdma_create_cqp, do not call irdma_destroy_cqp, but cleanup all the allocated resources in reverse order. Drop the extra argument in irdma_destroy_cqp as its no longer needed. Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155505.1069-3-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-26 15:05:36 +03:00
Sindhu Devale	8cfc99dada	RDMA/irdma: Drop a local in irdma_sc_get_next_aeqe Drop the local wqe_idx in irdma_sc_get_next_aeqe and instead store the wqe_idx in the info structure for all asynchronous events(AE) received. There is no reason it should be tied to a specific AE source. Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155505.1069-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-26 15:05:36 +03:00
Sindhu Devale	ae463563b7	RDMA/irdma: Report correct WC error Report the correct WC error if a MW bind is performed on an already valid/bound window. Fixes: `44d9e52977` ("RDMA/irdma: Implement device initialization definitions") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155439.1057-2-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-26 14:58:42 +03:00
Sindhu Devale	3bfb25fa2b	RDMA/irdma: Fix op_type reporting in CQEs The op_type field CQ poll info structure is incorrectly filled in with the queue type as opposed to the op_type received in the CQEs. The wrong opcode could be decoded and returned to the ULP. Copy the op_type field received in the CQE in the CQ poll info structure. Fixes: `24419777e9` ("RDMA/irdma: Fix RQ completion opcode") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230725155439.1057-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-26 14:58:42 +03:00
Bart Van Assche	89e637c19b	scsi: RDMA/srp: Fix residual handling Although the code for residual handling in the SRP initiator follows the SCSI documentation, that documentation has never been correct. Because scsi_finish_command() starts from the data buffer length and subtracts the residual, scsi_set_resid() must not be called if a residual overflow occurs. Hence remove the scsi_set_resid() calls from the SRP initiator if a residual overflow occurrs. Cc: Leon Romanovsky <leon@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Fixes: `9237f04e12` ("scsi: core: Fix scsi_get/set_resid() interface") Fixes: `e714531a34` ("IB/srp: Fix residual handling") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230724200843.3376570-3-bvanassche@acm.org Acked-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-07-25 21:34:35 -04:00
Christophe JAILLET	24b1b5d85c	IB/hfi1: Use struct_size() Use struct_size() instead of hand-writing it, when allocating a structure with a flex array. This is less verbose, more robust and more informative. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/f4618a67d5ae0a30eb3f2b4558c8cc790feed79a.1690044376.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-24 14:43:12 +03:00
Junxian Huang	0b5eed0683	RDMA/hns: Remove VF extend configuration Remove VF extend configuration since the relative registers are configured in firmware currently. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230721025146.450831-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-24 14:42:53 +03:00
Luoyouming	f5a61344ed	RDMA/hns: Support get XRCD number from firmware Support driver get the num of XRCD from firmware. Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20230721025146.450831-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-24 14:42:30 +03:00
Minjie Du	44725a8738	RDMA/qedr: Remove duplicate assignments of va Avoid double assignment of iwqp->ietf_mem.va. Signed-off-by: Minjie Du <duminjie@vivo.com> Link: https://lore.kernel.org/r/20230705031849.2443-1-duminjie@vivo.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-07-23 10:51:08 +03:00

... 9 10 11 12 13 ...

14888 Commits