summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2021-11-02RDMA/irdma: Do not hold qos mutex twice on QP resumeMustafa Ismail
[ Upstream commit 2dace185caa580720c7cd67fec9efc5ee26108ac ] When irdma_ws_add fails, irdma_ws_remove is used to cleanup the leaf node. This lead to holding the qos mutex twice in the QP resume path. Fix this by avoiding the call to irdma_ws_remove and unwinding the error in irdma_ws_add. This skips the call to irdma_tc_in_use function which is not needed in the error unwind cases. Fixes: 3ae331c75128 ("RDMA/irdma: Add QoS definitions") Link: https://lore.kernel.org/r/20211019151654.1943-2-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-02RDMA/irdma: Set VLAN in UD work completion correctlyMustafa Ismail
[ Upstream commit cc07b73ef11d11d4359fb104d0199b22451dd3d8 ] Currently VLAN is reported in UD work completion when VLAN id is zero, i.e. no VLAN case. Report VLAN in UD work completion only when VLAN id is non-zero. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20211019151654.1943-1-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-02RDMA/irdma: Process extended CQ entries correctlyShiraz Saleem
[ Upstream commit e93c7d8e8c4cf80c6afe56e71c83c1cd31b4fce1 ] The valid bit for extended CQE's written by HW is retrieved from the incorrect quad-word. This leads to missed completions for any UD traffic particularly after a wrap-around. Get the valid bit for extended CQE's from the correct quad-word in the descriptor. Fixes: 551c46edc769 ("RDMA/irdma: Add user/kernel shared libraries") Link: https://lore.kernel.org/r/20211005182302.374-1-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-02RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a stringMark Zhang
commit 64733956ebba7cc629856f4a6ee35a52bc9c023f upstream. When copying the device name, the length of the data memcpy copied exceeds the length of the source buffer, which cause the KASAN issue below. Use strscpy_pad() instead. BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] Read of size 64 at addr ffff88811a10f5e0 by task rping/140263 CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1d/0xa0 kasan_report+0xcb/0x110 kasan_check_range+0x13d/0x180 memcpy+0x20/0x60 ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] ib_nl_make_request+0x1c6/0x380 [ib_core] send_mad+0x20a/0x220 [ib_core] ib_sa_path_rec_get+0x3e3/0x800 [ib_core] cma_query_ib_route+0x29b/0x390 [rdma_cm] rdma_resolve_route+0x308/0x3e0 [rdma_cm] ucma_resolve_route+0xe1/0x150 [rdma_ucm] ucma_write+0x17b/0x1f0 [rdma_ucm] vfs_write+0x142/0x4d0 ksys_write+0x133/0x160 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f26499aa90f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003 RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00 R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810 Allocated by task 131419: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 proc_self_get_link+0x8b/0x100 pick_link+0x4f1/0x5c0 step_into+0x2eb/0x3d0 walk_component+0xc8/0x2c0 link_path_walk+0x3b8/0x580 path_openat+0x101/0x230 do_filp_open+0x12e/0x240 do_sys_openat2+0x115/0x280 __x64_sys_openat+0xce/0x140 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02RDMA/mlx5: Initialize the ODP xarray when creating an ODP MRAharon Landau
commit 5508546631a0f555d7088203dec2614e41b5106e upstream. Normally the zero fill would hide the missing initialization, but an errant set to desc_size in reg_create() causes a crash: BUG: unable to handle page fault for address: 0000000800000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 890 Comm: ib_write_bw Not tainted 5.15.0-rc4+ #47 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5_ib_dereg_mr+0x14/0x3b0 [mlx5_ib] Code: 48 63 cd 4c 89 f7 48 89 0c 24 e8 37 30 03 e1 48 8b 0c 24 eb a0 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 30 <48> 8b 2f 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 8b 87 c8 RSP: 0018:ffff88811afa3a60 EFLAGS: 00010286 RAX: 000000000000001c RBX: 0000000800000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000800000000 RBP: 0000000800000000 R08: 0000000000000000 R09: c0000000fffff7ff R10: ffff88811afa38f8 R11: ffff88811afa38f0 R12: ffffffffa02c7ac0 R13: 0000000000000000 R14: ffff88811afa3cd8 R15: ffff88810772fa00 FS: 00007f47b9080740(0000) GS:ffff88852cd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000800000000 CR3: 000000010761e003 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_free_odp_mr+0x95/0xc0 [mlx5_ib] mlx5_ib_dereg_mr+0x128/0x3b0 [mlx5_ib] ib_dereg_mr_user+0x45/0xb0 [ib_core] ? xas_load+0x8/0x80 destroy_hw_idr_uobject+0x1a/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2f/0x150 [ib_uverbs] uobj_destroy+0x3c/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x467/0xb00 [ib_uverbs] ? uverbs_finalize_object+0x60/0x60 [ib_uverbs] ? ttwu_queue_wakelist+0xa9/0xe0 ? pty_write+0x85/0x90 ? file_tty_write.isra.33+0x214/0x330 ? process_echoes+0x60/0x60 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] __x64_sys_ioctl+0x10d/0x8e0 ? vfs_write+0x17f/0x260 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Add the missing xarray initialization and remove the desc_size set. Fixes: a639e66703ee ("RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr") Link: https://lore.kernel.org/r/a4846a11c9de834663e521770da895007f9f0d30.1634642730.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02RDMA/mlx5: Set user priority for DCTPatrisious Haddad
commit 1ab52ac1e9bc9391f592c9fa8340a6e3e9c36286 upstream. Currently, the driver doesn't set the PCP-based priority for DCT, hence DCT response packets are transmitted without user priority. Fix it by setting user provided priority in the eth_prio field in the DCT context, which in turn sets the value in the transmitted packet. Fixes: 776a3906b692 ("IB/mlx5: Add support for DC target QP") Link: https://lore.kernel.org/r/5fd2d94a13f5742d8803c218927322257d53205c.1633512672.git.leonro@nvidia.com Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02IB/hfi1: Fix abba locking issue with sc_disable()Mike Marciniszyn
commit 13bac861952a78664907a0f927d3e874e9a59034 upstream. sc_disable() after having disabled the send context wakes up any waiters by calling hfi1_qp_wakeup() while holding the waitlock for the sc. This is contrary to the model for all other calls to hfi1_qp_wakeup() where the waitlock is dropped and a local is used to drive calls to hfi1_qp_wakeup(). Fix by moving the sc->piowait into a local list and driving the wakeup calls from the list. Fixes: 099a884ba4c0 ("IB/hfi1: Handle wakeup of orphaned QPs for pio") Link: https://lore.kernel.org/r/20211013141852.128104.2682.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fieldsMike Marciniszyn
commit d39bf40e55e666b5905fdbd46a0dced030ce87be upstream. Overflowing either addrlimit or bytes_togo can allow userspace to trigger a buffer overflow of kernel memory. Check for overflows in all the places doing math on user controlled buffers. Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters") Link: https://lore.kernel.org/r/20211012175519.7298.77738.stgit@awfm-01.cornelisnetworks.com Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07RDMA/hns: Add the check of the CQE size of the user spaceWenpeng Liang
[ Upstream commit e671f0ecfece14940a9bb81981098910ea278cf7 ] If the CQE size of the user space is not the size supported by the hardware, the creation of CQ should be stopped. Fixes: 09a5f210f67e ("RDMA/hns: Add support for CQE in size of 64 Bytes") Link: https://lore.kernel.org/r/20210927125557.15031-3-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/hns: Fix the size setting error when copying CQE in clean_cq()Wenpeng Liang
[ Upstream commit cc26aee100588a3f293921342a307b6309ace193 ] The size of CQE is different for different versions of hardware, so the driver needs to specify the size of CQE explicitly. Fixes: 09a5f210f67e ("RDMA/hns: Add support for CQE in size of 64 Bytes") Link: https://lore.kernel.org/r/20210927125557.15031-2-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/hfi1: Fix kernel pointer leakGuo Zhi
[ Upstream commit 7d5cfafe8b4006a75b55c2f1fdfdb363f9a5cc98 ] Pointers should be printed with %p or %px rather than cast to 'unsigned long long' and printed with %llx. Change %llx to %p to print the secured pointer. Fixes: 042a00f93aad ("IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev") Link: https://lore.kernel.org/r/20210922134857.619602-1-qtxuning1999@sjtu.edu.cn Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn> Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/hns: Work around broken constant propagation in gcc 8Jason Gunthorpe
[ Upstream commit 14351f08ed5c8b888cdd95651152db7e096ee27f ] gcc 8.3 and 5.4 throw this: In function 'modify_qp_init_to_rtr', ././include/linux/compiler_types.h:322:38: error: call to '__compiletime_assert_1859' declared with attribute error: FIELD_PREP: value too large for the field _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) [..] drivers/infiniband/hw/hns/hns_roce_common.h:91:52: note: in expansion of macro 'FIELD_PREP' *((__le32 *)ptr + (field_h) / 32) |= cpu_to_le32(FIELD_PREP( \ ^~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write' #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val) ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4412:2: note: in expansion of macro 'hr_reg_write' hr_reg_write(context, QPC_LP_PKTN_INI, lp_pktn_ini); Because gcc has miscalculated the constantness of lp_pktn_ini: mtu = ib_mtu_enum_to_int(ib_mtu); if (WARN_ON(mtu < 0)) [..] lp_pktn_ini = ilog2(MAX_LP_MSG_LEN / mtu); Since mtu is limited to {256,512,1024,2048,4096} lp_pktn_ini is between 4 and 8 which is compatible with the 4 bit field in the FIELD_PREP. Work around this broken compiler by adding a 'can never be true' constraint on lp_pktn_ini's value which clears out the problem. Fixes: f0cb411aad23 ("RDMA/hns: Use new interface to modify QP context") Link: https://lore.kernel.org/r/0-v1-c773ecb137bc+11f-hns_gcc8_jgg@nvidia.com Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/irdma: Report correct WC error when there are MW bind errorsSindhu Devale
[ Upstream commit 9f7fa37a6bd90f2749c67f8524334c387d972eb9 ] Report the correct WC error when MW bind error related asynchronous events are generated by HW. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20210916191222.824-5-shiraz.saleem@intel.com Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/irdma: Report correct WC error when transport retry counter is exceededSindhu Devale
[ Upstream commit d3bdcd59633907ee306057b6bb70f06dce47dddc ] When the retry counter exceeds, as the remote QP didn't send any Ack or Nack an asynchronous event (AE) for too many retries is generated. Add code to handle the AE and set the correct IB WC error code IB_WC_RETRY_EXC_ERR. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20210916191222.824-4-shiraz.saleem@intel.com Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/irdma: Validate number of CQ entries on create CQSindhu Devale
[ Upstream commit f4475f249445b3c1fb99919b0514a075b6d6b3d4 ] Add lower bound check for CQ entries at creation time. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20210916191222.824-3-shiraz.saleem@intel.com Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/irdma: Skip CQP ring during a resetSindhu Devale
[ Upstream commit 5b1e985f7626307c451f98883f5e2665ee208e1c ] Due to duplicate reset flags, CQP commands are processed during reset. This leads CQP failures such as below: irdma0: [Delete Local MAC Entry Cmd Error][op_code=49] status=-27 waiting=1 completion_err=0 maj=0x0 min=0x0 Remove the redundant flag and set the correct reset flag so CPQ is paused during reset Fixes: 8498a30e1b94 ("RDMA/irdma: Register auxiliary driver and implement private channel OPs") Link: https://lore.kernel.org/r/20210916191222.824-2-shiraz.saleem@intel.com Reported-by: LiLiang <liali@redhat.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failureTao Liu
[ Upstream commit ca465e1f1f9b38fe916a36f7d80c5d25f2337c81 ] If cma_listen_on_all() fails it leaves the per-device ID still on the listen_list but the state is not set to RDMA_CM_ADDR_BOUND. When the cmid is eventually destroyed cma_cancel_listens() is not called due to the wrong state, however the per-device IDs are still holding the refcount preventing the ID from being destroyed, thus deadlocking: task:rping state:D stack: 0 pid:19605 ppid: 47036 flags:0x00000084 Call Trace: __schedule+0x29a/0x780 ? free_unref_page_commit+0x9b/0x110 schedule+0x3c/0xa0 schedule_timeout+0x215/0x2b0 ? __flush_work+0x19e/0x1e0 wait_for_completion+0x8d/0xf0 _destroy_id+0x144/0x210 [rdma_cm] ucma_close_id+0x2b/0x40 [rdma_ucm] __destroy_id+0x93/0x2c0 [rdma_ucm] ? __xa_erase+0x4a/0xa0 ucma_destroy_id+0x9a/0x120 [rdma_ucm] ucma_write+0xb8/0x130 [rdma_ucm] vfs_write+0xb4/0x250 ksys_write+0xb5/0xd0 ? syscall_trace_enter.isra.19+0x123/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Ensure that cma_listen_on_all() atomically unwinds its action under the lock during error. Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id") Link: https://lore.kernel.org/r/20210913093344.17230-1-thomas.liu@ucloud.cn Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07IB/cma: Do not send IGMP leaves for sendonly Multicast groupsChristoph Lameter
[ Upstream commit 2cc74e1ee31d00393b6698ec80b322fd26523da4 ] ROCE uses IGMP for Multicast instead of the native Infiniband system where joins are required in order to post messages on the Multicast group. On Ethernet one can send Multicast messages to arbitrary addresses without the need to subscribe to a group. So ROCE correctly does not send IGMP joins during rdma_join_multicast(). F.e. in cma_iboe_join_multicast() we see: if (addr->sa_family == AF_INET) { if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) { ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT; if (!send_only) { err = cma_igmp_send(ndev, &ib.rec.mgid, true); } } } else { So the IGMP join is suppressed as it is unnecessary. However no such check is done in destroy_mc(). And therefore leaving a sendonly multicast group will send an IGMP leave. This means that the following scenario can lead to a multicast receiver unexpectedly being unsubscribed from a MC group: 1. Sender thread does a sendonly join on MC group X. No IGMP join is sent. 2. Receiver thread does a regular join on the same MC Group x. IGMP join is sent and the receiver begins to get messages. 3. Sender thread terminates and destroys MC group X. IGMP leave is sent and the receiver no longer receives data. This patch adds the same logic for sendonly joins to destroy_mc() that is also used in cma_iboe_join_multicast(). Fixes: ab15c95a17b3 ("IB/core: Support for CMA multicast join flags") Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2109081340540.668072@gentwo.de Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requestsJason Gunthorpe
commit 305d568b72f17f674155a2a8275f865f207b3808 upstream. The FSM can run in a circle allowing rdma_resolve_ip() to be called twice on the same id_priv. While this cannot happen without going through the work, it violates the invariant that the same address resolution background request cannot be active twice. CPU 1 CPU 2 rdma_resolve_addr(): RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY rdma_resolve_ip(addr_handler) #1 process_one_req(): for #1 addr_handler(): RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND mutex_unlock(&id_priv->handler_mutex); [.. handler still running ..] rdma_resolve_addr(): RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY rdma_resolve_ip(addr_handler) !! two requests are now on the req_list rdma_destroy_id(): destroy_id_handler_unlock(): _destroy_id(): cma_cancel_operation(): rdma_addr_cancel() // process_one_req() self removes it spin_lock_bh(&lock); cancel_delayed_work(&req->work); if (!list_empty(&req->list)) == true ! rdma_addr_cancel() returns after process_on_req #1 is done kfree(id_priv) process_one_req(): for #2 addr_handler(): mutex_lock(&id_priv->handler_mutex); !! Use after free on id_priv rdma_addr_cancel() expects there to be one req on the list and only cancels the first one. The self-removal behavior of the work only happens after the handler has returned. This yields a situations where the req_list can have two reqs for the same "handle" but rdma_addr_cancel() only cancels the first one. The second req remains active beyond rdma_destroy_id() and will use-after-free id_priv once it inevitably triggers. Fix this by remembering if the id_priv has called rdma_resolve_ip() and always cancel before calling it again. This ensures the req_list never gets more than one item in it and doesn't cost anything in the normal flow that never uses this strange error path. Link: https://lore.kernel.org/r/0-v1-3bc675b8006d+22-syz_cancel_uaf_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager") Reported-by: syzbot+dc3dfba010d7671e05f5@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07RDMA/cma: Do not change route.addr.src_addr.ss_familyJason Gunthorpe
commit bc0bdc5afaa740d782fbf936aaeebd65e5c2921d upstream. If the state is not idle then rdma_bind_addr() will immediately fail and no change to global state should happen. For instance if the state is already RDMA_CM_LISTEN then this will corrupt the src_addr and would cause the test in cma_cancel_operation(): if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev) To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4 family, failing the test. This would manifest as this trace from syzkaller: BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26 Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204 CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __list_add_valid+0x93/0xa0 lib/list_debug.c:26 __list_add include/linux/list.h:67 [inline] list_add_tail include/linux/list.h:100 [inline] cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline] rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751 ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102 ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xa30 fs/read_write.c:603 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Which is indicating that an rdma_id_private was destroyed without doing cma_cancel_listens(). Instead of trying to re-use the src_addr memory to indirectly create an any address build one explicitly on the stack and bind to that as any other normal flow would do. Link: https://lore.kernel.org/r/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear") Reported-by: syzbot+6bb0528b13611047209c@syzkaller.appspotmail.com Tested-by: Hao Sun <sunhao.th@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26RDMA/mlx5: Fix xlt_chunk_align calculationNiklas Schnelle
commit f4c6f31011eafe027abddf6cee1288a1b5a05b73 upstream. The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka sizeof(size_t). The incoming ent_size is either 8 or 16, so the miscalculation when 16 is required is only an over-alignment and functional harmless. Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") Link: https://lore.kernel.org/r/20210908081849.7948-2-schnelle@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26RDMA/hns: Enable stash feature of HIP09Yixing Liu
commit 260f64a40198309008026447f7fda277a73ed8c3 upstream. The stash feature is enabled by default on HIP09. Fixes: f93c39bc9547 ("RDMA/hns: Add support for QP stash") Fixes: bfefae9f108d ("RDMA/hns: Add support for CQ stash") Link: https://lore.kernel.org/r/1629539607-33217-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18RDMA/hns: Fix QP's resp incomplete assignmentWenpeng Liang
[ Upstream commit d2e0ccffcdd7209fc9881c8970d2a7e28dcb43b9 ] The resp passed to the user space represents the enable flag of qp, incomplete assignment will cause some features of the user space to be disabled. Fixes: 90ae0b57e4a5 ("RDMA/hns: Combine enable flags of qp") Fixes: aba457ca890c ("RDMA/hns: Support owner mode doorbell") Link: https://lore.kernel.org/r/1629985056-57004-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Fix query destination qpnWenpeng Liang
[ Upstream commit e788a3cd5787aca74e0ee00202d4dca64b43f043 ] The bit width of dqpn is 24 bits, using u8 will cause truncation error. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1629985056-57004-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Bugfix for incorrect association between dip_idx and dgidJunxian Huang
[ Upstream commit eb653eda1e91dd3e7d1d2448d528d033dbfbe78f ] dip_idx and dgid should be a one-to-one mapping relationship, but when qp_num loops back to the start number, it may happen that two different dgid are assiociated to the same dip_idx incorrectly. One solution is to store the qp_num that is not assigned to dip_idx in an array. When a dip_idx needs to be allocated to a new dgid, an spare qp_num is extracted and assigned to dip_idx. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/1629884592-23424-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com> Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Bugfix for the missing assignment for dip_idxJunxian Huang
[ Upstream commit 074f315fc54a9ce45559a44ca36d9fa1ee1ea2cd ] When the dgid-dip_idx mapping relationship exists, dip should be assigned. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/1629884592-23424-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Bugfix for data type of dip_idxJunxian Huang
[ Upstream commit 4303e61264c45cb535255c5b76400f5c4ab1305d ] dip_idx is associated with qp_num whose data type is u32. However, dip_idx is incorrectly defined as u8 data in the hns_roce_dip struct, which leads to data truncation during value assignment. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/1629884592-23424-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Fix incorrect lsn fieldYixing Liu
[ Upstream commit 9bed8a70716ba65d300b9cc30eb7e0276353f7bf ] In RNR NAK screnario, according to the specification, when no credit is available, only the first fragment of the send request can be sent. The LSN(Limit Sequence Number) field should be 0 or the entire packet will be resent. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1629883169-2306-1-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Ownerbit mode add control fieldLang Cheng
[ Upstream commit f8c549afd1e76ad78b1d044a307783c9b94ae3ab ] The ownerbit mode is for external card mode. Make it controlled by the firmware. Fixes: aba457ca890c ("RDMA/hns: Support owner mode doorbell") Link: https://lore.kernel.org/r/1629539607-33217-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Fix return in hns_roce_rereg_user_mr()YueHaibing
[ Upstream commit c4c7d7a43246a42b0355692c3ed53dff7cbb29bb ] If re-registering an MR in hns_roce_rereg_user_mr(), we should return NULL instead of passing 0 to ERR_PTR for clarity. Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process") Link: https://lore.kernel.org/r/20210804125939.20516-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/mlx5: Delete not-available udata checkLeon Romanovsky
[ Upstream commit 5f6bb7e32283b8e3339b7adc00638234ac199cc4 ] XRC_TGT QPs are created through kernel verbs and don't have udata at all. Fixes: 6eefa839c4dd ("RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata") Fixes: e383085c2425 ("RDMA/mlx5: Set ECE options during QP create") Link: https://lore.kernel.org/r/b68228597e730675020aa5162745390a2d39d3a2.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/efa: Remove double QP type assignmentLeon Romanovsky
[ Upstream commit f9193d266347fe9bed5c173e7a1bf96268142a79 ] The QP type is set by the IB/core and shouldn't be set in the driver. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Link: https://lore.kernel.org/r/838c40134c1590167b888ca06ad51071139ff2ae.1627040189.git.leonro@nvidia.com Acked-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/hns: Don't overwrite supplied QP attributesLeon Romanovsky
[ Upstream commit e66e49592b690d6abd537cc207b07a3db2f413d0 ] QP attributes that were supplied by IB/core already have all parameters set when they are passed to the driver. The drivers are not supposed to change anything in struct ib_qp_init_attr. Fixes: 66d86e529dd5 ("RDMA/hns: Add UD support for HIP09") Link: https://lore.kernel.org/r/5987138875e8ade9aa339d4db6e1bd9694ed4591.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/iwcm: Release resources if iw_cm module initialization failsLeon Romanovsky
[ Upstream commit e677b72a0647249370f2635862bf0241c86f66ad ] The failure during iw_cm module initialization partially left the system with unreleased memory and other resources. Rewrite the module init/exit routines in such way that netlink commands will be opened only after successful initialization. Fixes: b493d91d333e ("iwcm: common code for port mapper") Link: https://lore.kernel.org/r/b01239f99cb1a3e6d2b0694c242d89e6410bcd93.1627048781.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18IB/hfi1: Adjust pkey entry in index 0Mike Marciniszyn
[ Upstream commit 62004871e1fa7f9a60797595c03477af5b5ec36f ] It is possible for the primary IPoIB network device associated with any RDMA device to fail to join certain multicast groups preventing IPv6 neighbor discovery and possibly other network ULPs from working correctly. The IPv4 broadcast group is not affected as the IPoIB network device handles joining that multicast group directly. This is because the primary IPoIB network device uses the pkey at ndex 0 in the associated RDMA device's pkey table. Anytime the pkey value of index 0 changes, the primary IPoIB network device automatically modifies it's broadcast address (i.e. /sys/class/net/[ib0]/broadcast), since the broadcast address includes the pkey value, and then bounces carrier. This includes initial pkey assignment, such as when the pkey at index 0 transitions from the opa default of invalid (0x0000) to some value such as the OPA default pkey for Virtual Fabric 0: 0x8001 or when the fabric manager is restarted with a configuration change causing the pkey at index 0 to change. Many network ULPs are not sensitive to the carrier bounce and are not expecting the broadcast address to change including the linux IPv6 stack. This problem does not affect IPoIB child network devices as their pkey value is constant for all time. To mitigate this issue, change the default pkey in at index 0 to 0x8001 to cover the predominant case and avoid issues as ipoib comes up and the FM sweeps. At some point, ipoib multicast support should automatically fix non-broadcast addresses as it does with the primary broadcast address. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20210715160445.142451.47651.stgit@awfm-01.cornelisnetworks.com Suggested-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/rtrs: Move sq_wr_avail to rtrs_conJack Wang
[ Upstream commit cfcdbd9dd7632a9bb1e308a029f5fa65008333af ] In order to account HB for sq_wr_avail properly, move sq_wr_avail from rtrs_srv_con to rtrs_con. Although rtrs-clt do not care sq_wr_avail, but still init it to max_send_wr. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/20210712060750.16494-7-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/rtrs: Enable the same selective signal for heartbeat and IOJack Wang
[ Upstream commit e2d98504c697f9c8e45b815062f8893b10808d8e ] On idle session, because we do not do signal for heartbeat, it will overflow the send queue after sometime. To avoid that, we need to enable the signal for heartbeat. To do that, add a new member signal_interval in rtrs_path, which will set min of queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat and IO, so the sq queue full accounting is correct. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_conJack Wang
[ Upstream commit a10431eff136ef15e1f9955efe369744b1294db1 ] We need to track also the wr used for heatbeat. This is a preparation for that, will be used in later patch. The io_cnt in rtrs_clt is removed, use wr_cnt instead. Link: https://lore.kernel.org/r/20210712060750.16494-3-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15RDMA/mlx5: Fix number of allocated XLT entriesNiklas Schnelle
commit 9660dcbe0d9186976917c94bce4e69dbd8d7a974 upstream. In commit 8010d74b9965b ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") the allocation logic was split out of mlx5_ib_update_xlt() and the logic was changed to enable better OOM handling. Sadly this change introduced a miscalculation of the number of entries that were actually allocated when under memory pressure where it can actually become 0 which on s390 lets dma_map_single() fail. It can also lead to corruption of the free pages list when the wrong number of entries is used in the calculation of sg->length which is used as argument for free_pages(). Fix this by using the allocation size instead of misusing get_order(size). Cc: stable@vger.kernel.org Fixes: 8010d74b9965 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()") Link: https://lore.kernel.org/r/20210908081849.7948-1-schnelle@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-20RDMA/rxe: Zero out index member of struct rxe_queueXiao Yang
1) New index member of struct rxe_queue was introduced but not zeroed so the initial value of index may be random. 2) The current index is not masked off to index_mask. In this case producer_addr() and consumer_addr() will get an invalid address by the random index and then accessing the invalid address triggers the following panic: "BUG: unable to handle page fault for address: ffff9ae2c07a1414" Fix the issue by using kzalloc() to zero out index member. Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") Link: https://lore.kernel.org/r/20210820111509.172500-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-20RDMA/efa: Free IRQ vectors on error flowGal Pressman
Make sure to free the IRQ vectors in case the allocation doesn't return the expected number of IRQs. Fixes: b7f5e880f377 ("RDMA/efa: Add the efa module") Link: https://lore.kernel.org/r/20210811151131.39138-2-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/rxe: Fix memory allocation while in a spin lockBob Pearson
rxe_mcast_add_grp_elem() in rxe_mcast.c calls rxe_alloc() while holding spinlocks which in turn calls kzalloc(size, GFP_KERNEL) which is incorrect. This patch replaces rxe_alloc() by rxe_alloc_locked() which uses GFP_ATOMIC. This bug was caused by the below mentioned commit and failing to handle the need for the atomic allocate. Fixes: 4276fd0dddc9 ("RDMA/rxe: Remove RXE_POOL_ATOMIC") Link: https://lore.kernel.org/r/20210813210625.4484-1-rpearsonhpe@gmail.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/bnxt_re: Remove unpaired rtnl unlock in bnxt_re_dev_init()Dinghao Liu
The fixed commit removes all rtnl_lock() and rtnl_unlock() calls in function bnxt_re_dev_init(), but forgets to remove a rtnl_unlock() in the error handling path of bnxt_re_register_netdev(), which may cause a deadlock. This bug is suggested by a static analysis tool. Fixes: c2b777a95923 ("RDMA/bnxt_re: Refactor device add/remove functionalities") Link: https://lore.kernel.org/r/20210816085531.12167-1-dinghao.liu@zju.edu.cn Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()Tuo Li
kmalloc_array() is called to allocate memory for tx->descp. If it fails, the function __sdma_txclean() is called: __sdma_txclean(dd, tx); However, in the function __sdma_txclean(), tx-descp is dereferenced if tx->num_desc is not zero: sdma_unmap_desc(dd, &tx->descp[0]); To fix this possible null-pointer dereference, assign the return value of kmalloc_array() to a local variable descp, and then assign it to tx->descp if it is not NULL. Otherwise, go to enomem. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20210806133029.194964-1-islituo@gmail.com Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/irdma: Use correct kconfig symbol for AUXILIARY_BUSLukas Bulwahn
In Kconfig, references to config symbols do not use the prefix "CONFIG_". Commit fa0cf568fd76 ("RDMA/irdma: Add irdma Kconfig/Makefile and remove i40iw") selects config CONFIG_AUXILIARY_BUS in config INFINIBAND_IRDMA, but intended to select config AUXILIARY_BUS. Fixes: fa0cf568fd76 ("RDMA/irdma: Add irdma Kconfig/Makefile and remove i40iw") Link: https://lore.kernel.org/r/20210817084158.10095-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/bnxt_re: Add missing spin lock initializationNaresh Kumar PBS
Add the missing initialization of srq lock. Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Link: https://lore.kernel.org/r/1629343553-5843-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/uverbs: Track dmabuf memory regionsGal Pressman
The dmabuf memory registrations are missing the restrack handling and hence do not appear in rdma tool. Fixes: bfe0cc6eb249 ("RDMA/uverbs: Add uverbs command for dma-buf based MR registration") Link: https://lore.kernel.org/r/20210812135607.6228-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/mlx5: Fix crash when unbind multiport slaveMaor Gottlieb
Fix the below crash when deleting a slave from the unaffiliated list twice. First time when the slave is bound to the master and the second when the slave is unloaded. Fix it by checking if slave is unaffiliated (doesn't have ib device) before removing from the list. RIP: 0010:mlx5r_mp_remove+0x4e/0xa0 [mlx5_ib] Call Trace: auxiliary_bus_remove+0x18/0x30 __device_release_driver+0x177/x220 device_release_driver+0x24/0x30 bus_remove_device+0xd8/0x140 device_del+0x18a/0x3e0 mlx5_rescan_drivers_locked+0xa9/0x210 [mlx5_core] mlx5_unregister_device+0x34/0x60 [mlx5_core] mlx5_uninit_one+0x32/0x100 [mlx5_core] remove_one+0x6e/0xe0 [mlx5_core] pci_device_remove+0x36/0xa0 __device_release_driver+0x177/0x220 device_driver_detach+0x3c/0xa0 unbind_store+0x113/0x130 kernfs_fop_write_iter+0x110/0x1a0 new_sync_write+0x116/0x1a0 vfs_write+0x1ba/0x260 ksys_write+0x5f/0xe0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 93f8244431ad ("RDMA/mlx5: Convert mlx5_ib to use auxiliary bus") Link: https://lore.kernel.org/r/17ec98989b0ba88f7adfbad68eb20bce8d567b44.1628587493.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-12Merge tag 'net-5.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...
2021-08-09net/mlx5: Synchronize correct IRQ when destroying CQShay Drory
The CQ destroy is performed based on the IRQ number that is stored in cq->irqn. That number wasn't set explicitly during CQ creation and as expected some of the API users of mlx5_core_create_cq() forgot to update it. This caused to wrong synchronization call of the wrong IRQ with a number 0 instead of the real one. As a fix, set the IRQ number directly in the mlx5_core_create_cq() and update all users accordingly. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Fixes: ef1659ade359 ("IB/mlx5: Add DEVX support for CQ events") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>