summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-05-22ipv6: remove extra dev_hold() for fallback tunnelsEric Dumazet
commit 0d7a7b2014b1a499a0fe24c9f3063d7856b5aaaf upstream. My previous commits added a dev_hold() in tunnels ndo_init(), but forgot to remove it from special functions setting up fallback tunnels. Fallback tunnels do call their respective ndo_init() This leads to various reports like : unregister_netdevice: waiting for ip6gre0 to become free. Usage count = 2 Fixes: 48bb5697269a ("ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet
commit 48bb5697269a7cbe5194dbb044dc38c517e34c58 upstream. Same reasons than for the previous commits : 6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods") 40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods") 7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods") After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. [1] WARNING: CPU: 1 PID: 21059 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 21059 Comm: syz-executor.4 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc900025aefe8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520004b5def RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888023488568 R13: ffff8880254e9000 R14: 00000000dfd82cfd R15: ffff88802ee2d7c0 FS: 00007f13bc590700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0943e74000 CR3: 0000000025273000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] ip6_tnl_dev_uninit+0x370/0x3d0 net/ipv6/ip6_tunnel.c:387 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 ip6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_tunnel.c:263 ip6_tnl_newlink+0x312/0x580 net/ipv6/ip6_tunnel.c:2052 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22sit: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet
commit 6289a98f0817a4a457750d6345e754838eae9439 upstream. After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22ip6_gre: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet
commit 7f700334be9aeb91d5d86ef9ad2d901b9b453e9b upstream. After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding dev_hold(), and vice versa. - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. ip6_gre for example (among others problematic drivers) has to use dev_hold() in ip6gre_tunnel_init_common() instead of from ip6gre_newlink_common(), covering both ip6gre_tunnel_init() and ip6gre_tap_init()/ Note that ip6gre_tunnel_init_common() is not called from ip6erspan_tap_init() thus we also need to add a dev_hold() there, as ip6erspan_tunnel_uninit() does call dev_put() [1] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 8422 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 8422 Comm: syz-executor854 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc900018befd0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88801ef19c40 RSI: ffffffff815c51f5 RDI: fffff52000317dec RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888018cf4568 R13: ffff888018cf4c00 R14: ffff8880228f2000 R15: ffffffff8d659b80 FS: 00000000014eb300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d7bf2b3138 CR3: 0000000014933000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] ip6gre_tunnel_uninit+0x3d7/0x440 net/ipv6/ip6_gre.c:420 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 ip6gre_newlink_common.constprop.0+0x158/0x410 net/ipv6/ip6_gre.c:1984 ip6gre_newlink+0x275/0x7a0 net/ipv6/ip6_gre.c:2017 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_infoPhillip Potter
[ Upstream commit 2e9f60932a2c19e8a11b4a69d419f107024b05a0 ] Check at start of fill_frame_info that the MAC header in the supplied skb is large enough to fit a struct hsr_ethhdr, as otherwise this is not a valid HSR frame. If it is too small, return an error which will then cause the callers to clean up the skb. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=f7e9b601f1414f814f7602a82b6619a8d80bce3f Reported-by: syzbot+e267bed19bfc5478fb33@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22bridge: Fix possible races between assigning rx_handler_data and setting ↵Zhang Zhengming
IFF_BRIDGE_PORT bit [ Upstream commit 59259ff7a81b9eb6213891c6451221e567f8f22f ] There is a crash in the function br_get_link_af_size_filtered, as the port_exists(dev) is true and the rx_handler_data of dev is NULL. But the rx_handler_data of dev is correct saved in vmcore. The oops looks something like: ... pc : br_get_link_af_size_filtered+0x28/0x1c8 [bridge] ... Call trace: br_get_link_af_size_filtered+0x28/0x1c8 [bridge] if_nlmsg_size+0x180/0x1b0 rtnl_calcit.isra.12+0xf8/0x148 rtnetlink_rcv_msg+0x334/0x370 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x1f0/0x250 netlink_sendmsg+0x310/0x378 sock_sendmsg+0x4c/0x70 __sys_sendto+0x120/0x150 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc In br_add_if(), we found there is no guarantee that assigning rx_handler_data to dev->rx_handler_data will before setting the IFF_BRIDGE_PORT bit of priv_flags. So there is a possible data competition: CPU 0: CPU 1: (RCU read lock) (RTNL lock) rtnl_calcit() br_add_slave() if_nlmsg_size() br_add_if() br_get_link_af_size_filtered() -> netdev_rx_handler_register ... // The order is not guaranteed ... -> dev->priv_flags |= IFF_BRIDGE_PORT; // The IFF_BRIDGE_PORT bit of priv_flags has been set -> if (br_port_exists(dev)) { // The dev->rx_handler_data has NOT been assigned -> p = br_port_get_rcu(dev); .... -> rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); ... Fix it in br_get_link_af_size_filtered, using br_port_get_check_rcu() and checking the return value. Signed-off-by: Zhang Zhengming <zhangzhengming@huawei.com> Reviewed-by: Zhao Lei <zhaolei69@huawei.com> Reviewed-by: Wang Xiaogang <wangxiaogang3@huawei.com> Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19mm: fix struct page layout on 32-bit systemsMatthew Wilcox (Oracle)
commit 9ddb3c14afba8bc5950ed297f02d4ae05ff35cd1 upstream. 32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long. We always store the low bits in the first word to prevent the PageTail bit from being inadvertently set on a big endian platform. If that happened, get_user_pages_fast() racing against a page which was freed and reallocated to the page_pool could dereference a bogus compound_head(), which would be hard to trace back to this cause. Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Matteo Croce <mcroce@linux.microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19mptcp: fix splat when closing unaccepted socketPaolo Abeni
[ Upstream commit 578c18eff1627d6a911f08f4cf351eca41fdcc7d ] If userspace exits before calling accept() on a listener that had at least one new connection ready, we get: Attempt to release TCP socket in state 8 This happens because the mptcp socket gets cloned when the TCP connection is ready, but the socket is never exposed to userspace. The client additionally sends a DATA_FIN, which brings connection into CLOSE_WAIT state. This in turn prevents the orphan+state reset fixup in mptcp_sock_destruct() from doing its job. Fixes: 3721b9b64676b ("mptcp: Track received DATA_FIN sequence number and add related helpers") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/185 Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20210507001638.225468-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: nftables: avoid overflows in nft_hash_buckets()Eric Dumazet
[ Upstream commit a54754ec9891830ba548e2010c889e3c8146e449 ] Number of buckets being stored in 32bit variables, we have to ensure that no overflows occur in nft_hash_buckets() syzbot injected a size == 0x40000000 and reported: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 29539 Comm: syz-executor.4 Not tainted 5.12.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 __roundup_pow_of_two include/linux/log2.h:57 [inline] nft_hash_buckets net/netfilter/nft_set_hash.c:411 [inline] nft_hash_estimate.cold+0x19/0x1e net/netfilter/nft_set_hash.c:652 nft_select_set_ops net/netfilter/nf_tables_api.c:3586 [inline] nf_tables_newset+0xe62/0x3110 net/netfilter/nf_tables_api.c:4322 nfnetlink_rcv_batch+0xa09/0x24b0 net/netfilter/nfnetlink.c:488 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:612 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:630 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: nftables: Fix a memleak from userdata error path in new objectsPablo Neira Ayuso
[ Upstream commit 85dfd816fabfc16e71786eda0a33a7046688b5b0 ] Release object name if userdata allocation fails. Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL checkPablo Neira Ayuso
[ Upstream commit 5e024c325406470d1165a09c6feaf8ec897936be ] Do not assume that the tcph->doff field is correct when parsing for TCP options, skb_header_pointer() might fail to fetch these bits. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19smc: disallow TCP_ULP in smc_setsockopt()Cong Wang
[ Upstream commit 8621436671f3a4bba5db57482e1ee604708bf1eb ] syzbot is able to setup kTLS on an SMC socket which coincidentally uses sk_user_data too. Later, kTLS treats it as psock so triggers a refcnt warning. The root cause is that smc_setsockopt() simply calls TCP setsockopt() which includes TCP_ULP. I do not think it makes sense to setup kTLS on top of SMC sockets, so we should just disallow this setup. It is hard to find a commit to blame, but we can apply this patch since the beginning of TCP_ULP. Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19ethtool: fix missing NLM_F_MULTI flag when dumpingFernando Fernandez Mancera
[ Upstream commit cf754ae331be7cc192b951756a1dd031e9ed978a ] When dumping the ethtool information from all the interfaces, the netlink reply should contain the NLM_F_MULTI flag. This flag allows userspace tools to identify that multiple messages are expected. Link: https://bugzilla.redhat.com/1953847 Fixes: 365f9ae4ee36 ("ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xsk: Fix for xp_aligned_validate_desc() when len == chunk_sizeXuan Zhuo
[ Upstream commit ac31565c21937eee9117e43c9cd34f557f6f1cb8 ] When desc->len is equal to chunk_size, it is legal. But when the xp_aligned_validate_desc() got chunk_end from desc->addr + desc->len pointing to the next chunk during the check, it caused the check to fail. This problem was first introduced in bbff2f321a86 ("xsk: new descriptor addressing scheme"). Later in 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") this piece of code was moved into the new function called xp_aligned_validate_desc(). This function was then moved into xsk_queue.h via 26062b185eee ("xsk: Explicitly inline functions and move definitions"). Fixes: bbff2f321a86 ("xsk: new descriptor addressing scheme") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210428094424.54435-1-xuanzhuo@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: xt_SECMARK: add new revision to fix structure layoutPablo Neira Ayuso
[ Upstream commit c7d13358b6a2f49f81a34aa323a2d0878a0532a2 ] This extension breaks when trying to delete rules, add a new revision to fix this. Fixes: 5e6874cdb8de ("[SECMARK]: Add xtables SECMARK target") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_bXin Long
[ Upstream commit f282df0391267fb2b263da1cc3233aa6fb81defc ] Normally SCTP_MIB_CURRESTAB is always incremented once asoc enter into ESTABLISHED from the state < ESTABLISHED and decremented when the asoc is being deleted. However, in sctp_sf_do_dupcook_b(), the asoc's state can be changed to ESTABLISHED from the state >= ESTABLISHED where it shouldn't increment SCTP_MIB_CURRESTAB. Otherwise, one asoc may increment MIB_CURRESTAB multiple times but only decrement once at the end. I was able to reproduce it by using scapy to do the 4-way shakehands, after that I replayed the COOKIE-ECHO chunk with 'peer_vtag' field changed to different values, and SCTP_MIB_CURRESTAB was incremented multiple times and never went back to 0 even when the asoc was freed. This patch is to fix it by only incrementing SCTP_MIB_CURRESTAB when the state < ESTABLISHED in sctp_sf_do_dupcook_b(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19sunrpc: Fix misplaced barrier in call_decodeBaptiste Lepers
[ Upstream commit f8f7e0fb22b2e75be55f2f0c13e229e75b0eac07 ] Fix a misplaced barrier in call_decode. The struct rpc_rqst is modified as follows by xprt_complete_rqst: req->rq_private_buf.len = copied; /* Ensure all writes are done before we update */ /* req->rq_reply_bytes_recvd */ smp_wmb(); req->rq_reply_bytes_recvd = copied; And currently read as follows by call_decode: smp_rmb(); // misplaced if (!req->rq_reply_bytes_recvd) goto out; req->rq_rcv_buf.len = req->rq_private_buf.len; This patch places the smp_rmb after the if to ensure that rq_reply_bytes_recvd and rq_private_buf.len are read in order. Fixes: 9ba828861c56a ("SUNRPC: Don't try to parse incomplete RPC messages") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19sctp: do asoc update earlier in sctp_sf_do_dupcook_aXin Long
[ Upstream commit 35b4f24415c854cd718ccdf38dbea6297f010aae ] There's a panic that occurs in a few of envs, the call trace is as below: [] general protection fault, ... 0x29acd70f1000a: 0000 [#1] SMP PTI [] RIP: 0010:sctp_ulpevent_notify_peer_addr_change+0x4b/0x1fa [sctp] [] sctp_assoc_control_transport+0x1b9/0x210 [sctp] [] sctp_do_8_2_transport_strike.isra.16+0x15c/0x220 [sctp] [] sctp_cmd_interpreter.isra.21+0x1231/0x1a10 [sctp] [] sctp_do_sm+0xc3/0x2a0 [sctp] [] sctp_generate_timeout_event+0x81/0xf0 [sctp] This is caused by a transport use-after-free issue. When processing a duplicate COOKIE-ECHO chunk in sctp_sf_do_dupcook_a(), both COOKIE-ACK and SHUTDOWN chunks are allocated with the transort from the new asoc. However, later in the sideeffect machine, the old asoc is used to send them out and old asoc's shutdown_last_sent_to is set to the transport that SHUTDOWN chunk attached to in sctp_cmd_setup_t2(), which actually belongs to the new asoc. After the new_asoc is freed and the old asoc T2 timeout, the old asoc's shutdown_last_sent_to that is already freed would be accessed in sctp_sf_t2_timer_expire(). Thanks Alexander and Jere for helping dig into this issue. To fix it, this patch is to do the asoc update first, then allocate the COOKIE-ACK and SHUTDOWN chunks with the 'updated' old asoc. This would make more sense, as a chunk from an asoc shouldn't be sent out with another asoc. We had fixed quite a few issues caused by this. Fixes: 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK") Reported-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Reported-by: syzbot+bbe538efd1046586f587@syzkaller.appspotmail.com Reported-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xprtrdma: rpcrdma_mr_pop() already does list_del_init()Chuck Lever
[ Upstream commit 1363e6388c363d0433f9aa4e2f33efe047572687 ] The rpcrdma_mr_pop() earlier in the function has already cleared out mr_list, so it must not be done again in the error path. Fixes: 847568942f93 ("xprtrdma: Remove fr_state") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xprtrdma: Fix cwnd update orderingChuck Lever
[ Upstream commit 35d8b10a25884050bb3b0149b62c3818ec59f77c ] After a reconnect, the reply handler is opening the cwnd (and thus enabling more RPC Calls to be sent) /before/ rpcrdma_post_recvs() can post enough Receive WRs to receive their replies. This causes an RNR and the new connection is lost immediately. The race is most clearly exposed when KASAN and disconnect injection are enabled. This slows down rpcrdma_rep_create() enough to allow the send side to post a bunch of RPC Calls before the Receive completion handler can invoke ib_post_recv(). Fixes: 2ae50ad68cd7 ("xprtrdma: Close window between waking RPC senders and posting Receives") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xprtrdma: Avoid Receive Queue wrappingChuck Lever
[ Upstream commit 32e6b68167f1d446111c973d57e6f52aee11897a ] Commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") increased the number of Receive WRs that are posted by the client, but did not increase the size of the Receive Queue allocated during transport set-up. This is usually not an issue because RPCRDMA_BACKWARD_WRS is defined as (32) when SUNRPC_BACKCHANNEL is defined. In cases where it isn't, there is a real risk of Receive Queue wrapping. Fixes: e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19SUNRPC: fix ternary sign expansion bug in tracingDan Carpenter
[ Upstream commit cb579086536f6564f5846f89808ec394ef8b8621 ] This code is supposed to pass negative "err" values for tracing but it passes positive values instead. The problem is that the trace_svcsock_tcp_send() function takes a long but "err" is an int and "sent" is a u32. The negative is first type promoted to u32 so it becomes a high positive then it is promoted to long and it stays positive. Fix this by casting "err" directly to long. Fixes: 998024dee197 ("SUNRPC: Add more svcsock tracepoints") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19SUNRPC: Handle major timeout in xprt_adjust_timeout()Chris Dion
[ Upstream commit 09252177d5f924f404551b4b4eded5daa7f04a3a ] Currently if a major timeout value is reached, but the minor value has not been reached, an ETIMEOUT will not be sent back to the caller. This can occur if the v4 server is not responding to requests and retrans is configured larger than the default of two. For example, A TCP mount with a configured timeout value of 50 and a retransmission count of 3 to a v4 server which is not responding: 1. Initial value and increment set to 5s, maxval set to 20s, retries at 3 2. Major timeout is set to 20s, minor timeout set to 5s initially 3. xport_adjust_timeout() is called after 5s, retry with 10s timeout, minor timeout is bumped to 10s 4. And again after another 10s, 15s total time with minor timeout set to 15s 5. After 20s total time xport_adjust_timeout is called as major timeout is reached, but skipped because the minor timeout is not reached - After this time the cpu spins continually calling xport_adjust_timeout() and returning 0 for 10 seconds. As seen on perf sched: 39243.913182 [0005] mount.nfs[3794] 4607.938 0.017 9746.863 6. This continues until the 15s minor timeout condition is reached (in this case for 10 seconds). After which the ETIMEOUT is processed back to the caller, the cpu spinning stops, and normal operations continue Fixes: 7de62bc09fe6 ("SUNRPC dont update timeout value on connection reset") Signed-off-by: Chris Dion <Christopher.Dion@dell.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19SUNRPC: Remove trace_xprt_transmit_queuedChuck Lever
[ Upstream commit 6cf23783f750634e10daeede48b0f5f5d64ebf3a ] This tracepoint can crash when dereferencing snd_task because when some transports connect, they put a cookie in that field instead of a pointer to an rpc_task. BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc] Read of size 2 at addr ffff8881a83bd3a0 by task git/331872 CPU: 11 PID: 331872 Comm: git Tainted: G S 5.12.0-rc2-00007-g3ab6e585a7f9 #1453 Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015 Call Trace: dump_stack+0x9c/0xcf print_address_description.constprop.0+0x18/0x239 kasan_report+0x174/0x1b0 trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc] xprt_prepare_transmit+0x8e/0xc1 [sunrpc] call_transmit+0x4d/0xc6 [sunrpc] Fixes: 9ce07ae5eb1d ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19SUNRPC: Move fault injection call sitesChuck Lever
[ Upstream commit 7638e0bfaed1b653d3ca663e560e9ffb44bb1030 ] I've hit some crashes that occur in the xprt_rdma_inject_disconnect path. It appears that, for some provides, rdma_disconnect() can take so long that the transport can disconnect and release its hardware resources while rdma_disconnect() is still running, resulting in a UAF in the provider. The transport's fault injection method may depend on the stability of transport data structures. That means it needs to be invoked only from contexts that hold the transport write lock. Fixes: 4a0682583988 ("SUNRPC: Transport fault injection") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19net: sched: tapr: prevent cycle_time == 0 in parse_taprio_scheduleDu Cheng
[ Upstream commit ed8157f1ebf1ae81a8fa2653e3f20d2076fad1c9 ] There is a reproducible sequence from the userland that will trigger a WARN_ON() condition in taprio_get_start_time, which causes kernel to panic if configured as "panic_on_warn". Catch this condition in parse_taprio_schedule to prevent this condition. Reported as bug on syzkaller: https://syzkaller.appspot.com/bug?extid=d50710fd0873a9c6b40c Reported-by: syzbot+d50710fd0873a9c6b40c@syzkaller.appspotmail.com Signed-off-by: Du Cheng <ducheng2@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()Gustavo A. R. Silva
[ Upstream commit c1d9e34e11281a8ba1a1c54e4db554232a461488 ] Fix the following out-of-bounds warning: net/ethtool/ioctl.c:492:2: warning: 'memcpy' offset [49, 84] from the object at 'link_usettings' is out of the bounds of referenced subobject 'base' with type 'struct ethtool_link_settings' at offset 0 [-Warray-bounds] The problem is that the original code is trying to copy data into a some struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &link_usettings.base. Fix this by directly using &link_usettings and _from_ as destination and source addresses, instead. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()Gustavo A. R. Silva
[ Upstream commit 1e3d976dbb23b3fce544752b434bdc32ce64aabc ] Fix the following out-of-bounds warning: net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds] The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). So, the compiler legitimately complains about it. As these are just a couple of members, fix this by copying each one of them in separate calls to memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19sctp: Fix out-of-bounds warning in sctp_process_asconf_param()Gustavo A. R. Silva
[ Upstream commit e5272ad4aab347dde5610c0aedb786219e3ff793 ] Fix the following out-of-bounds warning: net/sctp/sm_make_chunk.c:3150:4: warning: 'memcpy' offset [17, 28] from the object at 'addr' is out of the bounds of referenced subobject 'v4' with type 'struct sockaddr_in' at offset 0 [-Warray-bounds] This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19mac80211: clear the beacon's CRC after channel switchEmmanuel Grumbach
[ Upstream commit d6843d1ee283137723b4a8c76244607ce6db1951 ] After channel switch, we should consider any beacon with a CSA IE as a new switch. If the CSA IE is a leftover from before the switch that the AP forgot to remove, we'll get a CSA-to-Self. This caused issues in iwlwifi where the firmware saw a beacon with a CSA-to-Self with mode = 1 on the new channel after a switch. The firmware considered this a new switch and closed its queues. Since the beacon didn't change between before and after the switch, we wouldn't handle it (the CRC is the same) and we wouldn't let the firmware open its queues again or disconnect if the CSA IE stays for too long. Clear the CRC valid state after we switch to make sure that we handle the beacon and handle the CSA IE as required. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20210408143124.b9e68aa98304.I465afb55ca2c7d59f7bf610c6046a1fd732b4c28@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19ip6_vti: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet
[ Upstream commit 40cb881b5aaa0b69a7d93dec8440d5c62dae299f ] After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Therefore, we need to move dev_hold() call from vti6_tnl_create2() to vti6_dev_init_gen() [1] WARNING: CPU: 0 PID: 15951 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 15951 Comm: syz-executor.3 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc90001eaef28 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520003d5dd7 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff88801bb1c568 R13: ffff88801f69e800 R14: 00000000ffffffff R15: ffff888050889d40 FS: 00007fc79314e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1c1ff47108 CR3: 0000000020fd5000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] vti6_dev_uninit+0x31a/0x360 net/ipv6/ip6_vti.c:297 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 vti6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_vti.c:190 vti6_newlink+0x9d/0xd0 net/ipv6/ip6_vti.c:1020 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x331/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmmsg+0x195/0x470 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2516 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19Bluetooth: check for zapped sk before connectingArchie Pusaka
[ Upstream commit 3af70b39fa2d415dc86c370e5b24ddb9fdacbd6f ] There is a possibility of receiving a zapped sock on l2cap_sock_connect(). This could lead to interesting crashes, one such case is tearing down an already tore l2cap_sock as is happened with this call trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0xc4/0x118 lib/dump_stack.c:56 register_lock_class kernel/locking/lockdep.c:792 [inline] register_lock_class+0x239/0x6f6 kernel/locking/lockdep.c:742 __lock_acquire+0x209/0x1e27 kernel/locking/lockdep.c:3105 lock_acquire+0x29c/0x2fb kernel/locking/lockdep.c:3599 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:137 [inline] _raw_spin_lock_bh+0x38/0x47 kernel/locking/spinlock.c:175 spin_lock_bh include/linux/spinlock.h:307 [inline] lock_sock_nested+0x44/0xfa net/core/sock.c:2518 l2cap_sock_teardown_cb+0x88/0x2fb net/bluetooth/l2cap_sock.c:1345 l2cap_chan_del+0xa3/0x383 net/bluetooth/l2cap_core.c:598 l2cap_chan_close+0x537/0x5dd net/bluetooth/l2cap_core.c:756 l2cap_chan_timeout+0x104/0x17e net/bluetooth/l2cap_core.c:429 process_one_work+0x7e3/0xcb0 kernel/workqueue.c:2064 worker_thread+0x5a5/0x773 kernel/workqueue.c:2196 kthread+0x291/0x2a6 kernel/kthread.c:211 ret_from_fork+0x4e/0x80 arch/x86/entry/entry_64.S:604 Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reported-by: syzbot+abfc0f5e668d4099af73@syzkaller.appspotmail.com Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19net: bridge: when suppression is enabled exclude RARP packetsNikolay Aleksandrov
[ Upstream commit 0353b4a96b7a9f60fe20d1b3ebd4931a4085f91c ] Recently we had an interop issue where RARP packets got suppressed with bridge neigh suppression enabled, but the check in the code was meant to suppress GARP. Exclude RARP packets from it which would allow some VMWare setups to work, to quote the report: "Those RARP packets usually get generated by vMware to notify physical switches when vMotion occurs. vMware may use random sip/tip or just use sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep and other times get dropped by the logic" Reported-by: Amer Abdalamer <amer@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19net/sched: cls_flower: use ntohs for struct flow_dissector_key_portsVladimir Oltean
[ Upstream commit 6215afcb9a7e35cef334dc0ae7f998cc72c8465f ] A make W=1 build complains that: net/sched/cls_flower.c:214:20: warning: cast from restricted __be16 net/sched/cls_flower.c:214:20: warning: incorrect type in argument 1 (different base types) net/sched/cls_flower.c:214:20: expected unsigned short [usertype] val net/sched/cls_flower.c:214:20: got restricted __be16 [usertype] dst This is because we use htons on struct flow_dissector_key_ports members src and dst, which are defined as __be16, so they are already in network byte order, not host. The byte swap function for the other direction should have been used. Because htons and ntohs do the same thing (either both swap, or none does), this change has no functional effect except to silence the warnings. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19Bluetooth: initialize skb_queue_head at l2cap_chan_create()Tetsuo Handa
[ Upstream commit be8597239379f0f53c9710dd6ab551bbf535bec6 ] syzbot is hitting "INFO: trying to register non-static key." message [1], for "struct l2cap_chan"->tx_q.lock spinlock is not yet initialized when l2cap_chan_del() is called due to e.g. timeout. Since "struct l2cap_chan"->lock mutex is initialized at l2cap_chan_create() immediately after "struct l2cap_chan" is allocated using kzalloc(), let's as well initialize "struct l2cap_chan"->{tx_q,srej_q}.lock spinlocks there. [1] https://syzkaller.appspot.com/bug?extid=fadfba6a911f6bf71842 Reported-and-tested-by: syzbot <syzbot+fadfba6a911f6bf71842@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan defaultArchie Pusaka
[ Upstream commit 3a9d54b1947ecea8eea9a902c0b7eb58a98add8a ] Currently l2cap_chan_set_defaults() reset chan->conf_state to zero. However, there is a flag CONF_NOT_COMPLETE which is set when creating the l2cap_chan. It is suggested that the flag should be cleared when l2cap_chan is ready, but when l2cap_chan_set_defaults() is called, l2cap_chan is not yet ready. Therefore, we must set this flag as the default. Example crash call trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0xc4/0x118 lib/dump_stack.c:56 panic+0x1c6/0x38b kernel/panic.c:117 __warn+0x170/0x1b9 kernel/panic.c:471 warn_slowpath_fmt+0xc7/0xf8 kernel/panic.c:494 debug_print_object+0x175/0x193 lib/debugobjects.c:260 debug_object_assert_init+0x171/0x1bf lib/debugobjects.c:614 debug_timer_assert_init kernel/time/timer.c:629 [inline] debug_assert_init kernel/time/timer.c:677 [inline] del_timer+0x7c/0x179 kernel/time/timer.c:1034 try_to_grab_pending+0x81/0x2e5 kernel/workqueue.c:1230 cancel_delayed_work+0x7c/0x1c4 kernel/workqueue.c:2929 l2cap_clear_timer+0x1e/0x41 include/net/bluetooth/l2cap.h:834 l2cap_chan_del+0x2d8/0x37e net/bluetooth/l2cap_core.c:640 l2cap_chan_close+0x532/0x5d8 net/bluetooth/l2cap_core.c:756 l2cap_sock_shutdown+0x806/0x969 net/bluetooth/l2cap_sock.c:1174 l2cap_sock_release+0x64/0x14d net/bluetooth/l2cap_sock.c:1217 __sock_release+0xda/0x217 net/socket.c:580 sock_close+0x1b/0x1f net/socket.c:1039 __fput+0x322/0x55c fs/file_table.c:208 ____fput+0x17/0x19 fs/file_table.c:244 task_work_run+0x19b/0x1d3 kernel/task_work.c:115 exit_task_work include/linux/task_work.h:21 [inline] do_exit+0xe4c/0x204a kernel/exit.c:766 do_group_exit+0x291/0x291 kernel/exit.c:891 get_signal+0x749/0x1093 kernel/signal.c:2396 do_signal+0xa5/0xcdb arch/x86/kernel/signal.c:737 exit_to_usermode_loop arch/x86/entry/common.c:243 [inline] prepare_exit_to_usermode+0xed/0x235 arch/x86/entry/common.c:277 syscall_return_slowpath+0x3a7/0x3b3 arch/x86/entry/common.c:348 int_ret_from_sys_call+0x25/0xa3 Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reported-by: syzbot+338f014a98367a08a114@syzkaller.appspotmail.com Reviewed-by: Alain Michaud <alainm@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19Bluetooth: Fix incorrect status handling in LE PHY UPDATE eventAyush Garg
[ Upstream commit 87df8bcccd2cede62dfb97dc3d4ca1fe66cb4f83 ] Skip updation of tx and rx PHYs values, when PHY Update event's status is not successful. Signed-off-by: Ayush Garg <ayush.garg@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19tipc: convert dest node's address to network orderHoang Le
[ Upstream commit 1980d37565061ab44bdc2f9e4da477d3b9752e81 ] (struct tipc_link_info)->dest is in network order (__be32), so we must convert the value to network order before assigning. The problem detected by sparse: net/tipc/netlink_compat.c:699:24: warning: incorrect type in assignment (different base types) net/tipc/netlink_compat.c:699:24: expected restricted __be32 [usertype] dest net/tipc/netlink_compat.c:699:24: got int Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14sctp: delay auto_asconf init until binding the first addrXin Long
commit 34e5b01186858b36c4d7c87e1a025071e8e2401f upstream. As Or Cohen described: If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock held and sp->do_auto_asconf is true, then an element is removed from the auto_asconf_splist without any proper locking. This can happen in the following functions: 1. In sctp_accept, if sctp_sock_migrate fails. 2. In inet_create or inet6_create, if there is a bpf program attached to BPF_CGROUP_INET_SOCK_CREATE which denies creation of the sctp socket. This patch is to fix it by moving the auto_asconf init out of sctp_init_sock(), by which inet_create()/inet6_create() won't need to operate it in sctp_destroy_sock() when calling sk_common_release(). It also makes more sense to do auto_asconf init while binding the first addr, as auto_asconf actually requires an ANY addr bind, see it in sctp_addr_wq_timeout_handler(). This addresses CVE-2021-23133. Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14Revert "net/sctp: fix race condition in sctp_destroy_sock"Xin Long
commit 01bfe5e8e428b475982a98a46cca5755726f3f7f upstream. This reverts commit b166a20b07382b8bc1dcee2a448715c9c2c81b5b. This one has to be reverted as it introduced a dead lock, as syzbot reported: CPU0 CPU1 ---- ---- lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); CPU0 is the thread of sctp_addr_wq_timeout_handler(), and CPU1 is that of sctp_close(). The original issue this commit fixed will be fixed in the next patch. Reported-by: syzbot+959223586843e69a2674@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14net: Only allow init netns to set default tcp cong to a restricted algoJonathon Reinhart
commit 8d432592f30fcc34ef5a10aac4887b4897884493 upstream. tcp_set_default_congestion_control() is netns-safe in that it writes to &net->ipv4.tcp_congestion_control, but it also sets ca->flags |= TCP_CONG_NON_RESTRICTED which is not namespaced. This has the unintended side-effect of changing the global net.ipv4.tcp_allowed_congestion_control sysctl, despite the fact that it is read-only: 97684f0970f6 ("net: Make tcp_allowed_congestion_control readonly in non-init netns") Resolve this netns "leak" by only allowing the init netns to set the default algorithm to one that is restricted. This restriction could be removed if tcp_allowed_congestion_control were namespace-ified in the future. This bug was uncovered with https://github.com/JonathonReinhart/linux-netns-sysctl-verify Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control") Signed-off-by: Jonathon Reinhart <jonathon.reinhart@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14net:nfc:digital: Fix a double free in digital_tg_recv_dep_reqLv Yunlong
[ Upstream commit 75258586793efc521e5dd52a5bf6c7a4cf7002be ] In digital_tg_recv_dep_req, it calls nfc_tm_data_received(..,resp). If nfc_tm_data_received() failed, the callee will free the resp via kfree_skb() and return error. But in the exit branch, the resp will be freed again. My patch sets resp to NULL if nfc_tm_data_received() failed, to avoid the double free. Fixes: 1c7a4c24fbfd9 ("NFC Digital: Add target NFC-DEP support") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14net: bridge: mcast: fix broken length + header check for MRDv6 Adv.Linus Lüssing
[ Upstream commit 99014088156cd78867d19514a0bc771c4b86b93b ] The IPv6 Multicast Router Advertisements parsing has the following two issues: For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes, assuming MLDv2 Reports with at least one multicast address entry). When ipv6_mc_check_mld_msg() tries to parse an Multicast Router Advertisement its MLD length check will fail - and it will wrongly return -EINVAL, even if we have a valid MRD Advertisement. With the returned -EINVAL the bridge code will assume a broken packet and will wrongly discard it, potentially leading to multicast packet loss towards multicast routers. The second issue is the MRD header parsing in br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header immediately after the IPv6 header (IPv6 next header type). However according to RFC4286, section 2 all MRD messages contain a Router Alert option (just like MLD). So instead there is an IPv6 Hop-by-Hop option for the Router Alert between the IPv6 and ICMPv6 header, again leading to the bridge wrongly discarding Multicast Router Advertisements. To fix these two issues, introduce a new return value -ENODATA to ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop option which is not an MLD but potentially an MRD packet. This also simplifies further parsing in the bridge code, as ipv6_mc_check_mld() already fully checks the ICMPv6 header and hop-by-hop option. These issues were found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14vsock/virtio: free queued packets when closing socketStefano Garzarella
[ Upstream commit 8432b8114957235f42e070a16118a7f750de9d39 ] As reported by syzbot [1], there is a memory leak while closing the socket. We partially solved this issue with commit ac03046ece2b ("vsock/virtio: free packets during the socket release"), but we forgot to drain the RX queue when the socket is definitely closed by the scheduled work. To avoid future issues, let's use the new virtio_transport_remove_sock() to drain the RX queue before removing the socket from the af_vsock lists calling vsock_remove_sock(). [1] https://syzkaller.appspot.com/bug?extid=24452624fc4c571eedd9 Fixes: ac03046ece2b ("vsock/virtio: free packets during the socket release") Reported-and-tested-by: syzbot+24452624fc4c571eedd9@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14gro: fix napi_gro_frags() Fast GRO breakage due to IP alignment checkAlexander Lobakin
[ Upstream commit 7ad18ff6449cbd6beb26b53128ddf56d2685aa93 ] Commit 38ec4944b593 ("gro: ensure frag0 meets IP header alignment") did the right thing, but missed the fact that napi_gro_frags() logics calls for skb_gro_reset_offset() *before* pulling Ethernet header to the skb linear space. That said, the introduced check for frag0 address being aligned to 4 always fails for it as Ethernet header is obviously 14 bytes long, and in case with NET_IP_ALIGN its start is not aligned to 4. Fix this by adding @nhoff argument to skb_gro_reset_offset() which tells if an IP header is placed right at the start of frag0 or not. This restores Fast GRO for napi_gro_frags() that became very slow after the mentioned commit, and preserves the introduced check to avoid silent unaligned accesses. From v1 [0]: - inline tiny skb_gro_reset_offset() to let the code be optimized more efficively (esp. for the !NET_IP_ALIGN case) (Eric); - pull in Reviewed-by from Eric. [0] https://lore.kernel.org/netdev/20210418114200.5839-1-alobakin@pm.me Fixes: 38ec4944b593 ("gro: ensure frag0 meets IP header alignment") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14vsock/vmci: log once the failed queue pair allocationStefano Garzarella
[ Upstream commit e16edc99d658cd41c60a44cc14d170697aa3271f ] VMCI feature is not supported in conjunction with the vSphere Fault Tolerance (FT) feature. VMware Tools can repeatedly try to create a vsock connection. If FT is enabled the kernel logs is flooded with the following messages: qp_alloc_hypercall result = -20 Could not attach to queue pair with -20 "qp_alloc_hypercall result = -20" was hidden by commit e8266c4c3307 ("VMCI: Stop log spew when qp allocation isn't possible"), but "Could not attach to queue pair with -20" is still there flooding the log. Since the error message can be useful in some cases, print it only once. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14netfilter: nftables_offload: special ethertype handling for VLANPablo Neira Ayuso
[ Upstream commit 783003f3bb8a565326e89d18bbd948ad8ffc816a ] The nftables offload parser sets FLOW_DISSECTOR_KEY_BASIC .n_proto to the ethertype field in the ethertype frame. However: - FLOW_DISSECTOR_KEY_BASIC .n_proto field always stores either IPv4 or IPv6 ethertypes. - FLOW_DISSECTOR_KEY_VLAN .vlan_tpid stores either the 802.1q and 802.1ad ethertypes. Same as for FLOW_DISSECTOR_KEY_CVLAN. This function adjusts the flow dissector to handle two scenarios: 1) FLOW_DISSECTOR_KEY_VLAN .vlan_tpid is set to 802.1q or 802.1ad. Then, transfer: - the .n_proto field to FLOW_DISSECTOR_KEY_VLAN .tpid. - the original FLOW_DISSECTOR_KEY_VLAN .tpid to the FLOW_DISSECTOR_KEY_CVLAN .tpid - the original FLOW_DISSECTOR_KEY_CVLAN .tpid to the .n_proto field. 2) .n_proto is set to 802.1q or 802.1ad. Then, transfer: - the .n_proto field to FLOW_DISSECTOR_KEY_VLAN .tpid. - the original FLOW_DISSECTOR_KEY_VLAN .tpid to the .n_proto field. Fixes: a82055af5959 ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14netfilter: nftables_offload: VLAN id needs host byteorder in flow dissectorPablo Neira Ayuso
[ Upstream commit ff4d90a89d3d4d9814e0a2696509a7d495be4163 ] The flow dissector representation expects the VLAN id in host byteorder. Add the NFT_OFFLOAD_F_NETWORK2HOST flag to swap the bytes from nft_cmp. Fixes: a82055af5959 ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14netfilter: nft_payload: fix C-VLAN offload supportPablo Neira Ayuso
[ Upstream commit 14c20643ef9457679cc6934d77adc24296505214 ] - add another struct flow_dissector_key_vlan for C-VLAN - update layer 3 dependency to allow to match on IPv4/IPv6 Fixes: 89d8fd44abfb ("netfilter: nft_payload: add C-VLAN offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14net/packet: remove data races in fanout operationsEric Dumazet
[ Upstream commit 94f633ea8ade8418634d152ad0931133338226f6 ] af_packet fanout uses RCU rules to ensure f->arr elements are not dismantled before RCU grace period. However, it lacks rcu accessors to make sure KCSAN and other tools wont detect data races. Stupid compilers could also play games. Fixes: dc99f600698d ("packet: Add fanout support.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: "Gong, Sishuai" <sishuai@purdue.edu> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>