summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-06-22sunrpc: clean up properly in gss_mech_unregister()NeilBrown
commit 24c5efe41c29ee3e55bcf5a1c9f61ca8709622e8 upstream. gss_mech_register() calls svcauth_gss_register_pseudoflavor() for each flavour, but gss_mech_unregister() does not call auth_domain_put(). This is unbalanced and makes it impossible to reload the module. Change svcauth_gss_register_pseudoflavor() to return the registered auth_domain, and save it for later release. Cc: stable@vger.kernel.org (v2.6.12+) Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations.NeilBrown
commit d47a5dc2888fd1b94adf1553068b8dad76cec96c upstream. There is no valid case for supporting duplicate pseudoflavor registrations. Currently the silent acceptance of such registrations is hiding a bug. The rpcsec_gss_krb5 module registers 2 flavours but does not unregister them, so if you load, unload, reload the module, it will happily continue to use the old registration which now has pointers to the memory were the module was originally loaded. This could lead to unexpected results. So disallow duplicate registrations. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Cc: stable@vger.kernel.org (v2.6.12+) Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22netfilter: nft_nat: return EOPNOTSUPP if type or flags are not supportedPablo Neira Ayuso
[ Upstream commit 0d7c83463fdf7841350f37960a7abadd3e650b41 ] Instead of EINVAL which should be used for malformed netlink messages. Fixes: eb31628e37a0 ("netfilter: nf_tables: Add support for IPv6 NAT") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22batman-adv: Revert "disable ethtool link speed detection when auto ↵Sven Eckelmann
negotiation off" [ Upstream commit 9ad346c90509ebd983f60da7d082f261ad329507 ] The commit 8c46fcd78308 ("batman-adv: disable ethtool link speed detection when auto negotiation off") disabled the usage of ethtool's link_ksetting when auto negotation was enabled due to invalid values when used with tun/tap virtual net_devices. According to the patch, automatic measurements should be used for these kind of interfaces. But there are major flaws with this argumentation: * automatic measurements are not implemented * auto negotiation has nothing to do with the validity of the retrieved values The first point has to be fixed by a longer patch series. The "validity" part of the second point must be addressed in the same patch series by dropping the usage of ethtool's link_ksetting (thus always doing automatic measurements over ethernet). Drop the patch again to have more default values for various net_device types/configurations. The user can still overwrite them using the batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE. Reported-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22Bluetooth: Add SCO fallback for invalid LMP parameters errorHsin-Yu Chao
[ Upstream commit 56b5453a86203a44726f523b4133c1feca49ce7c ] Bluetooth PTS test case HFP/AG/ACC/BI-12-I accepts SCO connection with invalid parameter at the first SCO request expecting AG to attempt another SCO request with the use of "safe settings" for given codec, base on section 5.7.1.2 of HFP 1.7 specification. This patch addresses it by adding "Invalid LMP Parameters" (0x1e) to the SCO fallback case. Verified with below log: < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17 Handle: 256 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x0380 3-EV3 may not be used 2-EV5 may not be used 3-EV5 may not be used > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01|0x0028) ncmd 1 Status: Success (0x00) > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 1 > HCI Event: Synchronous Connect Complete (0x2c) plen 17 Status: Invalid LMP Parameters / Invalid LL Parameters (0x1e) Handle: 0 Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC) Link type: eSCO (0x02) Transmission interval: 0x00 Retransmission window: 0x02 RX packet length: 0 TX packet length: 0 Air mode: Transparent (0x03) < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17 Handle: 256 Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 8 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x03c8 EV3 may be used 2-EV3 may not be used 3-EV3 may not be used 2-EV5 may not be used 3-EV5 may not be used > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01|0x0028) ncmd 1 Status: Success (0x00) > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 5 > HCI Event: Max Slots Change (0x1b) plen 3 Handle: 256 Max slots: 1 > HCI Event: Synchronous Connect Complete (0x2c) plen 17 Status: Success (0x00) Handle: 257 Address: 00:1B:DC:F2:21:59 (OUI 00-1B-DC) Link type: eSCO (0x02) Transmission interval: 0x06 Retransmission window: 0x04 RX packet length: 30 TX packet length: 30 Air mode: Transparent (0x03) Signed-off-by: Hsin-Yu Chao <hychao@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22bridge: Avoid infinite loop when suppressing NS messages with invalid optionsIdo Schimmel
[ Upstream commit 53fc685243bd6fb90d90305cea54598b78d3cbfc ] When neighbor suppression is enabled the bridge device might reply to Neighbor Solicitation (NS) messages on behalf of remote hosts. In case the NS message includes the "Source link-layer address" option [1], the bridge device will use the specified address as the link-layer destination address in its reply. To avoid an infinite loop, break out of the options parsing loop when encountering an option with length zero and disregard the NS message. This is consistent with the IPv6 ndisc code and RFC 4886 which states that "Nodes MUST silently discard an ND packet that contains an option with length zero" [2]. [1] https://tools.ietf.org/html/rfc4861#section-4.3 [2] https://tools.ietf.org/html/rfc4861#section-4.6 Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alla Segal <allas@mellanox.com> Tested-by: Alla Segal <allas@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22ipv6: fix IPV6_ADDRFORM operation logicHangbin Liu
[ Upstream commit 79a1f0ccdbb4ad700590f61b00525b390cb53905 ] Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present. Previously the checking logic looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol != IPPROTO_TCP) break; After commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP was blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; break; else break; Then after commit 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") UDP/UDPLITE were blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; if (sk->sk_protocol == IPPROTO_TCP) do_some_check; if (sk->sk_protocol != IPPROTO_TCP) break; Fix it by using Eric's code and simply remove the break in TCP check, which looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; else break; Fixes: 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-10vsock: fix timeout in vsock_accept()Stefano Garzarella
[ Upstream commit 7e0afbdfd13d1e708fe96e31c46c4897101a6a43 ] The accept(2) is an "input" socket interface, so we should use SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout. So this patch replace sock_sndtimeo() with sock_rcvtimeo() to use the right timeout in the vsock_accept(). Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-10l2tp: do not use inet_hash()/inet_unhash()Eric Dumazet
[ Upstream commit 02c71b144c811bcdd865e0a1226d0407d11357e8 ] syzbot recently found a way to crash the kernel [1] Issue here is that inet_hash() & inet_unhash() are currently only meant to be used by TCP & DCCP, since only these protocols provide the needed hashinfo pointer. L2TP uses a single list (instead of a hash table) This old bug became an issue after commit 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") since after this commit, sk_common_release() can be called while the L2TP socket is still considered 'hashed'. general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 7063 Comm: syz-executor654 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00 RSP: 0018:ffffc90001777d30 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242 RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008 RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1 R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0 R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00 FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sk_common_release+0xba/0x370 net/core/sock.c:3210 inet_create net/ipv4/af_inet.c:390 [inline] inet_create+0x966/0xe00 net/ipv4/af_inet.c:248 __sock_create+0x3cb/0x730 net/socket.c:1428 sock_create net/socket.c:1479 [inline] __sys_socket+0xef/0x200 net/socket.c:1521 __do_sys_socket net/socket.c:1530 [inline] __se_sys_socket net/socket.c:1528 [inline] __x64_sys_socket+0x6f/0xb0 net/socket.c:1528 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x441e29 Code: e8 fc b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdce184148 EFLAGS: 00000246 ORIG_RAX: 0000000000000029 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441e29 RDX: 0000000000000073 RSI: 0000000000000002 RDI: 0000000000000002 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000402c30 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace 23b6578228ce553e ]--- RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00 RSP: 0018:ffffc90001777d30 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242 RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008 RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1 R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0 R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00 FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Andrii Nakryiko <andriin@fb.com> Reported-by: syzbot+3610d489778b57cc8031@syzkaller.appspotmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-10l2tp: add sk_family checks to l2tp_validate_socketEric Dumazet
[ Upstream commit d9a81a225277686eb629938986d97629ea102633 ] syzbot was able to trigger a crash after using an ISDN socket and fool l2tp. Fix this by making sure the UDP socket is of the proper family. BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78 Write of size 1 at addr ffff88808ed0c590 by task syz-executor.5/3018 CPU: 0 PID: 3018 Comm: syz-executor.5 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:382 __kasan_report.cold+0x20/0x38 mm/kasan/report.c:511 kasan_report+0x33/0x50 mm/kasan/common.c:625 setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78 l2tp_tunnel_register+0xb15/0xdd0 net/l2tp/l2tp_core.c:1523 l2tp_nl_cmd_tunnel_create+0x4b2/0xa60 net/l2tp/l2tp_netlink.c:249 genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline] genl_family_rcv_msg net/netlink/genetlink.c:718 [inline] genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:746 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e6/0x810 net/socket.c:2352 ___sys_sendmsg+0x100/0x170 net/socket.c:2406 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x45ca29 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007effe76edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004fe1c0 RCX: 000000000045ca29 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000094e R14: 00000000004d5d00 R15: 00007effe76ee6d4 Allocated by task 3018: save_stack+0x1b/0x40 mm/kasan/common.c:49 set_track mm/kasan/common.c:57 [inline] __kasan_kmalloc mm/kasan/common.c:495 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:468 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x161/0x7a0 mm/slab.c:3665 kmalloc include/linux/slab.h:560 [inline] sk_prot_alloc+0x223/0x2f0 net/core/sock.c:1612 sk_alloc+0x36/0x1100 net/core/sock.c:1666 data_sock_create drivers/isdn/mISDN/socket.c:600 [inline] mISDN_sock_create+0x272/0x400 drivers/isdn/mISDN/socket.c:796 __sock_create+0x3cb/0x730 net/socket.c:1428 sock_create net/socket.c:1479 [inline] __sys_socket+0xef/0x200 net/socket.c:1521 __do_sys_socket net/socket.c:1530 [inline] __se_sys_socket net/socket.c:1528 [inline] __x64_sys_socket+0x6f/0xb0 net/socket.c:1528 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Freed by task 2484: save_stack+0x1b/0x40 mm/kasan/common.c:49 set_track mm/kasan/common.c:57 [inline] kasan_set_free_info mm/kasan/common.c:317 [inline] __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:456 __cache_free mm/slab.c:3426 [inline] kfree+0x109/0x2b0 mm/slab.c:3757 kvfree+0x42/0x50 mm/util.c:603 __free_fdtable+0x2d/0x70 fs/file.c:31 put_files_struct fs/file.c:420 [inline] put_files_struct+0x248/0x2e0 fs/file.c:413 exit_files+0x7e/0xa0 fs/file.c:445 do_exit+0xb04/0x2dd0 kernel/exit.c:791 do_group_exit+0x125/0x340 kernel/exit.c:894 get_signal+0x47b/0x24e0 kernel/signal.c:2739 do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784 exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305 entry_SYSCALL_64_after_hwframe+0x49/0xb3 The buggy address belongs to the object at ffff88808ed0c000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1424 bytes inside of 2048-byte region [ffff88808ed0c000, ffff88808ed0c800) The buggy address belongs to the page: page:ffffea00023b4300 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0xfffe0000000200(slab) raw: 00fffe0000000200 ffffea0002838208 ffffea00015ba288 ffff8880aa000e00 raw: 0000000000000000 ffff88808ed0c000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88808ed0c480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88808ed0c500: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88808ed0c580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88808ed0c600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88808ed0c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 6b9f34239b00 ("l2tp: fix races in tunnel creation") Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Guillaume Nault <gnault@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-10devinet: fix memleak in inetdev_init()Yang Yingliang
[ Upstream commit 1b49cd71b52403822731dc9f283185d1da355f97 ] When devinet_sysctl_register() failed, the memory allocated in neigh_parms_alloc() should be freed. Fixes: 20e61da7ffcf ("ipv4: fail early when creating netdev named all or default") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03netfilter: nf_conntrack_pptp: fix compilation warning with W=1 buildPablo Neira Ayuso
commit 4946ea5c1237036155c3b3a24f049fd5f849f8f6 upstream. >> include/linux/netfilter/nf_conntrack_pptp.h:13:20: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers] extern const char *const pptp_msg_name(u_int16_t msg); ^~~~~~ Reported-by: kbuild test robot <lkp@intel.com> Fixes: 4c559f15efcc ("netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xsk: Add overflow check for u64 division, stored into u32Björn Töpel
commit b16a87d0aef7a6be766f6618976dc5ff2c689291 upstream. The npgs member of struct xdp_umem is an u32 entity, and stores the number of pages the UMEM consumes. The calculation of npgs npgs = size / PAGE_SIZE can overflow. To avoid overflow scenarios, the division is now first stored in a u64, and the result is verified to fit into 32b. An alternative would be storing the npgs as a u64, however, this wastes memory and is an unrealisticly large packet area. Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt") Reported-by: "Minh Bùi Quang" <minhquangbui99@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/CACtPs=GGvV-_Yj6rbpzTVnopgi5nhMoCcTkSkYrJHGQHJWFZMQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20200525080400.13195-1-bjorn.topel@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03esp6: get the right proto for transport mode in esp6_gso_encapXin Long
commit 3c96ec56828922e3fe5477f75eb3fc02f98f98b5 upstream. For transport mode, when ipv6 nexthdr is set, the packet format might be like: ---------------------------------------------------- | | dest | | | | ESP | ESP | | IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV | ---------------------------------------------------- What it wants to get for x-proto in esp6_gso_encap() is the proto that will be set in ESP nexthdr. So it should skip all ipv6 nexthdrs and get the real transport protocol. Othersize, the wrong proto number will be set into ESP nexthdr. This patch is to skip all ipv6 nexthdrs by calling ipv6_skip_exthdr() in esp6_gso_encap(). Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03netfilter: nf_conntrack_pptp: prevent buffer overflows in debug codePablo Neira Ayuso
commit 4c559f15efcc43b996f4da528cd7f9483aaca36d upstream. Dan Carpenter says: "Smatch complains that the value for "cmd" comes from the network and can't be trusted." Add pptp_msg_name() helper function that checks for the array boundary. Fixes: f09943fefe6b ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03netfilter: nfnetlink_cthelper: unbreak userspace helper supportPablo Neira Ayuso
commit 703acd70f2496537457186211c2f03e792409e68 upstream. Restore helper data size initialization and fix memcopy of the helper data size. Fixes: 157ffffeb5dc ("netfilter: nfnetlink_cthelper: reject too large userspace allocation requests") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03netfilter: ipset: Fix subcounter update skipPhil Sutter
commit a164b95ad6055c50612795882f35e0efda1f1390 upstream. If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE must be set, not unset. Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03netfilter: nft_reject_bridge: enable reject with bridge vlanMichael Braun
commit e9c284ec4b41c827f4369973d2792992849e4fa5 upstream. Currently, using the bridge reject target with tagged packets results in untagged packets being sent back. Fix this by mirroring the vlan id as well. Fixes: 85f5b3086a04 ("netfilter: bridge: add reject support") Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03ip_vti: receive ipip packet by calling ip_tunnel_rcvXin Long
commit 976eba8ab596bab94b9714cd46d38d5c6a2c660d upstream. In Commit dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel"), it tries to receive IPIP packets in vti by calling xfrm_input(). This case happens when a small packet or frag sent by peer is too small to get compressed. However, xfrm_input() will still get to the IPCOMP path where skb sec_path is set, but never dropped while it should have been done in vti_ipcomp4_protocol.cb_handler(vti_rcv_cb), as it's not an ipcomp4 packet. This will cause that the packet can never pass xfrm4_policy_check() in the upper protocol rcv functions. So this patch is to call ip_tunnel_rcv() to process IPIP packets instead. Fixes: dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03vti4: eliminated some duplicate code.Jeremy Sowden
commit f981c57ffd2d7cf2dd4b6d6f8fcb3965df42f54c upstream. The ipip tunnel introduced in commit dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel") largely duplicated the existing vti_input and vti_recv functions. Refactored to deduplicate the common code. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xfrm: fix a NULL-ptr deref in xfrm_local_errorXin Long
commit f6a23d85d078c2ffde79c66ca81d0a1dde451649 upstream. This patch is to fix a crash: [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access [ ] general protection fault: 0000 [#1] SMP KASAN PTI [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0 [ ] Call Trace: [ ] xfrm6_local_error+0x1eb/0x300 [ ] xfrm_local_error+0x95/0x130 [ ] __xfrm6_output+0x65f/0xb50 [ ] xfrm6_output+0x106/0x46f [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel] [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan] [ ] vxlan_xmit+0x6a0/0x4276 [vxlan] [ ] dev_hard_start_xmit+0x165/0x820 [ ] __dev_queue_xmit+0x1ff0/0x2b90 [ ] ip_finish_output2+0xd3e/0x1480 [ ] ip_do_fragment+0x182d/0x2210 [ ] ip_output+0x1d0/0x510 [ ] ip_send_skb+0x37/0xa0 [ ] raw_sendmsg+0x1b4c/0x2b80 [ ] sock_sendmsg+0xc0/0x110 This occurred when sending a v4 skb over vxlan6 over ipsec, in which case skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries to get ipv6 info from a ipv4 sk. This issue was actually fixed by Commit 628e341f319f ("xfrm: make local error reporting more robust"), but brought back by Commit 844d48746e4b ("xfrm: choose protocol family by skb protocol"). So to fix it, we should call xfrm6_local_error() only when skb->protocol is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6. Fixes: 844d48746e4b ("xfrm: choose protocol family by skb protocol") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xfrm: fix a warning in xfrm_policy_insert_listXin Long
commit ed17b8d377eaf6b4a01d46942b4c647378a79bdd upstream. This waring can be triggered simply by: # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 1 mark 0 mask 0x10 #[1] # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 2 mark 0 mask 0x1 #[2] # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 2 mark 0 mask 0x10 #[3] Then dmesg shows: [ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548 [ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030 [ ] Call Trace: [ ] xfrm_policy_inexact_insert+0x85/0xe50 [ ] xfrm_policy_insert+0x4ba/0x680 [ ] xfrm_add_policy+0x246/0x4d0 [ ] xfrm_user_rcv_msg+0x331/0x5c0 [ ] netlink_rcv_skb+0x121/0x350 [ ] xfrm_netlink_rcv+0x66/0x80 [ ] netlink_unicast+0x439/0x630 [ ] netlink_sendmsg+0x714/0xbf0 [ ] sock_sendmsg+0xe2/0x110 The issue was introduced by Commit 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities"). After that, the policies [1] and [2] would be able to be added with different priorities. However, policy [3] will actually match both [1] and [2]. Policy [1] was matched due to the 1st 'return true' in xfrm_policy_mark_match(), and policy [2] was matched due to the 2nd 'return true' in there. It caused WARN_ON() in xfrm_policy_insert_list(). This patch is to fix it by only (the same value and priority) as the same policy in xfrm_policy_mark_match(). Thanks to Yuehaibing, we could make this fix better. v1->v2: - check policy->mark.v == pol->mark.v only without mask. Fixes: 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xfrm interface: fix oops when deleting a x-netns interfaceNicolas Dichtel
commit c95c5f58b35ef995f66cb55547eee6093ab5fcb8 upstream. Here is the steps to reproduce the problem: ip netns add foo ip netns add bar ip -n foo link add xfrmi0 type xfrm dev lo if_id 42 ip -n foo link set xfrmi0 netns bar ip netns del foo ip netns del bar Which results to: [ 186.686395] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bd3: 0000 [#1] SMP PTI [ 186.687665] CPU: 7 PID: 232 Comm: kworker/u16:2 Not tainted 5.6.0+ #1 [ 186.688430] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 186.689420] Workqueue: netns cleanup_net [ 186.689903] RIP: 0010:xfrmi_dev_uninit+0x1b/0x4b [xfrm_interface] [ 186.690657] Code: 44 f6 ff ff 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 8d 8f c0 08 00 00 8b 05 ce 14 00 00 48 8b 97 d0 08 00 00 48 8b 92 c0 0e 00 00 <48> 8b 14 c2 48 8b 02 48 85 c0 74 19 48 39 c1 75 0c 48 8b 87 c0 08 [ 186.692838] RSP: 0018:ffffc900003b7d68 EFLAGS: 00010286 [ 186.693435] RAX: 000000000000000d RBX: ffff8881b0f31000 RCX: ffff8881b0f318c0 [ 186.694334] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000246 RDI: ffff8881b0f31000 [ 186.695190] RBP: ffffc900003b7df0 R08: ffff888236c07740 R09: 0000000000000040 [ 186.696024] R10: ffffffff81fce1b8 R11: 0000000000000002 R12: ffffc900003b7d80 [ 186.696859] R13: ffff8881edcc6a40 R14: ffff8881a1b6e780 R15: ffffffff81ed47c8 [ 186.697738] FS: 0000000000000000(0000) GS:ffff888237dc0000(0000) knlGS:0000000000000000 [ 186.698705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 186.699408] CR2: 00007f2129e93148 CR3: 0000000001e0a000 CR4: 00000000000006e0 [ 186.700221] Call Trace: [ 186.700508] rollback_registered_many+0x32b/0x3fd [ 186.701058] ? __rtnl_unlock+0x20/0x3d [ 186.701494] ? arch_local_irq_save+0x11/0x17 [ 186.702012] unregister_netdevice_many+0x12/0x55 [ 186.702594] default_device_exit_batch+0x12b/0x150 [ 186.703160] ? prepare_to_wait_exclusive+0x60/0x60 [ 186.703719] cleanup_net+0x17d/0x234 [ 186.704138] process_one_work+0x196/0x2e8 [ 186.704652] worker_thread+0x1a4/0x249 [ 186.705087] ? cancel_delayed_work+0x92/0x92 [ 186.705620] kthread+0x105/0x10f [ 186.706000] ? __kthread_bind_mask+0x57/0x57 [ 186.706501] ret_from_fork+0x35/0x40 [ 186.706978] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace fscache sunrpc button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic 8139too ide_cd_mod cdrom ide_gd_mod ata_generic ata_piix libata scsi_mod piix psmouse i2c_piix4 ide_core 8139cp i2c_core mii floppy [ 186.710423] ---[ end trace 463bba18105537e5 ]--- The problem is that x-netns xfrm interface are not removed when the link netns is removed. This causes later this oops when thoses interfaces are removed. Let's add a handler to remove all interfaces related to a netns when this netns is removed. Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Reported-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_outputXin Long
commit a204aef9fd77dce1efd9066ca4e44eede99cd858 upstream. An use-after-free crash can be triggered when sending big packets over vxlan over esp with esp offload enabled: [] BUG: KASAN: use-after-free in ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0 [] Call Trace: [] dump_stack+0x75/0xa0 [] kasan_report+0x37/0x50 [] ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0 [] ipv6_gso_segment+0x2c8/0x13c0 [] skb_mac_gso_segment+0x1cb/0x420 [] skb_udp_tunnel_segment+0x6b5/0x1c90 [] inet_gso_segment+0x440/0x1380 [] skb_mac_gso_segment+0x1cb/0x420 [] esp4_gso_segment+0xae8/0x1709 [esp4_offload] [] inet_gso_segment+0x440/0x1380 [] skb_mac_gso_segment+0x1cb/0x420 [] __skb_gso_segment+0x2d7/0x5f0 [] validate_xmit_skb+0x527/0xb10 [] __dev_queue_xmit+0x10f8/0x2320 <--- [] ip_finish_output2+0xa2e/0x1b50 [] ip_output+0x1a8/0x2f0 [] xfrm_output_resume+0x110e/0x15f0 [] __xfrm4_output+0xe1/0x1b0 [] xfrm4_output+0xa0/0x200 [] iptunnel_xmit+0x5a7/0x920 [] vxlan_xmit_one+0x1658/0x37a0 [vxlan] [] vxlan_xmit+0x5e4/0x3ec8 [vxlan] [] dev_hard_start_xmit+0x125/0x540 [] __dev_queue_xmit+0x17bd/0x2320 <--- [] ip6_finish_output2+0xb20/0x1b80 [] ip6_output+0x1b3/0x390 [] ip6_xmit+0xb82/0x17e0 [] inet6_csk_xmit+0x225/0x3d0 [] __tcp_transmit_skb+0x1763/0x3520 [] tcp_write_xmit+0xd64/0x5fe0 [] __tcp_push_pending_frames+0x8c/0x320 [] tcp_sendmsg_locked+0x2245/0x3500 [] tcp_sendmsg+0x27/0x40 As on the tx path of vxlan over esp, skb->inner_network_header would be set on vxlan_xmit() and xfrm4_tunnel_encap_add(), and the later one can overwrite the former one. It causes skb_udp_tunnel_segment() to use a wrong skb->inner_network_header, then the issue occurs. This patch is to fix it by calling xfrm_output_gso() instead when the inner_protocol is set, in which gso_segment of inner_protocol will be done first. While at it, also improve some code around. Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xfrm: allow to accept packets with ipv6 NEXTHDR_HOP in xfrm_inputXin Long
commit afcaf61be9d1dbdee5ec186d1dcc67b6b692180f upstream. For beet mode, when it's ipv6 inner address with nexthdrs set, the packet format might be: ---------------------------------------------------- | outer | | dest | | | ESP | ESP | | IP hdr | ESP | opts.| TCP | Data | Trailer | ICV | ---------------------------------------------------- The nexthdr from ESP could be NEXTHDR_HOP(0), so it should continue processing the packet when nexthdr returns 0 in xfrm_input(). Otherwise, when ipv6 nexthdr is set, the packet will be dropped. I don't see any error cases that nexthdr may return 0. So fix it by removing the check for nexthdr == 0. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03mac80211: mesh: fix discovery timer re-arming issue / crashLinus Lüssing
commit e2d4a80f93fcfaf72e2e20daf6a28e39c3b90677 upstream. On a non-forwarding 802.11s link between two fairly busy neighboring nodes (iperf with -P 16 at ~850MBit/s TCP; 1733.3 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 4), so with frequent PREQ retries, usually after around 30-40 seconds the following crash would occur: [ 1110.822428] Unable to handle kernel read from unreadable memory at virtual address 00000000 [ 1110.830786] Mem abort info: [ 1110.833573] Exception class = IABT (current EL), IL = 32 bits [ 1110.839494] SET = 0, FnV = 0 [ 1110.842546] EA = 0, S1PTW = 0 [ 1110.845678] user pgtable: 4k pages, 48-bit VAs, pgd = ffff800076386000 [ 1110.852204] [0000000000000000] *pgd=00000000f6322003, *pud=00000000f62de003, *pmd=0000000000000000 [ 1110.861167] Internal error: Oops: 86000004 [#1] PREEMPT SMP [ 1110.866730] Modules linked in: pppoe ppp_async batman_adv ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 usbcore usb_common [ 1110.932190] Process swapper/3 (pid: 0, stack limit = 0xffff0000090c8000) [ 1110.938884] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0 [ 1110.944965] Hardware name: LS1043A RGW Board (DT) [ 1110.949658] task: ffff8000787a81c0 task.stack: ffff0000090c8000 [ 1110.955568] PC is at 0x0 [ 1110.958097] LR is at call_timer_fn.isra.27+0x24/0x78 [ 1110.963055] pc : [<0000000000000000>] lr : [<ffff0000080ff29c>] pstate: 00400145 [ 1110.970440] sp : ffff00000801be10 [ 1110.973744] x29: ffff00000801be10 x28: ffff000008bf7018 [ 1110.979047] x27: ffff000008bf87c8 x26: ffff000008c160c0 [ 1110.984352] x25: 0000000000000000 x24: 0000000000000000 [ 1110.989657] x23: dead000000000200 x22: 0000000000000000 [ 1110.994959] x21: 0000000000000000 x20: 0000000000000101 [ 1111.000262] x19: ffff8000787a81c0 x18: 0000000000000000 [ 1111.005565] x17: ffff0000089167b0 x16: 0000000000000058 [ 1111.010868] x15: ffff0000089167b0 x14: 0000000000000000 [ 1111.016172] x13: ffff000008916788 x12: 0000000000000040 [ 1111.021475] x11: ffff80007fda9af0 x10: 0000000000000001 [ 1111.026777] x9 : ffff00000801bea0 x8 : 0000000000000004 [ 1111.032080] x7 : 0000000000000000 x6 : ffff80007fda9aa8 [ 1111.037383] x5 : ffff00000801bea0 x4 : 0000000000000010 [ 1111.042685] x3 : ffff00000801be98 x2 : 0000000000000614 [ 1111.047988] x1 : 0000000000000000 x0 : 0000000000000000 [ 1111.053290] Call trace: [ 1111.055728] Exception stack(0xffff00000801bcd0 to 0xffff00000801be10) [ 1111.062158] bcc0: 0000000000000000 0000000000000000 [ 1111.069978] bce0: 0000000000000614 ffff00000801be98 0000000000000010 ffff00000801bea0 [ 1111.077798] bd00: ffff80007fda9aa8 0000000000000000 0000000000000004 ffff00000801bea0 [ 1111.085618] bd20: 0000000000000001 ffff80007fda9af0 0000000000000040 ffff000008916788 [ 1111.093437] bd40: 0000000000000000 ffff0000089167b0 0000000000000058 ffff0000089167b0 [ 1111.101256] bd60: 0000000000000000 ffff8000787a81c0 0000000000000101 0000000000000000 [ 1111.109075] bd80: 0000000000000000 dead000000000200 0000000000000000 0000000000000000 [ 1111.116895] bda0: ffff000008c160c0 ffff000008bf87c8 ffff000008bf7018 ffff00000801be10 [ 1111.124715] bdc0: ffff0000080ff29c ffff00000801be10 0000000000000000 0000000000400145 [ 1111.132534] bde0: ffff8000787a81c0 ffff00000801bde8 0000ffffffffffff 000001029eb19be8 [ 1111.140353] be00: ffff00000801be10 0000000000000000 [ 1111.145220] [< (null)>] (null) [ 1111.149917] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 1111.155741] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 1111.161130] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 1111.166002] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 1111.171825] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 [ 1111.177213] Exception stack(0xffff0000090cbe30 to 0xffff0000090cbf70) [ 1111.183642] be20: 0000000000000020 0000000000000000 [ 1111.191461] be40: 0000000000000001 0000000000000000 00008000771af000 0000000000000000 [ 1111.199281] be60: ffff000008c95180 0000000000000000 ffff000008c19360 ffff0000090cbef0 [ 1111.207101] be80: 0000000000000810 0000000000000400 0000000000000098 ffff000000000000 [ 1111.214920] bea0: 0000000000000001 ffff0000089167b0 0000000000000000 ffff0000089167b0 [ 1111.222740] bec0: 0000000000000000 ffff000008c198e8 ffff000008bf7018 ffff000008c19000 [ 1111.230559] bee0: 0000000000000000 0000000000000000 ffff8000787a81c0 ffff000008018000 [ 1111.238380] bf00: ffff00000801c000 ffff00000913ba34 ffff8000787a81c0 ffff0000090cbf70 [ 1111.246199] bf20: ffff0000080857cc ffff0000090cbf70 ffff0000080857d0 0000000000400145 [ 1111.254020] bf40: ffff000008018000 ffff00000801c000 ffffffffffffffff ffff0000080fa574 [ 1111.261838] bf60: ffff0000090cbf70 ffff0000080857d0 [ 1111.266706] [<ffff0000080832e8>] el1_irq+0xe8/0x18c [ 1111.271576] [<ffff0000080857d0>] arch_cpu_idle+0x10/0x18 [ 1111.276880] [<ffff0000080d7de4>] do_idle+0xec/0x1b8 [ 1111.281748] [<ffff0000080d8020>] cpu_startup_entry+0x20/0x28 [ 1111.287399] [<ffff00000808f81c>] secondary_start_kernel+0x104/0x110 [ 1111.293662] Code: bad PC value [ 1111.296710] ---[ end trace 555b6ca4363c3edd ]--- [ 1111.301318] Kernel panic - not syncing: Fatal exception in interrupt [ 1111.307661] SMP: stopping secondary CPUs [ 1111.311574] Kernel Offset: disabled [ 1111.315053] CPU features: 0x0002000 [ 1111.318530] Memory Limit: none [ 1111.321575] Rebooting in 3 seconds.. With some added debug output / delays we were able to push the crash from the timer callback runner into the callback function and by that shedding some light on which object holding the timer gets corrupted: [ 401.720899] Unable to handle kernel read from unreadable memory at virtual address 00000868 [...] [ 402.335836] [<ffff0000088fafa4>] _raw_spin_lock_bh+0x14/0x48 [ 402.341548] [<ffff000000dbe684>] mesh_path_timer+0x10c/0x248 [mac80211] [ 402.348154] [<ffff0000080ff29c>] call_timer_fn.isra.27+0x24/0x78 [ 402.354150] [<ffff0000080ff77c>] run_timer_softirq+0x184/0x398 [ 402.359974] [<ffff000008081938>] __do_softirq+0x100/0x1fc [ 402.365362] [<ffff0000080a2e28>] irq_exit+0x80/0xd8 [ 402.370231] [<ffff0000080ea708>] __handle_domain_irq+0x88/0xb0 [ 402.376053] [<ffff000008081678>] gic_handle_irq+0x68/0xb0 The issue happens due to the following sequence of events: 1) mesh_path_start_discovery(): -> spin_unlock_bh(&mpath->state_lock) before mesh_path_sel_frame_tx() 2) mesh_path_free_rcu() -> del_timer_sync(&mpath->timer) [...] -> kfree_rcu(mpath) 3) mesh_path_start_discovery(): -> mod_timer(&mpath->timer, ...) [...] -> rcu_read_unlock() 4) mesh_path_free_rcu()'s kfree_rcu(): -> kfree(mpath) 5) mesh_path_timer() starts after timeout, using freed mpath object So a use-after-free issue due to a timer re-arming bug caused by an early spin-unlocking. This patch fixes this issue by re-checking if mpath is about to be free'd and if so bails out of re-arming the timer. Cc: stable@vger.kernel.org Fixes: 050ac52cbe1f ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Link: https://lore.kernel.org/r/20200522170413.14973-1-linus.luessing@c0d3.blue Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03libceph: ignore pool overlay and cache logic on redirectsJerry Lee
[ Upstream commit 890bd0f8997ae6ac0a367dd5146154a3963306dd ] OSD client should ignore cache/overlay flag if got redirect reply. Otherwise, the client hangs when the cache tier is in forward mode. [ idryomov: Redirects are effectively deprecated and no longer used or tested. The original tiering modes based on redirects are inherently flawed because redirects can race and reorder, potentially resulting in data corruption. The new proxy and readproxy tiering modes should be used instead of forward and readforward. Still marking for stable as obviously correct, though. ] Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/23296 URL: https://tracker.ceph.com/issues/36406 Signed-off-by: Jerry Lee <leisurelysw24@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and ↵Jere Leppänen
socket is closed [ Upstream commit d3e8e4c11870413789f029a71e72ae6e971fe678 ] Commit bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.") starts shutdown when an association is restarted, if in SHUTDOWN-PENDING state and the socket is closed. However, the rationale stated in that commit applies also when in SHUTDOWN-SENT state - we don't want to move an association to ESTABLISHED state when the socket has been closed, because that results in an association that is unreachable from user space. The problem scenario: 1. Client crashes and/or restarts. 2. Server (using one-to-one socket) calls close(). SHUTDOWN is lost. 3. Client reconnects using the same addresses and ports. 4. Server's association is restarted. The association and the socket move to ESTABLISHED state, even though the server process has closed its descriptor. Also, after step 4 when the server process exits, some resources are leaked in an attempt to release the underlying inet sock structure in ESTABLISHED state: IPv4: Attempt to release TCP socket in state 1 00000000377288c7 Fix by acting the same way as in SHUTDOWN-PENDING state. That is, if an association is restarted in SHUTDOWN-SENT state and the socket is closed, then start shutdown and don't move the association or the socket to ESTABLISHED state. Fixes: bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.") Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03sctp: Don't add the shutdown timer if its already been addedNeil Horman
[ Upstream commit 20a785aa52c82246055a089e55df9dac47d67da1 ] This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e #5 [f418de68] sctp_do_sm at f8db8c77 [sctp] #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] #7 [f418df48] inet_shutdown at c080baf9 #8 [f418df5c] sys_shutdown at c079eedf #9 [f418df70] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: jere.leppanen@nokia.com CC: marcelo.leitner@gmail.com CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"Yuqi Jin
[ Upstream commit a6211caa634da39d861a47437ffcda8b38ef421b ] Commit adb03115f459 ("net: get rid of an signed integer overflow in ip_idents_reserve()") used atomic_cmpxchg to replace "atomic_add_return" inside the function "ip_idents_reserve". The reason was to avoid UBSAN warning. However, this change has caused performance degrade and in GCC-8, fno-strict-overflow is now mapped to -fwrapv -fwrapv-pointer and signed integer overflow is now undefined by default at all optimization levels[1]. Moreover, it was a bug in UBSAN vs -fwrapv /-fno-strict-overflow, so Let's revert it safely. [1] https://gcc.gnu.org/gcc-8/changes.html Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiong Wang <jiongwang@huawei.com> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: qrtr: Fix passing invalid reference to qrtr_local_enqueue()Manivannan Sadhasivam
[ Upstream commit d28ea1fbbf437054ef339afec241019f2c4e2bb6 ] Once the traversal of the list is completed with list_for_each_entry(), the iterator (node) will point to an invalid object. So passing this to qrtr_local_enqueue() which is outside of the iterator block is erroneous eventhough the object is not used. So fix this by passing NULL to qrtr_local_enqueue(). Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: ipip: fix wrong address family in init error pathVadim Fedorenko
[ Upstream commit 57ebc8f08504f176eb0f25b3e0fde517dec61a4f ] In case of error with MPLS support the code is misusing AF_INET instead of AF_MPLS. Fixes: 1b69e7e6c4da ("ipip: support MPLS over IPv4") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: inet_csk: Fix so_reuseport bind-address cache in tb->fast*Martin KaFai Lau
[ Upstream commit 88d7fcfa3b1fe670f0412b95be785aafca63352b ] The commit 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") added a bind-address cache in tb->fast*. The tb->fast* caches the address of a sk which has successfully been binded with SO_REUSEPORT ON. The idea is to avoid the expensive conflict search in inet_csk_bind_conflict(). There is an issue with wildcard matching where sk_reuseport_match() should have returned false but it is currently returning true. It ends up hiding bind conflict. For example, bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */ bind("[::2]:443"); /* with SO_REUSEPORT. Succeed. */ bind("[::]:443"); /* with SO_REUSEPORT. Still Succeed where it shouldn't */ The last bind("[::]:443") with SO_REUSEPORT on should have failed because it should have a conflict with the very first bind("[::1]:443") which has SO_REUSEPORT off. However, the address "[::2]" is cached in tb->fast* in the second bind. In the last bind, the sk_reuseport_match() returns true because the binding sk's wildcard addr "[::]" matches with the "[::2]" cached in tb->fast*. The correct bind conflict is reported by removing the second bind such that tb->fast* cache is not involved and forces the bind("[::]:443") to go through the inet_csk_bind_conflict(): bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */ bind("[::]:443"); /* with SO_REUSEPORT. -EADDRINUSE */ The expected behavior for sk_reuseport_match() is, it should only allow the "cached" tb->fast* address to be used as a wildcard match but not the address of the binding sk. To do that, the current "bool match_wildcard" arg is split into "bool match_sk1_wildcard" and "bool match_sk2_wildcard". This change only affects the sk_reuseport_match() which is only used by inet_csk (e.g. TCP). The other use cases are calling inet_rcv_saddr_equal() and this patch makes it pass the same "match_wildcard" arg twice to the "ipv[46]_rcv_saddr_equal(..., match_wildcard, match_wildcard)". Cc: Josef Bacik <jbacik@fb.com> Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03__netif_receive_skb_core: pass skb by referenceBoris Sukholitko
[ Upstream commit c0bbbdc32febd4f034ecbf3ea17865785b2c0652 ] __netif_receive_skb_core may change the skb pointer passed into it (e.g. in rx_handler). The original skb may be freed as a result of this operation. The callers of __netif_receive_skb_core may further process original skb by using pt_prev pointer returned by __netif_receive_skb_core thus leading to unpleasant effects. The solution is to pass skb by reference into __netif_receive_skb_core. v2: Added Fixes tag and comment regarding ppt_prev and skb invariant. Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup") Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: dsa: mt7530: fix roaming from DSA user portsDENG Qingfang
[ Upstream commit 5e5502e012b8129e11be616acb0f9c34bc8f8adb ] When a client moves from a DSA user port to a software port in a bridge, it cannot reach any other clients that connected to the DSA user ports. That is because SA learning on the CPU port is disabled, so the switch ignores the client's frames from the CPU port and still thinks it is at the user port. Fix it by enabling SA learning on the CPU port. To prevent the switch from learning from flooding frames from the CPU port, set skb->offload_fwd_mark to 1 for unicast and broadcast frames, and let the switch flood them instead of trapping to the CPU port. Multicast frames still need to be trapped to the CPU port for snooping, so set the SA_DIS bit of the MTK tag to 1 when transmitting those frames to disable SA learning. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03ax25: fix setsockopt(SO_BINDTODEVICE)Eric Dumazet
[ Upstream commit 687775cec056b38a4c8f3291e0dd7a9145f7b667 ] syzbot was able to trigger this trace [1], probably by using a zero optlen. While we are at it, cap optlen to IFNAMSIZ - 1 instead of IFNAMSIZ. [1] BUG: KMSAN: uninit-value in strnlen+0xf9/0x170 lib/string.c:569 CPU: 0 PID: 8807 Comm: syz-executor483 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 strnlen+0xf9/0x170 lib/string.c:569 dev_name_hash net/core/dev.c:207 [inline] netdev_name_node_lookup net/core/dev.c:277 [inline] __dev_get_by_name+0x75/0x2b0 net/core/dev.c:778 ax25_setsockopt+0xfa3/0x1170 net/ax25/af_ax25.c:654 __compat_sys_setsockopt+0x4ed/0x910 net/compat.c:403 __do_compat_sys_setsockopt net/compat.c:413 [inline] __se_compat_sys_setsockopt+0xdd/0x100 net/compat.c:410 __ia32_compat_sys_setsockopt+0x62/0x80 net/compat.c:410 do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline] do_fast_syscall_32+0x3bf/0x6d0 arch/x86/entry/common.c:398 entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f57dd9 Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffae8c1c EFLAGS: 00000217 ORIG_RAX: 000000000000016e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000101 RDX: 0000000000000019 RSI: 0000000020000000 RDI: 0000000000000004 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Local variable ----devname@ax25_setsockopt created at: ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536 ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27rxrpc: Fix ack discardDavid Howells
[ Upstream commit 441fdee1eaf050ef0040bde0d7af075c1c6a6d8b ] The Rx protocol has a "previousPacket" field in it that is not handled in the same way by all protocol implementations. Sometimes it contains the serial number of the last DATA packet received, sometimes the sequence number of the last DATA packet received and sometimes the highest sequence number so far received. AF_RXRPC is using this to weed out ACKs that are out of date (it's possible for ACK packets to get reordered on the wire), but this does not work with OpenAFS which will just stick the sequence number of the last packet seen into previousPacket. The issue being seen is that big AFS FS.StoreData RPC (eg. of ~256MiB) are timing out when partly sent. A trace was captured, with an additional tracepoint to show ACKs being discarded in rxrpc_input_ack(). Here's an excerpt showing the problem. 52873.203230: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 0002449c q=00024499 fl=09 A DATA packet with sequence number 00024499 has been transmitted (the "q=" field). ... 52873.243296: rxrpc_rx_ack: c=000004ae 00012a2b DLY r=00024499 f=00024497 p=00024496 n=0 52873.243376: rxrpc_rx_ack: c=000004ae 00012a2c IDL r=0002449b f=00024499 p=00024498 n=0 52873.243383: rxrpc_rx_ack: c=000004ae 00012a2d OOS r=0002449d f=00024499 p=0002449a n=2 The Out-Of-Sequence ACK indicates that the server didn't see DATA sequence number 00024499, but did see seq 0002449a (previousPacket, shown as "p=", skipped the number, but firstPacket, "f=", which shows the bottom of the window is set at that point). 52873.252663: rxrpc_retransmit: c=000004ae q=24499 a=02 xp=14581537 52873.252664: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244bc q=00024499 fl=0b *RETRANS* The packet has been retransmitted. Retransmission recurs until the peer says it got the packet. 52873.271013: rxrpc_rx_ack: c=000004ae 00012a31 OOS r=000244a1 f=00024499 p=0002449e n=6 More OOS ACKs indicate that the other packets that are already in the transmission pipeline are being received. The specific-ACK list is up to 6 ACKs and NAKs. ... 52873.284792: rxrpc_rx_ack: c=000004ae 00012a49 OOS r=000244b9 f=00024499 p=000244b6 n=30 52873.284802: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=63505500 52873.284804: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c2 q=00024499 fl=0b *RETRANS* 52873.287468: rxrpc_rx_ack: c=000004ae 00012a4a OOS r=000244ba f=00024499 p=000244b7 n=31 52873.287478: rxrpc_rx_ack: c=000004ae 00012a4b OOS r=000244bb f=00024499 p=000244b8 n=32 At this point, the server's receive window is full (n=32) with presumably 1 NAK'd packet and 31 ACK'd packets. We can't transmit any more packets. 52873.287488: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=61327980 52873.287489: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c3 q=00024499 fl=0b *RETRANS* 52873.293850: rxrpc_rx_ack: c=000004ae 00012a4c DLY r=000244bc f=000244a0 p=00024499 n=25 And now we've received an ACK indicating that a DATA retransmission was received. 7 packets have been processed (the occupied part of the window moved, as indicated by f= and n=). 52873.293853: rxrpc_rx_discard_ack: c=000004ae r=00012a4c 000244a0<00024499 00024499<000244b8 However, the DLY ACK gets discarded because its previousPacket has gone backwards (from p=000244b8, in the ACK at 52873.287478 to p=00024499 in the ACK at 52873.293850). We then end up in a continuous cycle of retransmit/discard. kafs fails to update its window because it's discarding the ACKs and can't transmit an extra packet that would clear the issue because the window is full. OpenAFS doesn't change the previousPacket value in the ACKs because no new DATA packets are received with a different previousPacket number. Fix this by altering the discard check to only discard an ACK based on previousPacket if there was no advance in the firstPacket. This allows us to transmit a new packet which will cause previousPacket to advance in the next ACK. The check, however, needs to allow for the possibility that previousPacket may actually have had the serial number placed in it instead - in which case it will go outside the window and we should ignore it. Fixes: 1a2391c30c0b ("rxrpc: Fix detection of out of order acks") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27rxrpc: Trace discarded ACKsDavid Howells
[ Upstream commit d1f129470e6cb79b8b97fecd12689f6eb49e27fe ] Add a tracepoint to track received ACKs that are discarded due to being outside of the Tx window. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27rxrpc: Fix a memory leak in rxkad_verify_response()Qiushi Wu
commit f45d01f4f30b53c3a0a1c6c1c154acb7ff74ab9f upstream. A ticket was not released after a call of the function "rxkad_decrypt_ticket" failed. Thus replace the jump target "temporary_error_free_resp" by "temporary_error_free_ticket". Fixes: 8c2f826dc3631 ("rxrpc: Don't put crypto buffers on the stack") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: David Howells <dhowells@redhat.com> cc: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()Stefano Brivio
[ Upstream commit 6f7c9caf017be8ab0fe3b99509580d0793bf0833 ] Replace negations of nft_rbtree_interval_end() with a new helper, nft_rbtree_interval_start(), wherever this helps to visualise the problem at hand, that is, for all the occurrences except for the comparison against given flags in __nft_rbtree_get(). This gets especially useful in the next patch. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20netfilter: conntrack: avoid gcc-10 zero-length-bounds warningArnd Bergmann
[ Upstream commit 2c407aca64977ede9b9f35158e919773cae2082f ] gcc-10 warns around a suspicious access to an empty struct member: net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc': net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds] 1522 | memset(&ct->__nfct_init_offset[0], 0, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from net/netfilter/nf_conntrack_core.c:37: include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset' 90 | u8 __nfct_init_offset[0]; | ^~~~~~~~~~~~~~~~~~ The code is correct but a bit unusual. Rework it slightly in a way that does not trigger the warning, using an empty struct instead of an empty array. There are probably more elegant ways to do this, but this is the smallest change. Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20tcp: fix SO_RCVLOWAT hangs with fat skbsEric Dumazet
[ Upstream commit 24adbc1676af4e134e709ddc7f34cf2adc2131e4 ] We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100% overhead in tcp_set_rcvlowat() This works well when skb->len/skb->truesize ratio is bigger than 0.5 But if we receive packets with small MSS, we can end up in a situation where not enough bytes are available in the receive queue to satisfy RCVLOWAT setting. As our sk_rcvbuf limit is hit, we send zero windows in ACK packets, preventing remote peer from sending more data. Even autotuning does not help, because it only triggers at the time user process drains the queue. If no EPOLLIN is generated, this can not happen. Note poll() has a similar issue, after commit c7004482e8dc ("tcp: Respect SO_RCVLOWAT in tcp_poll().") Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net: tcp: fix rx timestamp behavior for tcp_recvmsgKelly Littlepage
[ Upstream commit cc4de047b33be247f9c8150d3e496743a49642b8 ] The stated intent of the original commit is to is to "return the timestamp corresponding to the highest sequence number data returned." The current implementation returns the timestamp for the last byte of the last fully read skb, which is not necessarily the last byte in the recv buffer. This patch converts behavior to the original definition, and to the behavior of the previous draft versions of commit 98aaa913b4ed ("tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg") which also match this behavior. Fixes: 98aaa913b4ed ("tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg") Co-developed-by: Iris Liu <iris@onechronos.com> Signed-off-by: Iris Liu <iris@onechronos.com> Signed-off-by: Kelly Littlepage <kelly@onechronos.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20netprio_cgroup: Fix unlimited memory leak of v2 cgroupsZefan Li
[ Upstream commit 090e28b229af92dc5b40786ca673999d59e73056 ] If systemd is configured to use hybrid mode which enables the use of both cgroup v1 and v2, systemd will create new cgroup on both the default root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach task to the two cgroups. If the task does some network thing then the v2 cgroup can never be freed after the session exited. One of our machines ran into OOM due to this memory leak. In the scenario described above when sk_alloc() is called cgroup_sk_alloc() thought it's in v2 mode, so it stores the cgroup pointer in sk->sk_cgrp_data and increments the cgroup refcnt, but then sock_update_netprioidx() thought it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the cgroup refcnt will never be freed. Currently we do the mode switch when someone writes to the ifpriomap cgroup control file. The easiest fix is to also do the switch when a task is attached to a new cgroup. Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Yang Yingliang <yangyingliang@huawei.com> Tested-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net: ipv4: really enforce backoff for redirectsPaolo Abeni
[ Upstream commit 57644431a6c2faac5d754ebd35780cf43a531b1a ] In commit b406472b5ad7 ("net: ipv4: avoid mixed n_redirects and rate_tokens usage") I missed the fact that a 0 'rate_tokens' will bypass the backoff algorithm. Since rate_tokens is cleared after a redirect silence, and never incremented on redirects, if the host keeps receiving packets requiring redirect it will reply ignoring the backoff. Additionally, the 'rate_last' field will be updated with the cadence of the ingress packet requiring redirect. If that rate is high enough, that will prevent the host from generating any other kind of ICMP messages The check for a zero 'rate_tokens' value was likely a shortcut to avoid the more complex backoff algorithm after a redirect silence period. Address the issue checking for 'n_redirects' instead, which is incremented on successful redirect, and does not interfere with other ICMP replies. Fixes: b406472b5ad7 ("net: ipv4: avoid mixed n_redirects and rate_tokens usage") Reported-and-tested-by: Colin Walters <walters@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20tcp: fix error recovery in tcp_zerocopy_receive()Eric Dumazet
[ Upstream commit e776af608f692a7a647455106295fa34469e7475 ] If user provides wrong virtual address in TCP_ZEROCOPY_RECEIVE operation we want to return -EINVAL error. But depending on zc->recv_skip_hint content, we might return -EIO error if the socket has SOCK_DONE set. Make sure to return -EINVAL in this case. BUG: KMSAN: uninit-value in tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline] BUG: KMSAN: uninit-value in do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685 CPU: 1 PID: 625 Comm: syz-executor.0 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline] do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685 tcp_getsockopt+0xf8/0x1f0 net/ipv4/tcp.c:3728 sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3131 __sys_getsockopt+0x533/0x7b0 net/socket.c:2177 __do_sys_getsockopt net/socket.c:2192 [inline] __se_sys_getsockopt+0xe1/0x100 net/socket.c:2189 __x64_sys_getsockopt+0x62/0x80 net/socket.c:2189 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c829 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1deeb72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00000000004e01e0 RCX: 000000000045c829 RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000009 RBP: 000000000078bf00 R08: 0000000020000200 R09: 0000000000000000 R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000001d8 R14: 00000000004d3038 R15: 00007f1deeb736d4 Local variable ----zc@do_tcp_getsockopt created at: do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670 do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670 Fixes: 05255b823a61 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"Maciej Żenczykowski
[ Upstream commit 09454fd0a4ce23cb3d8af65066c91a1bf27120dd ] This reverts commit 19bda36c4299ce3d7e5bce10bebe01764a655a6d: | ipv6: add mtu lock check in __ip6_rt_update_pmtu | | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu. | It leaded to that mtu lock doesn't really work when receiving the pkt | of ICMPV6_PKT_TOOBIG. | | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4 | did in __ip_rt_update_pmtu. The above reasoning is incorrect. IPv6 *requires* icmp based pmtu to work. There's already a comment to this effect elsewhere in the kernel: $ git grep -p -B1 -A3 'RTAX_MTU lock' net/ipv6/route.c=4813= static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg) ... /* In IPv6 pmtu discovery is not optional, so that RTAX_MTU lock cannot disable it. We still use this lock to block changes caused by addrconf/ndisc. */ This reverts to the pre-4.9 behaviour. Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Fixes: 19bda36c4299 ("ipv6: add mtu lock check in __ip6_rt_update_pmtu") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20netlabel: cope with NULL catmapPaolo Abeni
[ Upstream commit eead1c2ea2509fd754c6da893a94f0e69e83ebe4 ] The cipso and calipso code can set the MLS_CAT attribute on successful parsing, even if the corresponding catmap has not been allocated, as per current configuration and external input. Later, selinux code tries to access the catmap if the MLS_CAT flag is present via netlbl_catmap_getlong(). That may cause null ptr dereference while processing incoming network traffic. Address the issue setting the MLS_CAT flag only if the catmap is really allocated. Additionally let netlbl_catmap_getlong() cope with NULL catmap. Reported-by: Matthew Sheets <matthew.sheets@gd-ms.com> Fixes: 4b8feff251da ("netlabel: fix the horribly broken catmap functions") Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net: fix a potential recursive NETDEV_FEAT_CHANGECong Wang
[ Upstream commit dd912306ff008891c82cd9f63e8181e47a9cb2fb ] syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event between bonding master and slave. I managed to find a reproducer for this: ip li set bond0 up ifenslave bond0 eth0 brctl addbr br0 ethtool -K eth0 lro off brctl addif br0 bond0 ip li set br0 up When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave, it captures this and calls bond_compute_features() to fixup its master's and other slaves' features. However, when syncing with its lower devices by netdev_sync_lower_features() this event is triggered again on slaves when the LRO feature fails to change, so it goes back and forth recursively until the kernel stack is exhausted. Commit 17b85d29e82c intentionally lets __netdev_update_features() return -1 for such a failure case, so we have to just rely on the existing check inside netdev_sync_lower_features() and skip NETDEV_FEAT_CHANGE event only for this specific failure case. Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack") Reported-by: syzbot+e73ceacfd8560cc8a3ca@syzkaller.appspotmail.com Reported-by: syzbot+c2fb6f9ddcea95ba49b5@syzkaller.appspotmail.com Cc: Jarod Wilson <jarod@redhat.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jann Horn <jannh@google.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20drop_monitor: work around gcc-10 stringop-overflow warningArnd Bergmann
[ Upstream commit dc30b4059f6e2abf3712ab537c8718562b21c45d ] The current gcc-10 snapshot produces a false-positive warning: net/core/drop_monitor.c: In function 'trace_drop_common.constprop': cc1: error: writing 8 bytes into a region of size 0 [-Werror=stringop-overflow=] In file included from net/core/drop_monitor.c:23: include/uapi/linux/net_dropmon.h:36:8: note: at offset 0 to object 'entries' with size 4 declared here 36 | __u32 entries; | ^~~~~~~ I reported this in the gcc bugzilla, but in case it does not get fixed in the release, work around it by using a temporary variable. Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94881 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>