summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-04-14net: openvswitch: conntrack: simplify the return expression of ↵Zheng Yongjun
ovs_ct_limit_get_default_limit() [ Upstream commit 5e359044c107ecbdc2e9b3fd5ce296006e6de4bc ] Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);Norman Maurer
[ Upstream commit 98184612aca0a9ee42b8eb0262a49900ee9eef0d ] Support for UDP_GRO was added in the past but the implementation for getsockopt was missed which did lead to an error when we tried to retrieve the setting for UDP_GRO. This patch adds the missing switch case for UDP_GRO Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Norman Maurer <norman_maurer@apple.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net/rds: Fix a use after free in rds_message_map_pagesLv Yunlong
[ Upstream commit bdc2ab5c61a5c07388f4820ff21e787b4dfd1ced ] In rds_message_map_pages, the rm is freed by rds_message_put(rm). But rm is still used by rm->data.op_sg in return value. My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is freed to avoid the uaf. Fixes: 7dba92037baf3 ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net/ncsi: Avoid channel_monitor hrtimer deadlockMilton Miller
[ Upstream commit 03cb4d05b4ea9a3491674ca40952adb708d549fa ] Calling ncsi_stop_channel_monitor from channel_monitor is a guaranteed deadlock on SMP because stop calls del_timer_sync on the timer that invoked channel_monitor as its timer function. Recognise the inherent race of marking the monitor disabled before deleting the timer by just returning if enable was cleared. After a timeout (the default case -- reset to START when response received) just mark the monitor.enabled false. If the channel has an entry on the channel_queue list, or if the state is not ACTIVE or INACTIVE, then warn and mark the timer stopped and don't restart, as the locking is broken somehow. Fixes: 0795fb2021f0 ("net/ncsi: Stop monitor if channel times out or is inactive") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Eddie James <eajames@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net:tipc: Fix a double free in tipc_sk_mcast_rcvLv Yunlong
[ Upstream commit 6bf24dc0cc0cc43b29ba344b66d78590e687e046 ] In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq) finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time. Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after the branch completed. My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because this skb will be freed by kfree_skb(skb) finally. Fixes: cb1b728096f54 ("tipc: eliminate race condition at multicast reception") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZEOliver Hartkopp
[ Upstream commit 9e9714742fb70467464359693a73b911a630226f ] Since commit f5223e9eee65 ("can: extend sockaddr_can to include j1939 members") the sockaddr_can has been extended in size and a new CAN_REQUIRED_SIZE macro has been introduced to calculate the protocol specific needed size. The ABI for the msg_name and msg_namelen has not been adapted to the new CAN_REQUIRED_SIZE macro for the other CAN protocols which leads to a problem when an existing binary reads the (increased) struct sockaddr_can in msg_name. Fixes: f5223e9eee65 ("can: extend sockaddr_can to include j1939 members") Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Link: https://lore.kernel.org/linux-can/1135648123.112255.1616613706554.JavaMail.zimbra@nod.at/T/#t Link: https://lore.kernel.org/r/20210325125850.1620-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14esp: delete NETIF_F_SCTP_CRC bit from features for esp offloadXin Long
[ Upstream commit 154deab6a3ba47792936edf77f2f13a1cbc4351d ] Now in esp4/6_gso_segment(), before calling inner proto .gso_segment, NETIF_F_CSUM_MASK bits are deleted, as HW won't be able to do the csum for inner proto due to the packet encrypted already. So the UDP/TCP packet has to do the checksum on its own .gso_segment. But SCTP is using CRC checksum, and for that NETIF_F_SCTP_CRC should be deleted to make SCTP do the csum in own .gso_segment as well. In Xiumei's testing with SCTP over IPsec/veth, the packets are kept dropping due to the wrong CRC checksum. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net: xfrm: Localize sequence counter per network namespaceAhmed S. Darwish
[ Upstream commit e88add19f68191448427a6e4eb059664650a837f ] A sequence counter write section must be serialized or its internal state can get corrupted. The "xfrm_state_hash_generation" seqcount is global, but its write serialization lock (net->xfrm.xfrm_state_lock) is instantiated per network namespace. The write protection is thus insufficient. To provide full protection, localize the sequence counter per network namespace instead. This should be safe as both the seqcount read and write sections access data exclusively within the network namespace. It also lays the foundation for transforming "xfrm_state_hash_generation" data type from seqcount_t to seqcount_LOCKNAME_t in further commits. Fixes: b65e3d7be06f ("xfrm: state: add sequence count to detect hash resizes") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14xfrm: interface: fix ipv4 pmtu check to honor ip header dfEyal Birger
[ Upstream commit 8fc0e3b6a8666d656923d214e4dc791e9a17164a ] Frag needed should only be sent if the header enables DF. This fix allows packets larger than MTU to pass the xfrm interface and be fragmented after encapsulation, aligning behavior with non-interface xfrm. Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net: let skb_orphan_partial wake-up waiters.Paolo Abeni
commit 9adc89af724f12a03b47099cd943ed54e877cd59 upstream. Currently the mentioned helper can end-up freeing the socket wmem without waking-up any processes waiting for more write memory. If the partially orphaned skb is attached to an UDP (or raw) socket, the lack of wake-up can hang the user-space. Even for TCP sockets not calling the sk destructor could have bad effects on TSQ. Address the issue using skb_orphan to release the sk wmem before setting the new sock_efree destructor. Additionally bundle the whole ownership update in a new helper, so that later other potential users could avoid duplicate code. v1 -> v2: - use skb_orphan() instead of sort of open coding it (Eric) - provide an helper for the ownership change (Eric) Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()Maciej Żenczykowski
commit 630e4576f83accf90366686f39808d665d8dbecc upstream. Found by virtue of ipv6 raw sockets not honouring the per-socket IP{,V6}_FREEBIND setting. Based on hits found via: git grep '[.]ip_nonlocal_bind' We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT, and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl. The helper is defined as: static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) { return net->ipv6.sysctl.ip_nonlocal_bind || inet->freebind || inet->transparent; } so this change only widens the accepted opt-outs and is thus a clean bugfix. I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug, but IMHO this should be applied to stable kernels as far back as possible. As such I'm adding a 'fixes' tag with the commit that originally added the helper, which happened in 4.19. Backporting to older LTS kernels (at least 4.9 and 4.14) would presumably require open-coding it or backporting the helper as well. Other possibly relevant commits: v4.18-rc6-1502-g83ba4645152d net: add helpers checking if socket can be bound to nonlocal address v4.18-rc6-1431-gd0c1f01138c4 net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind v4.14-rc5-271-gb71d21c274ef sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND v4.7-rc7-1883-g9b9742022888 sctp: support ipv6 nonlocal bind v4.1-12247-g35a256fee52c ipv6: Nonlocal bind Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14net: hsr: Reset MAC header for Tx pathKurt Kanzenbach
commit 9d6803921a16f4d768dc41a75375629828f4d91e upstream. Reset MAC header in HSR Tx path. This is needed, because direct packet transmission, e.g. by specifying PACKET_QDISC_BYPASS does not reset the MAC header. This has been observed using the following setup: |$ ip link add name hsr0 type hsr slave1 lan0 slave2 lan1 supervision 45 version 1 |$ ifconfig hsr0 up |$ ./test hsr0 The test binary is using mmap'ed sockets and is specifying the PACKET_QDISC_BYPASS socket option. This patch resolves the following warning on a non-patched kernel: |[ 112.725394] ------------[ cut here ]------------ |[ 112.731418] WARNING: CPU: 1 PID: 257 at net/hsr/hsr_forward.c:560 hsr_forward_skb+0x484/0x568 |[ 112.739962] net/hsr/hsr_forward.c:560: Malformed frame (port_src hsr0) The warning can be safely removed, because the other call sites of hsr_forward_skb() make sure that the skb is prepared correctly. Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14mac80211: fix TXQ AC confusionJohannes Berg
commit 1153a74768a9212daadbb50767aa400bc6a0c9b0 upstream. Normally, TXQs have txq->tid = tid; txq->ac = ieee80211_ac_from_tid(tid); However, the special management TXQ actually has txq->tid = IEEE80211_NUM_TIDS; // 16 txq->ac = IEEE80211_AC_VO; This makes sense, but ieee80211_ac_from_tid(16) is the same as ieee80211_ac_from_tid(0) which is just IEEE80211_AC_BE. Now, normally this is fine. However, if the netdev queues were stopped, then the code in ieee80211_tx_dequeue() will propagate the stop from the interface (vif->txqs_stopped[]) if the AC 2 (ieee80211_ac_from_tid(txq->tid)) is marked as stopped. On wake, however, __ieee80211_wake_txqs() will wake the TXQ if AC 0 (txq->ac) is woken up. If a driver stops all queues with ieee80211_stop_tx_queues() and then wakes them again with ieee80211_wake_tx_queues(), the ieee80211_wake_txqs() tasklet will run to resync queue and TXQ state. If all queues were woken, then what'll happen is that _ieee80211_wake_txqs() will run in order of HW queues 0-3, typically (and certainly for iwlwifi) corresponding to ACs 0-3, so it'll call __ieee80211_wake_txqs() for each AC in order 0-3. When __ieee80211_wake_txqs() is called for AC 0 (VO) that'll wake up the management TXQ (remember its tid is 16), and the driver's wake_tx_queue() will be called. That tries to get a frame, which will immediately *stop* the TXQ again, because now we check against AC 2, and AC 2 hasn't yet been marked as woken up again in sdata->vif.txqs_stopped[] since we're only in the __ieee80211_wake_txqs() call for AC 0. Thus, the management TXQ will never be started again. Fix this by checking txq->ac directly instead of calculating the AC as ieee80211_ac_from_tid(txq->tid). Fixes: adf8ed01e4fd ("mac80211: add an optional TXQ for other PS-buffered frames") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20210323210500.bf4d50afea4a.I136ffde910486301f8818f5442e3c9bf8670a9c4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14net: sched: sch_teql: fix null-pointer dereferencePavel Tikhomirov
commit 1ffbc7ea91606e4abd10eb60de5367f1c86daf5e upstream. Reproduce: modprobe sch_teql tc qdisc add dev teql0 root teql0 This leads to (for instance in Centos 7 VM) OOPS: [ 532.366633] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 532.366733] IP: [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.366825] PGD 80000001376d5067 PUD 137e37067 PMD 0 [ 532.366906] Oops: 0000 [#1] SMP [ 532.366987] Modules linked in: sch_teql ... [ 532.367945] CPU: 1 PID: 3026 Comm: tc Kdump: loaded Tainted: G ------------ T 3.10.0-1062.7.1.el7.x86_64 #1 [ 532.368041] Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.2 04/01/2014 [ 532.368125] task: ffff8b7d37d31070 ti: ffff8b7c9fdbc000 task.ti: ffff8b7c9fdbc000 [ 532.368224] RIP: 0010:[<ffffffffc06124a8>] [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.368320] RSP: 0018:ffff8b7c9fdbf8e0 EFLAGS: 00010286 [ 532.368394] RAX: ffffffffc0612490 RBX: ffff8b7cb1565e00 RCX: ffff8b7d35ba2000 [ 532.368476] RDX: ffff8b7d35ba2000 RSI: 0000000000000000 RDI: ffff8b7cb1565e00 [ 532.368557] RBP: ffff8b7c9fdbf8f8 R08: ffff8b7d3fd1f140 R09: ffff8b7d3b001600 [ 532.368638] R10: ffff8b7d3b001600 R11: ffffffff84c7d65b R12: 00000000ffffffd8 [ 532.368719] R13: 0000000000008000 R14: ffff8b7d35ba2000 R15: ffff8b7c9fdbf9a8 [ 532.368800] FS: 00007f6a4e872740(0000) GS:ffff8b7d3fd00000(0000) knlGS:0000000000000000 [ 532.368885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 532.368961] CR2: 00000000000000a8 CR3: 00000001396ee000 CR4: 00000000000206e0 [ 532.369046] Call Trace: [ 532.369159] [<ffffffff84c8192e>] qdisc_create+0x36e/0x450 [ 532.369268] [<ffffffff846a9b49>] ? ns_capable+0x29/0x50 [ 532.369366] [<ffffffff849afde2>] ? nla_parse+0x32/0x120 [ 532.369442] [<ffffffff84c81b4c>] tc_modify_qdisc+0x13c/0x610 [ 532.371508] [<ffffffff84c693e7>] rtnetlink_rcv_msg+0xa7/0x260 [ 532.372668] [<ffffffff84907b65>] ? sock_has_perm+0x75/0x90 [ 532.373790] [<ffffffff84c69340>] ? rtnl_newlink+0x890/0x890 [ 532.374914] [<ffffffff84c8da7b>] netlink_rcv_skb+0xab/0xc0 [ 532.376055] [<ffffffff84c63708>] rtnetlink_rcv+0x28/0x30 [ 532.377204] [<ffffffff84c8d400>] netlink_unicast+0x170/0x210 [ 532.378333] [<ffffffff84c8d7a8>] netlink_sendmsg+0x308/0x420 [ 532.379465] [<ffffffff84c2f3a6>] sock_sendmsg+0xb6/0xf0 [ 532.380710] [<ffffffffc034a56e>] ? __xfs_filemap_fault+0x8e/0x1d0 [xfs] [ 532.381868] [<ffffffffc034a75c>] ? xfs_filemap_fault+0x2c/0x30 [xfs] [ 532.383037] [<ffffffff847ec23a>] ? __do_fault.isra.61+0x8a/0x100 [ 532.384144] [<ffffffff84c30269>] ___sys_sendmsg+0x3e9/0x400 [ 532.385268] [<ffffffff847f3fad>] ? handle_mm_fault+0x39d/0x9b0 [ 532.386387] [<ffffffff84d88678>] ? __do_page_fault+0x238/0x500 [ 532.387472] [<ffffffff84c31921>] __sys_sendmsg+0x51/0x90 [ 532.388560] [<ffffffff84c31972>] SyS_sendmsg+0x12/0x20 [ 532.389636] [<ffffffff84d8dede>] system_call_fastpath+0x25/0x2a [ 532.390704] [<ffffffff84d8de21>] ? system_call_after_swapgs+0xae/0x146 [ 532.391753] Code: 00 00 00 00 00 00 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b b7 48 01 00 00 48 89 fb <48> 8b 8e a8 00 00 00 48 85 c9 74 43 48 89 ca eb 0f 0f 1f 80 00 [ 532.394036] RIP [<ffffffffc06124a8>] teql_destroy+0x18/0x100 [sch_teql] [ 532.395127] RSP <ffff8b7c9fdbf8e0> [ 532.396179] CR2: 00000000000000a8 Null pointer dereference happens on master->slaves dereference in teql_destroy() as master is null-pointer. When qdisc_create() calls teql_qdisc_init() it imediately fails after check "if (m->dev == dev)" because both devices are teql0, and it does not set qdisc_priv(sch)->m leaving it zero on error path, then qdisc_create() imediately calls teql_destroy() which does not expect zero master pointer and we get OOPS. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved fieldTetsuo Handa
commit 08c27f3322fec11950b8f1384aa0f3b11d028528 upstream. KMSAN found uninitialized value at batadv_tt_prepare_tvlv_local_data() [1], for commit ced72933a5e8ab52 ("batman-adv: use CRC32C instead of CRC16 in TT code") inserted 'reserved' field into "struct batadv_tvlv_tt_data" and commit 7ea7b4a142758dea ("batman-adv: make the TT CRC logic VLAN specific") moved that field to "struct batadv_tvlv_tt_vlan_data" but left that field uninitialized. [1] https://syzkaller.appspot.com/bug?id=07f3e6dba96f0eb3cabab986adcd8a58b9bdbe9d Reported-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: ced72933a5e8ab52 ("batman-adv: use CRC32C instead of CRC16 in TT code") Fixes: 7ea7b4a142758dea ("batman-adv: make the TT CRC logic VLAN specific") Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlhMuhammad Usama Anjum
commit 864db232dc7036aa2de19749c3d5be0143b24f8f upstream. nlh is being checked for validtity two times when it is dereferenced in this function. Check for validity again when updating the flags through nlh pointer to make the dereferencing safe. CC: <stable@vger.kernel.org> Addresses-Coverity: ("NULL pointer dereference") Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14nfc: Avoid endless loops caused by repeated llcp_sock_connect()Xiaoming Ni
commit 4b5db93e7f2afbdfe3b78e37879a85290187e6f1 upstream. When sock_wait_state() returns -EINPROGRESS, "sk->sk_state" is LLCP_CONNECTING. In this case, llcp_sock_connect() is repeatedly invoked, nfc_llcp_sock_link() will add sk to local->connecting_sockets twice. sk->sk_node->next will point to itself, that will make an endless loop and hang-up the system. To fix it, check whether sk->sk_state is LLCP_CONNECTING in llcp_sock_connect() to avoid repeated invoking. Fixes: b4011239a08e ("NFC: llcp: Fix non blocking sockets connections") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.11 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14nfc: fix memory leak in llcp_sock_connect()Xiaoming Ni
commit 7574fcdbdcb335763b6b322f6928dc0fd5730451 upstream. In llcp_sock_connect(), use kmemdup to allocate memory for "llcp_sock->service_name". The memory is not released in the sock_unlink label of the subsequent failure branch. As a result, memory leakage occurs. fix CVE-2020-25672 Fixes: d646960f7986 ("NFC: Initial LLCP support") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.3 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14nfc: fix refcount leak in llcp_sock_connect()Xiaoming Ni
commit 8a4cd82d62b5ec7e5482333a72b58a4eea4979f0 upstream. nfc_llcp_local_get() is invoked in llcp_sock_connect(), but nfc_llcp_local_put() is not invoked in subsequent failure branches. As a result, refcount leakage occurs. To fix it, add calling nfc_llcp_local_put(). fix CVE-2020-25671 Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.6 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14nfc: fix refcount leak in llcp_sock_bind()Xiaoming Ni
commit c33b1cc62ac05c1dbb1cdafe2eb66da01c76ca8d upstream. nfc_llcp_local_get() is invoked in llcp_sock_bind(), but nfc_llcp_local_put() is not invoked in subsequent failure branches. As a result, refcount leakage occurs. To fix it, add calling nfc_llcp_local_put(). fix CVE-2020-25670 Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Link: https://www.openwall.com/lists/oss-security/2020/11/01/1 Cc: <stable@vger.kernel.org> #v3.6 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10netfilter: conntrack: Fix gre tunneling over ipv6Ludovic Senecaux
[ Upstream commit 8b2030b4305951f44afef80225f1475618e25a73 ] This fix permits gre connections to be tracked within ip6tables rules Signed-off-by: Ludovic Senecaux <linuxludo@free.fr> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-10mac80211: choose first enabled channel for monitorKarthikeyan Kathirvel
[ Upstream commit 041c881a0ba8a75f71118bd9766b78f04beed469 ] Even if the first channel from sband channel list is invalid or disabled mac80211 ends up choosing it as the default channel for monitor interfaces, making them not usable. Fix this by assigning the first available valid or enabled channel instead. Signed-off-by: Karthikeyan Kathirvel <kathirve@codeaurora.org> Link: https://lore.kernel.org/r/1615440547-7661-1-git-send-email-kathirve@codeaurora.org [reword commit message, comment, code cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07bpf: Remove MTU check in __bpf_skb_max_lenJesper Dangaard Brouer
commit 6306c1189e77a513bf02720450bb43bd4ba5d8ae upstream. Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07appletalk: Fix skb allocation size in loopback caseDoug Brown
[ Upstream commit 39935dccb21c60f9bbf1bb72d22ab6fd14ae7705 ] If a DDP broadcast packet is sent out to a non-gateway target, it is also looped back. There is a potential for the loopback device to have a longer hardware header length than the original target route's device, which can result in the skb not being created with enough room for the loopback device's hardware header. This patch fixes the issue by determining that a loopback will be necessary prior to allocating the skb, and if so, ensuring the skb has enough room. This was discovered while testing a new driver that creates a LocalTalk network interface (LTALK_HLEN = 1). It caused an skb_under_panic. Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07net: introduce CAN specific pointer in the struct net_deviceOleksij Rempel
[ Upstream commit 4e096a18867a5a989b510f6999d9c6b6622e8f7b ] Since 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") the CAN framework uses per device specific data in the AF_CAN protocol. For this purpose the struct net_device->ml_priv is used. Later the ml_priv usage in CAN was extended for other users, one of them being CAN_J1939. Later in the kernel ml_priv was converted to an union, used by other drivers. E.g. the tun driver started storing it's stats pointer. Since tun devices can claim to be a CAN device, CAN specific protocols will wrongly interpret this pointer, which will cause system crashes. Mostly this issue is visible in the CAN_J1939 stack. To fix this issue, we request a dedicated CAN pointer within the net_device struct. Reported-by: syzbot+5138c4dd15a0401bec7b@syzkaller.appspotmail.com Fixes: 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") Fixes: ffd956eef69b ("can: introduce CAN midlayer private and allocate it automatically") Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Fixes: 497a5757ce4e ("tun: switch to net core provided statistics counters") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20210223070127.4538-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07flow_dissector: fix TTL and TOS dissection on IPv4 fragmentsDavide Caratti
[ Upstream commit d2126838050ccd1dadf310ffb78b2204f3b032b9 ] the following command: # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ $tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop doesn't drop all IPv4 packets that match the configured TTL / destination address. In particular, if "fragment offset" or "more fragments" have non zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the correct behavior. Fixes: 518d8a2e9bad ("net/flow_dissector: add support for dissection of misc ip header fields") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07rpc: fix NULL dereference on kmalloc failureJ. Bruce Fields
[ Upstream commit 0ddc942394013f08992fc379ca04cffacbbe3dae ] I think this is unlikely but possible: svc_authenticate sets rq_authop and calls svcauth_gss_accept. The kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL, and returning SVC_DENIED. This causes svc_process_common to go to err_bad_auth, and eventually call svc_authorise. That calls ->release == svcauth_gss_release, which tries to dereference rq_auth_data. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07ipv6: weaken the v4mapped source checkJakub Kicinski
[ Upstream commit dcc32f4f183ab8479041b23a1525d48233df1d43 ] This reverts commit 6af1799aaf3f1bc8defedddfa00df3192445bbf3. Commit 6af1799aaf3f ("ipv6: drop incoming packets having a v4mapped source address") introduced an input check against v4mapped addresses. Use of such addresses on the wire is indeed questionable and not allowed on public Internet. As the commit pointed out https://tools.ietf.org/html/draft-itojun-v6ops-v4mapped-harmful-02 lists potential issues. Unfortunately there are applications which use v4mapped addresses, and breaking them is a clear regression. For example v4mapped addresses (or any semi-valid addresses, really) may be used for uni-direction event streams or packet export. Since the issue which sparked the addition of the check was with TCP and request_socks in particular push the check down to TCPv6 and DCCP. This restores the ability to receive UDPv6 packets with v4mapped address as the source. Keep using the IPSTATS_MIB_INHDRERRORS statistic to minimize the user-visible changes. Fixes: 6af1799aaf3f ("ipv6: drop incoming packets having a v4mapped source address") Reported-by: Sunyi Shao <sunyishao@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-07selinux: vsock: Set SID for socket returned by accept()David Brazdil
[ Upstream commit 1f935e8e72ec28dddb2dc0650b3b6626a293d94b ] For AF_VSOCK, accept() currently returns sockets that are unlabelled. Other socket families derive the child's SID from the SID of the parent and the SID of the incoming packet. This is typically done as the connected socket is placed in the queue that accept() removes from. Reuse the existing 'security_sk_clone' hook to copy the SID from the parent (server) socket to the child. There is no packet SID in this case. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30mac80211: fix double free in ibss_leaveMarkus Theil
commit 3bd801b14e0c5d29eeddc7336558beb3344efaa3 upstream. Clear beacon ie pointer and ie length after free in order to prevent double free. ================================================================== BUG: KASAN: double-free or invalid-free \ in ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 CPU: 0 PID: 8472 Comm: syz-executor100 Not tainted 5.11.0-rc6-syzkaller #0 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2c6 mm/kasan/report.c:230 kasan_report_invalid_free+0x51/0x80 mm/kasan/report.c:355 ____kasan_slab_free+0xcc/0xe0 mm/kasan/common.c:341 kasan_slab_free include/linux/kasan.h:192 [inline] __cache_free mm/slab.c:3424 [inline] kfree+0xed/0x270 mm/slab.c:3760 ieee80211_ibss_leave+0x83/0xe0 net/mac80211/ibss.c:1876 rdev_leave_ibss net/wireless/rdev-ops.h:545 [inline] __cfg80211_leave_ibss+0x19a/0x4c0 net/wireless/ibss.c:212 __cfg80211_leave+0x327/0x430 net/wireless/core.c:1172 cfg80211_leave net/wireless/core.c:1221 [inline] cfg80211_netdev_notifier_call+0x9e8/0x12c0 net/wireless/core.c:1335 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2040 call_netdevice_notifiers_extack net/core/dev.c:2052 [inline] call_netdevice_notifiers net/core/dev.c:2066 [inline] __dev_close_many+0xee/0x2e0 net/core/dev.c:1586 __dev_close net/core/dev.c:1624 [inline] __dev_change_flags+0x2cb/0x730 net/core/dev.c:8476 dev_change_flags+0x8a/0x160 net/core/dev.c:8549 dev_ifsioc+0x210/0xa70 net/core/dev_ioctl.c:265 dev_ioctl+0x1b1/0xc40 net/core/dev_ioctl.c:511 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060 sock_ioctl+0x477/0x6a0 net/socket.c:1177 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+93976391bf299d425f44@syzkaller.appspotmail.com Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20210213133653.367130-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()Eric Dumazet
commit 50535249f624d0072cd885bcdce4e4b6fb770160 upstream. struct sockaddr_qrtr has a 2-byte hole, and qrtr_recvmsg() currently does not clear it before copying kernel data to user space. It might be too late to name the hole since sockaddr_qrtr structure is uapi. BUG: KMSAN: kernel-infoleak in kmsan_copy_to_user+0x9c/0xb0 mm/kmsan/kmsan_hooks.c:249 CPU: 0 PID: 29705 Comm: syz-executor.3 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:120 kmsan_report+0xfb/0x1e0 mm/kmsan/kmsan_report.c:118 kmsan_internal_check_memory+0x202/0x520 mm/kmsan/kmsan.c:402 kmsan_copy_to_user+0x9c/0xb0 mm/kmsan/kmsan_hooks.c:249 instrument_copy_to_user include/linux/instrumented.h:121 [inline] _copy_to_user+0x1ac/0x270 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] move_addr_to_user+0x3a2/0x640 net/socket.c:237 ____sys_recvmsg+0x696/0xd50 net/socket.c:2575 ___sys_recvmsg net/socket.c:2610 [inline] do_recvmmsg+0xa97/0x22d0 net/socket.c:2710 __sys_recvmmsg net/socket.c:2789 [inline] __do_sys_recvmmsg net/socket.c:2812 [inline] __se_sys_recvmmsg+0x24a/0x410 net/socket.c:2805 __x64_sys_recvmmsg+0x62/0x80 net/socket.c:2805 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465f69 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f43659d6188 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 0000000000465f69 RDX: 0000000000000008 RSI: 0000000020003e40 RDI: 0000000000000003 RBP: 00000000004bfa8f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000010060 R11: 0000000000000246 R12: 000000000056bf60 R13: 0000000000a9fb1f R14: 00007f43659d6300 R15: 0000000000022000 Local variable ----addr@____sys_recvmsg created at: ____sys_recvmsg+0x168/0xd50 net/socket.c:2550 ____sys_recvmsg+0x168/0xd50 net/socket.c:2550 Bytes 2-3 of 12 are uninitialized Memory access of size 12 starts at ffff88817c627b40 Data copied to user address 0000000020000140 Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30can: dev: Move device back to init netns on owning netns deleteMartin Willi
commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream. When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14) [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8) [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114) [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac) [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60) [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380) [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438) [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8) [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c) [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30Revert "netfilter: x_tables: Update remaining dereference to RCU"Mark Tomlinson
[ Upstream commit abe7034b9a8d57737e80cc16d60ed3666990bdbf ] This reverts commit 443d6e86f821a165fae3fc3fc13086d27ac140b1. This (and the following) patch basically re-implemented the RCU mechanisms of patch 784544739a25. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30netfilter: x_tables: Use correct memory barriers.Mark Tomlinson
[ Upstream commit 175e476b8cdf2a4de7432583b49c871345e4f8a1 ] When a new table value was assigned, it was followed by a write memory barrier. This ensured that all writes before this point would complete before any writes after this point. However, to determine whether the rules are unused, the sequence counter is read. To ensure that all writes have been done before these reads, a full memory barrier is needed, not just a write memory barrier. The same argument applies when incrementing the counter, before the rules are read. Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic reported in cc00bcaa5899 (which is still present), while still maintaining the same speed of replacing tables. The smb_mb() barriers potentially slow the packet path, however testing has shown no measurable change in performance on a 4-core MIPS64 platform. Fixes: 7f5c6d4f665b ("netfilter: get rid of atomic ops in fast path") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30Revert "netfilter: x_tables: Switch synchronization to RCU"Mark Tomlinson
[ Upstream commit d3d40f237480abf3268956daf18cdc56edd32834 ] This reverts commit cc00bcaa589914096edef7fb87ca5cee4a166b5c. This (and the preceding) patch basically re-implemented the RCU mechanisms of patch 784544739a25. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Prior to using RCU a script calling "iptables" approx. 200 times was taking 1.16s. With RCU this increased to 11.59s. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30mac80211: fix rate mask resetJohannes Berg
[ Upstream commit 1944015fe9c1d9fa5e9eb7ffbbb5ef8954d6753b ] Coverity reported the strange "if (~...)" condition that's always true. It suggested that ! was intended instead of ~, but upon further analysis I'm convinced that what really was intended was a comparison to 0xff/0xffff (in HT/VHT cases respectively), since this indicates that all of the rates are enabled. Change the comparison accordingly. I'm guessing this never really mattered because a reset to not having a rate mask is basically equivalent to having a mask that enables all rates. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: 2ffbe6d33366 ("mac80211: fix and optimize MCS mask handling") Fixes: b119ad6e726c ("mac80211: add rate mask logic for vht rates") Reviewed-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210212112213.36b38078f569.I8546a20c80bc1669058eb453e213630b846e107b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30tcp: relookup sock for RST+ACK packets handled by obsolete req sockAlexander Ovechkin
[ Upstream commit 7233da86697efef41288f8b713c10c2499cffe85 ] Currently tcp_check_req can be called with obsolete req socket for which big socket have been already created (because of CPU race or early demux assigning req socket to multiple packets in gro batch). Commit e0f9759f530bf789e984 ("tcp: try to keep packet if SYN_RCV race is lost") added retry in case when tcp_check_req is called for PSH|ACK packet. But if client sends RST+ACK immediatly after connection being established (it is performing healthcheck, for example) retry does not occur. In that case tcp_check_req tries to close req socket, leaving big socket active. Fixes: e0f9759f530 ("tcp: try to keep packet if SYN_RCV race is lost") Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru> Reported-by: Oleg Senin <olegsenin@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30netfilter: ctnetlink: fix dump of the expect mask attributeFlorian Westphal
[ Upstream commit b58f33d49e426dc66e98ed73afb5d97b15a25f2d ] Before this change, the mask is never included in the netlink message, so "conntrack -E expect" always prints 0.0.0.0. In older kernels the l3num callback struct was passed as argument, based on tuple->src.l3num. After the l3num indirection got removed, the call chain is based on m.src.l3num, but this value is 0xffff. Init l3num to the correct value. Fixes: f957be9d349a3 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30net: sched: validate stab valuesEric Dumazet
[ Upstream commit e323d865b36134e8c5c82c834df89109a5c60dab ] iproute2 package is well behaved, but malicious user space can provide illegal shift values and trigger UBSAN reports. Add stab parameter to red_check_params() to validate user input. syzbot reported: UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18 shift exponent 111 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 red_calc_qavg_from_idle_time include/net/red.h:312 [inline] red_calc_qavg include/net/red.h:353 [inline] choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221 __dev_xmit_skb net/core/dev.c:3837 [inline] __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117 __ip6_finish_output net/ipv6/ip6_output.c:182 [inline] __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161 ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192 NF_HOOK_COND include/linux/netfilter.h:290 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215 dst_output include/net/dst.h:448 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320 inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135 dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138 dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535 dccp_finish_passive_close net/dccp/proto.c:123 [inline] dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118 dccp_terminate_connection net/dccp/proto.c:958 [inline] dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release+0xcd/0x280 net/socket.c:599 sock_close+0x18/0x20 net/socket.c:1258 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30ipv6: fix suspecious RCU usage warningWei Wang
[ Upstream commit 28259bac7f1dde06d8ba324e222bbec9d4e92f2b ] Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start() calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls rcu_dereference_rtnl(). The fix proposed is to add a variant of nexthop_fib6_nh() to use rcu_dereference_bh_rtnl() for ipv6_route_seq_show(). The reported trace is as follows: ./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz-executor.0/17895: at: seq_read+0x71/0x12a0 fs/seq_file.c:169 at: seq_file_net include/linux/seq_file_net.h:19 [inline] at: ipv6_route_seq_start+0xaf/0x300 net/ipv6/ip6_fib.c:2616 stack backtrace: CPU: 1 PID: 17895 Comm: syz-executor.0 Not tainted 4.15.0-syzkaller #0 Call Trace: [<ffffffff849edf9e>] __dump_stack lib/dump_stack.c:17 [inline] [<ffffffff849edf9e>] dump_stack+0xd8/0x147 lib/dump_stack.c:53 [<ffffffff8480b7fa>] lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5745 [<ffffffff8459ada6>] nexthop_fib6_nh include/net/nexthop.h:416 [inline] [<ffffffff8459ada6>] ipv6_route_native_seq_show net/ipv6/ip6_fib.c:2488 [inline] [<ffffffff8459ada6>] ipv6_route_seq_show+0x436/0x7a0 net/ipv6/ip6_fib.c:2673 [<ffffffff81c556df>] seq_read+0xccf/0x12a0 fs/seq_file.c:276 [<ffffffff81dbc62c>] proc_reg_read+0x10c/0x1d0 fs/proc/inode.c:231 [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:714 [inline] [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:701 [inline] [<ffffffff81bc28ae>] do_iter_read+0x49e/0x660 fs/read_write.c:935 [<ffffffff81bc81ab>] vfs_readv+0xfb/0x170 fs/read_write.c:997 [<ffffffff81c88847>] kernel_readv fs/splice.c:361 [inline] [<ffffffff81c88847>] default_file_splice_read+0x487/0x9c0 fs/splice.c:416 [<ffffffff81c86189>] do_splice_to+0x129/0x190 fs/splice.c:879 [<ffffffff81c86f66>] splice_direct_to_actor+0x256/0x890 fs/splice.c:951 [<ffffffff81c8777d>] do_splice_direct+0x1dd/0x2b0 fs/splice.c:1060 [<ffffffff81bc4747>] do_sendfile+0x597/0xce0 fs/read_write.c:1459 [<ffffffff81bca205>] SYSC_sendfile64 fs/read_write.c:1520 [inline] [<ffffffff81bca205>] SyS_sendfile64+0x155/0x170 fs/read_write.c:1506 [<ffffffff81015fcf>] do_syscall_64+0x1ff/0x310 arch/x86/entry/common.c:305 [<ffffffff84a00076>] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Wei Wang <weiwan@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Petr Machata <petrm@nvidia.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24net/qrtr: fix __netdev_alloc_skb callPavel Skripkin
commit 093b036aa94e01a0bea31a38d7f0ee28a2749023 upstream. syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER. It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(), which tries to allocate skb. Since the value comes from the untrusted source there is no need to raise a warning in __alloc_pages_nodemask(). [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014 Call Trace: __alloc_pages include/linux/gfp.h:511 [inline] __alloc_pages_node include/linux/gfp.h:524 [inline] alloc_pages_node include/linux/gfp.h:538 [inline] kmalloc_large_node+0x60/0x110 mm/slub.c:3999 __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496 __kmalloc_reserve net/core/skbuff.c:150 [inline] __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210 __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446 netdev_alloc_skb include/linux/skbuff.h:2832 [inline] qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442 qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write+0x426/0x650 fs/read_write.c:518 vfs_write+0x791/0xa30 fs/read_write.c:605 ksys_write+0x12d/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24sunrpc: fix refcount leak for rpc auth modulesDaniel Kobras
commit f1442d6349a2e7bb7a6134791bdc26cb776c79af upstream. If an auth module's accept op returns SVC_CLOSE, svc_process_common() enters a call path that does not call svc_authorise() before leaving the function, and thus leaks a reference on the auth module's refcount. Hence, make sure calls to svc_authenticate() and svc_authorise() are paired for all call paths, to make sure rpc auth modules can be unloaded. Signed-off-by: Daniel Kobras <kobras@puzzle-itc.de> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24svcrdma: disable timeouts on rdma backchannelTimo Rothenpieler
commit 6820bf77864d5894ff67b5c00d7dba8f92011e3d upstream. This brings it in line with the regular tcp backchannel, which also has all those timeouts disabled. Prevents the backchannel from timing out, getting some async operations like server side copying getting stuck indefinitely on the client side. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24NFSD: Repair misuse of sv_lock in 5.10.16-rt30.Joe Korty
commit c7de87ff9dac5f396f62d584f3908f80ddc0e07b upstream. [ This problem is in mainline, but only rt has the chops to be able to detect it. ] Lockdep reports a circular lock dependency between serv->sv_lock and softirq_ctl.lock on system shutdown, when using a kernel built with CONFIG_PREEMPT_RT=y, and a nfs mount exists. This is due to the definition of spin_lock_bh on rt: local_bh_disable(); rt_spin_lock(lock); which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is not a problem as long as _every_ lock of serv->sv_lock is a: spin_lock_bh(&serv->sv_lock); but there is one of the form: spin_lock(&serv->sv_lock); This is what is causing the circular dependency splat. The spin_lock() grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. If later on in the critical region, someone does a local_bh_disable, we get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no exceptions. [ OK ] Stopped target NFS client services. Stopping Logout off all iSCSI sessions on shutdown... Stopping NFS server and services... [ 109.442380] [ 109.442385] ====================================================== [ 109.442386] WARNING: possible circular locking dependency detected [ 109.442387] 5.10.16-rt30 #1 Not tainted [ 109.442389] ------------------------------------------------------ [ 109.442390] nfsd/1032 is trying to acquire lock: [ 109.442392] ffff994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270 [ 109.442405] [ 109.442405] but task is already holding lock: [ 109.442406] ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442415] [ 109.442415] which lock already depends on the new lock. [ 109.442415] [ 109.442416] [ 109.442416] the existing dependency chain (in reverse order) is: [ 109.442417] [ 109.442417] -> #1 (&serv->sv_lock){+.+.}-{0:0}: [ 109.442421] rt_spin_lock+0x2b/0xc0 [ 109.442428] svc_add_new_perm_xprt+0x42/0xa0 [ 109.442430] svc_addsock+0x135/0x220 [ 109.442434] write_ports+0x4b3/0x620 [ 109.442438] nfsctl_transaction_write+0x45/0x80 [ 109.442440] vfs_write+0xff/0x420 [ 109.442444] ksys_write+0x4f/0xc0 [ 109.442446] do_syscall_64+0x33/0x40 [ 109.442450] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 109.442454] [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 109.442457] __lock_acquire+0x1264/0x20b0 [ 109.442463] lock_acquire+0xc2/0x400 [ 109.442466] rt_spin_lock+0x2b/0xc0 [ 109.442469] __local_bh_disable_ip+0xd9/0x270 [ 109.442471] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442474] svc_close_list+0x60/0x90 [ 109.442476] svc_close_net+0x49/0x1a0 [ 109.442478] svc_shutdown_net+0x12/0x40 [ 109.442480] nfsd_destroy+0xc5/0x180 [ 109.442482] nfsd+0x1bc/0x270 [ 109.442483] kthread+0x194/0x1b0 [ 109.442487] ret_from_fork+0x22/0x30 [ 109.442492] [ 109.442492] other info that might help us debug this: [ 109.442492] [ 109.442493] Possible unsafe locking scenario: [ 109.442493] [ 109.442493] CPU0 CPU1 [ 109.442494] ---- ---- [ 109.442495] lock(&serv->sv_lock); [ 109.442496] lock((softirq_ctrl.lock).lock); [ 109.442498] lock(&serv->sv_lock); [ 109.442499] lock((softirq_ctrl.lock).lock); [ 109.442501] [ 109.442501] *** DEADLOCK *** [ 109.442501] [ 109.442501] 3 locks held by nfsd/1032: [ 109.442503] #0: ffffffff93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270 [ 109.442508] #1: ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442512] #2: ffffffff93a81b20 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0 [ 109.442518] [ 109.442518] stack backtrace: [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015 [ 109.442524] Call Trace: [ 109.442527] dump_stack+0x77/0x97 [ 109.442533] check_noncircular+0xdc/0xf0 [ 109.442546] __lock_acquire+0x1264/0x20b0 [ 109.442553] lock_acquire+0xc2/0x400 [ 109.442564] rt_spin_lock+0x2b/0xc0 [ 109.442570] __local_bh_disable_ip+0xd9/0x270 [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442577] svc_close_list+0x60/0x90 [ 109.442581] svc_close_net+0x49/0x1a0 [ 109.442585] svc_shutdown_net+0x12/0x40 [ 109.442588] nfsd_destroy+0xc5/0x180 [ 109.442590] nfsd+0x1bc/0x270 [ 109.442595] kthread+0x194/0x1b0 [ 109.442600] ret_from_fork+0x22/0x30 [ 109.518225] nfsd: last server has exited, flushing export cache [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped GSSAPI Proxy Daemon. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped NFS status monitor for NFSv2/3 locking.. Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-20net: dsa: tag_mtk: fix 802.1ad VLAN egressDENG Qingfang
commit 9200f515c41f4cbaeffd8fdd1d8b6373a18b1b67 upstream. A different TPID bit is used for 802.1ad VLAN frames. Reported-by: Ilario Gelmetti <iochesonome@gmail.com> Fixes: f0af34317f4b ("net: dsa: mediatek: combine MediaTek tag with VLAN tag") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17SUNRPC: Set memalloc_nofs_save() for sync tasksBenjamin Coddington
[ Upstream commit f0940f4b3284a00f38a5d42e6067c2aaa20e1f2e ] We could recurse into NFS doing memory reclaim while sending a sync task, which might result in a deadlock. Set memalloc_nofs_save for sync task execution. Fixes: a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17net: qrtr: fix error return code of qrtr_sendmsg()Jia-Ju Bai
commit 179d0ba0c454057a65929c46af0d6ad986754781 upstream. When sock_alloc_send_skb() returns NULL to skb, no error return code of qrtr_sendmsg() is assigned. To fix this bug, rc is assigned with -ENOMEM in this case. Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17cipso,calipso: resolve a number of problems with the DOI refcountsPaul Moore
commit ad5d07f4a9cd671233ae20983848874731102c08 upstream. The current CIPSO and CALIPSO refcounting scheme for the DOI definitions is a bit flawed in that we: 1. Don't correctly match gets/puts in netlbl_cipsov4_list(). 2. Decrement the refcount on each attempt to remove the DOI from the DOI list, only removing it from the list once the refcount drops to zero. This patch fixes these problems by adding the missing "puts" to netlbl_cipsov4_list() and introduces a more conventional, i.e. not-buggy, refcounting mechanism to the DOI definitions. Upon the addition of a DOI to the DOI list, it is initialized with a refcount of one, removing a DOI from the list removes it from the list and drops the refcount by one; "gets" and "puts" behave as expected with respect to refcounts, increasing and decreasing the DOI's refcount by one. Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence counts") Fixes: d7cce01504a0 ("netlabel: Add support for removing a CALIPSO DOI.") Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17net: sched: avoid duplicates in classes dumpMaximilian Heyne
commit bfc2560563586372212b0a8aeca7428975fa91fe upstream. This is a follow up of commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") which has fixed the issue only for the qdisc dump. The duplicate printing also occurs when dumping the classes via tc class show dev eth0 Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17nexthop: Do not flush blackhole nexthops when loopback goes downIdo Schimmel
commit 76c03bf8e2624076b88d93542d78e22d5345c88e upstream. As far as user space is concerned, blackhole nexthops do not have a nexthop device and therefore should not be affected by the administrative or carrier state of any netdev. However, when the loopback netdev goes down all the blackhole nexthops are flushed. This happens because internally the kernel associates blackhole nexthops with the loopback netdev. This behavior is both confusing to those not familiar with kernel internals and also diverges from the legacy API where blackhole IPv4 routes are not flushed when the loopback netdev goes down: # ip route add blackhole 198.51.100.0/24 # ip link set dev lo down # ip route show 198.51.100.0/24 blackhole 198.51.100.0/24 Blackhole IPv6 routes are flushed, but at least user space knows that they are associated with the loopback netdev: # ip -6 route show 2001:db8:1::/64 blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium Fix this by only flushing blackhole nexthops when the loopback netdev is unregistered. Fixes: ab84be7e54fc ("net: Initial nexthop code") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Donald Sharp <sharpd@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>