summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-24Revert "md-raid: destroy the bitmap after destroying the thread"Guoqing Jiang
This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it obviously breaks clustered raid as noticed by Neil though it fixed KASAN issue for dm-raid, let's revert it and fix KASAN issue in next commit. [1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470 Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-08-24Merge tag 'trace-v6.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: - Fix build warning for when MODULES and FTRACE_WITH_DIRECT_CALLS are not set. A warning happens with ops_references_rec() defined but not used. * tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix build warning for ops_references_rec() not used
2022-08-24md: Flush workqueue md_rdev_misc_wq in md_alloc()David Sloan
A race condition still exists when removing and re-creating md devices in test cases. However, it is only seen on some setups. The race condition was tracked down to a reference still being held to the kobject by the rdev in the md_rdev_misc_wq which will be released in rdev_delayed_delete(). md_alloc() waits for previous deletions by waiting on the md_misc_wq, but the md_rdev_misc_wq may still be holding a reference to a recently removed device. To fix this, also flush the md_rdev_misc_wq in md_alloc(). Signed-off-by: David Sloan <david.sloan@eideticom.com> [logang@deltatee.com: rewrote commit message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-08-24md/raid10: Fix the data type of an r10_sync_page_io() argumentBart Van Assche
Fix the following sparse warning: drivers/md/raid10.c:2647:60: sparse: sparse: incorrect type in argument 5 (different base types) @@ expected restricted blk_opf_t [usertype] opf @@ got int rw @@ This patch does not change any functionality since REQ_OP_READ = READ = 0 and since REQ_OP_WRITE = WRITE = 1. Cc: Rong A Chen <rong.a.chen@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Fixes: 4ce4c73f662b ("md/core: Combine two sync_page_io() arguments") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Song Liu <song@kernel.org>
2022-08-24Merge branch 'dmi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi update from Jean Delvare. Tiny cleanup. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi: Use the proper accessor for the version field
2022-08-24io_uring: conditional ->async_data allocationPavel Begunkov
There are opcodes that need ->async_data only in some cases and allocation it unconditionally may hurt performance. Add an option to opdef to make move the allocation part from the core io_uring to opcode specific code. Note, we can't just set opdef->async_size to zero because there are other helpers that rely on it, e.g. io_alloc_async_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24io_uring/notif: order notif vs send CQEsPavel Begunkov
Currently, there is no ordering between notification CQEs and completions of the send flushing it, this quite complicates the userspace, especially since we don't flush notification when the send(+flush) request fails, i.e. there will be only one CQE. What we can do is to make sure that notification completions come only after sends. The easiest way to achieve this is to not try to complete a notification inline from io_sendzc() but defer it to task_work, considering that io-wq sendzc is disallowed CQEs will be naturally ordered because task_works will only be executed after we're done with submission and so inline completion. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cddfd1c2bf91f22b9fe08e13b7dffdd8f858a151.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24io_uring/net: fix indentationPavel Begunkov
Fix up indentation before we get complaints from tooling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24io_uring/net: fix zc send link failingPavel Begunkov
Failed requests should be marked with req_set_fail(), so links and cqe skipping work correctly, which is missing in io_sendzc(). Note, io_sendzc() return IOU_OK on failure, so the core code won't do the cleanup for us. Fixes: 06a5464be84e4 ("io_uring: wire send zc request type") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24io_uring/net: fix must_hold annotationPavel Begunkov
Fix up the io_alloc_notif()'s __must_hold as we don't have a ctx argument there but should get it from the slot instead. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cbb0a920f18e0aed590bf58300af817b9befb8a3.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24loop: Check for overflow while configuring loopSiddh Raman Pant
The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset. loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h). The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset; This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos); Thus, check for negative value during loop_set_status_from_info(). Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Siddh Raman Pant <code@siddh.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-24Merge branch 'sysctl-data-races'David S. Miller
Kuniyuki Iwashima says: ==================== net: sysctl: Fix data-races around net.core.XXX This series fixes data-races around all knobs in net_core_table and netns_core_table except for bpf stuff. These knobs are skipped: - 4 bpf knobs - netdev_rss_key: Written only once by net_get_random_once() and read-only knob - rps_sock_flow_entries: Protected with sock_flow_mutex - flow_limit_cpu_bitmap: Protected with flow_limit_update_mutex - flow_limit_table_len: Protected with flow_limit_update_mutex - default_qdisc: Protected with qdisc_mod_lock - warnings: Unused - high_order_alloc_disable: Protected with static_key_mutex - skb_defer_max: Already using READ_ONCE() - sysctl_txrehash: Already using READ_ONCE() Note 5th patch fixes net.core.message_cost and net.core.message_burst, and lib/ratelimit.c does not have an explicit maintainer. Changes: v3: * Fix build failures of CONFIG_SYSCTL=n case in 13th & 14th patches v2: https://lore.kernel.org/netdev/20220818035227.81567-1-kuniyu@amazon.com/ * Remove 4 bpf knobs and added 6 knobs v1: https://lore.kernel.org/netdev/20220816052347.70042-1-kuniyu@amazon.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around sysctl_somaxconn.Kuniyuki Iwashima
While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around netdev_unregister_timeout_secs.Kuniyuki Iwashima
While reading netdev_unregister_timeout_secs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around gro_normal_batch.Kuniyuki Iwashima
While reading gro_normal_batch, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_devconf_inherit_init_net.Kuniyuki Iwashima
While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.Kuniyuki Iwashima
While reading sysctl_fb_tunnels_only_for_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around netdev_budget_usecs.Kuniyuki Iwashima
While reading netdev_budget_usecs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_max_skb_frags.Kuniyuki Iwashima
While reading sysctl_max_skb_frags, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around netdev_budget.Kuniyuki Iwashima
While reading netdev_budget, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around sysctl_net_busy_read.Kuniyuki Iwashima
While reading sysctl_net_busy_read, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around sysctl_net_busy_poll.Kuniyuki Iwashima
While reading sysctl_net_busy_poll, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 060212928670 ("net: add low latency socket poll") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix a data-race around sysctl_tstamp_allow_data.Kuniyuki Iwashima
While reading sysctl_tstamp_allow_data, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_optmem_max.Kuniyuki Iwashima
While reading sysctl_optmem_max, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24ratelimit: Fix data-races in ___ratelimit().Kuniyuki Iwashima
While reading rs->interval and rs->burst, they can be changed concurrently via sysctl (e.g. net_ratelimit_state). Thus, we need to add READ_ONCE() to their readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around netdev_tstamp_prequeue.Kuniyuki Iwashima
While reading netdev_tstamp_prequeue, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around netdev_max_backlog.Kuniyuki Iwashima
While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. While at it, we remove the unnecessary spaces in the doc. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around weight_p and dev_weight_[rt]x_bias.Kuniyuki Iwashima
While reading weight_p, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Also, dev_[rt]x_weight can be read/written at the same time. So, we need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to use the same weight_p while changing dev_[rt]x_weight, we add a mutex in proc_do_dev_weight(). Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: Fix data-races around sysctl_[rw]mem_(max|default).Kuniyuki Iwashima
While reading sysctl_[rw]mem_(max|default), they can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net/core/skbuff: Check the return value of skb_copy_bits()lily
skb_copy_bits() could fail, which requires a check on the return value. Signed-off-by: Li Zhong <floridsleeves@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-08-24 1) Fix a refcount leak in __xfrm_policy_check. From Xin Xiong. 2) Revert "xfrm: update SA curlft.use_time". This violates RFC 2367. From Antony Antony. 3) Fix a comment on XFRMA_LASTUSED. From Antony Antony. 4) x->lastused is not cloned in xfrm_do_migrate. Fix from Antony Antony. 5) Serialize the calls to xfrm_probe_algs. From Herbert Xu. 6) Fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24fec: Restart PPS after link state changeCsókás Bence
On link state change, the controller gets reset, causing PPS to drop out and the PHC to lose its time and calibration. So we restart it if needed, restoring calibration and time registers. Changes since v2: * Add `fec_ptp_save_state()`/`fec_ptp_restore_state()` * Use `ktime_get_real_ns()` * Use `BIT()` macro Changes since v1: * More ECR #define's * Stop PPS in `fec_ptp_stop()` Signed-off-by: Csókás Bence <csokas.bence@prolan.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: neigh: don't call kfree_skb() under spin_lock_irqsave()Yang Yingliang
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once. Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increasesEric Dumazet
Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered. I found this issue while investigating a probable kernel issue causing flakes in tools/testing/selftests/net/ip_defrag.sh In particular, these sysctl changes were ignored: ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1 ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000 >/dev/null 2>&1 This change is inline with commit 836196239298 ("net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net") Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: flowtable: fix stuck flows on cleanup due to pending workPablo Neira Ayuso
To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, this patch flushes the pending work, then it sets the teardown flag to all flows in the flowtable and it forces a garbage collector run to queue work to remove the flows from hardware, then it flushes this new pending work and (finally) it forces another garbage collector run to remove the entry from the software flowtable. Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work Original patch description and stack trace by Paul Blakey. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@nvidia.com> Tested-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: flowtable: add function to invoke garbage collection immediatelyPablo Neira Ayuso
Expose nf_flow_table_gc_run() to force a garbage collector run from the offload infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: disallow binding to already bound chainPablo Neira Ayuso
Update nft_data_init() to report EINVAL if chain is already bound. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_tunnel: restrict it to netdev familyPablo Neira Ayuso
Only allow to use this expression from NFPROTO_NETDEV family. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet familiesPablo Neira Ayuso
As it was originally intended, restrict extension to supported families. Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: do not leave chain stats enabled on errorPablo Neira Ayuso
Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created. Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_payload: do not truncate csum_offset and csum_typePablo Neira Ayuso
Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support. Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_payload: report ERANGE for too long offset and lengthPablo Neira Ayuso
Instead of offset and length are truncation to u8, report ERANGE. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: make table handle allocation per-netns friendlyPablo Neira Ayuso
mutex is per-netns, move table_netns to the pernet area. *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0: nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921 Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: disallow updates of implicit chainPablo Neira Ayuso
Updates on existing implicit chain make no sense, disallow this. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-23Merge tag 'cgroup-for-6.0-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - The psi data structure was changed to be allocated dynamically but it wasn't being cleared leading to it reporting garbage values and triggering spurious oom kills. - A deadlock involving cpuset and cpu hotplug. - When a controller is moved across cgroup hierarchies, css->rstat_css_node didn't get RCU drained properly from the previous list. * tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix race condition at rebind_subsystems() cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS sched/psi: Remove unused parameter nbytes of psi_trigger_create() sched/psi: Zero the memory of struct psi_group
2022-08-23Merge tag 'audit-pr-20220823' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single fix for a potential double-free on a fsnotify error path" * tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix potential double free on error path from fsnotify_add_inode_mark
2022-08-23Merge tag 'fs.fixes.v6.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull file_remove_privs() fix from Christian Brauner: "As part of Stefan's and Jens' work to add async buffered write support to xfs we refactored file_remove_privs() and added __file_remove_privs() to avoid calling __remove_privs() when IOCB_NOWAIT is passed. While debugging a recent performance regression report I found that during review we missed that commit faf99b563558 ("fs: add __remove_file_privs() with flags parameter") accidently changed behavior when dentry_needs_remove_privs() returns zero. Before the commit it would still call inode_has_no_xattr() setting the S_NOSEC bit and thereby avoiding even calling into dentry_needs_remove_privs() the next time this function is called. After that commit inode_has_no_xattr() would only be called if __remove_privs() had to be called. Restore the old behavior. This is likely the cause of the performance regression" * tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: __file_remove_privs(): restore call to inode_has_no_xattr()
2022-08-23Merge tag 'mlx5-fixes-2022-08-22' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-08-22 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Unlock on error in mlx5_sriov_enable() net/mlx5e: Fix use after free in mlx5e_fs_init() net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup() net/mlx5: unlock on error path in esw_vfs_changed_event_handler() net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off net/mlx5e: TC, Add missing policer validation net/mlx5e: Fix wrong application of the LRO state net/mlx5: Avoid false positive lockdep warning by adding lock_class_key net/mlx5: Fix cmd error logging for manage pages cmd net/mlx5: Disable irq when locking lag_lock net/mlx5: Eswitch, Fix forwarding decision to uplink net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY net/mlx5e: Properly disable vlan strip on non-UL reps ==================== Link: https://lore.kernel.org/r/20220822195917.216025-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ice: xsk: reduced queue count fixes Maciej Fijalkowski says: this small series is supposed to fix the issues around AF_XDP usage with reduced queue count on interface. Due to the XDP rings setup, some configurations can result in sockets not seeing traffic flowing. More about this in description of patch 2. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: xsk: use Rx ring's XDP ring when picking NAPI context ice: xsk: prohibit usage of non-balanced queue id ==================== Link: https://lore.kernel.org/r/20220822163257.2382487-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge branch 'bnxt_en-bug-fixes'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Bug fixes This series includes 2 fixes for regressions introduced by the XDP multi-buffer feature, 1 devlink reload bug fix, and 1 SRIOV resource accounting bug fix. ==================== Link: https://lore.kernel.org/r/1661180814-19350-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>