summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-06-16x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale DataPawan Gupta
commit 8d50cdf8b8341770bc6367bce40c0c1bb0e1d5b3 upstream Add the sysfs reporting file for Processor MMIO Stale Data vulnerability. It exposes the vulnerability and mitigation state similar to the existing files for the other hardware vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14xsk: Fix possible crash when multiple sockets are createdMaciej Fijalkowski
commit ba3beec2ec1d3b4fd8672ca6e781dac4b3267f6e upstream. Fix a crash that happens if an Rx only socket is created first, then a second socket is created that is Tx only and bound to the same umem as the first socket and also the same netdev and queue_id together with the XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page pool was not created by the first socket as it was an Rx only socket. When the second socket is bound it needs this tx_descs array of this shared page pool as it has a Tx component, but unfortunately it was never allocated, leading to a crash. Note that this array is only used for zero-copy drivers using the batched Tx APIs, currently only ice and i40e. [ 5511.150360] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 5511.158419] #PF: supervisor write access in kernel mode [ 5511.164472] #PF: error_code(0x0002) - not-present page [ 5511.170416] PGD 0 P4D 0 [ 5511.173347] Oops: 0002 [#1] PREEMPT SMP PTI [ 5511.178186] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 5.18.0-rc1+ #97 [ 5511.187245] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016 [ 5511.198418] RIP: 0010:xsk_tx_peek_release_desc_batch+0x198/0x310 [ 5511.205375] Code: c0 83 c6 01 84 c2 74 6d 8d 46 ff 23 07 44 89 e1 48 83 c0 14 48 c1 e1 04 48 c1 e0 04 48 03 47 10 4c 01 c1 48 8b 50 08 48 8b 00 <48> 89 51 08 48 89 01 41 80 bd d7 00 00 00 00 75 82 48 8b 19 49 8b [ 5511.227091] RSP: 0018:ffffc90000003dd0 EFLAGS: 00010246 [ 5511.233135] RAX: 0000000000000000 RBX: ffff88810c8da600 RCX: 0000000000000000 [ 5511.241384] RDX: 000000000000003c RSI: 0000000000000001 RDI: ffff888115f555c0 [ 5511.249634] RBP: ffffc90000003e08 R08: 0000000000000000 R09: ffff889092296b48 [ 5511.257886] R10: 0000ffffffffffff R11: ffff889092296800 R12: 0000000000000000 [ 5511.266138] R13: ffff88810c8db500 R14: 0000000000000040 R15: 0000000000000100 [ 5511.274387] FS: 0000000000000000(0000) GS:ffff88903f800000(0000) knlGS:0000000000000000 [ 5511.283746] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5511.290389] CR2: 0000000000000008 CR3: 00000001046e2001 CR4: 00000000003706f0 [ 5511.298640] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5511.306892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5511.315142] Call Trace: [ 5511.317972] <IRQ> [ 5511.320301] ice_xmit_zc+0x68/0x2f0 [ice] [ 5511.324977] ? ktime_get+0x38/0xa0 [ 5511.328913] ice_napi_poll+0x7a/0x6a0 [ice] [ 5511.333784] __napi_poll+0x2c/0x160 [ 5511.337821] net_rx_action+0xdd/0x200 [ 5511.342058] __do_softirq+0xe6/0x2dd [ 5511.346198] irq_exit_rcu+0xb5/0x100 [ 5511.350339] common_interrupt+0xa4/0xc0 [ 5511.354777] </IRQ> [ 5511.357201] <TASK> [ 5511.359625] asm_common_interrupt+0x1e/0x40 [ 5511.364466] RIP: 0010:cpuidle_enter_state+0xd2/0x360 [ 5511.370211] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 e9 00 7b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 72 02 00 00 31 ff e8 02 0c 80 ff fb 45 85 f6 <0f> 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49 [ 5511.391921] RSP: 0018:ffffffff82a03e60 EFLAGS: 00000202 [ 5511.397962] RAX: ffff88903f800000 RBX: 0000000000000001 RCX: 000000000000001f [ 5511.406214] RDX: 0000000000000000 RSI: ffffffff823400b9 RDI: ffffffff8234c046 [ 5511.424646] RBP: ffff88810a384800 R08: 000005032a28c046 R09: 0000000000000008 [ 5511.443233] R10: 000000000000000b R11: 0000000000000006 R12: ffffffff82bcf700 [ 5511.461922] R13: 000005032a28c046 R14: 0000000000000001 R15: 0000000000000000 [ 5511.480300] cpuidle_enter+0x29/0x40 [ 5511.494329] do_idle+0x1c7/0x250 [ 5511.507610] cpu_startup_entry+0x19/0x20 [ 5511.521394] start_kernel+0x649/0x66e [ 5511.534626] secondary_startup_64_no_verify+0xc3/0xcb [ 5511.549230] </TASK> Detect such case during bind() and allocate this memory region via newly introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as for other buffer pool allocations, so that it matches the kvfree() from xp_destroy(). Fixes: d1bc532e99be ("i40e: xsk: Move tmp desc array from driver to pool") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14random: mark bootloader randomness code as __initJason A. Donenfeld
commit 39e0f991a62ed5efabd20711a7b6e7da92603170 upstream. add_bootloader_randomness() and the variables it touches are only used during __init and not after, so mark these as __init. At the same time, unexport this, since it's only called by other __init code that's built-in. Cc: stable@vger.kernel.org Fixes: 428826f5358c ("fdt: add support for rng-seed") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14nodemask: Fix return values to be unsignedKees Cook
[ Upstream commit 0dfe54071d7c828a02917b595456bfde1afdddc9 ] The nodemask routines had mixed return values that provided potentially signed return values that could never happen. This was leading to the compiler getting confusing about the range of possible return values (it was thinking things could be negative where they could not be). Fix all the nodemask routines that should be returning unsigned (or bool) values. Silences: mm/swapfile.c: In function ‘setup_swap_info’: mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds of ‘struct plist_node[]’ [-Werror=array-bounds] 2291 | p->avail_lists[i].prio = 1; | ~~~~~~~~~~~~~~^~~ In file included from mm/swapfile.c:16: ./include/linux/swap.h:292:27: note: while referencing ‘avail_lists’ 292 | struct plist_node avail_lists[]; /* | ^~~~~~~~~~~ Reported-by: Christophe de Dinechin <dinechin@redhat.com> Link: https://lore.kernel.org/lkml/20220414150855.2407137-3-dinechin@redhat.com/ Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n buildsPeter Zijlstra
[ Upstream commit 656d054e0a15ec327bd82801ccd58201e59f6896 ] When building x86_64 with JUMP_LABEL=n it's possible for instrumentation to sneak into noinstr: vmlinux.o: warning: objtool: exit_to_user_mode+0x14: call to static_key_count.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_exit_to_user_mode+0x2d: call to static_key_count.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: irqentry_exit_to_user_mode+0x1b: call to static_key_count.constprop.0() leaves .noinstr.text section Switch to arch_ prefixed atomic to avoid the explicit instrumentation. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14extcon: Fix extcon_get_extcon_dev() error handlingDan Carpenter
[ Upstream commit 58e4a2d27d3255e4e8c507fdc13734dccc9fc4c7 ] The extcon_get_extcon_dev() function returns error pointers on error, NULL when it's a -EPROBE_DEFER defer situation, and ERR_PTR(-ENODEV) when the CONFIG_EXTCON option is disabled. This is very complicated for the callers to handle and a number of them had bugs that would lead to an Oops. In real life, there are two things which prevented crashes. First, error pointers would only be returned if there was bug in the caller where they passed a NULL "extcon_name" and none of them do that. Second, only two out of the eight drivers will build when CONFIG_EXTCON is disabled. The normal way to write this would be to return -EPROBE_DEFER directly when appropriate and return NULL when CONFIG_EXTCON is disabled. Then the error handling is simple and just looks like: dev->edev = extcon_get_extcon_dev(acpi_dev_name(adev)); if (IS_ERR(dev->edev)) return PTR_ERR(dev->edev); For the two drivers which can build with CONFIG_EXTCON disabled, then extcon_get_extcon_dev() will now return NULL which is not treated as an error and the probe will continue successfully. Those two drivers are "typec_fusb302" and "max8997-battery". In the original code, the typec_fusb302 driver had an 800ms hang in tcpm_get_current_limit() but now that function is a no-op. For the max8997-battery driver everything should continue working as is. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14iio: st_sensors: Add a local lock for protecting odrMiquel Raynal
[ Upstream commit 474010127e2505fc463236470908e1ff5ddb3578 ] Right now the (framework) mlock lock is (ab)used for multiple purposes: 1- protecting concurrent accesses over the odr local cache 2- avoid changing samplig frequency whilst buffer is running Let's start by handling situation #1 with a local lock. Suggested-by: Jonathan Cameron <jic23@kernel.org> Cc: Denis Ciocca <denis.ciocca@st.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220207143840.707510-7-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14i40e: xsk: Move tmp desc array from driver to poolMagnus Karlsson
[ Upstream commit d1bc532e99becf104635ed4da6fefa306f452321 ] Move desc_array from the driver to the pool. The reason behind this is that we can then reuse this array as a temporary storage for descriptors in all zero-copy drivers that use the batched interface. This will make it easier to add batching to more drivers. i40e is the only driver that has a batched Tx zero-copy implementation, so no need to touch any other driver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: bail out early if hardware offload is not supportedPablo Neira Ayuso
[ Upstream commit 3a41c64d9c1185a2f3a184015e2a9b78bfc99c71 ] If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device provides the .ndo_setup_tc interface or there is an indirect flow block that has been registered. Otherwise, bail out early from the preparation phase. Moreover, validate that family == NFPROTO_NETDEV and hook is NF_NETDEV_INGRESS. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: delete flowtable hooks via transaction listPablo Neira Ayuso
[ Upstream commit b6d9014a3335194590abdd2a2471ef5147a67645 ] Remove inactive bool field in nft_hook object that was introduced in abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable"). Move stale flowtable hooks to transaction list instead. Deleting twice the same device does not result in ENOENT. Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14net: sched: add barrier to fix packet stuck problem for lockless qdiscGuoju Fang
[ Upstream commit 2e8728c955ce0624b958eee6e030a37aca3a5d86 ] In qdisc_run_end(), the spin_unlock() only has store-release semantic, which guarantees all earlier memory access are visible before it. But the subsequent test_bit() has no barrier semantics so may be reordered ahead of the spin_unlock(). The store-load reordering may cause a packet stuck problem. The concurrent operations can be described as below, CPU 0 | CPU 1 qdisc_run_end() | qdisc_run_begin() . | . ----> /* may be reorderd here */ | . | . | . | spin_unlock() | set_bit() | . | smp_mb__after_atomic() ---- test_bit() | spin_trylock() . | . Consider the following sequence of events: CPU 0 reorder test_bit() ahead and see MISSED = 0 CPU 1 calls set_bit() CPU 1 calls spin_trylock() and return fail CPU 0 executes spin_unlock() At the end of the sequence, CPU 0 calls spin_unlock() and does nothing because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one take it, until the next cpu pushing to the qdisc (if ever ...) will notice and dequeue it. This patch fix this by adding one explicit barrier. As spin_unlock() and test_bit() ordering is a store-load ordering, a full memory barrier smp_mb() is needed here. Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Guoju Fang <gjfang@linux.alibaba.com> Link: https://lore.kernel.org/r/20220528101628.120193-1-gjfang@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14net/mlx5: correct ECE offset in query qp outputChangcheng Liu
[ Upstream commit 3fc2a9e89b3508a5cc0c324f26d7b4740ba8c456 ] ECE field should be after opt_param_mask in query qp output. Fixes: 6b646a7e4af6 ("net/mlx5: Add ability to read and write ECE options") Signed-off-by: Changcheng Liu <jerrliu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tcp: add accessors to read/set tp->snd_cwndEric Dumazet
[ Upstream commit 40570375356c874b1578e05c1dcc3ff7c1322dbe ] We had various bugs over the years with code breaking the assumption that tp->snd_cwnd is greater than zero. Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction") can trigger, and without a repro we would have to spend considerable time finding the bug. Instead of complaining too late, we want to catch where and when tp->snd_cwnd is set to an illegal value. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14net: sched: fixed barrier to prevent skbuff sticking in qdisc backlogVincent Ray
[ Upstream commit a54ce3703613e41fe1d98060b62ec09a3984dc28 ] In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit() does not provide any ordering guarantee as test_bit() is not an atomic operation. This, added to the fact that the spin_trylock() call at the beginning of qdisc_run_begin() does not guarantee acquire semantics if it does not grab the lock, makes it possible for the following statement : if (test_bit(__QDISC_STATE_MISSED, &qdisc->state)) to be executed before an enqueue operation called before qdisc_run_begin(). As a result the following race can happen : CPU 1 CPU 2 qdisc_run_begin() qdisc_run_begin() /* true */ set(MISSED) . /* returns false */ . . /* sees MISSED = 1 */ . /* so qdisc not empty */ . __qdisc_run() . . . pfifo_fast_dequeue() ----> /* may be done here */ . | . clear(MISSED) | . . | . smp_mb __after_atomic(); | . . | . /* recheck the queue */ | . /* nothing => exit */ | enqueue(skb1) | . | qdisc_run_begin() | . | spin_trylock() /* fail */ | . | smp_mb__before_atomic() /* not enough */ | . ---- if (test_bit(MISSED)) return false; /* exit */ In the above scenario, CPU 1 and CPU 2 both try to grab the qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the bypass code path, where it emits its skb then calls __qdisc_run(). CPU1 fails, sets MISSED and goes down the traditionnal enqueue() + dequeue() code path. But when executing qdisc_run_begin() for the second time, after enqueuing its skbuff, it sees the MISSED bit still set (by itself) and consequently chooses to exit early without setting it again nor trying to grab the spinlock again. Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue and found it empty, so it returned. At the end of the sequence, we end up with skb1 enqueued in the backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set, and no __netif_schedule() called made. skb1 will now linger in the qdisc until somebody later performs a full __qdisc_run(). Associated to the bypass capacity of the qdisc, and the ability of the TCP layer to avoid resending packets which it knows are still in the qdisc, this can lead to serious traffic "holes" in a TCP connection. We fix this by replacing the smp_mb__before_atomic() / test_bit() / set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin() by a single test_and_set_bit() call, which is more concise and enforces the needed memory barriers. Fixes: 89837eb4b246 ("net: sched: add barrier to ensure correct ordering for lockless qdisc") Signed-off-by: Vincent Ray <vray@kalrayinc.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14export: fix string handling of namespace in EXPORT_SYMBOL_NSGreg Kroah-Hartman
[ Upstream commit d143b9db8069f0e2a0fa34484e806a55a0dd4855 ] Commit c3a6cf19e695 ("export: avoid code duplication in include/linux/export.h") broke the ability for a defined string to be used as a namespace value. Fix this up by using stringify to properly encode the namespace name. Fixes: c3a6cf19e695 ("export: avoid code duplication in include/linux/export.h") Cc: Miroslav Benes <mbenes@suse.cz> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Quentin Perret <qperret@google.com> Cc: Matthias Maennich <maennich@google.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20220427090442.2105905-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09fs: add two trivial lookup helpersChristian Brauner
commit 00675017e0aeba5305665c52ded4ddce6a4c0231 upstream. Similar to the addition of lookup_one() add a version of lookup_one_unlocked() and lookup_one_positive_unlocked() that take idmapped mounts into account. This is required to port overlay to support idmapped base layers. Cc: <linux-fsdevel@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09nodemask.h: fix compilation error with GCC12Christophe de Dinechin
commit 37462a920392cb86541650a6f4121155f11f1199 upstream. With gcc version 12.0.1 20220401 (Red Hat 12.0.1-0), building with defconfig results in the following compilation error: | CC mm/swapfile.o | mm/swapfile.c: In function `setup_swap_info': | mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds | of `struct plist_node[]' [-Werror=array-bounds] | 2291 | p->avail_lists[i].prio = 1; | | ~~~~~~~~~~~~~~^~~ | In file included from mm/swapfile.c:16: | ./include/linux/swap.h:292:27: note: while referencing `avail_lists' | 292 | struct plist_node avail_lists[]; /* | | ^~~~~~~~~~~ This is due to the compiler detecting that the mask in node_states[__state] could theoretically be zero, which would lead to first_node() returning -1 through find_first_bit. I believe that the warning/error is legitimate. I first tried adding a test to check that the node mask is not emtpy, since a similar test exists in the case where MAX_NUMNODES == 1. However, adding the if statement causes other warnings to appear in for_each_cpu_node_but, because it introduces a dangling else ambiguity. And unfortunately, GCC is not smart enough to detect that the added test makes the case where (node) == -1 impossible, so it still complains with the same message. This is why I settled on replacing that with a harmless, but relatively useless (node) >= 0 test. Based on the warning for the dangling else, I also decided to fix the case where MAX_NUMNODES == 1 by moving the condition inside the for loop. It will still only be tested once. This ensures that the meaning of an else following for_each_node_mask or derivatives would not silently have a different meaning depending on the configuration. Link: https://lkml.kernel.org/r/20220414150855.2407137-3-dinechin@redhat.com Signed-off-by: Christophe de Dinechin <christophe@dinechin.org> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ben Segall <bsegall@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao
commit 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc upstream. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064NTokunori Ikegami
commit 0a8e98305f63deaf0a799d5cf5532cc83af035d1 upstream. Since commit dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value") buffered writes fail on S29GL064N. This is because, on S29GL064N, reads return 0xFF at the end of DQ polling for write completion, where as, chip_good() check expects actual data written to the last location to be returned post DQ polling completion. Fix is to revert to using chip_good() for S29GL064N which only checks for DQ lines to settle down to determine write completion. Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.de/ Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value") Cc: stable@vger.kernel.org Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com> Acked-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220323170458.5608-3-ikegami.t@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09landlock: Fix landlock_add_rule(2) documentationMickaël Salaün
commit a13e248ff90e81e9322406c0e618cf2168702f4e upstream. It is not mandatory to pass a file descriptor obtained with the O_PATH flag. Also, replace rule's accesses with ruleset's accesses. Link: https://lore.kernel.org/r/20220506160820.524344-2-mic@digikod.net Cc: stable@vger.kernel.org Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09landlock: Add clang-format exceptionsMickaël Salaün
commit 6cc2df8e3a3967e7c13a424f87f6efb1d4a62d80 upstream. In preparation to a following commit, add clang-format on and clang-format off stanzas around constant definitions. This enables to keep aligned values, which is much more readable than packed definitions. Link: https://lore.kernel.org/r/20220506160513.523257-2-mic@digikod.net Cc: stable@vger.kernel.org Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09tty: goldfish: Introduce gf_ioread32()/gf_iowrite32()Laurent Vivier
commit 2e2ac4a3327479f7e2744cdd88a5c823f2057bad upstream. The goldfish TTY device was clearly defined as having little-endian registers, but the switch to __raw_{read,write}l(() broke its driver when running on big-endian kernels (if anyone ever tried this). The m68k qemu implementation got this wrong, and assumed native-endian registers. While this is a bug in qemu, it is probably impossible to fix that since there is no way of knowing which other operating systems have started relying on that bug over the years. Hence revert commit da31de35cd2f ("tty: goldfish: use __raw_writel()/__raw_readl()", and define gf_ioread32()/gf_iowrite32() to be able to use accessors defined by the architecture. Cc: stable@vger.kernel.org # v5.11+ Fixes: da31de35cd2fb78f ("tty: goldfish: use __raw_writel()/__raw_readl()") Signed-off-by: Laurent Vivier <laurent@vivier.eu> Link: https://lore.kernel.org/r/20220406201523.243733-2-laurent@vivier.eu [geert: Add rationale based on Arnd's comments] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09NFSv4.1 mark qualified async operations as MOVEABLE tasksOlga Kornievskaia
[ Upstream commit 118f09eda21d392e1eeb9f8a4bee044958cccf20 ] Mark async operations such as RENAME, REMOVE, COMMIT MOVEABLE for the nfsv4.1+ sessions. Fixes: 85e39feead948 ("NFSv4.1 identify and mark RPC tasks that can move between transports") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09NFS: Create a new nfs_alloc_fattr_with_label() functionAnna Schumaker
[ Upstream commit d755ad8dc752d44545613ea04d660aed674e540d ] For creating fattrs with the label field already allocated for us. I also update nfs_free_fattr() to free the label in the end. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09mailbox: forward the hrtimer if not queued and under a lockBjörn Ardö
[ Upstream commit bca1a1004615efe141fd78f360ecc48c60bc4ad5 ] This reverts commit c7dacf5b0f32957b24ef29df1207dc2cd8307743, "mailbox: avoid timer start from callback" The previous commit was reverted since it lead to a race that caused the hrtimer to not be started at all. The check for hrtimer_active() in msg_submit() will return true if the callback function txdone_hrtimer() is currently running. This function could return HRTIMER_NORESTART and then the timer will not be restarted, and also msg_submit() will not start the timer. This will lead to a message actually being submitted but no timer will start to check for its compleation. The original fix that added checking hrtimer_active() was added to avoid a warning with hrtimer_forward. Looking in the kernel another solution to avoid this warning is to check hrtimer_is_queued() before calling hrtimer_forward_now() instead. This however requires a lock so the timer is not started by msg_submit() inbetween this check and the hrtimer_forward() call. Fixes: c7dacf5b0f32 ("mailbox: avoid timer start from callback") Signed-off-by: Björn Ardö <bjorn.ardo@axis.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09list: fix a data-race around ep->rdllistKuniyuki Iwashima
[ Upstream commit d679ae94fdd5d3ab00c35078f5af5f37e068b03d ] ep_poll() first calls ep_events_available() with no lock held and checks if ep->rdllist is empty by list_empty_careful(), which reads rdllist->prev. Thus all accesses to it need some protection to avoid store/load-tearing. Note INIT_LIST_HEAD_RCU() already has the annotation for both prev and next. Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") added the first lockless ep_events_available(), and commit c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") made some ep_events_available() calls lockless and added single call under a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") made the last ep_events_available() lockless. BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: INIT_LIST_HEAD include/linux/list.h:38 [inline] list_splice_init include/linux/list.h:492 [inline] ep_start_scan fs/eventpoll.c:622 [inline] ep_send_events fs/eventpoll.c:1656 [inline] ep_poll fs/eventpoll.c:1806 [inline] do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: list_empty_careful include/linux/list.h:329 [inline] ep_events_available fs/eventpoll.c:381 [inline] ep_poll fs/eventpoll.c:1797 [inline] do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff88810480c7d0 -> 0xffff888103c15098 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com Cc: Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Cc: Kuniyuki Iwashima <kuni1840@gmail.com> Cc: "Soheil Hassas Yeganeh" <soheil@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09list: introduce list_is_head() helper and re-use it in list.hAndy Shevchenko
[ Upstream commit 0425473037db40d9e322631f2d4dc6ef51f97e88 ] Introduce list_is_head() in the similar (*) way as it's done for list_entry_is_head(). Make use of it in the list.h. *) it's done as inliner and not a macro to be aligned with other list_is_*() APIs; while at it, make all three to have the same style. Link: https://lkml.kernel.org/r/20211201141824.81400-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac()Gustavo A. R. Silva
[ Upstream commit 54db804d5d7d36709d1ce70bde3b9a6c61b290b6 ] Fix the following Wstringop-overflow warnings when building with GCC-11: drivers/scsi/fcoe/fcoe.c: In function ‘fcoe_netdev_config’: drivers/scsi/fcoe/fcoe.c:744:32: warning: ‘fcoe_wwn_from_mac’ accessing 32 bytes in a region of size 6 [-Wstringop-overflow=] 744 | wwnn = fcoe_wwn_from_mac(ctlr->ctl_src_addr, 1, 0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/scsi/fcoe/fcoe.c:744:32: note: referencing argument 1 of type ‘unsigned char *’ In file included from drivers/scsi/fcoe/fcoe.c:36: ./include/scsi/libfcoe.h:252:5: note: in a call to function ‘fcoe_wwn_from_mac’ 252 | u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN], unsigned int, unsigned int); | ^~~~~~~~~~~~~~~~~ drivers/scsi/fcoe/fcoe.c:747:32: warning: ‘fcoe_wwn_from_mac’ accessing 32 bytes in a region of size 6 [-Wstringop-overflow=] 747 | wwpn = fcoe_wwn_from_mac(ctlr->ctl_src_addr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 748 | 2, 0); | ~~~~~ drivers/scsi/fcoe/fcoe.c:747:32: note: referencing argument 1 of type ‘unsigned char *’ In file included from drivers/scsi/fcoe/fcoe.c:36: ./include/scsi/libfcoe.h:252:5: note: in a call to function ‘fcoe_wwn_from_mac’ 252 | u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN], unsigned int, unsigned int); | ^~~~~~~~~~~~~~~~~ CC drivers/scsi/bnx2fc/bnx2fc_io.o In function ‘bnx2fc_net_config’, inlined from ‘bnx2fc_if_create’ at drivers/scsi/bnx2fc/bnx2fc_fcoe.c:1543:7: drivers/scsi/bnx2fc/bnx2fc_fcoe.c:833:32: warning: ‘fcoe_wwn_from_mac’ accessing 32 bytes in a region of size 6 [-Wstringop-overflow=] 833 | wwnn = fcoe_wwn_from_mac(ctlr->ctl_src_addr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 834 | 1, 0); | ~~~~~ drivers/scsi/bnx2fc/bnx2fc_fcoe.c: In function ‘bnx2fc_if_create’: drivers/scsi/bnx2fc/bnx2fc_fcoe.c:833:32: note: referencing argument 1 of type ‘unsigned char *’ In file included from drivers/scsi/bnx2fc/bnx2fc.h:53, from drivers/scsi/bnx2fc/bnx2fc_fcoe.c:17: ./include/scsi/libfcoe.h:252:5: note: in a call to function ‘fcoe_wwn_from_mac’ 252 | u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN], unsigned int, unsigned int); | ^~~~~~~~~~~~~~~~~ In function ‘bnx2fc_net_config’, inlined from ‘bnx2fc_if_create’ at drivers/scsi/bnx2fc/bnx2fc_fcoe.c:1543:7: drivers/scsi/bnx2fc/bnx2fc_fcoe.c:839:32: warning: ‘fcoe_wwn_from_mac’ accessing 32 bytes in a region of size 6 [-Wstringop-overflow=] 839 | wwpn = fcoe_wwn_from_mac(ctlr->ctl_src_addr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 840 | 2, 0); | ~~~~~ drivers/scsi/bnx2fc/bnx2fc_fcoe.c: In function ‘bnx2fc_if_create’: drivers/scsi/bnx2fc/bnx2fc_fcoe.c:839:32: note: referencing argument 1 of type ‘unsigned char *’ In file included from drivers/scsi/bnx2fc/bnx2fc.h:53, from drivers/scsi/bnx2fc/bnx2fc_fcoe.c:17: ./include/scsi/libfcoe.h:252:5: note: in a call to function ‘fcoe_wwn_from_mac’ 252 | u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN], unsigned int, unsigned int); | ^~~~~~~~~~~~~~~~~ drivers/scsi/qedf/qedf_main.c: In function ‘__qedf_probe’: drivers/scsi/qedf/qedf_main.c:3520:30: warning: ‘fcoe_wwn_from_mac’ accessing 32 bytes in a region of size 6 [-Wstringop-overflow=] 3520 | qedf->wwnn = fcoe_wwn_from_mac(qedf->mac, 1, 0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/scsi/qedf/qedf_main.c:3520:30: note: referencing argument 1 of type ‘unsigned char *’ In file included from drivers/scsi/qedf/qedf.h:9, from drivers/scsi/qedf/qedf_main.c:23: ./include/scsi/libfcoe.h:252:5: note: in a call to function ‘fcoe_wwn_from_mac’ 252 | u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN], unsigned int, unsigned int); | ^~~~~~~~~~~~~~~~~ drivers/scsi/qedf/qedf_main.c:3521:30: warning: ‘fcoe_wwn_from_mac’ accessing 32 bytes in a region of size 6 [-Wstringop-overflow=] 3521 | qedf->wwpn = fcoe_wwn_from_mac(qedf->mac, 2, 0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/scsi/qedf/qedf_main.c:3521:30: note: referencing argument 1 of type ‘unsigned char *’ In file included from drivers/scsi/qedf/qedf.h:9, from drivers/scsi/qedf/qedf_main.c:23: ./include/scsi/libfcoe.h:252:5: note: in a call to function ‘fcoe_wwn_from_mac’ 252 | u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN], unsigned int, unsigned int); | ^~~~~~~~~~~~~~~~~ by changing the array size to the correct value of ETH_ALEN in the argument declaration. Also, fix a couple of checkpatch warnings: WARNING: function definition argument 'unsigned int' should also have an identifier name This helps with the ongoing efforts to globally enable -Wstringop-overflow. Link: https://github.com/KSPP/linux/issues/181 Fixes: 85b4aa4926a5 ("[SCSI] fcoe: Fibre Channel over Ethernet") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09gpiolib: of: Introduce hook for missing gpio-rangesStefan Wahren
[ Upstream commit 3550bba25d5587a701e6edf20e20984d2ee72c78 ] Since commit 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") the device tree nodes of GPIO controller need the gpio-ranges property to handle gpio-hogs. Unfortunately it's impossible to guarantee that every new kernel is shipped with an updated device tree binary. In order to provide backward compatibility with those older DTB, we need a callback within of_gpiochip_add_pin_range() so the relevant platform driver can handle this case. Fixes: 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Bartosz Golaszewski <brgl@bgdev.pl> Link: https://lore.kernel.org/r/20220409095129.45786-2-stefan.wahren@i2se.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09platform/chrome: Re-introduce cros_ec_cmd_xfer and use it for ioctlsGuenter Roeck
[ Upstream commit 57b888ca2541785de2fcb90575b378921919b6c0 ] Commit 413dda8f2c6f ("platform/chrome: cros_ec_chardev: Use cros_ec_cmd_xfer_status helper") inadvertendly changed the userspace ABI. Previously, cros_ec ioctls would only report errors if the EC communication failed, and otherwise return success and the result of the EC communication. An EC command execution failure was reported in the EC response field. The above mentioned commit changed this behavior, and the ioctl itself would fail. This breaks userspace commands trying to analyze the EC command execution error since the actual EC command response is no longer reported to userspace. Fix the problem by re-introducing the cros_ec_cmd_xfer() helper, and use it to handle ioctl messages. Fixes: 413dda8f2c6f ("platform/chrome: cros_ec_chardev: Use cros_ec_cmd_xfer_status helper") Cc: Daisuke Nojiri <dnojiri@chromium.org> Cc: Rob Barnes <robbarnes@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Parth Malkan <parthmalkan@google.com> Reviewed-by: Daisuke Nojiri <dnojiri@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Fix decision on when to generate an IDLE ACKDavid Howells
[ Upstream commit 9a3dedcf18096e8f7f22b8777d78c4acfdea1651 ] Fix the decision on when to generate an IDLE ACK by keeping a count of the number of packets we've received, but not yet soft-ACK'd, and the number of packets we've processed, but not yet hard-ACK'd, rather than trying to keep track of which DATA sequence numbers correspond to those points. We then generate an ACK when either counter exceeds 2. The counters are both cleared when we transcribe the information into any sort of ACK packet for transmission. IDLE and DELAY ACKs are skipped if both counters are 0 (ie. no change). Fixes: 805b21b929e2 ("rxrpc: Send an ACK after every few DATA packets we receive") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09net: macb: Fix PTP one step sync supportHarini Katakam
[ Upstream commit 5cebb40bc9554aafcc492431181f43c6231b0459 ] PTP one step sync packets cannot have CSUM padding and insertion in SW since time stamp is inserted on the fly by HW. In addition, ptp4l version 3.0 and above report an error when skb timestamps are reported for packets that not processed for TX TS after transmission. Add a helper to identify PTP one step sync and fix the above two errors. Add a common mask for PTP header flag field "twoStepflag". Also reset ptp OSS bit when one step is not selected. Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support") Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation") Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220518170756.7752-1-harini.katakam@xilinx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09block: Fix the bio.bi_opf commentBart Van Assche
[ Upstream commit 5d2ae14276e698c76fa0c8ce870103f343b38263 ] Commit ef295ecf090d modified the Linux kernel such that the bottom bits of the bi_opf member contain the operation instead of the topmost bits. That commit did not update the comment next to bi_opf. Hence this patch. From commit ef295ecf090d: -#define bio_op(bio) ((bio)->bi_opf >> BIO_OP_SHIFT) +#define bio_op(bio) ((bio)->bi_opf & REQ_OP_MASK) Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: ef295ecf090d ("block: better op and flags encoding") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511235152.1082246-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09x86/speculation: Add missing prototype for unpriv_ebpf_notify()Josh Poimboeuf
[ Upstream commit 2147c438fde135d6c145a96e373d9348e7076f7f ] Fix the following warnings seen with "make W=1": kernel/sysctl.c:183:13: warning: no previous prototype for ‘unpriv_ebpf_notify’ [-Wmissing-prototypes] 183 | void __weak unpriv_ebpf_notify(int new_state) | ^~~~~~~~~~~~~~~~~~ arch/x86/kernel/cpu/bugs.c:659:6: warning: no previous prototype for ‘unpriv_ebpf_notify’ [-Wmissing-prototypes] 659 | void unpriv_ebpf_notify(int new_state) | ^~~~~~~~~~~~~~~~~~ Fixes: 44a3918c8245 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/5689d065f739602ececaee1e05e68b8644009608.1650930000.git.jpoimboe@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09scsi: iscsi: Fix harmless double shift bugDan Carpenter
[ Upstream commit 565138ac5f8a5330669a20e5f94759764e9165ec ] These flags are supposed to be bit numbers. Right now they cause a double shift bug where we use BIT(BIT(2)) instead of BIT(2). Fortunately, the bit numbers are small and it's done consistently so it does not cause an issue at run time. Link: https://lore.kernel.org/r/YmFyWHf8nrrx+SHa@kili Fixes: 5bd856256f8c ("scsi: iscsi: Merge suspend fields") Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09signal: Deliver SIGTRAP on perf event asynchronously if blockedMarco Elver
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ] With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case: <set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers> When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task. This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later. To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future). The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise). The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ] Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09device property: Allow error pointer to be passed to fwnode APIsAndy Shevchenko
[ Upstream commit 002752af7b89b74c64fe6bec8c5fde3d3a7810d8 ] Some of the fwnode APIs might return an error pointer instead of NULL or valid fwnode handle. The result of such API call may be considered optional and hence the test for it is usually done in a form of fwnode = fwnode_find_reference(...); if (IS_ERR(fwnode)) ...error handling... Nevertheless the resulting fwnode may have bumped the reference count and hence caller of the above API is obliged to call fwnode_handle_put(). Since fwnode may be not valid either as NULL or error pointer the check has to be performed there. This approach uglifies the code and adds a point of making a mistake, i.e. forgetting about error point case. To prevent this, allow an error pointer to be passed to the fwnode APIs. Fixes: 83b34afb6b79 ("device property: Introduce fwnode_find_reference()") Reported-by: Nuno Sá <nuno.sa@analog.com> Tested-by: Nuno Sá <nuno.sa@analog.com> Acked-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09efi: Add missing prototype for efi_capsule_setup_infoJan Kiszka
[ Upstream commit aa480379d8bdb33920d68acfd90f823c8af32578 ] Fixes "no previous declaration for 'efi_capsule_setup_info'" warnings under W=1. Fixes: 2959c95d510c ("efi/capsule: Add support for Quark security header") Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Link: https://lore.kernel.org/r/c28d3f86-dd72-27d1-e2c2-40971b8da6bd@siemens.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09drm: fix EDID struct for old ARM OABI formatLinus Torvalds
[ Upstream commit 47f15561b69e226bfc034e94ff6dbec51a4662af ] When building the kernel for arm with the "-mabi=apcs-gnu" option, gcc will force alignment of all structures and unions to a word boundary (see also STRUCTURE_SIZE_BOUNDARY and the "-mstructure-size-boundary=XX" option if you're a gcc person), even when the members of said structures do not want or need said alignment. This completely messes up the structure alignment of 'struct edid' on those targets, because even though all the embedded structures are marked with "__attribute__((packed))", the unions that contain them are not. This was exposed by commit f1e4c916f97f ("drm/edid: add EDID block count and size helpers"), but the bug is pre-existing. That commit just made the structure layout problem cause a build failure due to the addition of the BUILD_BUG_ON(sizeof(*edid) != EDID_LENGTH); sanity check in drivers/gpu/drm/drm_edid.c:edid_block_data(). This legacy union alignment should probably not be used in the first place, but we can fix the layout by adding the packed attribute to the union entries even when each member is already packed and it shouldn't matter in a sane build environment. You can see this issue with a trivial test program: union { struct { char c[5]; }; struct { char d; unsigned e; } __attribute__((packed)); } a = { "1234" }; where building this with a normal "gcc -S" will result in the expected 5-byte size of said union: .type a, @object .size a, 5 but with an ARM compiler and the old ABI: arm-linux-gnu-gcc -mabi=apcs-gnu -mfloat-abi=soft -S t.c you get .type a, %object .size a, 8 instead, because even though each member of the union is packed, the union itself still gets aligned. This was reported by Sudip for the spear3xx_defconfig target. Link: https://lore.kernel.org/lkml/YpCUzStDnSgQLNFN@debian/ Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolateVasily Averin
[ Upstream commit 2b132903de7124dd9a758be0c27562e91a510848 ] Fixes following sparse warnings: CHECK mm/vmscan.c mm/vmscan.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/vmscan.h): ./include/trace/events/vmscan.h:281:1: sparse: warning: cast to restricted isolate_mode_t ./include/trace/events/vmscan.h:281:1: sparse: warning: restricted isolate_mode_t degrades to integer Link: https://lkml.kernel.org/r/e85d7ff2-fd10-53f8-c24e-ba0458439c1b@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ALSA: jack: Access input_dev under mutexAmadeusz Sławiński
[ Upstream commit 1b6a6fc5280e97559287b61eade2d4b363e836f2 ] It is possible when using ASoC that input_dev is unregistered while calling snd_jack_report, which causes NULL pointer dereference. In order to prevent this serialize access to input_dev using mutex lock. Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20220412091628.3056922-1-amadeuszx.slawinski@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ipv6: fix locking issues with loops over idev->addr_listNiels Dossche
[ Upstream commit 51454ea42c1ab4e0c2828bb0d4d53957976980de ] idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on inet6_ifaddr instances. For example, multiple functions (like addrconf_{join,leave}_anycast) eventually call down to other functions that acquire the idev->lock. The current code temporarily unlocked the idev->lock during the loops, which can cause race conditions. Moving the locks up is also not an appropriate solution as the ordering of lock acquisition will be inconsistent with for example mc_lock. This solution adds an additional field to inet6_ifaddr that is used to temporarily add the instances to a temporary list while holding idev->lock. The temporary list can then be traversed without holding idev->lock. This change was done in two places. In addrconf_ifdown, the list_for_each_entry_safe variant of the list loop is also no longer necessary as there is no deletion within that specific loop. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEPEric W. Biederman
commit 4a3d2717d140401df7501a95e454180831a0c5af upstream. xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in user_enable_single_step and user_disable_single_step without locking could potentiallly cause problems. So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP that xtensa already had defined but unused. Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users. Cc: stable@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEPEric W. Biederman
commit c200e4bb44e80b343c09841e7caaaca0aac5e5fa upstream. User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate single stepping is a little confusing and worse changing tsk->ptrace without locking could potentionally cause problems. So use a thread info flag with a better name instead of flag in tsk->ptrace. Remove the definition PT_DTRACE as uml is the last user. Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09usb: core: hcd: Add support for deferring roothub registrationKishon Vijay Abraham I
commit a44623d9279086c89f631201d993aa332f7c9e66 upstream. It has been observed with certain PCIe USB cards (like Inateck connected to AM64 EVM or J7200 EVM) that as soon as the primary roothub is registered, port status change is handled even before xHC is running leading to cold plug USB devices not detected. For such cases, registering both the root hubs along with the second HCD is required. Add support for deferring roothub registration in usb_add_hcd(), so that both primary and secondary roothubs are registered along with the second HCD. This patch has been added and reverted earier as it triggered a race in usb device enumeration. That race is now fixed in 5.16-rc3, and in stable back to 5.4 commit 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex") commit 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race") CC: stable@vger.kernel.org # 5.4+ Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com> Tested-by: Chris Chiu <chris.chiu@canonical.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20220510091630.16564-2-kishon@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06netfilter: conntrack: re-fetch conntrack after insertionFlorian Westphal
commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream. In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry. This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger. Reported-by: <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com> Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - make reseeding from get_random_bytes() synchronousNicolai Stange
commit 074bcd4000e0d812bc253f86fedc40f81ed59ccc upstream. get_random_bytes() usually hasn't full entropy available by the time DRBG instances are first getting seeded from it during boot. Thus, the DRBG implementation registers random_ready_callbacks which would in turn schedule some work for reseeding the DRBGs once get_random_bytes() has sufficient entropy available. For reference, the relevant history around handling DRBG (re)seeding in the context of a not yet fully seeded get_random_bytes() is: commit 16b369a91d0d ("random: Blocking API for accessing nonblocking_pool") commit 4c7879907edd ("crypto: drbg - add async seeding operation") commit 205a525c3342 ("random: Add callback API for random pool readiness") commit 57225e679788 ("crypto: drbg - Use callback API for random readiness") commit c2719503f5e1 ("random: Remove kernel blocking API") However, some time later, the initialization state of get_random_bytes() has been made queryable via rng_is_initialized() introduced with commit 9a47249d444d ("random: Make crng state queryable"). This primitive now allows for streamlining the DRBG reseeding from get_random_bytes() by replacing that aforementioned asynchronous work scheduling from random_ready_callbacks with some simpler, synchronous code in drbg_generate() next to the related logic already present therein. Apart from improving overall code readability, this change will also enable DRBG users to rely on wait_for_random_bytes() for ensuring that the initial seeding has completed, if desired. The previous patches already laid the grounds by making drbg_seed() to record at each DRBG instance whether it was being seeded at a time when rng_is_initialized() still had been false as indicated by ->seeded == DRBG_SEED_STATE_PARTIAL. All that remains to be done now is to make drbg_generate() check for this condition, determine whether rng_is_initialized() has flipped to true in the meanwhile and invoke a reseed from get_random_bytes() if so. Make this move: - rename the former drbg_async_seed() work handler, i.e. the one in charge of reseeding a DRBG instance from get_random_bytes(), to "drbg_seed_from_random()", - change its signature as appropriate, i.e. make it take a struct drbg_state rather than a work_struct and change its return type from "void" to "int" in order to allow for passing error information from e.g. its __drbg_seed() invocation onwards to callers, - make drbg_generate() invoke this drbg_seed_from_random() once it encounters a DRBG instance with ->seeded == DRBG_SEED_STATE_PARTIAL by the time rng_is_initialized() has flipped to true and - prune everything related to the former, random_ready_callback based mechanism. As drbg_seed_from_random() is now getting invoked from drbg_generate() with the ->drbg_mutex being held, it must not attempt to recursively grab it once again. Remove the corresponding mutex operations from what is now drbg_seed_from_random(). Furthermore, as drbg_seed_from_random() can now report errors directly to its caller, there's no need for it to temporarily switch the DRBG's ->seeded state to DRBG_SEED_STATE_UNSEEDED so that a failure of the subsequently invoked __drbg_seed() will get signaled to drbg_generate(). Don't do it then. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [Jason: for stable, undid the modifications for the backport of 5acd3548.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - track whether DRBG was seeded with !rng_is_initialized()Nicolai Stange
commit 2bcd25443868aa8863779a6ebc6c9319633025d2 upstream. Currently, the DRBG implementation schedules asynchronous works from random_ready_callbacks for reseeding the DRBG instances with output from get_random_bytes() once the latter has sufficient entropy available. However, as the get_random_bytes() initialization state can get queried by means of rng_is_initialized() now, there is no real need for this asynchronous reseeding logic anymore and it's better to keep things simple by doing it synchronously when needed instead, i.e. from drbg_generate() once rng_is_initialized() has flipped to true. Of course, for this to work, drbg_generate() would need some means by which it can tell whether or not rng_is_initialized() has flipped to true since the last seeding from get_random_bytes(). Or equivalently, whether or not the last seed from get_random_bytes() has happened when rng_is_initialized() was still evaluating to false. As it currently stands, enum drbg_seed_state allows for the representation of two different DRBG seeding states: DRBG_SEED_STATE_UNSEEDED and DRBG_SEED_STATE_FULL. The former makes drbg_generate() to invoke a full reseeding operation involving both, the rather expensive jitterentropy as well as the get_random_bytes() randomness sources. The DRBG_SEED_STATE_FULL state on the other hand implies that no reseeding at all is required for a !->pr DRBG variant. Introduce the new DRBG_SEED_STATE_PARTIAL state to enum drbg_seed_state for representing the condition that a DRBG was being seeded when rng_is_initialized() had still been false. In particular, this new state implies that - the given DRBG instance has been fully seeded from the jitterentropy source (if enabled) - and drbg_generate() is supposed to reseed from get_random_bytes() *only* once rng_is_initialized() turns to true. Up to now, the __drbg_seed() helper used to set the given DRBG instance's ->seeded state to constant DRBG_SEED_STATE_FULL. Introduce a new argument allowing for the specification of the to be written ->seeded value instead. Make the first of its two callers, drbg_seed(), determine the appropriate value based on rng_is_initialized(). The remaining caller, drbg_async_seed(), is known to get invoked only once rng_is_initialized() is true, hence let it pass constant DRBG_SEED_STATE_FULL for the new argument to __drbg_seed(). There is no change in behaviour, except for that the pr_devel() in drbg_generate() would now report "unseeded" for ->pr DRBG instances which had last been seeded when rng_is_initialized() was still evaluating to false. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06crypto: drbg - prepare for more fine-grained tracking of seeding stateNicolai Stange
commit ce8ce31b2c5c8b18667784b8c515650c65d57b4e upstream. There are two different randomness sources the DRBGs are getting seeded from, namely the jitterentropy source (if enabled) and get_random_bytes(). At initial DRBG seeding time during boot, the latter might not have collected sufficient entropy for seeding itself yet and thus, the DRBG implementation schedules a reseed work from a random_ready_callback once that has happened. This is particularly important for the !->pr DRBG instances, for which (almost) no further reseeds are getting triggered during their lifetime. Because collecting data from the jitterentropy source is a rather expensive operation, the aforementioned asynchronously scheduled reseed work restricts itself to get_random_bytes() only. That is, it in some sense amends the initial DRBG seed derived from jitterentropy output at full (estimated) entropy with fresh randomness obtained from get_random_bytes() once that has been seeded with sufficient entropy itself. With the advent of rng_is_initialized(), there is no real need for doing the reseed operation from an asynchronously scheduled work anymore and a subsequent patch will make it synchronous by moving it next to related logic already present in drbg_generate(). However, for tracking whether a full reseed including the jitterentropy source is required or a "partial" reseed involving only get_random_bytes() would be sufficient already, the boolean struct drbg_state's ->seeded member must become a tristate value. Prepare for this by introducing the new enum drbg_seed_state and change struct drbg_state's ->seeded member's type from bool to that type. For facilitating review, enum drbg_seed_state is made to only contain two members corresponding to the former ->seeded values of false and true resp. at this point: DRBG_SEED_STATE_UNSEEDED and DRBG_SEED_STATE_FULL. A third one for tracking the intermediate state of "seeded from jitterentropy only" will be introduced with a subsequent patch. There is no change in behaviour at this point. Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06pipe: make poll_usage boolean and annotate its accessKuniyuki Iwashima
commit f485922d8fe4e44f6d52a5bb95a603b7c65554bb upstream. Patch series "Fix data-races around epoll reported by KCSAN." This series suppresses a false positive KCSAN's message and fixes a real data-race. This patch (of 2): pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage is set to 1, it never changes in other places. However, concurrent writes of a value trigger KCSAN, so let's make KCSAN happy. BUG: KCSAN: data-race in pipe_poll / pipe_poll write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6 Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014 Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kuniyuki Iwashima <kuni1840@gmail.com> Cc: "Soheil Hassas Yeganeh" <soheil@google.com> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>