summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-08hamradio: fix macro redefine warningHuang Pei
commit 16517829f2e02f096fb5ea9083d160381127faf3 upstream. MIPS/IA64 define END as assembly function ending, which conflict with END definition in mkiss.c, just undef it at first Reported-by: lkp@intel.com Signed-off-by: Huang Pei <huangpei@loongson.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()Like Xu
commit c6c937d673aaa1d603f62f134e1ca9c173eeeed3 upstream. Just like on the optional mmu_alloc_direct_roots() path, once shadow path reaches "r = -EIO" somewhere, the caller needs to know the actual state in order to enter error handling and avoid something worse. Fixes: 4a38162ee9f1 ("KVM: MMU: load PDPTRs outside mmu_lock") Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220301124941.48412-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08proc: fix documentation and description of pagemapYun Zhou
commit dd21bfa425c098b95ca86845f8e7d1ec1ddf6e4a upstream. Since bit 57 was exported for uffd-wp write-protected (commit fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"), fixing it can reduce some unnecessary confusion. Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information") Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Cc: Florian Schmidt <florian.schmidt@nutanix.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: SeongJae Park <sj@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Colin Cross <ccross@google.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"Jiri Bohac
commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream. This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a. Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS calculation in ipsec transport mode, resulting complete stalls of TCP connections. This happens when the (P)MTU is 1280 or slighly larger. The desired formula for the MSS is: MSS = (MTU - ESP_overhead) - IP header - TCP header However, the above commit clamps the (MTU - ESP_overhead) to a minimum of 1280, turning the formula into MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header With the (P)MTU near 1280, the calculated MSS is too large and the resulting TCP packets never make it to the destination because they are over the actual PMTU. The above commit also causes suboptimal double fragmentation in xfrm tunnel mode, as described in https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/ The original problem the above commit was trying to fix is now fixed by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU regression"). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08btrfs: do not start relocation until in progress drops are doneJosef Bacik
commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream. We hit a bug with a recovering relocation on mount for one of our file systems in production. I reproduced this locally by injecting errors into snapshot delete with balance running at the same time. This presented as an error while looking up an extent item WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680 CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8 RIP: 0010:lookup_inline_extent_backref+0x647/0x680 RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000 RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001 R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000 R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000 FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0 Call Trace: <TASK> insert_inline_extent_backref+0x46/0xd0 __btrfs_inc_extent_ref.isra.0+0x5f/0x200 ? btrfs_merge_delayed_refs+0x164/0x190 __btrfs_run_delayed_refs+0x561/0xfa0 ? btrfs_search_slot+0x7b4/0xb30 ? btrfs_update_root+0x1a9/0x2c0 btrfs_run_delayed_refs+0x73/0x1f0 ? btrfs_update_root+0x1a9/0x2c0 btrfs_commit_transaction+0x50/0xa50 ? btrfs_update_reloc_root+0x122/0x220 prepare_to_merge+0x29f/0x320 relocate_block_group+0x2b8/0x550 btrfs_relocate_block_group+0x1a6/0x350 btrfs_relocate_chunk+0x27/0xe0 btrfs_balance+0x777/0xe60 balance_kthread+0x35/0x50 ? btrfs_balance+0xe60/0xe60 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Normally snapshot deletion and relocation are excluded from running at the same time by the fs_info->cleaner_mutex. However if we had a pending balance waiting to get the ->cleaner_mutex, and a snapshot deletion was running, and then the box crashed, we would come up in a state where we have a half deleted snapshot. Again, in the normal case the snapshot deletion needs to complete before relocation can start, but in this case relocation could very well start before the snapshot deletion completes, as we simply add the root to the dead roots list and wait for the next time the cleaner runs to clean up the snapshot. Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that had a pending drop_progress key. If they do then we know we were in the middle of the drop operation and set a flag on the fs_info. Then balance can wait until this flag is cleared to start up again. If there are DEAD_ROOT's that don't have a drop_progress set then we're safe to start balance right away as we'll be properly protected by the cleaner_mutex. CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08btrfs: add missing run of delayed items after unlink during log replayFilipe Manana
commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream. During log replay, whenever we need to check if a name (dentry) exists in a directory we do searches on the subvolume tree for inode references or or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY keys as well, before kernel 5.17). However when during log replay we unlink a name, through btrfs_unlink_inode(), we may not delete inode references and dir index keys from a subvolume tree and instead just add the deletions to the delayed inode's delayed items, which will only be run when we commit the transaction used for log replay. This means that after an unlink operation during log replay, if we attempt to search for the same name during log replay, we will not see that the name was already deleted, since the deletion is recorded only on the delayed items. We run delayed items after every unlink operation during log replay, except at unlink_old_inode_refs() and at add_inode_ref(). This was due to an overlook, as delayed items should be run after evert unlink, for the reasons stated above. So fix those two cases. Fixes: 0d836392cadd5 ("Btrfs: fix mount failure after fsync due to hard link recreation") Fixes: 1f250e929a9c9 ("Btrfs: fix log replay failure after unlink and link combination") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08btrfs: qgroup: fix deadlock between rescan worker and remove qgroupSidong Yang
commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream. The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") by Kawasaki resolves deadlock between quota disable and qgroup rescan worker. But also there is a deadlock case like it. It's about enabling or disabling quota and creating or removing qgroup. It can be reproduced in simple script below. for i in {1..100} do btrfs quota enable /mnt & btrfs qgroup create 1/0 /mnt & btrfs qgroup destroy 1/0 /mnt & btrfs quota disable /mnt & done Here's why the deadlock happens: 1) The quota rescan task is running. 2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for the quota rescan task to complete. 3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock the qgroup_ioctl_lock mutex, because it's being held by task A. At that point task B is holding a transaction handle for the current transaction. 4) The quota rescan task calls btrfs_commit_transaction(). This results in it waiting for all other tasks to release their handles on the transaction, but task B is blocked on the qgroup_ioctl_lock mutex while holding a handle on the transaction, and that mutex is being held by task A, which is waiting for the quota rescan task to complete, resulting in a deadlock between these 3 tasks. To resolve this issue, the thread disabling quota should unlock qgroup_ioctl_lock before waiting rescan completion. Move btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock. Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08btrfs: do not WARN_ON() if we have PageError setJosef Bacik
commit a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 upstream. Whenever we do any extent buffer operations we call assert_eb_page_uptodate() to complain loudly if we're operating on an non-uptodate page. Our overnight tests caught this warning earlier this week WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50 CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: btrfs-cache btrfs_work_helper RIP: 0010:assert_eb_page_uptodate+0x3f/0x50 RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246 RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000 RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0 RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000 R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1 R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000 FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0 Call Trace: extent_buffer_test_bit+0x3f/0x70 free_space_test_bit+0xa6/0xc0 load_free_space_tree+0x1f6/0x470 caching_thread+0x454/0x630 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? lock_release+0x1f0/0x2d0 btrfs_work_helper+0xf2/0x3e0 ? lock_release+0x1f0/0x2d0 ? finish_task_switch.isra.0+0xf9/0x3a0 process_one_work+0x26d/0x580 ? process_one_work+0x580/0x580 worker_thread+0x55/0x3b0 ? process_one_work+0x580/0x580 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it"), however all that fix did was keep us from finding extent buffers after a failed writeout. It didn't keep us from continuing to use a buffer that we already had found. In this case we're searching the commit root to cache the block group, so we can start committing the transaction and switch the commit root and then start writing. After the switch we can look up an extent buffer that hasn't been written yet and start processing that block group. Then we fail to write that block out and clear Uptodate on the page, and then we start spewing these errors. Normally we're protected by the tree lock to a certain degree here. If we read a block we have that block read locked, and we block the writer from locking the block before we submit it for the write. However this isn't necessarily fool proof because the read could happen before we do the submit_bio and after we locked and unlocked the extent buffer. Also in this particular case we have path->skip_locking set, so that won't save us here. We'll simply get a block that was valid when we read it, but became invalid while we were using it. What we really want is to catch the case where we've "read" a block but it's not marked Uptodate. On read we ClearPageError(), so if we're !Uptodate and !Error we know we didn't do the right thing for reading the page. Fix this by checking !Uptodate && !Error, this way we will not complain if our buffer gets invalidated while we're using it, and we'll maintain the spirit of the check which is to make sure we have a fully in-cache block while we're messing with it. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08btrfs: fix relocation crash due to premature return from ↵Omar Sandoval
btrfs_commit_transaction() commit 5fd76bf31ccfecc06e2e6b29f8c809e934085b99 upstream. We are seeing crashes similar to the following trace: [38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs] [38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54 [38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs] [38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206 [38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14 [38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360 [38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000 [38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800 [38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360 [38.987146] FS: 00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000 [38.988662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0 [38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [38.992528] Call Trace: [38.992854] <TASK> [38.993148] btrfs_relocate_chunk+0x27/0xe0 [btrfs] [38.993941] btrfs_balance+0x78e/0xea0 [btrfs] [38.994801] ? vsnprintf+0x33c/0x520 [38.995368] ? __kmalloc_track_caller+0x351/0x440 [38.996198] btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs] [38.997084] btrfs_ioctl+0x11b0/0x2da0 [btrfs] [38.997867] ? mod_objcg_state+0xee/0x340 [38.998552] ? seq_release+0x24/0x30 [38.999184] ? proc_nr_files+0x30/0x30 [38.999654] ? call_rcu+0xc8/0x2f0 [39.000228] ? __x64_sys_ioctl+0x84/0xc0 [39.000872] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [39.001973] __x64_sys_ioctl+0x84/0xc0 [39.002566] do_syscall_64+0x3a/0x80 [39.003011] entry_SYSCALL_64_after_hwframe+0x44/0xae [39.003735] RIP: 0033:0x7f11c166959b [39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b [39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003 [39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0 [39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3 [39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001 [39.015040] </TASK> [39.015418] ---[ end trace 0000000000000000 ]--- [43.131559] ------------[ cut here ]------------ [43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717! [43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI [43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G W 5.17.0-rc4 #54 [43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs] [43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246 [43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001 [43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff [43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50 [43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000 [43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000 [43.145686] FS: 00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000 [43.146808] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0 [43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [43.150559] Call Trace: [43.150904] <TASK> [43.151253] btrfs_finish_extent_commit+0x88/0x290 [btrfs] [43.152127] btrfs_commit_transaction+0x74f/0xaa0 [btrfs] [43.152932] ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] [43.153786] btrfs_ioctl+0x1edc/0x2da0 [btrfs] [43.154475] ? __check_object_size+0x150/0x170 [43.155170] ? preempt_count_add+0x49/0xa0 [43.155753] ? __x64_sys_ioctl+0x84/0xc0 [43.156437] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [43.157456] __x64_sys_ioctl+0x84/0xc0 [43.157980] do_syscall_64+0x3a/0x80 [43.158543] entry_SYSCALL_64_after_hwframe+0x44/0xae [43.159231] RIP: 0033:0x7f7657f1e59b [43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b [43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 [43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800 [43.166863] </TASK> [43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common [43.169552] ---[ end trace 0000000000000000 ]--- [43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs] [43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246 [43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001 [43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff [43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50 [43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000 [43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000 [43.180071] FS: 00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000 [43.181073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0 [43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 We first hit the WARN_ON(rc->block_group->pinned > 0) in btrfs_relocate_block_group() and then the BUG_ON(!cache) in unpin_extent_range(). This tells us that we are exiting relocation and removing the block group with bytes still pinned for that block group. This is supposed to be impossible: the last thing relocate_block_group() does is commit the transaction to get rid of pinned extents. Commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") introduced an optimization so that commits from fsync don't have to wait for the previous commit to unpin extents. This was only intended to affect fsync, but it inadvertently made it possible for any commit to skip waiting for the previous commit to unpin. This is because if a call to btrfs_commit_transaction() finds that another thread is already committing the transaction, it waits for the other thread to complete the commit and then returns. If that other thread was in fsync, then it completes the commit without completing the previous commit. This makes the following sequence of events possible: Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________ btrfs_commit_transaction(N) | | btrfs_run_delayed_refs | | pin extents | | ... | | state = UNBLOCKED |btrfs_sync_file | | btrfs_start_transaction(N + 1) |relocate_block_group | | btrfs_join_transaction(N + 1) | btrfs_commit_transaction(N + 1) | ... | trans->state = COMMIT_START | | | btrfs_commit_transaction(N + 1) | | wait_for_commit(N + 1, COMPLETED) | wait_for_commit(N, SUPER_COMMITTED)| state = SUPER_COMMITTED | ... | btrfs_finish_extent_commit| | unpin_extent_range() | trans->state = COMPLETED | | | return | | ... | |Thread 1 isn't done, so pinned > 0 | |and we WARN | | | |btrfs_remove_block_group unpin_extent_range() | | Thread 3 removed the | | block group, so we BUG| | There are other sequences involving SUPER_COMMITTED transactions that can cause a similar outcome. We could fix this by making relocation explicitly wait for unpinning, but there may be other cases that need it. Josef mentioned ENOSPC flushing and the free space cache inode as other potential victims. Rather than playing whack-a-mole, this fix is conservative and makes all commits not in fsync wait for all previous transactions, which is what the optimization intended. Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08btrfs: fix lost prealloc extents beyond eof after full fsyncFilipe Manana
commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream. When doing a full fsync, if we have prealloc extents beyond (or at) eof, and the leaves that contain them were not modified in the current transaction, we end up not logging them. This results in losing those extents when we replay the log after a power failure, since the inode is truncated to the current value of the logged i_size. Just like for the fast fsync path, we need to always log all prealloc extents starting at or beyond i_size. The fast fsync case was fixed in commit 471d557afed155 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") but it missed the full fsync path. The problem exists since the very early days, when the log tree was added by commit e02119d5a7b439 ("Btrfs: Add a write ahead tree log to optimize synchronous operations"). Example reproducer: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt # Create our test file with many file extent items, so that they span # several leaves of metadata, even if the node/page size is 64K. Use # direct IO and not fsync/O_SYNC because it's both faster and it avoids # clearing the full sync flag from the inode - we want the fsync below # to trigger the slow full sync code path. $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo # Now add two preallocated extents to our file without extending the # file's size. One right at i_size, and another further beyond, leaving # a gap between the two prealloc extents. $ xfs_io -c "falloc -k 16M 1M" /mnt/foo $ xfs_io -c "falloc -k 20M 1M" /mnt/foo # Make sure everything is durably persisted and the transaction is # committed. This makes all created extents to have a generation lower # than the generation of the transaction used by the next write and # fsync. sync # Now overwrite only the first extent, which will result in modifying # only the first leaf of metadata for our inode. Then fsync it. This # fsync will use the slow code path (inode full sync bit is set) because # it's the first fsync since the inode was created/loaded. $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo # Extent list before power failure. $ xfs_io -c "fiemap -v" /mnt/foo /mnt/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 2178048..2178055 8 0x0 1: [8..16383]: 26632..43007 16376 0x0 2: [16384..32767]: 2156544..2172927 16384 0x0 3: [32768..34815]: 2172928..2174975 2048 0x800 4: [34816..40959]: hole 6144 5: [40960..43007]: 2174976..2177023 2048 0x801 <power fail> # Mount fs again, trigger log replay. $ mount /dev/sdc /mnt # Extent list after power failure and log replay. $ xfs_io -c "fiemap -v" /mnt/foo /mnt/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 2178048..2178055 8 0x0 1: [8..16383]: 26632..43007 16376 0x0 2: [16384..32767]: 2156544..2172927 16384 0x1 # The prealloc extents at file offsets 16M and 20M are missing. So fix this by calling btrfs_log_prealloc_extents() when we are doing a full fsync, so that we always log all prealloc extents beyond eof. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08tracing: Fix return value of __setup handlersRandy Dunlap
commit 1d02b444b8d1345ea4708db3bab4db89a7784b55 upstream. __setup() handlers should generally return 1 to indicate that the boot options have been handled. Using invalid option values causes the entire kernel boot option string to be reported as Unknown and added to init's environment strings, polluting it. Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6 kprobe_event=p,syscall_any,$arg1 trace_options=quiet trace_clock=jiffies", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc6 kprobe_event=p,syscall_any,$arg1 trace_options=quiet trace_clock=jiffies Return 1 from the __setup() handlers so that init's environment is not polluted with kernel boot options. Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org Cc: stable@vger.kernel.org Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter") Fixes: e1e232ca6b8f ("tracing: Add trace_clock=<clock> kernel parameter") Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08tracing/histogram: Fix sorting on old "cpu" valueSteven Rostedt (Google)
commit 1d1898f65616c4601208963c3376c1d828cbf2c7 upstream. When trying to add a histogram against an event with the "cpu" field, it was impossible due to "cpu" being a keyword to key off of the running CPU. So to fix this, it was changed to "common_cpu" to match the other generic fields (like "common_pid"). But since some scripts used "cpu" for keying off of the CPU (for events that did not have "cpu" as a field, which is most of them), a backward compatibility trick was added such that if "cpu" was used as a key, and the event did not have "cpu" as a field name, then it would fallback and switch over to "common_cpu". This fix has a couple of subtle bugs. One was that when switching over to "common_cpu", it did not change the field name, it just set a flag. But the code still found a "cpu" field. The "cpu" field is used for filtering and is returned when the event does not have a "cpu" field. This was found by: # cd /sys/kernel/tracing # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger # cat events/sched/sched_wakeup/hist Which showed the histogram unsorted: { cpu: 19, pid: 1175 } hitcount: 1 { cpu: 6, pid: 239 } hitcount: 2 { cpu: 23, pid: 1186 } hitcount: 14 { cpu: 12, pid: 249 } hitcount: 2 { cpu: 3, pid: 994 } hitcount: 5 Instead of hard coding the "cpu" checks, take advantage of the fact that trace_event_field_field() returns a special field for "cpu" and "CPU" if the event does not have "cpu" as a field. This special field has the "filter_type" of "FILTER_CPU". Check that to test if the returned field is of the CPU type instead of doing the string compare. Also, fix the sorting bug by testing for the hist_field flag of HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use the special CPU field to know what compare routine to use, and since that special field does not have a size, it returns tracing_map_cmp_none. Cc: stable@vger.kernel.org Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"") Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08HID: add mapping for KEY_ALL_APPLICATIONSWilliam Mahon
commit 327b89f0acc4c20a06ed59e4d9af7f6d804dc2e2 upstream. This patch adds a new key definition for KEY_ALL_APPLICATIONS and aliases KEY_DASHBOARD to it. It also maps the 0x0c/0x2a2 usage code to KEY_ALL_APPLICATIONS. Signed-off-by: William Mahon <wmahon@chromium.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220303035618.1.I3a7746ad05d270161a18334ae06e3b6db1a1d339@changeid Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08HID: add mapping for KEY_DICTATEWilliam Mahon
commit bfa26ba343c727e055223be04e08f2ebdd43c293 upstream. Numerous keyboards are adding dictate keys which allows for text messages to be dictated by a microphone. This patch adds a new key definition KEY_DICTATE and maps 0x0c/0x0d8 usage code to this new keycode. Additionally hid-debug is adjusted to recognize this new usage code as well. Signed-off-by: William Mahon <wmahon@chromium.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220303021501.1.I5dbf50eb1a7a6734ee727bda4a8573358c6d3ec0@changeid Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Input: samsung-keypad - properly state IOMEM dependencyDavid Gow
commit ba115adf61b36b8c167126425a62b0efc23f72c0 upstream. Make the samsung-keypad driver explicitly depend on CONFIG_HAS_IOMEM, as it calls devm_ioremap(). This prevents compile errors in some configs (e.g, allyesconfig/randconfig under UML): /usr/bin/ld: drivers/input/keyboard/samsung-keypad.o: in function `samsung_keypad_probe': samsung-keypad.c:(.text+0xc60): undefined reference to `devm_ioremap' Signed-off-by: David Gow <davidgow@google.com> Acked-by: anton ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20220225041727.1902850-1-davidgow@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Input: elan_i2c - fix regulator enable count imbalance after suspend/resumeHans de Goede
commit 04b7762e37c95d9b965d16bb0e18dbd1fa2e2861 upstream. Before these changes elan_suspend() would only disable the regulator when device_may_wakeup() returns false; whereas elan_resume() would unconditionally enable it, leading to an enable count imbalance when device_may_wakeup() returns true. This triggers the "WARN_ON(regulator->enable_count)" in regulator_put() when the elan_i2c driver gets unbound, this happens e.g. with the hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1. Fix this by making the regulator_enable() call also be conditional on device_may_wakeup() returning false. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220131135436.29638-2-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()Hans de Goede
commit 81a36d8ce554b82b0a08e2b95d0bd44fcbff339b upstream. elan_disable_power() is called conditionally on suspend, where as elan_enable_power() is always called on resume. This leads to an imbalance in the regulator's enable count. Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power() in preparation of fixing this. No functional changes intended. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220131135436.29638-1-hdegoede@redhat.com [dtor: consolidate elan_[en|dis]able() into elan_set_power()] Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08MAINTAINERS: adjust file entry for of_net.c after movementLukas Bulwahn
commit f616447034a120b18f6e612814641e7d8f5d7f0a upstream. Commit e330fb14590c ("of: net: move of_net under net/") moves of_net.c to ./net/core/, but misses to adjust the reference to this file in MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains: warning: no file matches F: drivers/of/of_net.c Adjust the file entry after this file movement. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20211016055815.14397-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08iavf: missing unlocks in iavf_watchdog_task()Dan Carpenter
commit bc2f39a6252ee40d9bfc2743d4437d420aec5f6e upstream. This code was re-organized and there some unlocks missing now. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08iavf: do not override the adapter state in the watchdog task (again)Stefan Assmann
commit fe523d7c9a8332855376ad5eb1aa301091129ba4 upstream. The watchdog task incorrectly changes the state to __IAVF_RESETTING, instead of letting the reset task take care of that. This was already resolved by commit 22c8fd71d3a5 ("iavf: do not override the adapter state in the watchdog task") but the problem was reintroduced by the recent code refactoring in commit 45eebd62999d ("iavf: Refactor iavf state machine tracking"). Fixes: 45eebd62999d ("iavf: Refactor iavf state machine tracking") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08net: stmmac: perserve TX and RX coalesce value during XDP setupOng Boon Leong
commit 61da6ac715700bcfeef50d187e15c6cc7c9d079b upstream. When XDP program is loaded, it is desirable that the previous TX and RX coalesce values are not re-inited to its default value. This prevents unnecessary re-configurig the coalesce values that were working fine before. Fixes: ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20211124114019.3949125-1-boon.leong.ong@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08selftests: mlxsw: resource_scale: Fix return valueAmit Cohen
[ Upstream commit 196f9bc050cbc5085b4cbb61cce2efe380bc66d0 ] The test runs several test cases and is supposed to return an error in case at least one of them failed. Currently, the check of the return value of each test case is in the wrong place, which can result in the wrong return value. For example: # TESTS='tc_police' ./resource_scale.sh TEST: 'tc_police' [default] 968 [FAIL] tc police offload count failed Error: mlxsw_spectrum: Failed to allocate policer index. We have an error talking to the kernel Command failed /tmp/tmp.i7Oc5HwmXY:969 TEST: 'tc_police' [default] overflow 969 [ OK ] ... TEST: 'tc_police' [ipv4_max] overflow 969 [ OK ] $ echo $? 0 Fix this by moving the check to be done after each test case. Fixes: 059b18e21c63 ("selftests: mlxsw: Return correct error code in resource scale test") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08net: dcb: disable softirqs in dcbnl_flush_dev()Vladimir Oltean
[ Upstream commit 10b6bb62ae1a49ee818fc479cf57b8900176773e ] Ido Schimmel points out that since commit 52cff74eef5d ("dcbnl : Disable software interrupts before taking dcb_lock"), the DCB API can be called by drivers from softirq context. One such in-tree example is the chelsio cxgb4 driver: dcb_rpl -> cxgb4_dcb_handle_fw_update -> dcb_ieee_setapp If the firmware for this driver happened to send an event which resulted in a call to dcb_ieee_setapp() at the exact same time as another DCB-enabled interface was unregistering on the same CPU, the softirq would deadlock, because the interrupted process was already holding the dcb_lock in dcbnl_flush_dev(). Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as in the rest of the dcbnl code. Fixes: 91b0383fef06 ("net: dcb: flush lingering app table entries for unregistered devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08drm/amdgpu: fix suspend/resume hang regressionQiang Yu
[ Upstream commit f1ef17011c765495c876fa75435e59eecfdc1ee4 ] Regression has been reported that suspend/resume may hang with the previous vm ready check commit. So bring back the evicted list check as a temp fix. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922 Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Qiang Yu <qiang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08nl80211: Handle nla_memdup failures in handle_nan_filterJiasheng Jiang
[ Upstream commit 6ad27f522cb3b210476daf63ce6ddb6568c0508b ] As there's potential for failure of the nla_memdup(), check the return value. Fixes: a442b761b24b ("cfg80211: add add_nan_func / del_nan_func") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20220301100020.3801187-1-jiasheng@iscas.ac.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08MIPS: ralink: mt7621: use bitwise NOT instead of logicalIlya Lipnitskiy
[ Upstream commit 5d8965704fe5662e2e4a7e4424a2cbe53e182670 ] It was the intention to reverse the bits, not make them all zero by using logical NOT operator. Fixes: cc19db8b312a ("MIPS: ralink: mt7621: do memory detection on KSEG1") Suggested-by: Chuanhong Guo <gch981213@gmail.com> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08e1000e: Fix possible HW unit hang after an s0ix exitSasha Neftin
[ Upstream commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd ] Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY LAN connected device (LCD) reset during power management flows. This fixes possible HW unit hangs on the s0ix exit on some corporate ADL platforms. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Suggested-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08drm/bridge: ti-sn65dsi86: Properly undo autosuspendDouglas Anderson
[ Upstream commit 26d3474348293dc752c55fe6d41282199f73714c ] The PM Runtime docs say: Drivers in ->remove() callback should undo the runtime PM changes done in ->probe(). Usually this means calling pm_runtime_disable(), pm_runtime_dont_use_autosuspend() etc. We weren't doing that for autosuspend. Let's do it. Fixes: 9bede63127c6 ("drm/bridge: ti-sn65dsi86: Use pm_runtime autosuspend") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20220222141838.1.If784ba19e875e8ded4ec4931601ce6d255845245@changeid Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08drm/i915/guc/slpc: Correct the param count for unset paramVinay Belgaumkar
[ Upstream commit 1b279f6ad467535c3b8a66b4edefaca2cdd5bdc3 ] SLPC unset param H2G only needs one parameter - the id of the param. Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits") Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220216181504.7155-1-vinay.belgaumkar@intel.com (cherry picked from commit 9648f1c3739505557d94ff749a4f32192ea81fe3) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix __IAVF_RESETTING state usageSlawomir Laba
[ Upstream commit 14756b2ae265d526b8356e86729090b01778fdf6 ] The setup of __IAVF_RESETTING state in watchdog task had no effect and could lead to slow resets in the driver as the task for __IAVF_RESETTING state only requeues watchdog. Till now the __IAVF_RESETTING was interpreted by reset task as running state which could lead to errors with allocating and resources disposal. Make watchdog_task queue the reset task when it's necessary. Do not update the state to __IAVF_RESETTING so the reset task knows exactly what is the current state of the adapter. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix race in init stateSlawomir Laba
[ Upstream commit a472eb5cbaebb5774672c565e024336c039e9128 ] When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES message, the driver will wait for the response after requeueing the watchdog task in iavf_init_get_resources call stack. The logic is implemented this way that iavf_init_get_resources has to be called in order to allocate adapter->vf_res. It is polling for the AQ response in iavf_get_vf_config function. Expect a call trace from kernel when adminq_task worker handles this message first. adapter->vf_res will be NULL in iavf_virtchnl_completion. Make the watchdog task not queue the adminq_task if the init process is not finished yet. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPSSlawomir Laba
[ Upstream commit 0579fafd37fb7efe091f0e6c8ccf968864f40f3e ] iavf_virtchnl_completion is called under crit_lock but when the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called, this lock is released in order to obtain rtnl_lock to avoid ABBA deadlock with unregister_netdev. Along with the new way iavf_remove behaves, there exist many risks related to the lock release and attmepts to regrab it. The driver faces crashes related to races between unregister_netdev and netdev_update_features. Yet another risk is that the driver could already obtain the crit_lock in order to destroy it and iavf_virtchnl_completion could crash or block forever. Make iavf_virtchnl_completion never relock crit_lock in it's call paths. Extract rtnl_lock locking logic to the driver for unregister_netdev in order to set the netdev_registered flag inside the lock. Introduce a new flag that will inform adminq_task to perform the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after it finishes processing messages. Guard this code with remove flags so it's never called when the driver is in remove state. Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix init state closure on removeSlawomir Laba
[ Upstream commit 3ccd54ef44ebfa0792c5441b6d9c86618f3378d1 ] When init states of the adapter work, the errors like lack of communication with the PF might hop in. If such events occur the driver restores previous states in order to retry initialization in a proper way. When remove task kicks in, this situation could lead to races with unregistering the netdevice as well as resources cleanup. With the commit introducing the waiting in remove for init to complete, this problem turns into an endless waiting if init never recovers from errors. Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the remove thread has started. Make __IAVF_COMM_FAILED adapter state respect the __IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED state and return without any action instead of trying to recover. Make __IAVF_INIT_FAILED adapter state respect the __IAVF_IN_REMOVE_TASK bit and return without any further actions. Make the loop in the remove handler break when adapter has __IAVF_INIT_FAILED state set. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add waiting so the port is initialized in removeSlawomir Laba
[ Upstream commit 974578017fc1fdd06cea8afb9dfa32602e8529ed ] There exist races when port is being configured and remove is triggered. unregister_netdev is not and can't be called under crit_lock mutex since it is calling ndo_stop -> iavf_close which requires this lock. Depending on init state the netdev could be still unregistered so unregister_netdev never cleans up, when shortly after that the device could become registered. Make iavf_remove wait until port finishes initialization. All critical state changes are atomic (under crit_lock). Crashes that come from iavf_reset_interrupt_capability and iavf_free_traffic_irqs should now be solved in a graceful manner. Fixes: 605ca7c5c6707 ("iavf: Fix kernel BUG in free_msi_irqs") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix kernel BUG in free_msi_irqsPrzemyslaw Patynowski
[ Upstream commit 605ca7c5c670762e36ccb475cfa089d7ad0698e0 ] Fix driver not freeing VF's traffic irqs, prior to calling pci_disable_msix in iavf_remove. There were possible 2 erroneous states in which, iavf_close would not be called. One erroneous state is fixed by allowing netdev to register, when state is already running. It was possible for VF adapter to enter state loop from running to resetting, where iavf_open would subsequently fail. If user would then unload driver/remove VF pci, iavf_close would not be called, as the netdev was not registered, leaving traffic pcis still allocated. Fixed this by breaking loop, allowing netdev to open device when adapter state is __IAVF_RUNNING and it is not explicitily downed. Other possiblity is entering to iavf_remove from __IAVF_RESETTING state, where iavf_close would not free irqs, but just return 0. Fixed this by checking for last adapter state and then removing irqs. Kernel panic: [ 2773.628585] kernel BUG at drivers/pci/msi.c:375! ... [ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0 ... [ 2773.640939] Call Trace: [ 2773.641572] pci_disable_msix+0xf7/0x120 [ 2773.642224] iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf] [ 2773.642897] iavf_remove+0x12e/0x500 [iavf] [ 2773.643578] pci_device_remove+0x3b/0xc0 [ 2773.644266] device_release_driver_internal+0x103/0x1f0 [ 2773.644948] pci_stop_bus_device+0x69/0x90 [ 2773.645576] pci_stop_and_remove_bus_device+0xe/0x20 [ 2773.646215] pci_iov_remove_virtfn+0xba/0x120 [ 2773.646862] sriov_disable+0x2f/0xe0 [ 2773.647531] ice_free_vfs+0x2f8/0x350 [ice] [ 2773.648207] ice_sriov_configure+0x94/0x960 [ice] [ 2773.648883] ? _kstrtoull+0x3b/0x90 [ 2773.649560] sriov_numvfs_store+0x10a/0x190 [ 2773.650249] kernfs_fop_write+0x116/0x190 [ 2773.650948] vfs_write+0xa5/0x1a0 [ 2773.651651] ksys_write+0x4f/0xb0 [ 2773.652358] do_syscall_64+0x5b/0x1a0 [ 2773.653075] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 22ead37f8af8 ("i40evf: Add longer wait after remove module") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add helper function to go from pci_dev to adapterKaren Sornek
[ Upstream commit 247aa001b72b6c8a89df9d108a2ec6f274a6b64d ] Add helper function to go from pci_dev to adapter to make work simple - to go from a pci_dev to the adapter structure and make netdev assignment instead of having to go to the net_device then the adapter. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Rework mutexes for better synchronisationSlawomir Laba
[ Upstream commit fc2e6b3b132a907378f6af08356b105a4139c4fb ] The driver used to crash in multiple spots when put to stress testing of the init, reset and remove paths. The user would experience call traces or hangs when creating, resetting, removing VFs. Depending on the machines, the call traces are happening in random spots, like reset restoring resources racing with driver remove. Make adapter->crit_lock mutex a mandatory lock for guarding the operations performed on all workqueues and functions dealing with resource allocation and disposal. Make __IAVF_REMOVE a final state of the driver respected by workqueues that shall not requeue, when they fail to obtain the crit_lock. Make the IRQ handler not to queue the new work for adminq_task when the __IAVF_REMOVE state is set. Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add trace while removing deviceJedrzej Jagielski
[ Upstream commit bdb9e5c7aec73a7b8b5acab37587b6de1203e68d ] Add kernel trace that device was removed. Currently there is no such information. I.e. Host admin removes a PCI device from a VM, than on VM shall be info about the event. This patch adds info log to iavf_remove function. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Combine init and watchdog state machinesMateusz Palczewski
[ Upstream commit 898ef1cb1cb24040c3e89263e02c605af70c776a ] Use single state machine for driver initialization and for service initialized driver. The init state machine implemented in init_task() is merged into the watchdog_task(). The init_task() function is removed. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add __IAVF_INIT_FAILED stateMateusz Palczewski
[ Upstream commit 59756ad6948be91d66867ce458083b820c59b8ba ] This commit adds a new state, __IAVF_INIT_FAILED to the state machine. From now on initialization functions report errors not by returning an error value, but by changing the state to indicate that something went wrong. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Refactor iavf state machine trackingMateusz Palczewski
[ Upstream commit 45eebd62999d37d13568723524b99d828e0ce22c ] Replace state changes of iavf state machine with a method that also tracks the previous state the machine was on. This change is required for further work with refactoring init and watchdog state machines. Tracking of previous state would help us recover iavf after failure has occurred. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08net: sparx5: Fix add vlan when invalid operationCasper Andersson
[ Upstream commit b3a34dc362c03215031b268fcc0b988e69490231 ] Check if operation is valid before changing any settings in hardware. Otherwise it results in changes being made despite it not being a valid operation. Fixes: 78eab33bb68b ("net: sparx5: add vlan support") Signed-off-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08net: chelsio: cxgb3: check the return value of pci_find_capability()Jia-Ju Bai
[ Upstream commit 767b9825ed1765894e569a3d698749d40d83762a ] The function pci_find_capability() in t3_prep_adapter() can fail, so its return value should be checked. Fixes: 4d22de3e6cc4 ("Add support for the latest 1G/10G Chelsio adapter, T3") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08ibmvnic: complete init_done on transport eventsSukadev Bhattiprolu
[ Upstream commit 36491f2df9ad2501e5a4ec25d3d95d72bafd2781 ] If we get a transport event, set the error and mark the init as complete so the attempt to send crq-init or login fail sooner rather than wait for the timeout. Fixes: bbd669a868bb ("ibmvnic: Fix completion structure initialization") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08ibmvnic: define flush_reset_queue helperSukadev Bhattiprolu
[ Upstream commit 83da53f7e4bd86dca4b2edc1e2bb324fb3c033a1 ] Define and use a helper to flush the reset queue. Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08ibmvnic: initialize rc before completing waitSukadev Bhattiprolu
[ Upstream commit 765559b10ce514eb1576595834f23cdc92125fee ] We should initialize ->init_done_rc before calling complete(). Otherwise the waiting thread may see ->init_done_rc as 0 before we have updated it and may assume that the CRQ was successful. Fixes: 6b278c0cb378 ("ibmvnic delay complete()") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08net: stmmac: only enable DMA interrupts when readyVincent Whitchurch
[ Upstream commit 087a7b944c5db409f7c1a68bf4896c56ba54eaff ] In this driver's ->ndo_open() callback, it enables DMA interrupts, starts the DMA channels, then requests interrupts with request_irq(), and then finally enables napi. If RX DMA interrupts are received before napi is enabled, no processing is done because napi_schedule_prep() will return false. If the network has a lot of broadcast/multicast traffic, then the RX ring could fill up completely before napi is enabled. When this happens, no further RX interrupts will be delivered, and the driver will fail to receive any packets. Fix this by only enabling DMA interrupts after all other initialization is complete. Fixes: 523f11b5d4fd72efb ("net: stmmac: move hardware setup for stmmac_open to new function") Reported-by: Lars Persson <larper@axis.com> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08net: stmmac: enhance XDP ZC driver level switching performanceOng Boon Leong
[ Upstream commit ac746c8520d9d056b6963ecca8ff1da9929d02f1 ] The previous stmmac_xdp_set_prog() implementation uses stmmac_release() and stmmac_open() which tear down the PHY device and causes undesirable autonegotiation which causes a delay whenever AFXDP ZC is setup. This patch introduces two new functions that just sufficiently tear down DMA descriptors, buffer, NAPI process, and IRQs and reestablish them accordingly in both stmmac_xdp_release() and stammac_xdp_open(). As the results of this enhancement, we get rid of transient state introduced by the link auto-negotiation: $ ./xdpsock -i eth0 -t -z sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 634444 634560 sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 632330 1267072 sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 632438 1899584 sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 632502 2532160 Reported-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8Vincent Mailhol
[ Upstream commit f4896248e9025ff744b4147e6758274a1cb8cbae ] The driver uses an atomic_t variable: struct es58x_device::opened_channel_cnt to keep track of the number of opened channels in order to only allocate memory for the URBs when this count changes from zero to one. While the intent was to prevent race conditions, the choice of an atomic_t turns out to be a bad idea for several reasons: - implementation is incorrect and fails to decrement opened_channel_cnt when the URB allocation fails as reported in [1]. - even if opened_channel_cnt were to be correctly decremented, atomic_t is insufficient to cover edge cases: there can be a race condition in which 1/ a first process fails to allocate URBs memory 2/ a second process enters es58x_open() before the first process does its cleanup and decrements opened_channed_cnt. In which case, the second process would successfully return despite the URBs memory not being allocated. - actually, any kind of locking mechanism was useless here because it is redundant with the network stack big kernel lock (a.k.a. rtnl_lock) which is being hold by all the callers of net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many() [3]. The atmomic_t is thus replaced by a simple u8 type and the logic to increment and decrement es58x_device:opened_channel_cnt is simplified accordingly fixing the bug reported in [1]. We do not check again for ASSERST_RTNL() as this is already done by the callers. [1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u [2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463 [3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541 Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces") Link: https://lore.kernel.org/all/20220212112713.577957-1-mailhol.vincent@wanadoo.fr Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08ARM: tegra: Move panels to AUX busThierry Reding
[ Upstream commit 8d3b01e0d4bb54368d73d0984466d72c2eeeac74 ] Move the eDP panel on Venice 2 and Nyan boards into the corresponding AUX bus device tree node. This allows us to avoid a nasty circular dependency that would otherwise be created between the DPAUX and panel nodes via the DDC/I2C phandle. Fixes: eb481f9ac95c ("ARM: tegra: add Acer Chromebook 13 device tree") Fixes: 59fe02cb079f ("ARM: tegra: Add DTS for the nyan-blaze board") Fixes: 40e231c770a4 ("ARM: tegra: Enable eDP for Venice2") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>