summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-04-21iomap: Add DIO tracepointsiomap-6.4-merge-1Ritesh Harjani (IBM)
Add trace_iomap_dio_rw_begin, trace_iomap_dio_rw_queued and trace_iomap_dio_complete tracepoint. trace_iomap_dio_rw_queued is mostly only to know that the request was queued and -EIOCBQUEUED was returned. It is mostly trace_iomap_dio_rw_begin & trace_iomap_dio_complete which has all the details. <example output log> a.out-2073 [006] 134.225717: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x0 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags DIO_FORCE_WAIT aio 1 a.out-2073 [006] 134.226234: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096 a.out-2074 [006] 136.225975: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 1 a.out-2074 [006] 136.226173: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 ksoftirqd/3-31 [003] 136.226389: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096 a.out-2075 [003] 141.674969: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags aio 1 a.out-2075 [003] 141.676085: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 kworker/2:0-27 [002] 141.676432: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096 a.out-2077 [006] 143.443746: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 1 a.out-2077 [006] 143.443866: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 ksoftirqd/5-41 [005] 143.444134: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096 a.out-2078 [007] 146.716833: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 0 a.out-2078 [007] 146.717639: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096 a.out-2079 [006] 148.972605: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 0 a.out-2079 [006] 148.973099: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: line up strings all prettylike] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-04-21iomap: Remove IOMAP_DIO_NOSYNC unused dio flagRitesh Harjani (IBM)
IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for aio dsync writes this is not useful anyway. For aio dsync case, we we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC doesn't let iomap_dio_complete() to call generic_write_sync(), hence we may lose the sync write. Hence kill this flag as it is not in use by any FS now. Tested-by: Disha Goel <disgoel@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-04-08Merge tag '6.3-rc5-smb3-cifs-client-fixes' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull cifs client fixes from Steve French: "Two cifs/smb3 client fixes, one for stable: - double lock fix for a cifs/smb1 reconnect path - DFS prefixpath fix for reconnect when server moved" * tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: double lock in cifs_reconnect_tcon() cifs: sanitize paths in cifs_update_super_prepath.
2023-04-08Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "28 hotfixes. 23 are cc:stable and the other five address issues which were introduced during this merge cycle. 20 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: fix a potential concurrency bug in RCU mode maple_tree: fix get wrong data_end in mtree_lookup_walk() mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() nilfs2: fix sysfs interface lifetime mm: take a page reference when removing device exclusive entries mm: vmalloc: avoid warn_alloc noise caused by fatal signal nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() zsmalloc: document freeable stats zsmalloc: document new fullness grouping fsdax: force clear dirty mark if CoW mm/hugetlb: fix uffd wr-protection for CoW optimization path mm: enable maple tree RCU mode by default maple_tree: add RCU lock checking to rcu callback functions maple_tree: add smp_rmb() to dead node detection maple_tree: fix write memory barrier of nodes once dead for RCU mode maple_tree: remove extra smp_wmb() from mas_dead_leaves() maple_tree: fix freeing of nodes in rcu mode maple_tree: detect dead nodes in mas_start() maple_tree: be more cautious about dead nodes ...
2023-04-07Merge tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: "Four fixes, three for stable: - slab out of bounds fix - lock cancellation fix - minor cleanup to address clang warning - fix for xfstest 551 (wrong parms passed to kvmalloc)" * tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr ksmbd: delete asynchronous work from list ksmbd: remove unused is_char_allowed function ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
2023-04-06cifs: double lock in cifs_reconnect_tcon()Dan Carpenter
This lock was supposed to be an unlock. Fixes: 6cc041e90c17 ("cifs: avoid races in parallel reconnects in smb1") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-05nilfs2: fix sysfs interface lifetimeRyusuke Konishi
The current nilfs2 sysfs support has issues with the timing of creation and deletion of sysfs entries, potentially leading to null pointer dereferences, use-after-free, and lockdep warnings. Some of the sysfs attributes for nilfs2 per-filesystem instance refer to metadata file "cpfile", "sufile", or "dat", but nilfs_sysfs_create_device_group that creates those attributes is executed before the inodes for these metadata files are loaded, and nilfs_sysfs_delete_device_group which deletes these sysfs entries is called after releasing their metadata file inodes. Therefore, access to some of these sysfs attributes may occur outside of the lifetime of these metadata files, resulting in inode NULL pointer dereferences or use-after-free. In addition, the call to nilfs_sysfs_create_device_group() is made during the locking period of the semaphore "ns_sem" of nilfs object, so the shrinker call caused by the memory allocation for the sysfs entries, may derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in nilfs_evict_inode()". Since nilfs2 may acquire "ns_sem" deep in the call stack holding other locks via its error handler __nilfs_error(), this causes lockdep to report circular locking. This is a false positive and no circular locking actually occurs as no inodes exist yet when nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep warnings can be resolved by simply moving the call to nilfs_sysfs_create_device_group() out of "ns_sem". This fixes these sysfs issues by revising where the device's sysfs interface is created/deleted and keeping its lifetime within the lifetime of the metadata files above. Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad fieldTetsuo Handa
nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing "struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when being passed to CRC function. Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com Reported-by: syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()Ryusuke Konishi
The finalization of nilfs_segctor_thread() can race with nilfs_segctor_kill_thread() which terminates that thread, potentially causing a use-after-free BUG as KASAN detected. At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member of "struct nilfs_sc_info" to indicate the thread has finished, and then notifies nilfs_segctor_kill_thread() of this using waitqueue "sc_wait_task" on the struct nilfs_sc_info. However, here, immediately after the NULL assignment to "sc_task", it is possible that nilfs_segctor_kill_thread() will detect it and return to continue the deallocation, freeing the nilfs_sc_info structure before the thread does the notification. This fixes the issue by protecting the NULL assignment to "sc_task" and its notification, with spinlock "sc_state_lock" of the struct nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate the race. Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05fsdax: force clear dirty mark if CoWShiyang Ruan
XFS allows CoW on non-shared extents to combat fragmentation[1]. The old non-shared extent could be mwrited before, its dax entry is marked dirty. This results in a WARNing: [ 28.512349] ------------[ cut here ]------------ [ 28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390 [ 28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables [ 28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117 [ 28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014 [ 28.516307] RIP: 0010:dax_insert_entry+0x342/0x390 [ 28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48 [ 28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086 [ 28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b [ 28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff [ 28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8 [ 28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155 [ 28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8 [ 28.519703] FS: 00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000 [ 28.520148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0 [ 28.520863] PKRU: 55555554 [ 28.521043] Call Trace: [ 28.521219] <TASK> [ 28.521368] dax_fault_iter+0x196/0x390 [ 28.521595] dax_iomap_pte_fault+0x19b/0x3d0 [ 28.521852] __xfs_filemap_fault+0x234/0x2b0 [ 28.522116] __do_fault+0x30/0x130 [ 28.522334] do_fault+0x193/0x340 [ 28.522586] __handle_mm_fault+0x2d3/0x690 [ 28.522975] handle_mm_fault+0xe6/0x2c0 [ 28.523259] do_user_addr_fault+0x1bc/0x6f0 [ 28.523521] exc_page_fault+0x60/0x140 [ 28.523763] asm_exc_page_fault+0x22/0x30 [ 28.524001] RIP: 0033:0x7f14a0b589ca [ 28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 [ 28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202 [ 28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046 [ 28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000 [ 28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000 [ 28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c [ 28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0 [ 28.527449] </TASK> [ 28.527600] ---[ end trace 0000000000000000 ]--- To be able to delete this entry, clear its dirty mark before invalidate_inode_pages2_range(). [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/ Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05cifs: sanitize paths in cifs_update_super_prepath.Thiago Rafael Becker
After a server reboot, clients are failing to move files with ENOENT. This is caused by DFS referrals containing multiple separators, which the server move call doesn't recognize. v1: Initial patch. v2: Move prototype to header. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182472 Fixes: a31080899d5f ("cifs: sanitize multiple delimiters in prepath") Actually-Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Thiago Rafael Becker <tbecker@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-04Merge tag 'nfsd-6.3-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a crash and a resource leak in NFSv4 COMPOUND processing - Fix issues with AUTH_SYS credential handling - Try again to address an NFS/NFSD/SUNRPC build dependency regression * tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: callback request does not use correct credential for AUTH_SYS NFS: Remove "select RPCSEC_GSS_KRB5 sunrpc: only free unix grouplist after RCU settles nfsd: call op_release, even when op_func returns an error NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
2023-04-04NFSD: callback request does not use correct credential for AUTH_SYSDai Ngo
Currently callback request does not use the credential specified in CREATE_SESSION if the security flavor for the back channel is AUTH_SYS. Problem was discovered by pynfs 4.1 DELEG5 and DELEG7 test with error: DELEG5 st_delegation.testCBSecParms : FAILURE expected callback with uid, gid == 17, 19, got 0, 0 Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-04NFS: Remove "select RPCSEC_GSS_KRB5Chuck Lever
If CONFIG_CRYPTO=n (e.g. arm/shmobile_defconfig): WARNING: unmet direct dependencies detected for RPCSEC_GSS_KRB5 Depends on [n]: NETWORK_FILESYSTEMS [=y] && SUNRPC [=y] && CRYPTO [=n] Selected by [y]: - NFS_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFS_FS [=y] As NFSv4 can work without crypto enabled, remove the RPCSEC_GSS_KRB5 dependency altogether. Trond says: > It is possible to use the NFSv4.1 client with just AUTH_SYS, and > in fact there are plenty of people out there using only that. The > fact that RFC5661 gets its knickers in a twist about RPCSEC_GSS > support is largely irrelevant to those people. > > The other issue is that ’select’ enforces the strict dependency > that if the NFS client is compiled into the kernel, then the > RPCSEC_GSS and kerberos code needs to be compiled in as well: they > cannot exist as modules. Fixes: e57d06527738 ("NFS & NFSD: Update GSS dependencies") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-03Merge tag 'vfs.misc.fixes.v6.3-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs fix from Christian Brauner: "When a mount or mount tree is made shared the vfs allocates new peer group ids for all mounts that have no peer group id set. Only mounts that aren't marked with MNT_SHARED are relevant here as MNT_SHARED indicates that the mount has fully transitioned to a shared mount. The peer group id handling is done with namespace lock held. On failure, the peer group id settings of mounts for which a new peer group id was allocated need to be reverted and the allocated peer group id freed. The cleanup_group_ids() helper can identify the mounts to cleanup by checking whether a given mount has a peer group id set but isn't marked MNT_SHARED. The deallocation always needs to happen with namespace lock held to protect against concurrent modifications of the propagation settings. This fixes the one place where the namespace lock was dropped before calling cleanup_group_ids()" * tag 'vfs.misc.fixes.v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: drop peer group ids under namespace lock
2023-04-02ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdrNamjae Jeon
When smb1 mount fails, KASAN detect slab-out-of-bounds in init_smb2_rsp_hdr like the following one. For smb1 negotiate(56bytes) , init_smb2_rsp_hdr() for smb2 is called. The issue occurs while handling smb1 negotiate as smb2 server operations. Add smb server operations for smb1 (get_cmd_val, init_rsp_hdr, allocate_rsp_buf, check_user_session) to handle smb1 negotiate so that smb2 server operation does not handle it. [ 411.400423] CIFS: VFS: Use of the less secure dialect vers=1.0 is not recommended unless required for access to very old servers [ 411.400452] CIFS: Attempting to mount \\192.168.45.139\homes [ 411.479312] ksmbd: init_smb2_rsp_hdr : 492 [ 411.479323] ================================================================== [ 411.479327] BUG: KASAN: slab-out-of-bounds in init_smb2_rsp_hdr+0x1e2/0x1f4 [ksmbd] [ 411.479369] Read of size 16 at addr ffff888488ed0734 by task kworker/14:1/199 [ 411.479379] CPU: 14 PID: 199 Comm: kworker/14:1 Tainted: G OE 6.1.21 #3 [ 411.479386] Hardware name: ASUSTeK COMPUTER INC. Z10PA-D8 Series/Z10PA-D8 Series, BIOS 3801 08/23/2019 [ 411.479390] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] [ 411.479425] Call Trace: [ 411.479428] <TASK> [ 411.479432] dump_stack_lvl+0x49/0x63 [ 411.479444] print_report+0x171/0x4a8 [ 411.479452] ? kasan_complete_mode_report_info+0x3c/0x200 [ 411.479463] ? init_smb2_rsp_hdr+0x1e2/0x1f4 [ksmbd] [ 411.479497] kasan_report+0xb4/0x130 [ 411.479503] ? init_smb2_rsp_hdr+0x1e2/0x1f4 [ksmbd] [ 411.479537] kasan_check_range+0x149/0x1e0 [ 411.479543] memcpy+0x24/0x70 [ 411.479550] init_smb2_rsp_hdr+0x1e2/0x1f4 [ksmbd] [ 411.479585] handle_ksmbd_work+0x109/0x760 [ksmbd] [ 411.479616] ? _raw_spin_unlock_irqrestore+0x50/0x50 [ 411.479624] ? smb3_encrypt_resp+0x340/0x340 [ksmbd] [ 411.479656] process_one_work+0x49c/0x790 [ 411.479667] worker_thread+0x2b1/0x6e0 [ 411.479674] ? process_one_work+0x790/0x790 [ 411.479680] kthread+0x177/0x1b0 [ 411.479686] ? kthread_complete_and_exit+0x30/0x30 [ 411.479692] ret_from_fork+0x22/0x30 [ 411.479702] </TASK> Fixes: 39b291b86b59 ("ksmbd: return unsupported error on smb1 mount") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-02ksmbd: delete asynchronous work from listNamjae Jeon
When smb2_lock request is canceled by smb2_cancel or smb2_close(), ksmbd is missing deleting async_request_entry async_requests list. Because calling init_smb2_rsp_hdr() in smb2_lock() mark ->synchronous as true and then it will not be deleted in ksmbd_conn_try_dequeue_request(). This patch add release_async_work() to release the ones allocated for async work. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-02Merge tag 'for-6.3-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - scan block devices in non-exclusive mode to avoid temporary mkfs failures - fix race between quota disable and quota assign ioctls - fix deadlock when aborting transaction during relocation with scrub - ignore fiemap path cache when there are multiple paths for a node * tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: ignore fiemap path cache when there are multiple paths for a node btrfs: fix deadlock when aborting transaction during relocation with scrub btrfs: scan device in non-exclusive mode btrfs: fix race between quota disable and quota assign ioctls
2023-04-01Merge tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs client fixes from Steve French: "Four cifs/smb3 client (reconnect and DFS related) fixes, including two for stable: - DFS oops fix - DFS reconnect recursion fix - An SMB1 parallel reconnect fix - Trivial dead code removal in smb2_reconnect" * tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: get rid of dead check in smb2_reconnect() cifs: prevent infinite recursion in CIFSGetDFSRefer() cifs: avoid races in parallel reconnects in smb1 cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
2023-03-31nfsd: call op_release, even when op_func returns an errorJeff Layton
For ops with "trivial" replies, nfsd4_encode_operation will shortcut most of the encoding work and skip to just marshalling up the status. One of the things it skips is calling op_release. This could cause a memory leak in the layoutget codepath if there is an error at an inopportune time. Have the compound processing engine always call op_release, even when op_func sets an error in op->status. With this change, we also need nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL on error to avoid a double free. Reported-by: Zhi Li <yieli@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403 Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-31NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGALChuck Lever
OPDESC() simply indexes into nfsd4_ops[] by the op's operation number, without range checking that value. It assumes callers are careful to avoid calling it with an out-of-bounds opnum value. nfsd4_decode_compound() is not so careful, and can invoke OPDESC() with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end of nfsd4_ops[]. Reported-by: Jeff Layton <jlayton@kernel.org> Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-31Merge tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: - Fix shutdown of NFS TCP client sockets - Fix hangs when recovering open state after a server reboot * tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: fix shutdown of NFS TCP client socket NFSv4: Fix hangs when recovering open state after a server reboot
2023-03-31fs: drop peer group ids under namespace lockChristian Brauner
When cleaning up peer group ids in the failure path we need to make sure to hold on to the namespace lock. Otherwise another thread might just turn the mount from a shared into a non-shared mount concurrently. Link: https://lore.kernel.org/lkml/00000000000088694505f8132d77@google.com Fixes: 2a1867219c7b ("fs: add mount_setattr()") Reported-by: syzbot+8ac3859139c685c4f597@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # 5.12+ Message-Id: <20230330-vfs-mount_setattr-propagation-fix-v1-1-37548d91533b@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-03-30cifs: get rid of dead check in smb2_reconnect()Paulo Alcantara
The SMB2_IOCTL check in the switch statement will never be true as we return earlier from smb2_reconnect() if @smb2_command == SMB2_IOCTL. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-30cifs: prevent infinite recursion in CIFSGetDFSRefer()Paulo Alcantara
We can't call smb_init() in CIFSGetDFSRefer() as cifs_reconnect_tcon() may end up calling CIFSGetDFSRefer() again to get new DFS referrals and thus causing an infinite recursion. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Cc: stable@vger.kernel.org # 6.2 Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-30cifs: avoid races in parallel reconnects in smb1Paulo Alcantara
Prevent multiple threads of doing negotiate, session setup and tree connect by holding @ses->session_mutex in cifs_reconnect_tcon() while reconnecting session and tcon. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-30cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALLDavid Disseldorp
When compiled with CONFIG_CIFS_DFS_UPCALL disabled, cifs_dfs_d_automount is NULL. cifs.ko logic for mapping CIFS_FATTR_DFS_REFERRAL attributes to S_AUTOMOUNT and corresponding dentry flags is retained regardless of CONFIG_CIFS_DFS_UPCALL, leading to a NULL pointer dereference in VFS follow_automount() when traversing a DFS referral link: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... Call Trace: <TASK> __traverse_mounts+0xb5/0x220 ? cifs_revalidate_mapping+0x65/0xc0 [cifs] step_into+0x195/0x610 ? lookup_fast+0xe2/0xf0 path_lookupat+0x64/0x140 filename_lookup+0xc2/0x140 ? __create_object+0x299/0x380 ? kmem_cache_alloc+0x119/0x220 ? user_path_at_empty+0x31/0x50 user_path_at_empty+0x31/0x50 __x64_sys_chdir+0x2a/0xd0 ? exit_to_user_mode_prepare+0xca/0x100 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc This fix adds an inline cifs_dfs_d_automount() {return -EREMOTE} handler when CONFIG_CIFS_DFS_UPCALL is disabled. An alternative would be to avoid flagging S_AUTOMOUNT, etc. without CONFIG_CIFS_DFS_UPCALL. This approach was chosen as it provides more control over the error path. Signed-off-by: David Disseldorp <ddiss@suse.de> Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-30zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user spaceDamien Le Moal
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may fail, in which case -ENOTBLK is returned and this error code is propagated back to user space trhough iomap_dio_rw() -> zonefs_file_dio_write() return chain. This error code is fairly obscure and may confuse the user. Avoid this and be consistent with the behavior of zonefs_file_dio_append() for similar invalidate_inode_pages2_range() errors by returning -EBUSY to user space when iomap_dio_rw() returns -ENOTBLK. Suggested-by: Christoph Hellwig <hch@infradead.org> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
2023-03-30zonefs: Always invalidate last cached page on append writeDamien Le Moal
When a direct append write is executed, the append offset may correspond to the last page of a sequential file inode which might have been cached already by buffered reads, page faults with mmap-read or non-direct readahead. To ensure that the on-disk and cached data is consistant for such last cached page, make sure to always invalidate it in zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to userspace to differentiate from IO errors. This invalidation will always be a no-op when the FS block size (device zone write granularity) is equal to the page size (e.g. 4K). Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com> Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
2023-03-29btrfs: ignore fiemap path cache when there are multiple paths for a nodeFilipe Manana
During fiemap, when walking backreferences to determine if a b+tree node/leaf is shared, we may find a tree block (leaf or node) for which two parents were added to the references ulist. This happens if we get for example one direct ref (shared tree block ref) and one indirect ref (non-shared tree block ref) for the tree block at the current level, which can happen during relocation. In that case the fiemap path cache can not be used since it's meant for a single path, with one tree block at each possible level, so having multiple references for a tree block at any level may result in getting the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the warning: WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL) at lookup_backref_shared_cache() and at store_backref_shared_cache(). This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the warning is there just to catch any unexpected case like the one described above. However if a user finds this it may be scary and get reported. So just ignore the path cache once we find a tree block for which there are more than one reference, which is the less common case, and update the cache with the sharedness check result for all levels below the level for which we found multiple references. Reported-by: Jarno Pelkonen <jarno.pelkonen@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/ Fixes: 12a824dc67a6 ("btrfs: speedup checking for extent sharedness during fiemap") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28fsdax: dedupe should compare the min of two iters' lengthShiyang Ruan
In an dedupe comparison iter loop, the length of iomap_iter decreases because it implies the remaining length after each iteration. The dedupe command will fail with -EIO if the range is larger than one page size and not aligned to the page size. Also report warning in dmesg: [ 4338.498374] ------------[ cut here ]------------ [ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16 ... The compare function should use the min length of the current iters, not the total length. Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTENShiyang Ruan
unshare copies data from source to destination. But if the source is HOLE or UNWRITTEN extents, we should zero the destination, otherwise the HOLE or UNWRITTEN part will be user-visible old data of the new allocated extent. Found by running generic/649 while mounting with -o dax=always on pmem. Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax") Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28btrfs: fix deadlock when aborting transaction during relocation with scrubFilipe Manana
Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requires starting and committing a transaction, and if we have a failure in the critical section of the transaction commit path (transaction state >= TRANS_STATE_COMMIT_START), we will deadlock if there is a paused scrub. That results in stack traces like the following: [42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6 [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction. [42.936] ------------[ cut here ]------------ [42.936] BTRFS: Transaction aborted (error -28) [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...) [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Code: ff ff 45 8b (...) [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282 [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000 [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8 [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00 [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0 [42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000 [42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0 [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42.936] Call Trace: [42.936] <TASK> [42.936] ? start_transaction+0xcb/0x610 [btrfs] [42.936] prepare_to_relocate+0x111/0x1a0 [btrfs] [42.936] relocate_block_group+0x57/0x5d0 [btrfs] [42.936] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [42.936] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [42.936] ? __pfx_autoremove_wake_function+0x10/0x10 [42.936] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [42.936] btrfs_balance+0x8ff/0x11d0 [btrfs] [42.936] ? __kmem_cache_alloc_node+0x14a/0x410 [42.936] btrfs_ioctl+0x2334/0x32c0 [btrfs] [42.937] ? mod_objcg_state+0xd2/0x360 [42.937] ? refill_obj_stock+0xb0/0x160 [42.937] ? seq_release+0x25/0x30 [42.937] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [42.937] ? percpu_counter_add_batch+0x2e/0xa0 [42.937] ? __x64_sys_ioctl+0x88/0xc0 [42.937] __x64_sys_ioctl+0x88/0xc0 [42.937] do_syscall_64+0x38/0x90 [42.937] entry_SYSCALL_64_after_hwframe+0x72/0xdc [42.937] RIP: 0033:0x7f381a6ffe9b [42.937] Code: 00 48 89 44 24 (...) [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [42.937] </TASK> [42.937] ---[ end trace 0000000000000000 ]--- [42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds. [59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.196] task:btrfs state:D stack:0 pid:346772 ppid:1 flags:0x00004002 [59.196] Call Trace: [59.196] <TASK> [59.196] __schedule+0x392/0xa70 [59.196] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.196] schedule+0x5d/0xd0 [59.196] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.197] ? __pfx_autoremove_wake_function+0x10/0x10 [59.197] scrub_pause_off+0x21/0x50 [btrfs] [59.197] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.197] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] scrub_stripe+0x20d/0x740 [btrfs] [59.198] scrub_chunk+0xc4/0x130 [btrfs] [59.198] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.199] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.199] ? _copy_from_user+0x7b/0x80 [59.199] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.199] ? refill_stock+0x33/0x50 [59.199] ? should_failslab+0xa/0x20 [59.199] ? kmem_cache_alloc_node+0x151/0x460 [59.199] ? alloc_io_context+0x1b/0x80 [59.199] ? preempt_count_add+0x70/0xa0 [59.199] ? __x64_sys_ioctl+0x88/0xc0 [59.199] __x64_sys_ioctl+0x88/0xc0 [59.199] do_syscall_64+0x38/0x90 [59.199] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.199] RIP: 0033:0x7f82ffaffe9b [59.199] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.199] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b [59.199] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003 [59.199] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.199] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640 [59.199] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.199] </TASK> [59.199] INFO: task btrfs:346773 blocked for more than 120 seconds. [59.200] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.201] task:btrfs state:D stack:0 pid:346773 ppid:1 flags:0x00004002 [59.201] Call Trace: [59.201] <TASK> [59.201] __schedule+0x392/0xa70 [59.201] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.201] schedule+0x5d/0xd0 [59.201] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.201] ? __pfx_autoremove_wake_function+0x10/0x10 [59.201] scrub_pause_off+0x21/0x50 [btrfs] [59.202] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.202] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.202] ? __pfx_autoremove_wake_function+0x10/0x10 [59.202] scrub_stripe+0x20d/0x740 [btrfs] [59.202] scrub_chunk+0xc4/0x130 [btrfs] [59.203] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.203] ? __pfx_autoremove_wake_function+0x10/0x10 [59.203] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.203] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.203] ? _copy_from_user+0x7b/0x80 [59.203] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.204] ? should_failslab+0xa/0x20 [59.204] ? kmem_cache_alloc_node+0x151/0x460 [59.204] ? alloc_io_context+0x1b/0x80 [59.204] ? preempt_count_add+0x70/0xa0 [59.204] ? __x64_sys_ioctl+0x88/0xc0 [59.204] __x64_sys_ioctl+0x88/0xc0 [59.204] do_syscall_64+0x38/0x90 [59.204] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.204] RIP: 0033:0x7f82ffaffe9b [59.204] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.204] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b [59.204] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003 [59.204] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.204] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640 [59.204] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.204] </TASK> [59.204] INFO: task btrfs:346774 blocked for more than 120 seconds. [59.205] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.206] task:btrfs state:D stack:0 pid:346774 ppid:1 flags:0x00004002 [59.206] Call Trace: [59.206] <TASK> [59.206] __schedule+0x392/0xa70 [59.206] schedule+0x5d/0xd0 [59.206] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.206] ? __pfx_autoremove_wake_function+0x10/0x10 [59.206] scrub_pause_off+0x21/0x50 [btrfs] [59.207] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.207] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.207] ? __pfx_autoremove_wake_function+0x10/0x10 [59.207] scrub_stripe+0x20d/0x740 [btrfs] [59.208] scrub_chunk+0xc4/0x130 [btrfs] [59.208] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.208] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.208] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.208] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.209] ? _copy_from_user+0x7b/0x80 [59.209] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.209] ? should_failslab+0xa/0x20 [59.209] ? kmem_cache_alloc_node+0x151/0x460 [59.209] ? alloc_io_context+0x1b/0x80 [59.209] ? preempt_count_add+0x70/0xa0 [59.209] ? __x64_sys_ioctl+0x88/0xc0 [59.209] __x64_sys_ioctl+0x88/0xc0 [59.209] do_syscall_64+0x38/0x90 [59.209] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.209] RIP: 0033:0x7f82ffaffe9b [59.209] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.209] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b [59.209] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003 [59.209] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.209] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640 [59.209] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.209] </TASK> [59.209] INFO: task btrfs:346775 blocked for more than 120 seconds. [59.210] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.211] task:btrfs state:D stack:0 pid:346775 ppid:1 flags:0x00004002 [59.211] Call Trace: [59.211] <TASK> [59.211] __schedule+0x392/0xa70 [59.211] schedule+0x5d/0xd0 [59.211] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.211] ? __pfx_autoremove_wake_function+0x10/0x10 [59.211] scrub_pause_off+0x21/0x50 [btrfs] [59.212] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.212] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.212] ? __pfx_autoremove_wake_function+0x10/0x10 [59.212] scrub_stripe+0x20d/0x740 [btrfs] [59.213] scrub_chunk+0xc4/0x130 [btrfs] [59.213] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.213] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.213] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.213] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.214] ? _copy_from_user+0x7b/0x80 [59.214] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.214] ? should_failslab+0xa/0x20 [59.214] ? kmem_cache_alloc_node+0x151/0x460 [59.214] ? alloc_io_context+0x1b/0x80 [59.214] ? preempt_count_add+0x70/0xa0 [59.214] ? __x64_sys_ioctl+0x88/0xc0 [59.214] __x64_sys_ioctl+0x88/0xc0 [59.214] do_syscall_64+0x38/0x90 [59.214] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.214] RIP: 0033:0x7f82ffaffe9b [59.214] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.214] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b [59.214] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003 [59.214] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.214] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640 [59.214] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.214] </TASK> [59.214] INFO: task btrfs:346776 blocked for more than 120 seconds. [59.215] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.217] task:btrfs state:D stack:0 pid:346776 ppid:1 flags:0x00004002 [59.217] Call Trace: [59.217] <TASK> [59.217] __schedule+0x392/0xa70 [59.217] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.217] schedule+0x5d/0xd0 [59.217] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.217] ? __pfx_autoremove_wake_function+0x10/0x10 [59.217] scrub_pause_off+0x21/0x50 [btrfs] [59.217] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.217] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.218] ? __pfx_autoremove_wake_function+0x10/0x10 [59.218] scrub_stripe+0x20d/0x740 [btrfs] [59.218] scrub_chunk+0xc4/0x130 [btrfs] [59.218] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.219] ? __pfx_autoremove_wake_function+0x10/0x10 [59.219] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.219] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.219] ? _copy_from_user+0x7b/0x80 [59.219] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.219] ? should_failslab+0xa/0x20 [59.219] ? kmem_cache_alloc_node+0x151/0x460 [59.219] ? alloc_io_context+0x1b/0x80 [59.219] ? preempt_count_add+0x70/0xa0 [59.219] ? __x64_sys_ioctl+0x88/0xc0 [59.219] __x64_sys_ioctl+0x88/0xc0 [59.219] do_syscall_64+0x38/0x90 [59.219] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.219] RIP: 0033:0x7f82ffaffe9b [59.219] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.219] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b [59.219] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003 [59.219] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.219] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640 [59.219] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.219] </TASK> [59.219] INFO: task btrfs:346822 blocked for more than 120 seconds. [59.220] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.222] task:btrfs state:D stack:0 pid:346822 ppid:1 flags:0x00004002 [59.222] Call Trace: [59.222] <TASK> [59.222] __schedule+0x392/0xa70 [59.222] schedule+0x5d/0xd0 [59.222] btrfs_scrub_cancel+0x91/0x100 [btrfs] [59.222] ? __pfx_autoremove_wake_function+0x10/0x10 [59.222] btrfs_commit_transaction+0x572/0xeb0 [btrfs] [59.223] ? start_transaction+0xcb/0x610 [btrfs] [59.223] prepare_to_relocate+0x111/0x1a0 [btrfs] [59.223] relocate_block_group+0x57/0x5d0 [btrfs] [59.223] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [59.223] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [59.224] ? __pfx_autoremove_wake_function+0x10/0x10 [59.224] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [59.224] btrfs_balance+0x8ff/0x11d0 [btrfs] [59.224] ? __kmem_cache_alloc_node+0x14a/0x410 [59.224] btrfs_ioctl+0x2334/0x32c0 [btrfs] [59.225] ? mod_objcg_state+0xd2/0x360 [59.225] ? refill_obj_stock+0xb0/0x160 [59.225] ? seq_release+0x25/0x30 [59.225] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [59.225] ? percpu_counter_add_batch+0x2e/0xa0 [59.225] ? __x64_sys_ioctl+0x88/0xc0 [59.225] __x64_sys_ioctl+0x88/0xc0 [59.225] do_syscall_64+0x38/0x90 [59.225] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.225] RIP: 0033:0x7f381a6ffe9b [59.225] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.225] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [59.225] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [59.225] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [59.225] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [59.225] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [59.225] </TASK> What happens is the following: 1) A scrub is running, so fs_info->scrubs_running is 1; 2) Task A starts block group relocation, and at btrfs_relocate_chunk() it pauses scrub by calling btrfs_scrub_pause(). That increments fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running); 3) The scrub task pauses at scrub_pause_off(), waiting for fs_info->scrub_pause_req to decrease to 0; 4) Task A then enters btrfs_relocate_block_group(), and down that call chain we start a transaction and then attempt to commit it; 5) When task A calls btrfs_commit_transaction(), it either will do the commit itself or wait for some other task that already started the commit of the transaction - it doesn't matter which case; 6) The transaction commit enters state TRANS_STATE_COMMIT_START; 7) An error happens during the transaction commit, like -ENOSPC when running delayed refs or delayed items for example; 8) This results in calling transaction.c:cleanup_transaction(), where we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req from 0 to 1, and blocking this task waiting for fs_info->scrubs_running to decrease to 0; 9) From this point on, both the transaction commit and the scrub task hang forever: 1) The transaction commit is waiting for fs_info->scrubs_running to be decreased to 0; 2) The scrub task is at scrub_pause_off() waiting for fs_info->scrub_pause_req to decrease to 0 - so it can not proceed to stop the scrub and decrement fs_info->scrubs_running from 0 to 1. Therefore resulting in a deadlock. Fix this by having cleanup_transaction(), called if a transaction commit fails, not call btrfs_scrub_cancel() if relocation is in progress, and having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if the relocation failed and a transaction abort happened. This was triggered with btrfs/061 from fstests. Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28btrfs: scan device in non-exclusive modeAnand Jain
This fixes mkfs/mount/check failures due to race with systemd-udevd scan. During the device scan initiated by systemd-udevd, other user space EXCL operations such as mkfs, mount, or check may get blocked and result in a "Device or resource busy" error. This is because the device scan process opens the device with the EXCL flag in the kernel. Two reports were received: - btrfs/179 test case, where the fsck command failed with the -EBUSY error - LTP pwritev03 test case, where mkfs.vfs failed with the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem on the device. In both cases, fsck and mkfs (respectively) were racing with a systemd-udevd device scan, and systemd-udevd won, resulting in the -EBUSY error for fsck and mkfs. Reproducing the problem has been difficult because there is a very small window during which these userspace threads can race to acquire the exclusive device open. Even on the system where the problem was observed, the problem occurrences were anywhere between 10 to 400 iterations and chances of reproducing decreases with debug printk()s. However, an exclusive device open is unnecessary for the scan process, as there are no write operations on the device during scan. Furthermore, during the mount process, the superblock is re-read in the below function call chain: btrfs_mount_root btrfs_open_devices open_fs_devices btrfs_open_one_device btrfs_get_bdev_and_sb So, to fix this issue, removes the FMODE_EXCL flag from the scan operation, and add a comment. The case where mkfs may still write to the device and a scan is running, the btrfs signature is not written at that time so scan will not recognize such device. Reported-by: Sherry Yang <sherry.yang@oracle.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28btrfs: fix race between quota disable and quota assign ioctlsFilipe Manana
The quota assign ioctl can currently run in parallel with a quota disable ioctl call. The assign ioctl uses the quota root, while the disable ioctl frees that root, and therefore we can have a use-after-free triggered in the assign ioctl, leading to a trace like the following when KASAN is enabled: [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0 [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736 [672.724][T736] [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37 [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [672.727][T736] Call Trace: [672.728][T736] <TASK> [672.728][T736] dump_stack_lvl+0xd9/0x150 [672.725][T736] print_report+0xc1/0x5e0 [672.720][T736] ? __virt_addr_valid+0x61/0x2e0 [672.727][T736] ? __phys_addr+0xc9/0x150 [672.725][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.722][T736] kasan_report+0xc0/0xf0 [672.729][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.724][T736] btrfs_search_slot+0x2962/0x2db0 [672.723][T736] ? fs_reclaim_acquire+0xba/0x160 [672.722][T736] ? split_leaf+0x13d0/0x13d0 [672.726][T736] ? rcu_is_watching+0x12/0xb0 [672.723][T736] ? kmem_cache_alloc+0x338/0x3c0 [672.722][T736] update_qgroup_status_item+0xf7/0x320 [672.724][T736] ? add_qgroup_rb+0x3d0/0x3d0 [672.739][T736] ? do_raw_spin_lock+0x12d/0x2b0 [672.730][T736] ? spin_bug+0x1d0/0x1d0 [672.737][T736] btrfs_run_qgroups+0x5de/0x840 [672.730][T736] ? btrfs_qgroup_rescan_worker+0xa70/0xa70 [672.738][T736] ? __del_qgroup_relation+0x4ba/0xe00 [672.738][T736] btrfs_ioctl+0x3d58/0x5d80 [672.735][T736] ? tomoyo_path_number_perm+0x16a/0x550 [672.737][T736] ? tomoyo_execute_permission+0x4a0/0x4a0 [672.731][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.737][T736] ? __sanitizer_cov_trace_switch+0x54/0x90 [672.734][T736] ? do_vfs_ioctl+0x132/0x1660 [672.730][T736] ? vfs_fileattr_set+0xc40/0xc40 [672.730][T736] ? _raw_spin_unlock_irq+0x2e/0x50 [672.732][T736] ? sigprocmask+0xf2/0x340 [672.737][T736] ? __fget_files+0x26a/0x480 [672.732][T736] ? bpf_lsm_file_ioctl+0x9/0x10 [672.738][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.736][T736] __x64_sys_ioctl+0x198/0x210 [672.736][T736] do_syscall_64+0x39/0xb0 [672.731][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.739][T736] RIP: 0033:0x4556ad [672.742][T736] </TASK> [672.743][T736] [672.748][T736] Allocated by task 27677: [672.743][T736] kasan_save_stack+0x22/0x40 [672.741][T736] kasan_set_track+0x25/0x30 [672.741][T736] __kasan_kmalloc+0xa4/0xb0 [672.749][T736] btrfs_alloc_root+0x48/0x90 [672.746][T736] btrfs_create_tree+0x146/0xa20 [672.744][T736] btrfs_quota_enable+0x461/0x1d20 [672.743][T736] btrfs_ioctl+0x4a1c/0x5d80 [672.747][T736] __x64_sys_ioctl+0x198/0x210 [672.749][T736] do_syscall_64+0x39/0xb0 [672.744][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.756][T736] [672.757][T736] Freed by task 27677: [672.759][T736] kasan_save_stack+0x22/0x40 [672.759][T736] kasan_set_track+0x25/0x30 [672.756][T736] kasan_save_free_info+0x2e/0x50 [672.751][T736] ____kasan_slab_free+0x162/0x1c0 [672.758][T736] slab_free_freelist_hook+0x89/0x1c0 [672.752][T736] __kmem_cache_free+0xaf/0x2e0 [672.752][T736] btrfs_put_root+0x1ff/0x2b0 [672.759][T736] btrfs_quota_disable+0x80a/0xbc0 [672.752][T736] btrfs_ioctl+0x3e5f/0x5d80 [672.756][T736] __x64_sys_ioctl+0x198/0x210 [672.753][T736] do_syscall_64+0x39/0xb0 [672.765][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.769][T736] [672.768][T736] The buggy address belongs to the object at ffff888022ec0000 [672.768][T736] which belongs to the cache kmalloc-4k of size 4096 [672.769][T736] The buggy address is located 520 bytes inside of [672.769][T736] freed 4096-byte region [ffff888022ec0000, ffff888022ec1000) [672.760][T736] [672.764][T736] The buggy address belongs to the physical page: [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0 [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002 [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 [672.771][T736] page dumped because: kasan: bad access detected [672.778][T736] page_owner tracks the page as allocated [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88 [672.779][T736] get_page_from_freelist+0x119c/0x2d50 [672.779][T736] __alloc_pages+0x1cb/0x4a0 [672.776][T736] alloc_pages+0x1aa/0x270 [672.773][T736] allocate_slab+0x260/0x390 [672.771][T736] ___slab_alloc+0xa9a/0x13e0 [672.778][T736] __slab_alloc.constprop.0+0x56/0xb0 [672.771][T736] __kmem_cache_alloc_node+0x136/0x320 [672.789][T736] __kmalloc+0x4e/0x1a0 [672.783][T736] tomoyo_realpath_from_path+0xc3/0x600 [672.781][T736] tomoyo_path_perm+0x22f/0x420 [672.782][T736] tomoyo_path_unlink+0x92/0xd0 [672.780][T736] security_path_unlink+0xdb/0x150 [672.788][T736] do_unlinkat+0x377/0x680 [672.788][T736] __x64_sys_unlink+0xca/0x110 [672.789][T736] do_syscall_64+0x39/0xb0 [672.783][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.784][T736] page last free stack trace: [672.787][T736] free_pcp_prepare+0x4e5/0x920 [672.787][T736] free_unref_page+0x1d/0x4e0 [672.784][T736] __unfreeze_partials+0x17c/0x1a0 [672.797][T736] qlist_free_all+0x6a/0x180 [672.796][T736] kasan_quarantine_reduce+0x189/0x1d0 [672.797][T736] __kasan_slab_alloc+0x64/0x90 [672.793][T736] kmem_cache_alloc+0x17c/0x3c0 [672.799][T736] getname_flags.part.0+0x50/0x4e0 [672.799][T736] getname_flags+0x9e/0xe0 [672.792][T736] vfs_fstatat+0x77/0xb0 [672.791][T736] __do_sys_newlstat+0x84/0x100 [672.798][T736] do_syscall_64+0x39/0xb0 [672.796][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.790][T736] [672.791][T736] Memory state around the buggy address: [672.799][T736] ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.805][T736] ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ^ [672.809][T736] ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex before calling btrfs_run_qgroups(), which is what all qgroup ioctls should call. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/ CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-26Merge tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs client fixes from Steve French: "Twelve cifs/smb3 client fixes (most also for stable) - forced umount fix - fix for two perf regressions - reconnect fixes - small debugging improvements - multichannel fixes" * tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix unusable share after force unmount failure cifs: fix dentry lookups in directory handle cache smb3: lower default deferred close timeout to address perf regression cifs: fix missing unload_nls() in smb2_reconnect() cifs: avoid race conditions with parallel reconnects cifs: append path to open_enter trace event cifs: print session id while listing open files cifs: dump pending mids for all channels in DebugData cifs: empty interface list when server doesn't support query interfaces cifs: do not poll server interfaces too regularly cifs: lock chan_lock outside match_session cifs: check only tcon status on tcon related functions
2023-03-25Merge tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull yet more xfs bug fixes from Darrick Wong: "The first bugfix addresses a longstanding problem where we use the wrong file mapping cursors when trying to compute the speculative preallocation quantity. This has been causing sporadic crashes when alwayscow mode is engaged. The other two fixes correct minor problems in more recent changes. - Fix the new allocator tracepoints because git am mismerged the changes such that the trace_XXX got rebased to be in function YYY instead of XXX - Ensure that the perag AGFL_RESET state is consistent with whatever we've just read off the disk - Fix a bug where we used the wrong iext cursor during a write begin" * tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix mismerged tracepoints xfs: clear incore AGFL_RESET state if it's not needed xfs: pass the correct cursor to xfs_iomap_prealloc_size
2023-03-25ksmbd: remove unused is_char_allowed functionTom Rix
clang with W=1 reports fs/ksmbd/unicode.c:122:19: error: unused function 'is_char_allowed' [-Werror,-Wunused-function] static inline int is_char_allowed(char *ch) ^ This function is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-25ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARNMarios Makassikis
Commit 83dcedd5540d ("ksmbd: fix infinite loop in ksmbd_conn_handler_loop()"), changes GFP modifiers passed to kvmalloc(). This cause xfstests generic/551 test to fail. We limit pdu length size according to connection status and maximum number of connections. In the rest, memory allocation of request is limited by credit management. so these flags are no longer needed. Fixes: 83dcedd5540d ("ksmbd: fix infinite loop in ksmbd_conn_handler_loop()") Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-25Merge tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "This batch started with some debugging enhancements to the new allocator refactoring that we put in 6.3-rc1 to assist developers in rebasing their dev branches. As for more serious code changes -- there's a bug fix to make the lockless allocator scan the whole filesystem before resorting to the locking allocator. We're also adding a selftest for the venerable directory/xattr hash function to make sure that it produces consistent results so that we can address any fallout as soon as possible. - Add a few debugging assertions so that people (me) trying to port code to the new allocator functions don't mess up the caller requirements - Relax some overly cautious lock ordering enforcement in the new allocator code, which means that file allocations will locklessly scan for the best space they can get before backing off to the traditional lock-and-really-get-it behavior - Add tracepoints to make it easier to trace the xfs allocator behavior - Actually test the dir/xattr hash algorithm to make sure it produces consistent results across all the platforms XFS supports" * tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: test dir/attr hash when loading module xfs: add tracepoints for each of the externally visible allocators xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags xfs: try to idiot-proof the allocators
2023-03-24Merge tag 'mm-hotfixes-stable-2023-03-24-17-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) mm: mmap: remove newline at the end of the trace mailmap: add entries for Richard Leitner kcsan: avoid passing -g for test kfence: avoid passing -g for test mm: kfence: fix using kfence_metadata without initialization in show_object() lib: dhry: fix unstable smp_processor_id(_) usage mailmap: add entry for Enric Balletbo i Serra mailmap: map Sai Prakash Ranjan's old address to his current one mailmap: map Rajendra Nayak's old address to his current one Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare" mailmap: add entry for Tobias Klauser kasan, powerpc: don't rename memintrinsics if compiler adds prefixes mm/ksm: fix race with VMA iteration and mm_struct teardown kselftest: vm: fix unused variable warning mm: fix error handling for map_deny_write_exec mm: deduplicate error handling for map_deny_write_exec checksyscalls: ignore fstat to silence build warning on LoongArch nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy() test_maple_tree: add more testing for mas_empty_area() maple_tree: fix mas_skip_node() end slot detection ...
2023-03-24Merge tag '6.3-rc3-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: - return less confusing messages on unsupported dialects (STATUS_NOT_SUPPORTED instead of I/O error) - fix for overly frequent inactive session termination - fix refcount leak - fix bounds check problems found by static checkers - fix to advertise named stream support correctly - Fix AES256 signing bug when connected to from MacOS * tag '6.3-rc3-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: return unsupported error on smb1 mount ksmbd: return STATUS_NOT_SUPPORTED on unsupported smb2.0 dialect ksmbd: don't terminate inactive sessions after a few seconds ksmbd: fix possible refcount leak in smb2_open() ksmbd: add low bound validation to FSCTL_QUERY_ALLOCATED_RANGES ksmbd: add low bound validation to FSCTL_SET_ZERO_DATA ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION ksmbd: fix wrong signingkey creation when encryption is AES256
2023-03-24xfs: fix mismerged tracepointsxfs-6.3-fixes_2023-04-05xfs-6.3-fixes_2023-04-04xfs-6.3-fixes-7xfs-6.3-fixes-6Darrick J. Wong
At some point in between sending this patch to the list and merging it into for-next, the tracepoints got all mixed up because I've over-reliant on automated tools not sucking. The end result is that the tracepoints are all wrong, so fix them. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-24smb3: fix unusable share after force unmount failureSteve French
If user does forced unmount ("umount -f") while files are still open on the share (as was seen in a Kubernetes example running on SMB3.1.1 mount) then we were marking the share as "TID_EXITING" in umount_begin() which caused all subsequent operations (except write) to fail ... but unfortunately when umount_begin() is called we do not know yet that there are open files or active references on the share that would prevent unmount from succeeding. Kubernetes had example when they were doing umount -f when files were open which caused the share to become unusable until the files were closed (and the umount retried). Fix this so that TID_EXITING is not set until we are about to send the tree disconnect (not at the beginning of forced umounts in umount_begin) so that if "umount -f" fails (due to open files or references) the mount is still usable. Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-24cifs: fix dentry lookups in directory handle cachePaulo Alcantara
Get rid of any prefix paths in @path before lookup_positive_unlocked() as it will call ->lookup() which already adds those prefix paths through build_path_from_dentry(). This has caused a performance regression when mounting shares with a prefix path where readdir(2) would end up retrying several times to open bad directory names that contained duplicate prefix paths. Fix this by skipping any prefix paths in @path before calling lookup_positive_unlocked(). Fixes: e4029e072673 ("cifs: find and use the dentry for cached non-root directories also") Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-24smb3: lower default deferred close timeout to address perf regressionSteve French
Performance tests with large number of threads noted that the change of the default closetimeo (deferred close timeout between when close is done by application and when client has to send the close to the server), to 5 seconds from 1 second, significantly degraded perf in some cases like this (in the filebench example reported, the stats show close requests on the wire taking twice as long, and 50% regression in filebench perf). This is stil configurable via mount parm closetimeo, but to be safe, decrease default back to its previous value of 1 second. Reported-by: Yin Fengwei <fengwei.yin@intel.com> Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/lkml/997614df-10d4-af53-9571-edec36b0e2f3@intel.com/ Fixes: 5efdd9122eff ("smb3: allow deferred close timeout to be configurable") Cc: stable@vger.kernel.org # 6.0+ Tested-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-24cifs: fix missing unload_nls() in smb2_reconnect()Paulo Alcantara
Make sure to unload_nls() @nls_codepage if we no longer need it. Fixes: bc962159e8e3 ("cifs: avoid race conditions with parallel reconnects") Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-24xfs: clear incore AGFL_RESET state if it's not neededxfs-6.3-fixes-5Darrick J. Wong
Prior to commit 7ac2ff8bb371, when we loaded the incore perag structure with information from the AGF header, we would set or clear the pagf_agfl_reset field based on whether or not the AGFL list was misaligned within the block. IOWs, it's an incore state bit that's supposed to cache something in the ondisk metadata. Therefore, the code still needs to support clearing the incore bit if (somehow) the AGFL were to correct itself. It turns out that xfs_repair does exactly this -- phase 4 loads the AGF to scan the rmapbt for corrupt records, which can set NEEDS_AGFL_RESET. The scan unsets AGF_INIT but doesn't unset NEEDS_AGFL_RESET. Phase 5 totally rewrites the AGFL and fixes the alignment problem, didn't clear NEEDS_AGFL_RESET historically, and reloads the perag state to fix the freelist. This results in the AGFL being reset based on stale data, which then causes the new AGFL blocks to be leaked. A subsequent xfs_repair -n then complains about the leaks. One could argue that phase 5 ought to clear this bit directly when it reloads the perag AGF data after rewriting the AGFL, but libxfs used to handle this for us, so it should go back to doing that. Found by fuzzing flfirst = ones in xfs/352. Fixes: 7ac2ff8bb371 ("xfs: perags need atomic operational state") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-24xfs: pass the correct cursor to xfs_iomap_prealloc_sizeDarrick J. Wong
In xfs_buffered_write_iomap_begin, @icur is the iext cursor for the data fork and @ccur is the cursor for the cow fork. Pass in whichever cursor corresponds to allocfork, because otherwise the xfs_iext_prev_extent call can use the data fork cursor to walk off the end of the cow fork structure. Best case it returns the wrong results, worst case it does this: stack segment: 0000 [#1] PREEMPT SMP CPU: 2 PID: 3141909 Comm: fsstress Tainted: G W 6.3.0-rc2-xfsx #6.3.0-rc2 7bf5cc2e98997627cae5c930d890aba3aeec65dd Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014 RIP: 0010:xfs_iext_prev+0x71/0x150 [xfs] RSP: 0018:ffffc90002233aa8 EFLAGS: 00010297 RAX: 000000000000000f RBX: 000000000000000e RCX: 000000000000000c RDX: 0000000000000002 RSI: 000000000000000e RDI: ffff8883d0019ba0 RBP: 989642409af8a7a7 R08: ffffea0000000001 R09: 0000000000000002 R10: 0000000000000000 R11: 000000000000000c R12: ffffc90002233b00 R13: ffff8883d0019ba0 R14: 989642409af8a6bf R15: 000ffffffffe0000 FS: 00007fdf8115f740(0000) GS:ffff88843fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdf8115e000 CR3: 0000000357256000 CR4: 00000000003506e0 Call Trace: <TASK> xfs_iomap_prealloc_size.constprop.0.isra.0+0x1a6/0x410 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c] xfs_buffered_write_iomap_begin+0xa87/0xc60 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c] iomap_iter+0x132/0x2f0 iomap_file_buffered_write+0x92/0x330 xfs_file_buffered_write+0xb1/0x330 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c] vfs_write+0x2eb/0x410 ksys_write+0x65/0xe0 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Found by xfs/538 in alwayscow mode, but this doesn't seem particular to that test. Fixes: 590b16516ef3 ("xfs: refactor xfs_iomap_prealloc_size") Actually-Fixes: 66ae56a53f0e ("xfs: introduce an always_cow mode") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-24Merge tag 'for-6.3-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes, the zoned accounting fix is spread across a few patches, preparatory and the actual fixes: - zoned mode: - fix accounting of unusable zone space - fix zone activation condition for DUP profile - preparatory patches - improved error handling of missing chunks - fix compiler warning" * tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: drop space_info->active_total_bytes btrfs: zoned: count fresh BG region as zone unusable btrfs: use temporary variable for space_info in btrfs_update_block_group btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename btrfs: handle missing chunk mapping more gracefully