summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-02-10xfs: revert commit 8954c44ff477xfs-6.3-merge-2Darrick J. Wong
The name passed into __xfs_xattr_put_listent is exactly namelen bytes long and not null-terminated. Passing namelen+1 to the strscpy function strscpy(offset, (char *)name, namelen + 1); is therefore wrong. Go back to the old code, which works fine because strncpy won't find a null in @name and stops after namelen bytes. It really could be a memcpy call, but it worked for years. Reported-by: syzbot+898115bc6d7140437215@syzkaller.appspotmail.com Fixes: 8954c44ff477 ("xfs: use strscpy() to instead of strncpy()") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-10xfs: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-09xfs: allow setting full range of panic tagsDonald Douwsma
xfs will not allow combining other panic masks with XFS_PTAG_VERIFIER_ERROR. # sysctl fs.xfs.panic_mask=511 sysctl: setting key "fs.xfs.panic_mask": Invalid argument fs.xfs.panic_mask = 511 Update to the maximum value that can be set to allow the full range of masks. Do this using a mask of possible values to prevent this happening again as suggested by Darrick. Fixes: d519da41e2b7 ("xfs: Introduce XFS_PTAG_VERIFIER_ERROR panic mask") Signed-off-by: Donald Douwsma <ddouwsma@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-02-05xfs: don't use BMBT btree split workers for IO completionxfs-6.3-merge-1Dave Chinner
When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT. A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once: 1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation. 2. there must be a no available xfs_alloc_wq worker threads available in the worker pool. 3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run 4. The split work from the unwritten extent conversion must run first. 5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation. 6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from. At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock.... This is a highly improbably series of events, but there it is. There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread. This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here. The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing variable names in xfs_refcount_item.cDarrick J. Wong
Variable names in this code module are inconsistent and confusing. xfs_phys_extent describe physical mappings, so rename them "pmap". xfs_refcount_intents describe refcount intents, so rename them "ri". Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass refcount intent directly through the log intent codeDarrick J. Wong
Pass the incore refcount intent through the CUI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing variable names in xfs_rmap_item.cDarrick J. Wong
Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_rmap_intents describe block mapping intents, so rename them "ri". Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass rmap space mapping directly through the log intent codeDarrick J. Wong
Pass the incore rmap space mapping through the RUI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing xfs_extent_item variable namesDarrick J. Wong
Change the name of all pointers to xfs_extent_item structures to "xefi" to make the name consistent and because the current selections ("new" and "free") mean other things in C. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass xfs_extent_free_item directly through the log intent codeDarrick J. Wong
Pass the incore xfs_extent_free_item through the EFI logging code instead of repeatedly boxing and unboxing parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: fix confusing variable names in xfs_bmap_item.cDarrick J. Wong
Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_bmap_intents describe block mapping intents, so rename them "bi". Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: pass the xfs_bmbt_irec directly through the log intent codeDarrick J. Wong
Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05xfs: use strscpy() to instead of strncpy()Xu Panda
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL-terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-01-28Merge tag '6.2-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd server fixes from Steve French: "Four smb3 server fixes, all also for stable: - fix for signing bug - fix to more strictly check packet length - add a max connections parm to limit simultaneous connections - fix error message flood that can occur with newer Samba xattr format" * tag '6.2-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: downgrade ndr version error message to debug ksmbd: limit pdu length size according to connection status ksmbd: do not sign response to session request for guest login ksmbd: add max connections parameter
2023-01-27Merge tag '6.2-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fix from Steve French: "Fix for reconnect oops in smbdirect (RDMA), also is marked for stable" * tag '6.2-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix oops due to uncleared server->smbd_conn in reconnect
2023-01-27Merge tag 'ovl-fixes-6.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix two bugs, a recent one introduced in the last cycle, and an older one from v5.11" * tag 'ovl-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fail on invalid uid/gid mapping at copy up ovl: fix tmpfile leak
2023-01-27ovl: fail on invalid uid/gid mapping at copy upMiklos Szeredi
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then copy-up should fail, just like it would fail if the mounter task was doing the copy using "cp -a". There's a corner case where the "cp -a" would succeed but copy up fail: if there's a mapping of the invalid uid/gid (65534 by default) in the user namespace. This is because stat(2) will return this value if the mapping doesn't exist in the current user_ns and "cp -a" will in turn be able to create a file with this uid/gid. This behavior would be inconsistent with POSIX ACL's, which return -1 for invalid uid/gid which result in a failed copy. For consistency and simplicity fail the copy of the st_uid/st_gid are invalid. Fixes: 459c7c565ac3 ("ovl: unprivieged mounts") Cc: <stable@vger.kernel.org> # v5.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Seth Forshee <sforshee@kernel.org>
2023-01-27ovl: fix tmpfile leakMiklos Szeredi
Missed an error cleanup. Reported-by: syzbot+fd749a7ea127a84e0ffd@syzkaller.appspotmail.com Fixes: 2b1a77461f16 ("ovl: use vfs_tmpfile_open() helper") Cc: <stable@vger.kernel.org> # v6.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-01-25ksmbd: downgrade ndr version error message to debugNamjae Jeon
When user switch samba to ksmbd, The following message flood is coming when accessing files. Samba seems to changs dos attribute version to v5. This patch downgrade ndr version error message to debug. $ dmesg ... [68971.766914] ksmbd: v5 version is not supported [68971.779808] ksmbd: v5 version is not supported [68971.871544] ksmbd: v5 version is not supported [68971.910135] ksmbd: v5 version is not supported ... Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-25ksmbd: limit pdu length size according to connection statusNamjae Jeon
Stream protocol length will never be larger than 16KB until session setup. After session setup, the size of requests will not be larger than 16KB + SMB2 MAX WRITE size. This patch limits these invalidly oversized requests and closes the connection immediately. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18259 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-25Merge tag 'fs.fuse.acl.v6.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull fuse ACL fix from Christian Brauner: "The new posix acl API doesn't depend on the xattr handler infrastructure anymore and instead only relies on the posix acl inode operations. As a result daemons without FUSE_POSIX_ACL are unable to use posix acls like they used to. Fix this by copying what we did for overlayfs during the posix acl api conversion. Make fuse implement a dedicated ->get_inode_acl() method as does overlayfs. Fuse can then also uses this to express different needs for vfs permission checking during lookup and acl based retrieval via the regular system call path. This allows fuse to continue to refuse retrieving posix acls for daemons that don't set FUSE_POSXI_ACL for permission checking while also allowing a fuse server to retrieve it via the usual system calls" * tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fuse: fixes after adapting to new posix acl api
2023-01-25cifs: Fix oops due to uncleared server->smbd_conn in reconnectDavid Howells
In smbd_destroy(), clear the server->smbd_conn pointer after freeing the smbd_connection struct that it points to so that reconnection doesn't get confused. Fixes: 8ef130f9ec27 ("CIFS: SMBD: Implement function to destroy a SMB Direct connection") Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Long Li <longli@microsoft.com> Cc: Pavel Shilovsky <piastryyy@gmail.com> Cc: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-24Merge tag 'nfsd-6.2-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Nail another UAF in NFSD's filecache * tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't free files unconditionally in __nfsd_file_cache_purge
2023-01-24ext4: make xattr char unsignedness in hash explicitLinus Torvalds
Commit f3bbac32475b ("ext4: deal with legacy signed xattr name hash values") added a hashing function for the legacy case of having the xattr hash calculated using a signed 'char' type. It left the unsigned case alone, since it's all implicitly handled by the '-funsigned-char' compiler option. However, there's been some noise about back-porting it all into stable kernels that lack the '-funsigned-char', so let's just make that at least possible by making the whole 'this uses unsigned char' very explicit in the code itself. Whether such a back-port is really warranted or not, I'll leave to others, but at least together with this change it is technically sensible. Also, add a 'pr_warn_once()' for reporting the "hey, signedness for this hash calculation has changed" issue. Hopefully it never triggers except for that xfstests generic/454 test-case, but even if it does it's just good information to have. If for no other reason than "we can remove the legacy signed hash code entirely if nobody ever sees the message any more". Cc: Sasha Levin <sashal@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu>, Cc: Jason Donenfeld <Jason@zx2c4.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-24fuse: fixes after adapting to new posix acl apiChristian Brauner
This cycle we ported all filesystems to the new posix acl api. While looking at further simplifications in this area to remove the last remnants of the generic dummy posix acl handlers we realized that we regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use of posix acls. With the change to a dedicated posix acl api interacting with posix acls doesn't go through the old xattr codepaths anymore and instead only relies the get acl and set acl inode operations. Before this change fuse daemons that don't set FUSE_POSIX_ACL were able to get and set posix acl albeit with two caveats. First, that posix acls aren't cached. And second, that they aren't used for permission checking in the vfs. We regressed that use-case as we currently refuse to retrieve any posix acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons would see a change in behavior. We can restore the old behavior in multiple ways. We could change the new posix acl api and look for a dedicated xattr handler and if we find one prefer that over the dedicated posix acl api. That would break the consistency of the new posix acl api so we would very much prefer not to do that. We could introduce a new ACL_*_CACHE sentinel that would instruct the vfs permission checking codepath to not call into the filesystem and ignore acls. But a more straightforward fix for v6.2 is to do the same thing that Overlayfs does and give fuse a separate get acl method for permission checking. Overlayfs uses this to express different needs for vfs permission lookup and acl based retrieval via the regular system call path as well. Let fuse do the same for now. This way fuse can continue to refuse to retrieve posix acls for daemons that don't set FUSE_POSXI_ACL for permission checking while allowing a fuse server to retrieve it via the usual system calls. In the future, we could extend the get acl inode operation to not just pass a simple boolean to indicate rcu lookup but instead make it a flag argument. Then in addition to passing the information that this is an rcu lookup to the filesystem we could also introduce a flag that tells the filesystem that this is a request from the vfs to use these acls for permission checking. Then fuse could refuse the get acl request for permission checking when the daemon doesn't have FUSE_POSIX_ACL set in the same get acl method. This would also help Overlayfs and allow us to remove the second method for it as well. But since that change is more invasive as we need to update the get acl inode operation for multiple filesystems we should not do this as a fix for v6.2. Instead we will do this for the v6.3 merge window. Fwiw, since posix acls are now always correctly translated in the new posix acl api we could also allow them to be used for daemons without FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral change and again if dones should be done for v6.3. For now, let's just restore the original behavior. A nice side-effect of this change is that for fuse daemons with and without FUSE_POSIX_ACL the same code is used for posix acls in a backwards compatible way. This also means we can remove the legacy xattr handlers completely. We've also added comments to explain the expected behavior for daemons without FUSE_POSIX_ACL into the code. Fixes: 318e66856dde ("xattr: use posix acl api") Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-23nfsd: don't free files unconditionally in __nfsd_file_cache_purgeJeff Layton
nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed. Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-by: Torkil Svensgaard <torkil@drcmr.dk> Reported-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-22Merge tag 'gfs2-v6.2-rc4-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 writepage fix from Andreas Gruenbacher: - Fix a regression introduced by commit "gfs2: stop using generic_writepages in gfs2_ail1_start_one". * tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
2023-01-22Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"Andreas Gruenbacher
Commit b2b0a5e97855 switched from generic_writepages() to filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to replacing ->writepage() with ->writepages() and eventually eliminating the former. Function gfs2_ail1_start_one() is called from gfs2_log_flush(), our main function for flushing the filesystem log. Unfortunately, at least as implemented today, ->writepage() and ->writepages() are entirely different operations for journaled data inodes: while the former creates and submits transactions covering the data to be written, the latter flushes dirty buffers out to disk. With gfs2_ail1_start_one() now calling ->writepages(), we end up creating filesystem transactions while we are in the course of a log flush, which immediately deadlocks on the sdp->sd_log_flush_lock semaphore. Work around that by going back to how things used to work before commit b2b0a5e97855 for now; figuring out a superior solution will take time we don't have available right now. However ... Since the removal of generic_writepages() is imminent, open-code it here. We're already inside a blk_start_plug() ... blk_finish_plug() section here, so skip that part of the original generic_writepages(). This reverts commit b2b0a5e978552e348f85ad9c7568b630a5ede659. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de>
2023-01-21ext4: deal with legacy signed xattr name hash valuesLinus Torvalds
We potentially have old hashes of the xattr names generated on systems with signed 'char' types. Now that everybody uses '-funsigned-char', those hashes will no longer match. This only happens if you use xattrs names that have the high bit set, which probably doesn't happen in practice, but the xfstest generic/454 shows it. Instead of adding a new "signed xattr hash filesystem" bit and having to deal with all the possible combinations, just calculate the hash both ways if the first one fails, and always generate new hashes with the proper unsigned char version. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202212291509.704a11c9-oliver.sang@intel.com Link: https://lore.kernel.org/all/CAHk-=whUNjwqZXa-MH9KMmc_CpQpoFKFjAB9ZKHuu=TbsouT4A@mail.gmail.com/ Exposed-by: 3bc753c06dd0 ("kbuild: treat char as always unsigned") Cc: Eric Biggers <ebiggers@kernel.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu>, Cc: Jason Donenfeld <Jason@zx2c4.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-20Merge tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: - important fix for packet signature calculation error - three fixes to correct DFS deadlock, and DFS refresh problem - remove an unused DFS function, and duplicate tcon refresh code - DFS cache lookup fix - uninitialized rc fix * tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: remove unused function cifs: do not include page data when checking signature cifs: fix return of uninitialized rc in dfs_cache_update_tgthint() cifs: handle cache lookup errors different than -ENOENT cifs: remove duplicate code in __refresh_tcon() cifs: don't take exclusive lock for updating target hints cifs: avoid re-lookups in dfs_cache_find() cifs: fix potential deadlock in cache_refresh_path()
2023-01-20ksmbd: do not sign response to session request for guest loginMarios Makassikis
If ksmbd.mountd is configured to assign unknown users to the guest account ("map to guest = bad user" in the config), ksmbd signs the response. This is wrong according to MS-SMB2 3.3.5.5.3: 12. If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags field, and Session.IsAnonymous is FALSE, the server MUST sign the final session setup response before sending it to the client, as follows: [...] This fixes libsmb2 based applications failing to establish a session ("Wrong signature in received"). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-20ksmbd: add max connections parameterNamjae Jeon
Add max connections parameter to limit number of maximum simultaneous connections. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-20Merge tag 'for-6.2-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential out-of-bounds access to leaf data when seeking in an inline file - fix potential crash in quota when rescan races with disable - reimplement super block signature scratching by marking page/folio dirty and syncing block device, allow removing write_one_page * tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix race between quota rescan and disable leading to NULL pointer deref btrfs: fix invalid leaf access due to inline extent during lseek btrfs: stop using write_one_page in btrfs_scratch_superblock btrfs: factor out scratching of one regular super block
2023-01-19Merge tag 'zonefs-6.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: - A single patch to fix sync write operations to detect and handle errors due to external zone corruptions resulting in writes at invalid location, from me. * tag 'zonefs-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Detect append writes at invalid locations
2023-01-18cifs: remove unused functionPaulo Alcantara
Remove dfs_cache_update_tgthint() as it is not used anywhere. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18cifs: do not include page data when checking signatureEnzo Matsumiya
On async reads, page data is allocated before sending. When the response is received but it has no data to fill (e.g. STATUS_END_OF_FILE), __calc_signature() will still include the pages in its computation, leading to an invalid signature check. This patch fixes this by not setting the async read smb_rqst page data (zeroed by default) if its got_bytes is 0. This can be reproduced/verified with xfstests generic/465. Cc: <stable@vger.kernel.org> Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18Merge tag 'affs-for-6.2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs fix from David Sterba: "One minor fix for a KCSAN report" * tag 'affs-for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: initialize fsdata in affs_truncate()
2023-01-18Merge tag 'erofs-for-6.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Two patches fixes issues reported by syzbot, one fixes a missing `domain_id` mount option in documentation and a minor cleanup: - Fix wrong iomap->length calculation post EOF, which could cause a WARN_ON in iomap_iter_done() (Siddh) - Fix improper kvcalloc() use with __GFP_NOFAIL (me) - Add missing `domain_id` mount option in documentation (Jingbo) - Clean up fscache option parsing (Jingbo)" * tag 'erofs-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clean up parsing of fscache related options erofs: add documentation for 'domain_id' mount option erofs: fix kvcalloc() misuse with __GFP_NOFAIL erofs/zmap.c: Fix incorrect offset calculation
2023-01-18cifs: fix return of uninitialized rc in dfs_cache_update_tgthint()Paulo Alcantara
Fix this by initializing rc to 0 as cache_refresh_path() would not set it in case of success. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/all/202301190004.bEHvbKG6-lkp@intel.com/ Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18cifs: handle cache lookup errors different than -ENOENTPaulo Alcantara
lookup_cache_entry() might return an error different than -ENOENT (e.g. from ->char2uni), so handle those as well in cache_refresh_path(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18cifs: remove duplicate code in __refresh_tcon()Paulo Alcantara
The logic for creating or updating a cache entry in __refresh_tcon() could be simply done with cache_refresh_path(), so use it instead. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18cifs: don't take exclusive lock for updating target hintsPaulo Alcantara
Avoid contention while updating dfs target hints. This should be perfectly fine to update them under shared locks. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18cifs: avoid re-lookups in dfs_cache_find()Paulo Alcantara
Simply downgrade the write lock on cache updates from cache_refresh_path() and avoid unnecessary re-lookup in dfs_cache_find(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-18cifs: fix potential deadlock in cache_refresh_path()Paulo Alcantara
Avoid getting DFS referral from an exclusive lock in cache_refresh_path() because the tcon IPC used for getting the referral could be disconnected and thus causing a deadlock as shown below: task A task B ====== ====== cifs_demultiplex_thread() dfs_cache_find() cifs_handle_standard() cache_refresh_path() reconnect_dfs_server() down_write() dfs_cache_noreq_find() get_dfs_referral() down_read() <- deadlock smb2_get_dfs_refer() SMB2_ioctl() cifs_send_recv() compound_send_recv() wait_for_response() where task A cannot wake up task B because it is blocked on down_read() due to the exclusive lock held in cache_refresh_path() and therefore not being able to make progress. Fixes: c9f711039905 ("cifs: keep referral server sessions alive") Reviewed-by: Aurélien Aptel <aurelien.aptel@gmail.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-17Merge tag 'nfsd-6.2-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix recently introduced use-after-free bugs * tag 'nfsd-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: replace delayed_work with work_struct for nfsd_client_shrinker NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time NFSD: fix use-after-free in nfsd4_ssc_setup_dul()
2023-01-16Merge tag 'mm-hotfixes-stable-2023-01-16-15-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "21 hotfixes. Thirteen of these address pre-6.1 issues and hence have the cc:stable tag" * tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) init/Kconfig: fix typo (usafe -> unsafe) nommu: fix split_vma() map_count error nommu: fix do_munmap() error path nommu: fix memory leak in do_mmap() error path MAINTAINERS: update Robert Foss' email address proc: fix PIE proc-empty-vm, proc-pid-vm tests mm: update mmap_sem comments to refer to mmap_lock include/linux/mm: fix release_pages_arg kernel doc comment lib/win_minmax: use /* notation for regular comments kasan: mark kasan_kunit_executing as static nilfs2: fix general protection fault in nilfs_btree_insert() Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects hugetlb: unshare some PMDs when splitting VMAs mm: fix vma->anon_name memory leak for anonymous shmem VMAs mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection() ...
2023-01-16btrfs: fix race between quota rescan and disable leading to NULL pointer derefFilipe Manana
If we have one task trying to start the quota rescan worker while another one is trying to disable quotas, we can end up hitting a race that results in the quota rescan worker doing a NULL pointer dereference. The steps for this are the following: 1) Quotas are enabled; 2) Task A calls the quota rescan ioctl and enters btrfs_qgroup_rescan(). It calls qgroup_rescan_init() which returns 0 (success) and then joins a transaction and commits it; 3) Task B calls the quota disable ioctl and enters btrfs_quota_disable(). It clears the bit BTRFS_FS_QUOTA_ENABLED from fs_info->flags and calls btrfs_qgroup_wait_for_completion(), which returns immediately since the rescan worker is not yet running. Then it starts a transaction and locks fs_info->qgroup_ioctl_lock; 4) Task A queues the rescan worker, by calling btrfs_queue_work(); 5) The rescan worker starts, and calls rescan_should_stop() at the start of its while loop, which results in 0 iterations of the loop, since the flag BTRFS_FS_QUOTA_ENABLED was cleared from fs_info->flags by task B at step 3); 6) Task B sets fs_info->quota_root to NULL; 7) The rescan worker tries to start a transaction and uses fs_info->quota_root as the root argument for btrfs_start_transaction(). This results in a NULL pointer dereference down the call chain of btrfs_start_transaction(). The stack trace is something like the one reported in Link tag below: general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f] CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.1.0-syzkaller-13872-gb6bb9676f216 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: btrfs-qgroup-rescan btrfs_work_helper RIP: 0010:start_transaction+0x48/0x10f0 fs/btrfs/transaction.c:564 Code: 48 89 fb 48 (...) RSP: 0018:ffffc90000ab7ab0 EFLAGS: 00010206 RAX: 0000000000000041 RBX: 0000000000000208 RCX: ffff88801779ba80 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: dffffc0000000000 R08: 0000000000000001 R09: fffff52000156f5d R10: fffff52000156f5d R11: 1ffff92000156f5c R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2bea75b718 CR3: 000000001d0cc000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btrfs_qgroup_rescan_worker+0x3bb/0x6a0 fs/btrfs/qgroup.c:3402 btrfs_work_helper+0x312/0x850 fs/btrfs/async-thread.c:280 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Modules linked in: So fix this by having the rescan worker function not attempt to start a transaction if it didn't do any rescan work. Reported-by: syzbot+96977faa68092ad382c4@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/000000000000e5454b05f065a803@google.com/ Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-16btrfs: fix invalid leaf access due to inline extent during lseekFilipe Manana
During lseek, for SEEK_DATA and SEEK_HOLE modes, we access the disk_bytenr of an extent without checking its type. However inline extents have their data starting the offset of the disk_bytenr field, so accessing that field when we have an inline extent can result in either of the following: 1) Interpret the inline extent's data as a disk_bytenr value; 2) In case the inline data is less than 8 bytes, we access part of some other item in the leaf, or unused space in the leaf; 3) In case the inline data is less than 8 bytes and the extent item is the first item in the leaf, we can access beyond the leaf's limit. So fix this by not accessing the disk_bytenr field if we have an inline extent. Fixes: b6e833567ea1 ("btrfs: make hole and data seeking a lot more efficient") Reported-by: Matthias Schoepfer <matthias.schoepfer@googlemail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216908 Link: https://lore.kernel.org/linux-btrfs/7f25442f-b121-2a3a-5a3d-22bcaae83cd4@leemhuis.info/ CC: stable@vger.kernel.org # 6.1 Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-16btrfs: stop using write_one_page in btrfs_scratch_superblockChristoph Hellwig
write_one_page is an awkward interface that expects the page locked and ->writepage to be implemented. Replace that by zeroing the signature bytes and synchronize the block device page using the proper bdev helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-16btrfs: factor out scratching of one regular super blockChristoph Hellwig
btrfs_scratch_superblocks open codes scratching super block of a non-zoned super block. Split the code to read, zero and write the superblock for regular devices into a separate helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>