summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-06-06NFSD: Fix possible sleep during nfsd4_release_lockowner()Chuck Lever
commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream. nfsd4_release_lockowner() holds clp->cl_lock when it calls check_for_locks(). However, check_for_locks() calls nfsd_file_get() / nfsd_file_put() to access the backing inode's flc_posix list, and nfsd_file_put() can sleep if the inode was recently removed. Let's instead rely on the stateowner's reference count to gate whether the release is permitted. This should be a reliable indication of locks-in-use since file lock operations and ->lm_get_owner take appropriate references, which are released appropriately when file locks are removed. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06exec: Force single empty string when argv is emptyKees Cook
commit dcd46d897adb70d63e025f175a00a89797d31a43 upstream. Quoting[1] Ariadne Conill: "In several other operating systems, it is a hard requirement that the second argument to execve(2) be the name of a program, thus prohibiting a scenario where argc < 1. POSIX 2017 also recommends this behaviour, but it is not an explicit requirement[2]: The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions. ... Interestingly, Michael Kerrisk opened an issue about this in 2008[3], but there was no consensus to support fixing this issue then. Hopefully now that CVE-2021-4034 shows practical exploitative use[4] of this bug in a shellcode, we can reconsider. This issue is being tracked in the KSPP issue tracker[5]." While the initial code searches[6][7] turned up what appeared to be mostly corner case tests, trying to that just reject argv == NULL (or an immediately terminated pointer list) quickly started tripping[8] existing userspace programs. The next best approach is forcing a single empty string into argv and adjusting argc to match. The number of programs depending on argc == 0 seems a smaller set than those calling execve with a NULL argv. Account for the additional stack space in bprm_stack_limits(). Inject an empty string when argc == 0 (and set argc = 1). Warn about the case so userspace has some notice about the change: process './argc0' launched './argc0' with NULL argv: empty string added Additionally WARN() and reject NULL argv usage for kernel threads. [1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/ [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html [3] https://bugzilla.kernel.org/show_bug.cgi?id=8408 [4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt [5] https://github.com/KSPP/linux/issues/176 [6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+*NULL&literal=0 [7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%2C%5Cs*NULL&literal=0 [8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/ Reported-by: Ariadne Conill <ariadne@dereferenced.org> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Ariadne Conill <ariadne@dereferenced.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org [vegard: fixed conflicts due to missing 886d7de631da71e30909980fdbf318f7caade262^- and 3950e975431bc914f7e81b8f2a2dbdf2064acb0f^- and 655c16a8ce9c15842547f40ce23fd148aeccc074] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-15VFS: Fix memory leak caused by concurrently mounting fs with subtypeChenXiaoSong
If two processes mount same superblock, memory leak occurs: CPU0 | CPU1 do_new_mount | do_new_mount fs_set_subtype | fs_set_subtype kstrdup | | kstrdup memrory leak | The following reproducer triggers the problem: 1. shell command: mount -t ntfs /dev/sda1 /mnt & 2. c program: mount("/dev/sda1", "/mnt", "fuseblk", 0, "...") with kmemleak report being along the lines of unreferenced object 0xffff888235f1a5c0 (size 8): comm "mount.ntfs", pid 2860, jiffies 4295757824 (age 43.423s) hex dump (first 8 bytes): 00 a5 f1 35 82 88 ff ff ...5.... backtrace: [<00000000656e30cc>] __kmalloc_track_caller+0x16e/0x430 [<000000008e591727>] kstrdup+0x3e/0x90 [<000000008430d12b>] do_mount.cold+0x7b/0xd9 [<0000000078d639cd>] ksys_mount+0xb2/0x150 [<000000006015988d>] __x64_sys_mount+0x29/0x40 [<00000000e0a7c118>] do_syscall_64+0xc1/0x1d0 [<00000000bcea7df5>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000803a4067>] 0xffffffffffffffff Linus's tree already have refactoring patchset [1], one of them can fix this bug: c30da2e981a7 ("fuse: convert to use the new mount API") After refactoring, init super_block->s_subtype in fuse_fill_super. Since we did not merge the refactoring patchset in this branch, I create this patch. This patch fix this by adding a write lock while calling fs_set_subtype. [1] https://patchwork.kernel.org/project/linux-fsdevel/patch/20190903113640.7984-3-mszeredi@redhat.com/ Fixes: 79c0b2df79eb ("add filesystem subtype support") Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12btrfs: always log symlinks in full modeFilipe Manana
commit d0e64a981fd841cb0f28fcd6afcac55e6f1e6994 upstream. On Linux, empty symlinks are invalid, and attempting to create one with the system call symlink(2) results in an -ENOENT error and this is explicitly documented in the man page. If we rename a symlink that was created in the current transaction and its parent directory was logged before, we actually end up logging the symlink without logging its content, which is stored in an inline extent. That means that after a power failure we can end up with an empty symlink, having no content and an i_size of 0 bytes. It can be easily reproduced like this: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/testdir $ sync # Create a file inside the directory and fsync the directory. $ touch /mnt/testdir/foo $ xfs_io -c "fsync" /mnt/testdir # Create a symlink inside the directory and then rename the symlink. $ ln -s /mnt/testdir/foo /mnt/testdir/bar $ mv /mnt/testdir/bar /mnt/testdir/baz # Now fsync again the directory, this persist the log tree. $ xfs_io -c "fsync" /mnt/testdir <power failure> $ mount /dev/sdc /mnt $ stat -c %s /mnt/testdir/baz 0 $ readlink /mnt/testdir/baz $ Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that their content is also logged. A test case for fstests will follow. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12cifs: destage any unwritten data to the server before calling copychunk_writeRonnie Sahlberg
[ Upstream commit f5d0f921ea362636e4a2efb7c38d1ead373a8700 ] because the copychunk_write might cover a region of the file that has not yet been sent to the server and thus fail. A simple way to reproduce this is: truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with the 'fcollapse 16k 24k' due to write-back caching. fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call and it will fail serverside since the file is still 0b in size serverside until the writes have been destaged. To avoid this we must ensure that we destage any unwritten data to the server before calling COPYCHUNK_WRITE. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373 Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27ext4: force overhead calculation if the s_overhead_cluster makes no senseTheodore Ts'o
commit 85d825dbf4899a69407338bae462a59aa9a37326 upstream. If the file system does not use bigalloc, calculating the overhead is cheap, so force the recalculation of the overhead so we don't have to trust the precalculated overhead in the superblock. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix overhead calculation to account for the reserved gdt blocksTheodore Ts'o
commit 10b01ee92df52c8d7200afead4d5e5f55a5c58b1 upstream. The kernel calculation was underestimating the overhead by not taking into account the reserved gdt blocks. With this change, the overhead calculated by the kernel matches the overhead calculation in mke2fs. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: limit length to bitmap_maxbytes - blocksize in punch_holeTadeusz Struk
commit 2da376228a2427501feb9d15815a45dbdbdd753e upstream. Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the bitmap_maxbytes for the inode. This triggers a BUG in ext4_ind_remove_space(). According to the comments in this function the 'end' parameter needs to be one block after the last block to be removed. In the case when the BUG is triggered it points to the last block. Modify the ext4_punch_hole() function and add constraint that caps the length to satisfy the one before laster block requirement. LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331 LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality") Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ext4: fix symlink file size not match to file contentYe Bin
commit a2b0b205d125f27cddfb4f7280e39affdaf46686 upstream. We got issue as follows: [home]# fsck.ext4 -fn ram0yb e2fsck 1.45.6 (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Symlink /p3/d14/d1a/l3d (inode #3494) is invalid. Clear? no Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0). Fix? no As the symlink file size does not match the file content. If the writeback of the symlink data block failed, ext4_finish_bio() handles the end of IO. However this function fails to mark the buffer with BH_write_io_error and so when unmount does journal checkpoint it cannot detect the writeback error and will cleanup the journal. Thus we've lost the correct data in the journal area. To solve this issue, mark the buffer as BH_write_io_error in ext4_finish_bio(). Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27stat: fix inconsistency between struct stat and struct compat_statMikulas Patocka
[ Upstream commit 932aba1e169090357a77af18850a10c256b50819 ] struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27cifs: Check the IOCB_DIRECT flag, not O_DIRECTDavid Howells
[ Upstream commit 994fd530a512597ffcd713b0f6d5bc916c5698f0 ] Use the IOCB_DIRECT indicator flag on the I/O context rather than checking to see if the file was opened O_DIRECT. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27gfs2: assign rgrp glock before compute_bitstructsBob Peterson
commit 428f651cb80b227af47fc302e4931791f2fb4741 upstream. Before this patch, function read_rindex_entry called compute_bitstructs before it allocated a glock for the rgrp. But if compute_bitstructs found a problem with the rgrp, it called gfs2_consist_rgrpd, and that called gfs2_dump_glock for rgd->rd_gl which had not yet been assigned. read_rindex_entry compute_bitstructs gfs2_consist_rgrpd gfs2_dump_glock <---------rgd->rd_gl was not set. This patch changes read_rindex_entry so it assigns an rgrp glock before calling compute_bitstructs so gfs2_dump_glock does not reference an unassigned pointer. If an error is discovered, the glock must also be put, so a new goto and label were added. Reported-by: syzbot+c6fd14145e2f62ca0784@syzkaller.appspotmail.com Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20cifs: potential buffer overflow in handling symlinksHarshit Mogalapalli
[ Upstream commit 64c4a37ac04eeb43c42d272f6e6c8c12bfcf4304 ] Smatch printed a warning: arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error: __memcpy() 'dctx->buf' too small (16 vs u32max) It's caused because Smatch marks 'link_len' as untrusted since it comes from sscanf(). Add a check to ensure that 'link_len' is not larger than the size of the 'link_str' buffer. Fixes: c69c1b6eaea1 ("cifs: implement CIFSParseMFSymlink()") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20btrfs: fix qgroup reserve overflow the qgroup limitEthan Lien
commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream. We use extent_changeset->bytes_changed in qgroup_reserve_data() to record how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the bytes_changed is set as "unsigned int", and it will overflow if we try to fallocate a range larger than 4GiB. The result is we reserve less bytes and eventually break the qgroup limit. Unlike regular buffered/direct write, which we use one changeset for each ordered extent, which can never be larger than 256M. For fallocate, we use one changeset for the whole range, thus it no longer respects the 256M per extent limit, and caused the problem. The following example test script reproduces the problem: $ cat qgroup-overflow.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj mkfs.btrfs -f $DEV mount $DEV $MNT # Set qgroup limit to 2GiB. btrfs quota enable $MNT btrfs qgroup limit 2G $MNT # Try to fallocate a 3GiB file. This should fail. echo echo "Try to fallocate a 3GiB file..." fallocate -l 3G $MNT/3G.file # Try to fallocate a 5GiB file. echo echo "Try to fallocate a 5GiB file..." fallocate -l 5G $MNT/5G.file # See we break the qgroup limit. echo sync btrfs qgroup show -r $MNT umount $MNT When running the test: $ ./qgroup-overflow.sh (...) Try to fallocate a 3GiB file... fallocate: fallocate failed: Disk quota exceeded Try to fallocate a 5GiB file... qgroupid         rfer         excl     max_rfer --------         ----         ----     -------- 0/5           5.00GiB      5.00GiB      2.00GiB Since we have no control of how bytes_changed is used, it's better to set it to u64. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20jfs: prevent NULL deref in diFreeHaimin Zhang
[ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ] Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref in diFree since diFree uses it without do any validations. When function jfs_mount calls diMount to initialize fileset inode allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be initialized. Then it calls diFreeSpecial to close fileset inode allocation map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref. Reported-by: TCS Robot <tcs_robot@tencent.com> Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFS: swap-out must always use STABLE writes.NeilBrown
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ] The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFS: swap IO handling is slightly different for O_DIRECT IONeilBrown
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ] 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ubifs: Rectify space amount budget for mkdir/tmpfile operationsZhihao Cheng
[ Upstream commit a6dab6607d4681d227905d5198710b575dbdb519 ] UBIFS should make sure the flash has enough space to store dirty (Data that is newer than disk) data (in memory), space budget is exactly designed to do that. If space budget calculates less data than we need, 'make_reservation()' will do more work(return -ENOSPC if no free space lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error -28" in ubifs error messages) with ubifs inodes locked, which may effect other syscalls. A simple way to decide how much space do we need when make a budget: See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx() function according to corresponding operation. It's better to report ENOSPC in ubifs_budget_space(), as early as we can. Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE") Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20gfs2: Make sure FITRIM minlen is rounded up to fs block sizeAndrew Price
commit 27ca8273fda398638ca994a207323a85b6d81190 upstream. Per fstrim(8) we must round up the minlen argument to the fs block size. The current calculation doesn't take into account devices that have a discard granularity and requested minlen less than 1 fs block, so the value can get shifted away to zero in the translation to fs blocks. The zero minlen passed to gfs2_rgrp_send_discards() then allows sb_issue_discard() to be called with nr_sects == 0 which returns -EINVAL and results in gfs2_rgrp_send_discards() returning -EIO. Make sure minlen is never < 1 fs block by taking the max of the requested minlen and the fs block size before comparing to the device's discard granularity and shifting to fs blocks. Fixes: 076f0faa764ab ("GFS2: Fix FITRIM argument handling") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: rename_whiteout: correct old_dir size computingBaokun Li
commit 705757274599e2e064dd3054aabc74e8af31a095 upstream. When renaming the whiteout file, the old whiteout file is not deleted. Therefore, we add the old dentry size to the old dir like XFS. Otherwise, an error may be reported due to `fscki->calc_sz != fscki->size` in check_indes. Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Reported-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: setflags: Make dirtied_ino_d 8 bytes alignedZhihao Cheng
commit 1b83ec057db16b4d0697dc21ef7a9743b6041f72 upstream. Make 'ui->data_len' aligned with 8 bytes before it is assigned to dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr") applied, 'setflags()' only affects regular files and directories, only xattr inode, symlink inode and special inode(pipe/char_dev/block_dev) have none- zero 'ui->data_len' field, so assertion '!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space(). To avoid assertion fails in future evolution(eg. setflags can operate special inodes), it's better to make dirtied_ino_d 8 bytes aligned, after all aligned size is still zero for regular files. Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: Add missing iput if do_tmpfile() failed in rename whiteoutZhihao Cheng
commit 716b4573026bcbfa7b58ed19fe15554bac66b082 upstream. whiteout inode should be put when do_tmpfile() failed if inode has been initialized. Otherwise we will get following warning during umount: UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930 VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds. Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ubifs: rename_whiteout: Fix double free for whiteout_ui->dataZhihao Cheng
commit 40a8f0d5e7b3999f096570edab71c345da812e3e upstream. 'whiteout_ui->data' will be freed twice if space budget fail for rename whiteout operation as following process: rename_whiteout dev = kmalloc whiteout_ui->data = dev kfree(whiteout_ui->data) // Free first time iput(whiteout) ubifs_free_inode kfree(ui->data) // Double free! KASAN reports: ================================================================== BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70 Call Trace: kfree+0x117/0x490 ubifs_free_inode+0x4f/0x70 [ubifs] i_callback+0x30/0x60 rcu_do_batch+0x366/0xac0 __do_softirq+0x133/0x57f Allocated by task 1506: kmem_cache_alloc_trace+0x3c2/0x7a0 do_rename+0x9b7/0x1150 [ubifs] ubifs_rename+0x106/0x1f0 [ubifs] do_syscall_64+0x35/0x80 Freed by task 1506: kfree+0x117/0x490 do_rename.cold+0x53/0x8a [ubifs] ubifs_rename+0x106/0x1f0 [ubifs] do_syscall_64+0x35/0x80 The buggy address belongs to the object at ffff88810238bed8 which belongs to the cache kmalloc-8 of size 8 ================================================================== Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode() -> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it (because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release', and the nlink of whiteout inode is 0). Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ntfs: add sanity check on allocation sizeDongliang Mu
[ Upstream commit 714fbf2647b1a33d914edd695d4da92029c7e7c0 ] ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size. It triggers one BUG in the __ntfs_malloc function. Fix this by adding sanity check on ni->attr_list_size. Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ext4: don't BUG if someone dirty pages without asking ext4 firstTheodore Ts'o
[ Upstream commit cc5095747edfb054ca2068d01af20be3fcc3634f ] [un]pin_user_pages_remote is dirtying pages without properly warning the file system in advance. A related race was noted by Jan Kara in 2018[1]; however, more recently instead of it being a very hard-to-hit race, it could be reliably triggered by process_vm_writev(2) which was discovered by Syzbot[2]. This is technically a bug in mm/gup.c, but arguably ext4 is fragile in that if some other kernel subsystem dirty pages without properly notifying the file system using page_mkwrite(), ext4 will BUG, while other file systems will not BUG (although data will still be lost). So instead of crashing with a BUG, issue a warning (since there may be potential data loss) and just mark the page as clean to avoid unprivileged denial of service attacks until the problem can be properly fixed. More discussion and background can be found in the thread starting at [2]. [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.edu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFSv4/pNFS: Fix another issue with a list iterator pointing to the headTrond Myklebust
[ Upstream commit 7c9d845f0612e5bcd23456a2ec43be8ac43458f1 ] In nfs4_callback_devicenotify(), if we don't find a matching entry for the deviceid, we're left with a pointer to 'struct nfs_server' that actually points to the list of super blocks associated with our struct nfs_client. Furthermore, even if we have a valid pointer, nothing pins the super block, and so the struct nfs_server could end up getting freed while we're using it. Since all we want is a pointer to the struct pnfs_layoutdriver_type, let's skip all the iteration over super blocks, and just use APIs to find the layout driver directly. Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Fixes: 1be5683b03a7 ("pnfs: CB_NOTIFY_DEVICEID") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20jfs: fix divide error in dbNextAGPavel Skripkin
[ Upstream commit 2cc7cc01c15f57d056318c33705647f87dcd4aab ] Syzbot reported divide error in dbNextAG(). The problem was in missing validation check for malicious image. Syzbot crafted an image with bmp->db_numag equal to 0. There wasn't any validation checks, but dbNextAG() blindly use bmp->db_numag in divide expression Fix it by validating bmp->db_numag in dbMount() and return an error if image is malicious Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+46f5c25af73eb8330eb6@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20NFS: remove unneeded check in decode_devicenotify_args()Alexey Khoroshilov
[ Upstream commit cb8fac6d2727f79f211e745b16c9abbf4d8be652 ] [You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Overflow check in not needed anymore after we switch to kmalloc_array(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: a4f743a6bb20 ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20ext2: correct max file size computingZhang Yi
[ Upstream commit 50b3a818991074177a56c87124c7a7bdf5fa4f67 ] We need to calculate the max file size accurately if the total blocks that can address by block tree exceed the upper_limit. But this check is not correct now, it only compute the total data blocks but missing metadata blocks are needed. So in the case of "data blocks < upper_limit && total blocks > upper_limit", we will get wrong result. Fortunately, this case could not happen in reality, but it's confused and better to correct the computing. bits data blocks metadatablocks upper_limit 10 16843020 66051 2147483647 11 134480396 263171 1073741823 12 1074791436 1050627 536870911 (*) 13 8594130956 4198403 268435455 (*) 14 68736258060 16785411 134217727 (*) 15 549822930956 67125251 67108863 (*) 16 4398314962956 268468227 33554431 (*) [*] Need to calculate in depth. Fixes: 1c2d14212b15 ("ext2: Fix underflow in ext2_max_size()") Link: https://lore.kernel.org/r/20220212050532.179055-1-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20jffs2: fix memory leak in jffs2_scan_mediumBaokun Li
commit 9cdd3128874f5fe759e2c4e1360ab7fb96a8d1df upstream. If an error is returned in jffs2_scan_eraseblock() and some memory has been added to the jffs2_summary *s, we can observe the following kmemleak report: -------------------------------------------- unreferenced object 0xffff88812b889c40 (size 64): comm "mount", pid 692, jiffies 4294838325 (age 34.288s) hex dump (first 32 bytes): 40 48 b5 14 81 88 ff ff 01 e0 31 00 00 00 50 00 @H........1...P. 00 00 01 00 00 00 01 00 00 00 02 00 00 00 09 08 ................ backtrace: [<ffffffffae93a3a3>] __kmalloc+0x613/0x910 [<ffffffffaf423b9c>] jffs2_sum_add_dirent_mem+0x5c/0xa0 [<ffffffffb0f3afa8>] jffs2_scan_medium.cold+0x36e5/0x4794 [<ffffffffb0f3dbe1>] jffs2_do_mount_fs.cold+0xa7/0x2267 [<ffffffffaf40acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffaf40c00a>] jffs2_fill_super+0x2ea/0x4c0 [<ffffffffb0315d64>] mtd_get_sb+0x254/0x400 [<ffffffffb0315f5f>] mtd_get_sb_by_nr+0x4f/0xd0 [<ffffffffb0316478>] get_tree_mtd+0x498/0x840 [<ffffffffaf40bd15>] jffs2_get_tree+0x25/0x30 [<ffffffffae9f358d>] vfs_get_tree+0x8d/0x2e0 [<ffffffffaea7a98f>] path_mount+0x50f/0x1e50 [<ffffffffaea7c3d7>] do_mount+0x107/0x130 [<ffffffffaea7c5c5>] __se_sys_mount+0x1c5/0x2f0 [<ffffffffaea7c917>] __x64_sys_mount+0xc7/0x160 [<ffffffffb10142f5>] do_syscall_64+0x45/0x70 unreferenced object 0xffff888114b54840 (size 32): comm "mount", pid 692, jiffies 4294838325 (age 34.288s) hex dump (first 32 bytes): c0 75 b5 14 81 88 ff ff 02 e0 02 00 00 00 02 00 .u.............. 00 00 84 00 00 00 44 00 00 00 6b 6b 6b 6b 6b a5 ......D...kkkkk. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423b04>] jffs2_sum_add_inode_mem+0x54/0x90 [<ffffffffb0f3bd44>] jffs2_scan_medium.cold+0x4481/0x4794 [...] unreferenced object 0xffff888114b57280 (size 32): comm "mount", pid 692, jiffies 4294838393 (age 34.357s) hex dump (first 32 bytes): 10 d5 6c 11 81 88 ff ff 08 e0 05 00 00 00 01 00 ..l............. 00 00 38 02 00 00 28 00 00 00 6b 6b 6b 6b 6b a5 ..8...(...kkkkk. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423c34>] jffs2_sum_add_xattr_mem+0x54/0x90 [<ffffffffb0f3a24f>] jffs2_scan_medium.cold+0x298c/0x4794 [...] unreferenced object 0xffff8881116cd510 (size 16): comm "mount", pid 692, jiffies 4294838395 (age 34.355s) hex dump (first 16 bytes): 00 00 00 00 00 00 00 00 09 e0 60 02 00 00 6b a5 ..........`...k. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423cc4>] jffs2_sum_add_xref_mem+0x54/0x90 [<ffffffffb0f3b2e3>] jffs2_scan_medium.cold+0x3a20/0x4794 [...] -------------------------------------------- Therefore, we should call jffs2_sum_reset_collected(s) on exit to release the memory added in s. In addition, a new tag "out_buf" is added to prevent the NULL pointer reference caused by s being NULL. (thanks to Zhang Yi for this analysis) Fixes: e631ddba5887 ("[JFFS2] Add erase block summary support (mount time improvement)") Cc: stable@vger.kernel.org Co-developed-with: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20jffs2: fix memory leak in jffs2_do_mount_fsBaokun Li
commit d051cef784de4d54835f6b6836d98a8f6935772c upstream. If jffs2_build_filesystem() in jffs2_do_mount_fs() returns an error, we can observe the following kmemleak report: -------------------------------------------- unreferenced object 0xffff88811b25a640 (size 64): comm "mount", pid 691, jiffies 4294957728 (age 71.952s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa493be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffa5423a06>] jffs2_sum_init+0x86/0x130 [<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0 [<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0 [...] unreferenced object 0xffff88812c760000 (size 65536): comm "mount", pid 691, jiffies 4294957728 (age 71.952s) hex dump (first 32 bytes): bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ backtrace: [<ffffffffa493a449>] __kmalloc+0x6b9/0x910 [<ffffffffa5423a57>] jffs2_sum_init+0xd7/0x130 [<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0 [<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0 [...] -------------------------------------------- This is because the resources allocated in jffs2_sum_init() are not released. Call jffs2_sum_exit() to release these resources to solve the problem. Fixes: e631ddba5887 ("[JFFS2] Add erase block summary support (mount time improvement)") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20jffs2: fix use-after-free in jffs2_clear_xattr_subsystemBaokun Li
commit 4c7c44ee1650677fbe89d86edbad9497b7679b5c upstream. When we mount a jffs2 image, assume that the first few blocks of the image are normal and contain at least one xattr-related inode, but the next block is abnormal. As a result, an error is returned in jffs2_scan_eraseblock(). jffs2_clear_xattr_subsystem() is then called in jffs2_build_filesystem() and then again in jffs2_do_fill_super(). Finally we can observe the following report: ================================================================== BUG: KASAN: use-after-free in jffs2_clear_xattr_subsystem+0x95/0x6ac Read of size 8 at addr ffff8881243384e0 by task mount/719 Call Trace: dump_stack+0x115/0x16b jffs2_clear_xattr_subsystem+0x95/0x6ac jffs2_do_fill_super+0x84f/0xc30 jffs2_fill_super+0x2ea/0x4c0 mtd_get_sb+0x254/0x400 mtd_get_sb_by_nr+0x4f/0xd0 get_tree_mtd+0x498/0x840 jffs2_get_tree+0x25/0x30 vfs_get_tree+0x8d/0x2e0 path_mount+0x50f/0x1e50 do_mount+0x107/0x130 __se_sys_mount+0x1c5/0x2f0 __x64_sys_mount+0xc7/0x160 do_syscall_64+0x45/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 719: kasan_save_stack+0x23/0x60 __kasan_kmalloc.constprop.0+0x10b/0x120 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x1c0/0x870 jffs2_alloc_xattr_ref+0x2f/0xa0 jffs2_scan_medium.cold+0x3713/0x4794 jffs2_do_mount_fs.cold+0xa7/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] Freed by task 719: kmem_cache_free+0xcc/0x7b0 jffs2_free_xattr_ref+0x78/0x98 jffs2_clear_xattr_subsystem+0xa1/0x6ac jffs2_do_mount_fs.cold+0x5e6/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] The buggy address belongs to the object at ffff8881243384b8 which belongs to the cache jffs2_xattr_ref of size 48 The buggy address is located 40 bytes inside of 48-byte region [ffff8881243384b8, ffff8881243384e8) [...] ================================================================== The triggering of the BUG is shown in the following stack: ----------------------------------------------------------- jffs2_fill_super jffs2_do_fill_super jffs2_do_mount_fs jffs2_build_filesystem jffs2_scan_medium jffs2_scan_eraseblock <--- ERROR jffs2_clear_xattr_subsystem <--- free jffs2_clear_xattr_subsystem <--- free again ----------------------------------------------------------- An error is returned in jffs2_do_mount_fs(). If the error is returned by jffs2_sum_init(), the jffs2_clear_xattr_subsystem() does not need to be executed. If the error is returned by jffs2_build_filesystem(), the jffs2_clear_xattr_subsystem() also does not need to be executed again. So move jffs2_clear_xattr_subsystem() from 'out_inohash' to 'out_root' to fix this UAF problem. Fixes: aa98d7cf59b5 ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20NFSD: prevent underflow in nfssvc_decode_writeargs()Dan Carpenter
commit 184416d4b98509fb4c3d8fc3d6dc1437896cc159 upstream. Smatch complains: fs/nfsd/nfsxdr.c:341 nfssvc_decode_writeargs() warn: no lower bound on 'args->len' Change the type to unsigned to prevent this issue. Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20fuse: fix pipe buffer lifetime for direct_ioMiklos Szeredi
commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream. In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then imports the write buffer with fuse_get_user_pages(), which uses iov_iter_get_pages() to grab references to userspace pages instead of actually copying memory. On the filesystem device side, these pages can then either be read to userspace (via fuse_dev_read()), or splice()d over into a pipe using fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops. This is wrong because after fuse_dev_do_read() unlocks the FUSE request, the userspace filesystem can mark the request as completed, causing write() to return. At that point, the userspace filesystem should no longer have access to the pipe buffer. Fix by copying pages coming from the user address space to new pipe buffers. Reported-by: Jann Horn <jannh@google.com> Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-23fs: sysfs_emit: Remove PAGE_SIZE alignment checkLucas Wei
For kernel releases older than 4.20, using the SLUB alloctor will cause this alignment check to fail as that allocator did NOT align kmalloc allocations on a PAGE_SIZE boundry. Remove the check for these older kernels as it is a false-positive and causes problems on many devices. Signed-off-by: Lucas Wei <lucaswei@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16btrfs: unlock newly allocated extent buffer after errorQu Wenruo
commit 19ea40dddf1833db868533958ca066f368862211 upstream. [BUG] There is a bug report that injected ENOMEM error could leave a tree block locked while we return to user-space: BTRFS info (device loop0): enabling ssd optimizations FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 7579 Comm: syz-executor Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 fail_dump lib/fault-inject.c:52 [inline] should_fail+0x13c/0x160 lib/fault-inject.c:146 should_failslab+0x5/0x10 mm/slab_common.c:1328 slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494 slab_alloc_node mm/slub.c:3120 [inline] slab_alloc mm/slub.c:3214 [inline] kmem_cache_alloc+0x44/0x280 mm/slub.c:3219 btrfs_alloc_delayed_extent_op fs/btrfs/delayed-ref.h:299 [inline] btrfs_alloc_tree_block+0x38c/0x670 fs/btrfs/extent-tree.c:4833 __btrfs_cow_block+0x16f/0x7d0 fs/btrfs/ctree.c:415 btrfs_cow_block+0x12a/0x300 fs/btrfs/ctree.c:570 btrfs_search_slot+0x6b0/0xee0 fs/btrfs/ctree.c:1768 btrfs_insert_empty_items+0x80/0xf0 fs/btrfs/ctree.c:3905 btrfs_new_inode+0x311/0xa60 fs/btrfs/inode.c:6530 btrfs_create+0x12b/0x270 fs/btrfs/inode.c:6783 lookup_open+0x660/0x780 fs/namei.c:3282 open_last_lookups fs/namei.c:3352 [inline] path_openat+0x465/0xe20 fs/namei.c:3557 do_filp_open+0xe3/0x170 fs/namei.c:3588 do_sys_openat2+0x357/0x4a0 fs/open.c:1200 do_sys_open+0x87/0xd0 fs/open.c:1216 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x46ae99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f46711b9c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 000000000078c0a0 RCX: 000000000046ae99 RDX: 0000000000000000 RSI: 00000000000000a1 RDI: 0000000020005800 RBP: 00007f46711b9c80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000017 R13: 0000000000000000 R14: 000000000078c0a0 R15: 00007ffc129da6e0 ================================================ WARNING: lock held when returning to user space! 5.15.0-rc1 #16 Not tainted ------------------------------------------------ syz-executor/7579 is leaving the kernel with locks still held! 1 lock held by syz-executor/7579: #0: ffff888104b73da8 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x2e/0x1a0 fs/btrfs/locking.c:112 [CAUSE] In btrfs_alloc_tree_block(), after btrfs_init_new_buffer(), the new extent buffer @buf is locked, but if later operations like adding delayed tree ref fail, we just free @buf without unlocking it, resulting above warning. [FIX] Unlock @buf in out_free_buf: label. Reported-by: Hao Sun <sunhao.th@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CACkBjsZ9O6Zr0KK1yGn=1rQi6Crh1yeCRdTSBxx9R99L4xdn-Q@mail.gmail.com/ CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Denis Efremov <denis.e.efremov@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16ext4: add check to prevent attempting to resize an fs with sparse_super2Josh Triplett
commit b1489186cc8391e0c1e342f9fbc3eedf6b944c61 upstream. The in-kernel ext4 resize code doesn't support filesystem with the sparse_super2 feature. It fails with errors like this and doesn't finish the resize: EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770) EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize EXT4-fs (loop0): resized filesystem to 2097152 To reproduce: mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M truncate -s 30G ext4.img mount ext4.img /mnt python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)' dmesg | tail e2fsck ext4.img The userspace resize2fs tool has a check for this case: it checks if the filesystem has sparse_super2 set and if the kernel provides /sys/fs/ext4/features/sparse_super2. However, the former check requires manually reading and parsing the filesystem superblock. Detect this case in ext4_resize_begin and error out early with a clear error message. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08cifs: fix double free race when mount fails in cifs_get_root()Ronnie Sahlberg
[ Upstream commit 3d6cc9898efdfb062efb74dc18cfc700e082f5d5 ] When cifs_get_root() fails during cifs_smb3_do_mount() we call deactivate_locked_super() which eventually will call delayed_free() which will free the context. In this situation we should not proceed to enter the out: section in cifs_smb3_do_mount() and free the same resources a second time. [Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0 [Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G OE 5.17.0-rc3+ #4 [Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 [Thu Feb 10 12:59:06 2022] Call Trace: [Thu Feb 10 12:59:06 2022] <IRQ> [Thu Feb 10 12:59:06 2022] dump_stack_lvl+0x5d/0x78 [Thu Feb 10 12:59:06 2022] print_address_description.constprop.0+0x24/0x150 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] kasan_report.cold+0x7d/0x117 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] __asan_load8+0x86/0xa0 [Thu Feb 10 12:59:06 2022] rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] rcu_core+0x547/0xca0 [Thu Feb 10 12:59:06 2022] ? call_rcu+0x3c0/0x3c0 [Thu Feb 10 12:59:06 2022] ? __this_cpu_preempt_check+0x13/0x20 [Thu Feb 10 12:59:06 2022] ? lock_is_held_type+0xea/0x140 [Thu Feb 10 12:59:06 2022] rcu_core_si+0xe/0x10 [Thu Feb 10 12:59:06 2022] __do_softirq+0x1d4/0x67b [Thu Feb 10 12:59:06 2022] __irq_exit_rcu+0x100/0x150 [Thu Feb 10 12:59:06 2022] irq_exit_rcu+0xe/0x30 [Thu Feb 10 12:59:06 2022] sysvec_hyperv_stimer0+0x9d/0xc0 ... [Thu Feb 10 12:59:07 2022] Freed by task 58179: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] kasan_set_track+0x25/0x30 [Thu Feb 10 12:59:07 2022] kasan_set_free_info+0x24/0x40 [Thu Feb 10 12:59:07 2022] ____kasan_slab_free+0x137/0x170 [Thu Feb 10 12:59:07 2022] __kasan_slab_free+0x12/0x20 [Thu Feb 10 12:59:07 2022] slab_free_freelist_hook+0xb3/0x1d0 [Thu Feb 10 12:59:07 2022] kfree+0xcd/0x520 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0x149/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Thu Feb 10 12:59:07 2022] Last potentially related work creation: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] __kasan_record_aux_stack+0xb6/0xc0 [Thu Feb 10 12:59:07 2022] kasan_record_aux_stack_noalloc+0xb/0x10 [Thu Feb 10 12:59:07 2022] call_rcu+0x76/0x3c0 [Thu Feb 10 12:59:07 2022] cifs_umount+0xce/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] cifs_kill_sb+0xc8/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] deactivate_locked_super+0x5d/0xd0 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0xab9/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02fget: clarify and improve __fget_files() implementationLinus Torvalds
commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream. Commit 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it") fixed a race with getting a reference to a file just as it was being closed. It was a fairly minimal patch, and I didn't think re-checking the file pointer lookup would be a measurable overhead, since it was all right there and cached. But I was wrong, as pointed out by the kernel test robot. The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed quite noticeably. Admittedly it seems to be a very artificial test: doing "poll()" system calls on regular files in a very tight loop in multiple threads. That means that basically all the time is spent just looking up file descriptors without ever doing anything useful with them (not that doing 'poll()' on a regular file is useful to begin with). And as a result it shows the extra "re-check fd" cost as a sore thumb. Happily, the regression is fixable by just writing the code to loook up the fd to be better and clearer. There's still a cost to verify the file pointer, but now it's basically in the noise even for that benchmark that does nothing else - and the code is more understandable and has better comments too. [ Side note: this patch is also a classic case of one that looks very messy with the default greedy Myers diff - it's much more legible with either the patience of histogram diff algorithm ] Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Carel Si <beibei.si@intel.com> Cc: Jann Horn <jannh@google.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02tracefs: Set the group ownership in apply_options() not parse_options()Steven Rostedt (Google)
commit 851e99ebeec3f4a672bb5010cf1ece095acee447 upstream. Al Viro brought it to my attention that the dentries may not be filled when the parse_options() is called, causing the call to set_gid() to possibly crash. It should only be called if parse_options() succeeds totally anyway. He suggested the logical place to do the update is in apply_options(). Link: https://lore.kernel.org/all/20220225165219.737025658@goodmis.org/ Link: https://lkml.kernel.org/r/20220225153426.1c4cab6b@gandalf.local.home Cc: stable@vger.kernel.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 48b27b6b5191 ("tracefs: Set all files to the same group ownership as the mount option") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02configfs: fix a race in configfs_{,un}register_subsystem()ChenXiaoSong
[ Upstream commit 84ec758fb2daa236026506868c8796b0500c047d ] When configfs_register_subsystem() or configfs_unregister_subsystem() is executing link_group() or unlink_group(), it is possible that two processes add or delete list concurrently. Some unfortunate interleavings of them can cause kernel panic. One of cases is: A --> B --> C --> D A <-- B <-- C <-- D delete list_head *B | delete list_head *C --------------------------------|----------------------------------- configfs_unregister_subsystem | configfs_unregister_subsystem unlink_group | unlink_group unlink_obj | unlink_obj list_del_init | list_del_init __list_del_entry | __list_del_entry __list_del | __list_del // next == C | next->prev = prev | | next->prev = prev prev->next = next | | // prev == B | prev->next = next Fix this by adding mutex when calling link_group() or unlink_group(), but parent configfs_subsystem is NULL when config_item is root. So I create a mutex configfs_subsystem_mutex. Fixes: 7063fbf22611 ("[PATCH] configfs: User-driven configuration filesystem") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23NFS: Do not report writeback errors in nfs_getattr()Trond Myklebust
[ Upstream commit d19e0183a88306acda07f4a01fedeeffe2a2a06b ] The result of the writeback, whether it is an ENOSPC or an EIO, or anything else, does not inhibit the NFS client from reporting the correct file timestamps. Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23NFS: LOOKUP_DIRECTORY is also ok with symlinksTrond Myklebust
commit e0caaf75d443e02e55e146fd75fe2efc8aed5540 upstream. Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory fails) [1], part of Linux since 5.17-rc2, introduced a regression, where a symbolic link on an NFS mount to a directory on another NFS does not resolve(?) the first time it is accessed: Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23quota: make dquot_quota_sync return errors from ->sync_fsDarrick J. Wong
[ Upstream commit dd5532a4994bfda0386eb2286ec00758cee08444 ] Strangely, dquot_quota_sync ignores the return code from the ->sync_fs call, which means that quotacalls like Q_SYNC never see the error. This doesn't seem right, so fix that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23vfs: make freeze_super abort when sync_filesystem returns errorDarrick J. Wong
[ Upstream commit 2719c7160dcfaae1f73a1c0c210ad3281c19022e ] If we fail to synchronize the filesystem while preparing to freeze the fs, abort the freeze. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23btrfs: send: in case of IO error log itDāvis Mosāns
commit 2e7be9db125a0bf940c5d65eb5c40d8700f738b5 upstream. Currently if we get IO error while doing send then we abort without logging information about which file caused issue. So log it to help with debugging. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Dāvis Mosāns <davispuh@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16NFSv4 expose nfs_parse_server_name functionOlga Kornievskaia
[ Upstream commit f5b27cc6761e27ee6387a24df1a99ca77b360fea ] Make nfs_parse_server_name available outside of nfs4namespace.c. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16NFSv4 remove zero number of fs_locations entries error checkOlga Kornievskaia
[ Upstream commit 90e12a3191040bd3854d3e236c35921e4e92a044 ] Remove the check for the zero length fs_locations reply in the xdr decoding, and instead check for that in the migration code. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16NFSv4.1: Fix uninitialised variable in devicenotifyTrond Myklebust
[ Upstream commit b05bf5c63b326ce1da84ef42498d8e0e292e694c ] When decode_devicenotify_args() exits with no entries, we need to ensure that the struct cb_devicenotifyargs is initialised to { 0, NULL } in order to avoid problems in nfs4_callback_devicenotify(). Reported-by: <rtm@csail.mit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16nfs: nfs4clinet: check the return value of kstrdup()Xiaoke Wang
[ Upstream commit fbd2057e5329d3502a27491190237b6be52a1cb6 ] kstrdup() returns NULL when some internal memory errors happen, it is better to check the return value of it so to catch the memory error in time. Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>