summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-09-21iomap: Change calling convention for zeroingiomap-5.10-merge-2Matthew Wilcox (Oracle)
Pass the full length to iomap_zero() and dax_iomap_zero(), and have them return how many bytes they actually handled. This is preparatory work for handling THP, although it looks like DAX could actually take advantage of it if there's a larger contiguous area. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Convert iomap_write_end typesMatthew Wilcox (Oracle)
iomap_write_end cannot return an error, so switch it to return size_t instead of int and remove the error checking from the callers. Also convert the arguments to size_t from unsigned int, in case anyone ever wants to support a page size larger than 2GB. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Convert write_count to write_bytes_pendingMatthew Wilcox (Oracle)
Instead of counting bio segments, count the number of bytes submitted. This insulates us from the block layer's definition of what a 'same page' is, which is not necessarily clear once THPs are involved. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Convert read_count to read_bytes_pendingMatthew Wilcox (Oracle)
Instead of counting bio segments, count the number of bytes submitted. This insulates us from the block layer's definition of what a 'same page' is, which is not necessarily clear once THPs are involved. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Support arbitrarily many blocks per pageMatthew Wilcox (Oracle)
Size the uptodate array dynamically to support larger pages in the page cache. With a 64kB page, we're only saving 8 bytes per page today, but with a 2MB maximum page size, we'd have to allocate more than 4kB per page. Add a few debugging assertions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Use bitmap ops to set uptodate bitsMatthew Wilcox (Oracle)
Now that the bitmap is protected by a spinlock, we can use the more efficient bitmap ops instead of individual test/set bit ops. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Use kzalloc to allocate iomap_pageMatthew Wilcox (Oracle)
We can skip most of the initialisation, although spinlocks still need explicit initialisation as architectures may use a non-zero value to indicate unlocked. The comment is no longer useful as attach_page_private() handles the refcount now. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21fs: Introduce i_blocks_per_pageMatthew Wilcox (Oracle)
This helper is useful for both THPs and for supporting block size larger than page size. Convert all users that I could find (we have a few different ways of writing this idiom, and I may have missed some). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2020-09-21iomap: Fix misplaced page flushingMatthew Wilcox (Oracle)
If iomap_unshare_actor() unshares to an inline iomap, the page was not being flushed. block_write_end() and __iomap_write_end() already contain flushes, so adding it to iomap_write_end_inline() seems like the best place. That means we can remove it from iomap_write_actor(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Use round_down/round_up macros in __iomap_write_beginNikolay Borisov
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-10iomap: Mark read blocks uptodate in write_beginiomap-5.10-merge-1Matthew Wilcox (Oracle)
When bringing (portions of) a page uptodate, we were marking blocks that were zeroed as being uptodate, but not blocks that were read from storage. Like the previous commit, this problem was found with generic/127 and a kernel which failed readahead I/Os. This bug causes writes to be silently lost when working with flaky storage. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-10iomap: Clear page error before beginning a writeMatthew Wilcox (Oracle)
If we find a page in write_begin which is !Uptodate, we need to clear any error on the page before starting to read data into it. This matches how filemap_fault(), do_read_cache_page() and generic_file_buffered_read() handle PageError on !Uptodate pages. When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks were not being marked as uptodate. This was found with generic/127 and a specially modified kernel which would fail (some) readahead I/Os. The test read some bytes in a prior page which caused readahead to extend into page 0x34. There was a subsequent write to page 0x34, followed by a read to page 0x34. Because the blocks were still marked as !Uptodate, the read caused all blocks to be re-read, overwriting the write. With this change, and the next one, the bytes which were written are marked as being Uptodate, so even though the page is still marked as !Uptodate, the blocks containing the written data are not re-read from storage. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-10iomap: Fix direct I/O write consistency checkAndreas Gruenbacher
When a direct I/O write falls back to buffered I/O entirely, dio->size will be 0 in iomap_dio_complete. Function invalidate_inode_pages2_range will try to invalidate the rest of the address space. If there are any dirty pages in that range, the write will fail and a "Page cache invalidation failure on direct I/O" error will be logged. On gfs2, this can be reproduced as follows: xfs_io \ -c "open -ft foo" -c "pwrite 4k 4k" -c "close" \ -c "open -d foo" -c "pwrite 0 4k" Fix this by recognizing 0-length writes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-10iomap: fix WARN_ON_ONCE() from unprivileged usersQian Cai
It is trivial to trigger a WARN_ON_ONCE(1) in iomap_dio_actor() by unprivileged users which would taint the kernel, or worse - panic if panic_on_warn or panic_on_taint is set. Hence, just convert it to pr_warn_ratelimited() to let users know their workloads are racing. Thank Dave Chinner for the initial analysis of the racing reproducers. Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-06Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more io_uring fixes from Jens Axboe: "Two followup fixes. One is fixing a regression from this merge window, the other is two commits fixing cancelation of deferred requests. Both have gone through full testing, and both spawned a few new regression test additions to liburing. - Don't play games with const, properly store the output iovec and assign it as needed. - Deferred request cancelation fix (Pavel)" * tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block: io_uring: fix linked deferred ->files cancellation io_uring: fix cancel of deferred reqs with ->files io_uring: fix explicit async read/write mapping for large segments
2020-09-05io_uring: fix linked deferred ->files cancellationio_uring-5.9-2020-09-06Pavel Begunkov
While looking for ->files in ->defer_list, consider that requests there may actually be links. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05io_uring: fix cancel of deferred reqs with ->filesPavel Begunkov
While trying to cancel requests with ->files, it also should look for requests in ->defer_list, otherwise it might end up hanging a thread. Cancel all requests in ->defer_list up to the last request there with matching ->files, that's needed to follow drain ordering semantics. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05Merge tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Darrick Wong: "Fix a broken metadata verifier that would incorrectly validate attr fork extents of a realtime file against the realtime volume" * tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
2020-09-05xfs: don't update mtime on COW faultsMikulas Patocka
When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted xfs filesystem 2. run make clean 3. run make -j12 4. run make -j12 at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05ext2: don't update mtime on COW faultsMikulas Patocka
When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted ext2 filesystem 2. run make clean 3. run make -j12 4. run make -j12 at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05io_uring: fix explicit async read/write mapping for large segmentsJens Axboe
If we exceed UIO_FASTIOV, we don't handle the transition correctly between an allocated vec for requests that are queued with IOSQE_ASYNC. Store the iovec appropriately and re-set it in the iter iov in case it changed. Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Nick Hill <nick@nickhill.org> Tested-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-04Merge tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - EAGAIN with O_NONBLOCK retry fix - Two small fixes for registered files (Jiufei) * tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block: io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked file io_uring: set table->files[i] to NULL when io_sqe_file_register failed io_uring: fix removing the wrong file in __io_sqe_files_update()
2020-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi Kivilinna. 2) Fix loss of RTT samples in rxrpc, from David Howells. 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu. 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka. 5) We disable BH for too lokng in sctp_get_port_local(), add a cond_resched() here as well, from Xin Long. 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu. 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from Yonghong Song. 8) Missing of_node_put() in mt7530 DSA driver, from Sumera Priyadarsini. 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan. 10) Fix geneve tunnel checksumming bug in hns3, from Yi Li. 11) Memory leak in rxkad_verify_response, from Dinghao Liu. 12) In tipc, don't use smp_processor_id() in preemptible context. From Tuong Lien. 13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu. 14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter. 15) Fix ABI mismatch between driver and firmware in nfp, from Louis Peens. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits) net/smc: fix sock refcounting in case of termination net/smc: reset sndbuf_desc if freed net/smc: set rx_off for SMCR explicitly net/smc: fix toleration of fake add_link messages tg3: Fix soft lockup when tg3_reset_task() fails. doc: net: dsa: Fix typo in config code sample net: dp83867: Fix WoL SecureOn password nfp: flower: fix ABI mismatch between driver and firmware tipc: fix shutdown() of connectionless socket ipv6: Fix sysctl max for fib_multipath_hash_policy drivers/net/wan/hdlc: Change the default of hard_header_len to 0 net: gemini: Fix another missing clk_disable_unprepare() in probe net: bcmgenet: fix mask check in bcmgenet_validate_flow() amd-xgbe: Add support for new port mode net: usb: dm9601: Add USB ID of Keenetic Plus DSL vhost: fix typo in error message net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() pktgen: fix error message with wrong function name net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode cxgb4: fix thermal zone device registration ...
2020-09-03Merge tag 'affs-for-5.9-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs fix from David Sterba: "One fix to make permissions work the same way as on AmigaOS" * tag 'affs-for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: fix basic permission bits to actually work
2020-09-03xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt filesxfs-5.9-fixes-2Darrick J. Wong
The realtime flag only applies to the data fork, so don't use the realtime block number checks on the attr fork of a realtime file. Fixes: 30b0984d9117 ("xfs: refactor bmap record validation") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-09-02Merge tag 'xfs-5.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Various small corruption fixes that have come in during the past month: - Avoid a log recovery failure for an insert range operation by rolling deferred ops incrementally instead of at the end. - Fix an off-by-one error when calculating log space reservations for anything involving an inode allocation or free. - Fix a broken shortform xattr verifier. - Ensure that the shortform xattr header padding is always initialized to zero" * tag 'xfs-5.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: initialize the shortform attr header padding entry xfs: fix boundary test in xfs_attr_shortform_verify xfs: fix off-by-one in inode alloc block reservation calculation xfs: finish dfops on every insert range shift iteration
2020-09-02Merge branch 'work.epoll' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull epoll fixup from Al Viro: "Fixup for epoll regression; there's a better solution longer term, but this is the least intrusive fix" * 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix regression in "epoll: Keep a reference on files added to the check list"
2020-09-02io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked fileio_uring-5.9-2020-09-04Jens Axboe
Actually two things that need fixing up here: - The io_rw_reissue() -EAGAIN retry is explicit to block devices and regular files, so don't ever attempt to do that on other types of files. - If we hit -EAGAIN on a nonblock marked file, don't arm poll handler for it. It should just complete with -EAGAIN. Cc: stable@vger.kernel.org Reported-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02fix regression in "epoll: Keep a reference on files added to the check list"Al Viro
epoll_loop_check_proc() can run into a file already committed to destruction; we can't grab a reference on those and don't need to add them to the set for reverse path check anyway. Tested-by: Marc Zyngier <maz@kernel.org> Fixes: a9ed4a6560b8 ("epoll: Keep a reference on files added to the check list") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-02io_uring: set table->files[i] to NULL when io_sqe_file_register failedJiufei Xue
While io_sqe_file_register() failed in __io_sqe_files_update(), table->files[i] still point to the original file which may freed soon, and that will trigger use-after-free problems. Cc: stable@vger.kernel.org Fixes: f3bd9dae3708 ("io_uring: fix memleak in __io_sqe_files_update()") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01Merge tag 'for-5.9-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two small fixes and a bunch of lockdep fixes for warnings that show up with an upcoming tree locking update but are valid with current locks as well" * tag 'for-5.9-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: tree-checker: fix the error message for transid error btrfs: set the lockdep class for log tree extent buffers btrfs: set the correct lockdep class for new nodes btrfs: allocate scrub workqueues outside of locks btrfs: fix potential deadlock in the search ioctl btrfs: drop path before adding new uuid tree entry btrfs: block-group: fix free-space bitmap threshold
2020-09-01io_uring: fix removing the wrong file in __io_sqe_files_update()Jiufei Xue
Index here is already the position of the file in fixed_file_table, we should not use io_file_from_index() again to get it. Otherwise, the wrong file which still in use may be released unexpectedly. Cc: stable@vger.kernel.org # v5.6 Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-31affs: fix basic permission bits to actually workMax Staudt
The basic permission bits (protection bits in AmigaOS) have been broken in Linux' AFFS - it would only set bits, but never delete them. Also, contrary to the documentation, the Archived bit was not handled. Let's fix this for good, and set the bits such that Linux and classic AmigaOS can coexist in the most peaceful manner. Also, update the documentation to represent the current state of things. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Max Staudt <max@enpas.org> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-30Merge tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cfis fix from Steve French: "DFS fix for referral problem when using SMB1" * tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix check of tcon dfs in smb1
2020-08-29Merge tag 'fallthrough-fixes-5.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo A. R. Silva: "Fix some minor issues introduced by the recent treewide fallthrough conversions: - Fix identation issue - Fix erroneous fallthrough annotation - Remove unnecessary fallthrough annotation - Fix code comment changed by fallthrough conversion" * tag 'fallthrough-fixes-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: arm64/cpuinfo: Remove unnecessary fallthrough annotation media: dib0700: Fix identation issue in dib8096_set_param_override() afs: Remove erroneous fallthough annotation iio: dpot-dac: fix code comment in dpot_dac_read_raw()
2020-08-28Merge tag 'io_uring-5.9-2020-08-28' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A few fixes in here, all based on reports and test cases from folks using it. Most of it is stable material as well: - Hashed work cancelation fix (Pavel) - poll wakeup signalfd fix - memlock accounting fix - nonblocking poll retry fix - ensure we never return -ERESTARTSYS for reads - ensure offset == -1 is consistent with preadv2() as documented - IOPOLL -EAGAIN handling fixes - remove useless task_work bounce for block based -EAGAIN retry" * tag 'io_uring-5.9-2020-08-28' of git://git.kernel.dk/linux-block: io_uring: don't bounce block based -EAGAIN retry off task_work io_uring: fix IOPOLL -EAGAIN retries io_uring: clear req->result on IOPOLL re-issue io_uring: make offset == -1 consistent with preadv2/pwritev2 io_uring: ensure read requests go through -ERESTART* transformation io_uring: don't use poll handler if file can't be nonblocking read/written io_uring: fix imbalanced sqo_mm accounting io_uring: revert consumed iov_iter bytes on error io-wq: fix hang after cancelling pending hashed work io_uring: don't recurse on tsk->sighand->siglock with signalfd
2020-08-28Merge tag 'writeback_for_v5.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull writeback fixes from Jan Kara: "Fixes for writeback code occasionally skipping writeback of some inodes or livelocking sync(2)" * tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: writeback: Drop I_DIRTY_TIME_EXPIRE writeback: Fix sync livelock due to b_dirty_time processing writeback: Avoid skipping inode writeback writeback: Protect inode->i_io_list with inode->i_lock
2020-08-28Merge tag 'gfs2-v5.9-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix a memory leak on filesystem withdraw. We didn't detect this bug because we have slab merging on by default (CONFIG_SLAB_MERGE_DEFAULT). Adding 'slub_nomerge' to the kernel command line exposed the problem" * tag 'gfs2-v5.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: add some much needed cleanup for log flushes that fail
2020-08-28Merge tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "We have an inode number handling change, prompted by s390x which is a 64-bit architecture with a 32-bit ino_t, a patch to disallow leases to avoid potential data integrity issues when CephFS is re-exported via NFS or CIFS and a fix for the bulk of W=1 compilation warnings" * tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client: ceph: don't allow setlease on cephfs ceph: fix inode number handling on arches with 32-bit ino_t libceph: add __maybe_unused to DEFINE_CEPH_FEATURE
2020-08-28cifs: fix check of tcon dfs in smb1Paulo Alcantara
For SMB1, the DFS flag should be checked against tcon->Flags rather than tcon->share_flags. While at it, add an is_tcon_dfs() helper to check for DFS capability in a more generic way. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
2020-08-27io_uring: don't bounce block based -EAGAIN retry off task_workio_uring-5.9-2020-08-28Jens Axboe
These events happen inline from submission, so there's no need to bounce them through the original task. Just set them up for retry and issue retry directly instead of going over task_work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-27io_uring: fix IOPOLL -EAGAIN retriesJens Axboe
This normally isn't hit, as polling is mostly done on NVMe with deep queue depths. But if we do run into request starvation, we need to ensure that retries are properly serialized. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-27Merge tag 'rxrpc-fixes-20200820' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc, afs: Fix probing issues Here are some fixes for rxrpc and afs to fix issues in the RTT measuring in rxrpc and thence the Volume Location server probing in afs: (1) Move the serial number of a received ACK into a local variable to simplify the next patch. (2) Fix the loss of RTT samples due to extra interposed ACKs causing baseline information to be discarded too early. This is a particular problem for afs when it sends a single very short call to probe a server it hasn't talked to recently. (3) Fix rxrpc_kernel_get_srtt() to indicate whether it actually has seen any valid samples or not. (4) Remove a field that's set/woken, but never read/waited on. (5) Expose the RTT and other probe information through procfs to make debugging of this stuff easier. (6) Fix VL rotation in afs to only use summary information from VL probing and not the probe running state (which gets clobbered when next a probe is issued). (7) Fix VL rotation to actually return the error aggregated from the probe errors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27afs: Remove erroneous fallthough annotationDan Carpenter
The fall through annotation comes after a return statement so it's not reachable. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-27xfs: initialize the shortform attr header padding entryxfs-5.9-fixes-1Darrick J. Wong
Don't leak kernel memory contents into the shortform attr fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-08-27btrfs: tree-checker: fix the error message for transid errorQu Wenruo
The error message for inode transid is the same as for inode generation, which makes us unable to detect the real problem. Reported-by: Tyler Richmond <t.d.richmond@gmail.com> Fixes: 496245cac57e ("btrfs: tree-checker: Verify inode item") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-27btrfs: set the lockdep class for log tree extent buffersJosef Bacik
These are special extent buffers that get rewound in order to lookup the state of the tree at a specific point in time. As such they do not go through the normal initialization paths that set their lockdep class, so handle them appropriately when they are created and before they are locked. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-27btrfs: set the correct lockdep class for new nodesJosef Bacik
When flipping over to the rw_semaphore I noticed I'd get a lockdep splat in replace_path(), which is weird because we're swapping the reloc root with the actual target root. Turns out this is because we're using the root->root_key.objectid as the root id for the newly allocated tree block when setting the lockdep class, however we need to be using the actual owner of this new block, which is saved in owner. The affected path is through btrfs_copy_root as all other callers of btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid == root->root_key.objectid . CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-27btrfs: allocate scrub workqueues outside of locksJosef Bacik
I got the following lockdep splat while testing: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc7-00172-g021118712e59 #932 Not tainted ------------------------------------------------------ btrfs/229626 is trying to acquire lock: ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450 but task is already holding lock: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_scrub_dev+0x11c/0x630 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_run_dev_stats+0x49/0x480 commit_cowonly_roots+0xb5/0x2a0 btrfs_commit_transaction+0x516/0xa60 sync_filesystem+0x6b/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 __prepare_exit_to_usermode+0x1cc/0x1e0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_commit_transaction+0x4bb/0xa60 sync_filesystem+0x6b/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 __prepare_exit_to_usermode+0x1cc/0x1e0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_record_root_in_trans+0x43/0x70 start_transaction+0xd1/0x5d0 btrfs_dirty_inode+0x42/0xd0 touch_atime+0xa1/0xd0 btrfs_file_mmap+0x3f/0x60 mmap_region+0x3a4/0x640 do_mmap+0x376/0x580 vm_mmap_pgoff+0xd5/0x120 ksys_mmap_pgoff+0x193/0x230 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x68/0x90 _copy_to_user+0x1e/0x80 perf_read+0x141/0x2c0 vfs_read+0xad/0x1b0 ksys_read+0x5f/0xe0 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&cpuctx_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 perf_event_init_cpu+0x88/0x150 perf_event_init+0x1db/0x20b start_kernel+0x3ae/0x53c secondary_startup_64+0xa4/0xb0 -> #1 (pmus_lock){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 perf_event_init_cpu+0x4f/0x150 cpuhp_invoke_callback+0xb1/0x900 _cpu_up.constprop.26+0x9f/0x130 cpu_up+0x7b/0xc0 bringup_nonboot_cpus+0x4f/0x60 smp_init+0x26/0x71 kernel_init_freeable+0x110/0x258 kernel_init+0xa/0x103 ret_from_fork+0x1f/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 cpus_read_lock+0x39/0xb0 alloc_workqueue+0x378/0x450 __btrfs_alloc_workqueue+0x15d/0x200 btrfs_alloc_workqueue+0x51/0x160 scrub_workers_get+0x5a/0x170 btrfs_scrub_dev+0x18c/0x630 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->scrub_lock); lock(&fs_devs->device_list_mutex); lock(&fs_info->scrub_lock); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by btrfs/229626: #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630 #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630 stack backtrace: CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932 Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x165/0x180 __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 ? alloc_workqueue+0x378/0x450 cpus_read_lock+0x39/0xb0 ? alloc_workqueue+0x378/0x450 alloc_workqueue+0x378/0x450 ? rcu_read_lock_sched_held+0x52/0x80 __btrfs_alloc_workqueue+0x15d/0x200 btrfs_alloc_workqueue+0x51/0x160 scrub_workers_get+0x5a/0x170 btrfs_scrub_dev+0x18c/0x630 ? start_transaction+0xd1/0x5d0 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ? do_sigaction+0x102/0x250 ? lockdep_hardirqs_on_prepare+0xca/0x160 ? _raw_spin_unlock_irq+0x24/0x30 ? trace_hardirqs_on+0x1c/0xe0 ? _raw_spin_unlock_irq+0x24/0x30 ? do_sigaction+0x102/0x250 ? ksys_ioctl+0x83/0xc0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happens because we're allocating the scrub workqueues under the scrub and device list mutex, which brings in a whole host of other dependencies. Because the work queue allocation is done with GFP_KERNEL, it can trigger reclaim, which can lead to a transaction commit, which in turns needs the device_list_mutex, it can lead to a deadlock. A different problem for which this fix is a solution. Fix this by moving the actual allocation outside of the scrub lock, and then only take the lock once we're ready to actually assign them to the fs_info. We'll now have to cleanup the workqueues in a few more places, so I've added a helper to do the refcount dance to safely free the workqueues. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-27btrfs: fix potential deadlock in the search ioctlJosef Bacik
With the conversion of the tree locks to rwsem I got the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted ------------------------------------------------------ compsize/11122 is trying to acquire lock: ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90 but task is already holding lock: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-fs-00){++++}-{3:3}: down_write_nested+0x3b/0x70 __btrfs_tree_lock+0x24/0x120 btrfs_search_slot+0x756/0x990 btrfs_lookup_inode+0x3a/0xb4 __btrfs_update_delayed_inode+0x93/0x270 btrfs_async_run_delayed_root+0x168/0x230 btrfs_work_helper+0xd4/0x570 process_one_work+0x2ad/0x5f0 worker_thread+0x3a/0x3d0 kthread+0x133/0x150 ret_from_fork+0x1f/0x30 -> #1 (&delayed_node->mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_delayed_update_inode+0x50/0x440 btrfs_update_inode+0x8a/0xf0 btrfs_dirty_inode+0x5b/0xd0 touch_atime+0xa1/0xd0 btrfs_file_mmap+0x3f/0x60 mmap_region+0x3a4/0x640 do_mmap+0x376/0x580 vm_mmap_pgoff+0xd5/0x120 ksys_mmap_pgoff+0x193/0x230 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&mm->mmap_lock#2){++++}-{3:3}: __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 __might_fault+0x68/0x90 _copy_to_user+0x1e/0x80 copy_to_sk.isra.32+0x121/0x300 search_ioctl+0x106/0x200 btrfs_ioctl_tree_search_v2+0x7b/0xf0 btrfs_ioctl+0x106f/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-fs-00); lock(&delayed_node->mutex); lock(btrfs-fs-00); lock(&mm->mmap_lock#2); *** DEADLOCK *** 1 lock held by compsize/11122: #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180 stack backtrace: CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x165/0x180 __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 ? __might_fault+0x3e/0x90 ? find_held_lock+0x72/0x90 __might_fault+0x68/0x90 ? __might_fault+0x3e/0x90 _copy_to_user+0x1e/0x80 copy_to_sk.isra.32+0x121/0x300 ? btrfs_search_forward+0x2a6/0x360 search_ioctl+0x106/0x200 btrfs_ioctl_tree_search_v2+0x7b/0xf0 btrfs_ioctl+0x106f/0x30a0 ? __do_sys_newfstat+0x5a/0x70 ? ksys_ioctl+0x83/0xc0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The problem is we're doing a copy_to_user() while holding tree locks, which can deadlock if we have to do a page fault for the copy_to_user(). This exists even without my locking changes, so it needs to be fixed. Rework the search ioctl to do the pre-fault and then copy_to_user_nofault for the copying. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>