summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-08xfs: replace do_mod with native operationsxfs-4.18-merge-9Dave Chinner
do_mod() is a hold-over from when we have different sizes for file offsets and and other internal values for 40 bit XFS filesystems. Hence depending on build flags variables passed to do_mod() could change size. We no longer support those small format filesystems and hence everything is of fixed size theses days, even on 32 bit platforms. As such, we can convert all the do_mod() callers to platform optimised modulus operations as defined by linux/math64.h. Individual conversions depend on the types of variables being used. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: don't call xfs_da_shrink_inode with NULL bpEric Sandeen
xfs_attr3_leaf_create may have errored out before instantiating a buffer, for example if the blkno is out of range. In that case there is no work to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops if we try. This also seems to fix a flaw where the original error from xfs_attr3_leaf_create gets overwritten in the cleanup case, and it removes a pointless assignment to bp which isn't used after this. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969 Reported-by: Xu, Wen <wen.xu@gatech.edu> Tested-by: Xu, Wen <wen.xu@gatech.edu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: clean up MIN/MAXDave Chinner
Get rid of the MIN/MAX macros and just use the native min/max macros directly in the XFS code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: move various type verifiers to common fileDave Chinner
New verification functions like xfs_verify_fsbno() and xfs_verify_agino() are spread across multiple files and different header files. They really don't fit cleanly into the places they've been put, and have wider scope than the current header includes. Move the type verifiers to a new file in libxfs (xfs-types.c) and the prototypes to xfs_types.h where they will be visible to all the code that uses the types. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: xfs_reflink_convert_cow() memory allocation deadlockDave Chinner
xfs_reflink_convert_cow() manipulates the incore extent list in GFP_KERNEL context in the IO submission path whilst holding locked pages under writeback. This is a memory reclaim deadlock vector. This code is not in a transaction, so any memory allocations it makes aren't protected via the memalloc_nofs_save() context that transactions carry. Hence we need to run this call under memalloc_nofs_save() context to prevent potential memory allocations from being run as GFP_KERNEL and deadlocking. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: setup VFS i_rwsem lockdep state correctlyDave Chinner
When lockdep is enabled, it changes the type of the inode i_rwsem semaphore before unlocking a newly instantiated inode. THere is the possibility that there is already a waiter on that inode lock by the time we unlock the new inode, so having lockdep re-initialise the lock is a vector for trouble. Avoid this whole situation by setting up the i_rwsem lockdep class at the same time we set up the XFS inode i_ilock classes and so the VFS doesn't have to change the lock class itself when it is potentially unsafe. This change is necessary because the equivalent fixes to the VFS code made in commit 1e2e547a93a0 ("do d_instantiate/unlock_new_inode combinations safely") are not relevant to XFS as it has it's own internal inode cache lookup and instantiation routines. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: fix string handling in label get/set functionsxfs-4.18-merge-8Arnd Bergmann
[sandeen: fix subject, avoid copy-out of uninit data in getlabel] gcc-8 reports two warnings for the newly added getlabel/setlabel code: fs/xfs/xfs_ioctl.c: In function 'xfs_ioc_getlabel': fs/xfs/xfs_ioctl.c:1822:38: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] strncpy(label, sbp->sb_fname, sizeof(sbp->sb_fname)); ^ In function 'strncpy', inlined from 'xfs_ioc_setlabel' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1863:2, inlined from 'xfs_file_ioctl' at /git/arm-soc/fs/xfs/xfs_ioctl.c:1918:10: include/linux/string.h:254:9: error: '__builtin_strncpy' output may be truncated copying 12 bytes from a string of length 12 [-Werror=stringop-truncation] return __builtin_strncpy(p, q, size); In both cases, part of the problem is that one of the strncpy() arguments is a fixed-length character array with zero-padding rather than a zero-terminated string. In the first one case, we also get an odd warning about sizeof-pointer-memaccess, which doesn't seem right (the sizeof is for an array that happens to be the same as the second strncpy argument). To work around the bogus warning, I use a plain 'XFSLABEL_MAX' for the strncpy() length when copying the label in getlabel. For setlabel(), using memcpy() with the correct length that is already known avoids the second warning and is slightly simpler. In a related issue, it appears that we accidentally skip the trailing \0 when copying a 12-character label back to user space in getlabel(). Using the correct sizeof() argument here copies the extra character. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85602 Fixes: f7664b31975b ("xfs: implement online get/set fs label") Cc: Eric Sandeen <sandeen@redhat.com> Cc: Martin Sebor <msebor@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: convert to SPDX license tagsDave Chinner
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: validate btree records on retrievalDave Chinner
So we don't check the validity of records as we walk the btree. When there are corrupt records in the free space btree (e.g. zero startblock/length or beyond EOAG) we just blindly use it and things go bad from there. That leads to assert failures on debug kernels like this: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450 .... Call Trace: xfs_alloc_fixup_trees+0x368/0x5c0 xfs_alloc_ag_vextent_near+0x79a/0xe20 xfs_alloc_ag_vextent+0x1d3/0x330 xfs_alloc_vextent+0x5e9/0x870 Or crashes like this: XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000 ..... BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 .... Call Trace: xfs_bmap_add_extent_hole_real+0x67d/0x930 xfs_bmapi_write+0x934/0xc90 xfs_da_grow_inode_int+0x27e/0x2f0 xfs_dir2_grow_inode+0x55/0x130 xfs_dir2_sf_to_block+0x94/0x5d0 xfs_dir2_sf_addname+0xd0/0x590 xfs_dir_createname+0x168/0x1a0 xfs_rename+0x658/0x9b0 By checking that free space records pulled from the trees are within the valid range, we catch many of these corruptions before they can do damage. This is a generic btree record checking deficiency. We need to validate the records we fetch from all the different btrees before we use them to catch corruptions like this. This patch results in a corrupt record emitting an error message and returning -EFSCORRUPTED, and the higher layers catch that and abort: XFS (loop0): Size Freespace BTree record corruption in AG 0 detected! XFS (loop0): start block 0x0 block count 0x0 XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x42a/0x670 ..... Call Trace: dump_stack+0x85/0xcb xfs_trans_cancel+0x19f/0x1c0 xfs_create+0x42a/0x670 xfs_generic_create+0x1f6/0x2c0 vfs_create+0xf9/0x180 do_mknodat+0x1f9/0x210 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe ..... XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c. Return address = ffffffff81500868 XFS (loop0): Corruption of in-memory data detected. Shutting down filesystem Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()Dave Chinner
In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if we are doing an untrusted lookup. This is done because we need failed filehandle lookups to report -ESTALE to the caller, and it does this by converting -EINVAL and -ENOENT errors to -ESTALE. The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for for xfs_iget(UNTRUSTED) callers to determine the difference between "inode does not exist" and "corruption detected during lookup". We realy need that distinction in places calling xfS_iget(UNTRUSTED), so move the filehandle error case handling all the way out to xfs_nfs_get_inode() where it is needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: verify root inode more thoroughlyDave Chinner
When looking up the root inode at mount time, we don't actually do any verification to check that the inode is allocated and accounted for correctly in the INOBT. Make the checks on the root inode more robust by making it an untrusted lookup. This forces the inode lookup to use the inode btree to verify the inode is allocated and mapped correctly to disk. This will also have the effect of catching a significant number of AGI/INOBT related corruptions in AG 0 at mount time. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: verify COW extent size hint is valid in inode verifierDave Chinner
There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. Validate COW extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: verify extent size hint is valid in inode verifierDave Chinner
There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. This results in alignment assertion failures when setting up allocations such as this in direct IO: XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 .... Call Trace: xfs_bmap_btalloc+0x415/0x910 xfs_bmapi_write+0x71c/0x12e0 xfs_iomap_write_direct+0x2a9/0x420 xfs_file_iomap_begin+0x4dc/0xa70 iomap_apply+0x43/0x100 iomap_file_buffered_write+0x62/0x90 xfs_file_buffered_aio_write+0xba/0x300 __vfs_write+0xd5/0x150 vfs_write+0xb6/0x180 ksys_write+0x45/0xa0 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe And from xfs_db: core.extsize = 10380288 Which is not an integer multiple of the block size, and so violates Rule #7 for setting extent size hints. Validate extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06xfs: catch bad stripe alignment configurationsDave Chinner
When stripe alignments are invalid, data alignment algorithms in the allocator may not work correctly. Ensure we catch superblocks with invalid stripe alignment setups at mount time. These data alignment mismatches are now detected at mount time like this: XFS (loop0): SB stripe unit sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ...`.4.......... 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ XFS (loop0): SB validate failed with error -117. And the mount fails. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-05iomap: fsync swap files before iterating mappingsDarrick J. Wong
Swap files require that all the file mapping metadata be stable on disk. It is insufficient to flush dirty pages in the page cache because that won't necessarily result in filesystems pushing all their metadata out to disk. Therefore, call fsync from iomap_swapfile_activate. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz>
2018-06-04xfs: use xfs_trans_getsb in xfs_sync_sb_bufxfs-4.18-merge-7Eric Sandeen
Use xfs_trans_getsb rather than reaching right in for mp->m_sb_bp; I think this is more correct, and it facilitates building this libxfs code in userspace as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-04xfs: don't assert on corrupted unlinked inode listDarrick J. Wong
Use the per-ag inode number verifiers to detect corrupt lists and error out, instead of using ASSERTs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: explicitly pass buffer size to xfs_corruption_errorDarrick J. Wong
Explicitly pass the buffer length to xfs_corruption_error() instead of assuming XFS_CORRUPTION_DUMP_LEN so that we avoid dumping off the end of the buffer. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't assert when on-disk btree pointers are garbageDarrick J. Wong
Don't ASSERT when we encounter bad on-disk btree pointers in the debug check functions. Log the error to leave breadcrumbs and let the upper layers deal with it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: strengthen btree pointer checks before useDarrick J. Wong
Instead of ASSERTing on null btree pointers in xfs_btree_ptr_to_daddr, use the new block number verifiers to ensure that the btree pointer doesn't point to any sensitive areas (AG headers, past-EOFS) and return -EFSCORRUPTED if this is the case. Remove the ASSERT because on-disk corruptions shouldn't trigger ASSERTs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: introduce xfs_btree_debug_check_ptrDarrick J. Wong
Make xfs_btree_check_ptr a non-debug function and introduce a new _debug version that only runs when #ifdef DEBUG. This will enable us to reuse the checking logic with other parts of the btree code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: check directory bestfree information in the verifierDarrick J. Wong
Create a variant of xfs_dir2_data_freefind that is suitable for use in a verifier. Because _freefind is called by the verifier, we simply duplicate the _freefind function, convert the ASSERTs to return __this_address, and modify the verifier to call our new function. Once we've made it impossible for directory blocks with bad bestfree data to make it into the filesystem we can remove the DEBUG code from the regular _freefind function. Underlying argument: corruption of on-disk metadata should return -EFSCORRUPTED instead of blowing ASSERTs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't return garbage buffers in xfs_da3_node_readDarrick J. Wong
If we're reading a node in a dir/attr btree and the buffer comes off the disk with a magic number we don't recognize, don't ASSERT and don't set a garbage buffer type (0 also triggers ASSERTs). Instead, report the corruption, release the buffer, and return -EFSCORRUPTED because that's what the dabtree is -- corrupt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't ASSERT on short form btree root pointer of zeroDarrick J. Wong
Don't ASSERT if the short form btree root pointer is zero. Now that we use xfs_verify_agbno to check all short form btree pointers, we'll let that log the error and pass it to the upper layers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: btree lookup shouldn't ASSERT on empty btree nodesDarrick J. Wong
If a btree lookup encounters an empty btree node or an empty btree leaf on a multi-level btree, that's evidence of a corrupt on-disk btree. Therefore, we should return -EFSCORRUPTED to the upper levels, not an ASSERT failure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: xfs_alloc_get_rec should return EFSCORRUPTED for obvious bnobt corruptionDarrick J. Wong
Return -EFSCORRUPTED when the bnobt/cntbt return obviously corrupt values, rather than letting them bounce around in the internal code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: remove redundant ASSERT on insufficient bestfree length in _leaf_addnameDarrick J. Wong
In xfs_dir2_leaf_addname we ASSERT if the length of the unused space described by bestfree[0] is less the amount of space we wish to consume. Immediately after it is a call to xfs_dir2_data_use_free where the offset parameter is offset of the unused space and the length parameter is the amount of space we wish to consume. Both values (and the unused space pointer) are passed into xfs_dir2_data_check_free, which also validates that the region of unused space is big enough to cover the space we wish to consume. This is effectively the same check that the ASSERT covers, and since a check failure results in a corruption message being logged we can remove the ASSERT. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't assert when reporting on-disk corruption while loading btreeDarrick J. Wong
Don't bother ASSERTing when we're already going to log and return the corruption status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't forbid setting dax flag on directories if device doesn't daxDarrick J. Wong
On a directory, the DAX flag is merely a hint that files created in the directory should have the DAX flag set at creation time. We don't care if the underlying device supports DAX or not because directory metadata are always cached in DRAM. We don't care if new files get the flag even if the device doesn't support DAX because we always check for DAX support before setting the VFS flag (S_DAX). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-03xfs: verify AGI unlinked list contains valid blocksDave Chinner
The heads of tha AGI unlinked list are only scanned on debug kernels when the verifier runs. Change that to always scan the heads and validate that the inode numbers are valid. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01fs: use ->is_partially_uptodate in page_cache_seek_hole_dataxfs-4.18-merge-3Christoph Hellwig
This way the implementation doesn't depend on buffer_head internals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01fs: remove the buffer_unwritten check in page_seek_hole_dataChristoph Hellwig
We only call into this function through the iomap iterators, so we already know the buffer is unwritten. In addition to that we always require the uptodate flag that is ORed with the result anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01fs: move page_cache_seek_hole_data to iomap.cChristoph Hellwig
This function is only used by the iomap code, depends on being called from it, and will soon stop poking into buffer head internals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01xfs: use iomap_bmapChristoph Hellwig
Switch to the iomap based bmap implementation to get rid of one of the last users of xfs_get_blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01iomap: add an iomap-based bmap implementationChristoph Hellwig
This adds a simple iomap-based implementation of the legacy ->bmap interface. Note that we can't easily add checks for rt or reflink files, so these will have to remain in the callers. This interface just needs to die.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01iomap: add a iomap_sector helperChristoph Hellwig
Factor the repeated calculation of the on-disk sector for a given logical block into a littler helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01iomap: use __bio_add_page in iomap_dio_zeroChristoph Hellwig
We don't need any merging logic, and this also replaces a BUG_ON with a WARN_ON_ONCE inside __bio_add_page for the impossible overflow condition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01iomap: move IOMAP_F_BOUNDARY to gfs2Christoph Hellwig
Just define a range of fs specific flags and use that in gfs2 instead of exposing this internal flag globally. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01iomap: fix the comment describing IOMAP_NOWAITChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01iomap: inline data should be an iomap type, not a flagChristoph Hellwig
Inline data is fundamentally different from our normal mapped case in that it doesn't even have a block address. So instead of having a flag for it it should be an entirely separate iomap range type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01mm: split ->readpages calls to avoid non-contiguous pages listsChristoph Hellwig
That way file systems don't have to go spotting for non-contiguous pages and work around them. It also kicks off I/O earlier, allowing it to finish earlier and reduce latency. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01mm: return an unsigned int from __do_page_cache_readaheadChristoph Hellwig
We never return an error, so switch to returning an unsigned int. Most callers already did implicit casts to an unsigned type, and the one that didn't can be simplified now. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01mm: give the 'ret' variable a better name __do_page_cache_readaheadChristoph Hellwig
It counts the number of pages acted on, so name it nr_pages to make that obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01block: add a lower-level bio_add_page interfaceChristoph Hellwig
For the upcoming removal of buffer heads in XFS we need to keep track of the number of outstanding writeback requests per page. For this we need to know if bio_add_page merged a region with the previous bvec or not. Instead of adding additional arguments this refactors bio_add_page to be implemented using three lower level helpers which users like XFS can use directly if they care about the merge decisions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01xfs: fix error handling in xfs_refcount_insert()xfs-4.18-merge-2Dave Chinner
generic/475 fired an assert failure just after the filesystem was shut down: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_refcount.c, line: 182 ..... Call Trace: xfs_refcount_insert+0x151/0x190 xfs_refcount_adjust_extents.constprop.11+0x9c/0x470 xfs_refcount_adjust.constprop.10+0xb0/0x270 xfs_refcount_finish_one+0x25a/0x420 xfs_trans_log_finish_refcount_update+0x2a/0x40 xfs_refcount_update_finish_item+0x35/0xa0 xfs_defer_finish+0x15e/0x4d0 xfs_reflink_remap_extent+0x1bc/0x610 xfs_reflink_remap_blocks+0x6e/0x280 xfs_reflink_remap_range+0x311/0x530 vfs_clone_file_range+0x119/0x200 .... If xfs_btree_insert() returns an error, the corruption check fires instead of passing the error back the caller. The corruption check should be after we've checked for an error, not before, thereby avoiding assert failures if the filesystem shuts down during a refcount btree record insert. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01xfs: fix xfs_rtalloc_rec unitsDarrick J. Wong
All the realtime allocation functions deal with space on the rtdev in units of realtime extents. However, struct xfs_rtalloc_rec confusingly uses the word 'block' in the name, even though they're really extents. Fix the naming problem and fix all the unit handling problems in the two existing users. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-06-01xfs: strengthen rtalloc query range checksDarrick J. Wong
Strengthen the rtalloc range query checks to make sure that the keys do not run off the end of the realtime device inappropriately. Note that the query range functions require units of rt extents, not blocks, despite the type name. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-06-01xfs: xfs_rtbuf_get should check the bmapi_read resultsDarrick J. Wong
The xfs_rtbuf_get function should check the block mapping it gets back from bmapi_read. If there are no mappings or the mapping isn't a real extent, we should return -EFSCORRUPTED rather than trying to read a garbage value. We also require realtime bitmap blocks to be real, written allocations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-06-01xfs: xfs_rtword_t should be unsigned, not signedDarrick J. Wong
xfs_rtword_t is used for bit manipulations in the realtime bitmap file. Since we're performing bit shifts with this type, we don't want sign extension and we don't want to be left shifting negative quantities because that's undefined behavior. This also shuts up these UBSAN warnings: UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_rtbitmap.c:833:48 signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-05-31dax: change bdev_dax_supported() to support boolean returnsxfs-4.18-merge-1Dave Jiang
The function return values are confusing with the way the function is named. We expect a true or false return value but it actually returns 0/-errno. This makes the code very confusing. Changing the return values to return a bool where if DAX is supported then return true and no DAX support returns false. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>