summaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)Author
2021-10-22xfs: online repair of directoriesDarrick J. Wong
If a directory looks like it's in bad shape, try to sift through the rubble to find whatever directory entries we can, scan the directory tree for the parent (if needed), stage the new directory contents in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. As a side effect of this patch, directory inactivation will be able to purge any leftover dir blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: repair extended attributesDarrick J. Wong
If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, stage a new attribute structure in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: enable atomic swapext featureatomic-file-updates_2021-10-22Darrick J. Wong
Add the atomic swapext feature to the set of features that we will permit. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: make atomic extent swapping support realtime filesDarrick J. Wong
Now that bmap items support the realtime device, we can add the necessary pieces to the atomic extent swapping code to support such things. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: condense directories after an atomic swapDarrick J. Wong
The previous commit added a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. Now add this ability for directories. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online directory repair feature can create salvaged dirents in a temporary directory and swap the data forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed directory down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: condense extended attributes after an atomic swapDarrick J. Wong
Add a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. If we were swapping the extended attributes, we want to be able to convert file2's attr fork from block to inline format. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online xattr repair feature can create salvaged attrs in a temporary file and swap the attr forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed file's attr fork back down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: add error injection to test swapext recoveryDarrick J. Wong
Add an errortag so that we can test recovery of swapext log items. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: create deferred log items for extent swappingDarrick J. Wong
Now that we've created the skeleton of a log intent item to track and restart extent swap operations, add the upper level logic to commit intent items and turn them into concrete work recorded in the log. We use the deferred item "multihop" feature that was introduced a few patches ago to constrain the number of active swap operations to one per thread. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: introduce a swap-extent log intent itemDarrick J. Wong
Introduce a new intent log item to handle swapping extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: create a log incompat flag for atomic extent swappingDarrick J. Wong
Create a log incompat flag so that we only attempt to process swap extent log items if the filesystem supports it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22vfs: introduce new file range exchange ioctlDarrick J. Wong
Introduce a new ioctl to handle swapping ranges of bytes between files. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: port the defer ops capture and continue to resource capturelog-recovery-defer-capture_2021-10-22Darrick J. Wong
When log recovery tries to recover a transaction that had log intent items attached to it, it has to save certain parts of the transaction state (reservation, dfops chain, inodes with no automatic unlock) so that it can finish single-stepping the recovered transactions before finishing the chains. This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue functions. Right now they open-code this functionality, so let's port this to the formalized resource capture structure that we introduced in the previous patch. This enables us to hold up to two inodes and two buffers during log recovery, the same way we do for regular runtime. With this patch applied, we'll be ready to support atomic extent swap which holds two inodes; and logged xattrs which holds one inode and one xattr leaf buffer. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: formalize the process of holding onto resources across a defer rollDarrick J. Wong
Transaction users are allowed to flag up to two buffers and two inodes for ownership preservation across a deferred transaction roll. Hoist the variables and code responsible for this out of xfs_defer_trans_roll so that we can use it for the defer capture mechanism. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: xfs_bmap_finish_one should map unwritten extents properlyDarrick J. Wong
The deferred bmap work state and the log item can transmit unwritten state, so the XFS_BMAP_MAP handler must map in extents with that unwritten state. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: support deferred bmap updates on the attr forkDarrick J. Wong
The deferred bmap update log item has always supported the attr fork, so plumb this in so that higher layers can access this. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: add a realtime flag to the bmap update log redo itemsDarrick J. Wong
Extend the bmap update (BUI) log items with a new realtime flag that indicates that the updates apply against a realtime file's data fork. We'll wire up the actual code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: hoist freeing of rt data fork extent mappingsDarrick J. Wong
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert the physical extent of a data fork mapping for a realtime file into rt extents and pass that to the rt extent freeing function. Since the details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to xfs_rtbitmap.c to reduce code size when realtime isn't enabled. This will (one day) enable realtime EFIs to reuse the same unit-converting call with less code duplication. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: fix xfs_bunmapi to allow unmapping of partial rt extentsDarrick J. Wong
When XFS_BMAPI_REMAP is passed to bunmapi, that means that we want to remove part of a block mapping without touching the allocator. For realtime files with rtextsize > 1, that also means that we should skip all the code that changes a partial remove request into an unwritten extent conversion. IOWs, bunmapi in this mode should handle removing the mapping from the rt file and nothing else. Note that XFS_BMAPI_REMAP callers are required to decrement the reference count and/or free the space manually. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: clean up bmap log intent item tracepoint callsitesDarrick J. Wong
Pass the incore bmap structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: pass the xfs_bmbt_irec directly through the log intent codeDarrick J. Wong
Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through. We'll clean up the tracepoints shortly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: repair the rmapbtDarrick J. Wong
Rebuild the reverse mapping btree from all primary metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: create a helper to decide if a file mapping targets the rt volumeDarrick J. Wong
Create a helper so that we can stop open-coding this decision everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: introduce online scrub freezeDarrick J. Wong
Introduce a new 'online scrub freeze' that we can use to lock out all filesystem modifications and background activity so that we can perform global scans in order to rebuild metadata. This introduces a new IFLAG to the scrub ioctl to indicate that userspace is willing to allow a freeze. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: update health status if we get a clean bill of healthindirect-health-reporting_2021-10-22Darrick J. Wong
If scrub finds that everything is ok with the filesystem, we need a way to tell the health tracking that it can let go of indirect health flags, since indirect flags only mean that at some point in the past we lost some context. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: remember sick inodes that get inactivatedDarrick J. Wong
If an unhealthy inode gets inactivated, remember this fact in the per-fs health summary. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: add secondary and indirect classes to the health tracking systemDarrick J. Wong
Establish two more classes of health tracking bits: * Indirect problems, which suggest problems in other health domains that we weren't able to preserve. * Secondary problems, which track state that's related to primary evidence of health problems; and The first class we'll use in an upcoming patch to record in the AG health status the fact that we ran out of memory and had to inactivate an inode with defective metadata. The second class we use to indicate that repair knows that an inode is bad and we need to fix it later. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report XFS_CORRUPT_ON errors to the health systemcorruption-health-reports_2021-10-22Darrick J. Wong
Whenever we encounter XFS_CORRUPT_ON failures, we should report that to the health monitoring system for later reporting. I started with this and massaged everything until it built: @@ expression mp, test; @@ - if (XFS_CORRUPT_ON(mp, test)) return -EFSCORRUPTED; + if (XFS_CORRUPT_ON(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } @@ expression mp, test; identifier label, error; @@ - if (XFS_CORRUPT_ON(mp, test)) { error = -EFSCORRUPTED; goto label; } + if (XFS_CORRUPT_ON(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; } Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report realtime metadata corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt realtime metadat blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report inode corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt inode records, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report dir/attr block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt directory or extended attribute blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report btree block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report block map corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter a corrupt block mapping, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report ag header corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter a corrupt AG header, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: separate the marking of sick and checked metadataDarrick J. Wong
Split the setting of the sick and checked masks into separate functions as part of preparing to add the ability for regular runtime fs code (i.e. not scrub) to mark metadata structures sick when corruptions are found. Improve the documentation of libxfs' requirements for helper behavior. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: implement live quotacheck inode scanDarrick J. Wong
Create a new trio of scrub functions to check quota counters. While the dquots themselves are filesystem metadata and should be checked early, the dquot counter values are computed from other metadata and are therefore summary counters. We don't plug these into the scrub dispatch just yet, because we still need to be able to watch quota updates while doing our scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: report the health of quota countsDarrick J. Wong
Report the health of quota counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: create a new inode fork block unmap helperDarrick J. Wong
Create a new helper to unmap blocks from an inode's fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: repair inode block mapsDarrick J. Wong
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: zap broken inode forksDarrick J. Wong
Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: repair inode recordsDarrick J. Wong
Try to reinitialize corrupt inodes, or clear the reflink flag if it's not needed. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: repair inode btreesDarrick J. Wong
Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: repair free space btreesDarrick J. Wong
Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: reduce transaction reservations with reflinkDarrick J. Wong
Before to the introduction of deferred refcount operations, reflink would try to cram refcount btree updates into the same transaction as an allocation or a free event. Mainline XFS has never actually done that, but we never refactored the transaction reservations to reflect that we now do all refcount updates in separate transactions. Fix this to reduce the transaction reservation size even farther, so that between this patch and the previous one, we reduce the tr_write and tr_itruncate sizes by 66%. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: reduce the absurdly large log reservationsDarrick J. Wong
Back in the early days of reflink and rmap development I set the transaction reservation sizes to be overly generous for rmap+reflink filesystems, and a little under-generous for rmap-only filesystems. Since we don't need *eight* transaction rolls to handle three new log intent items, decrease the logcounts to what we actually need, and amend the shadow reservation computation function to reflect what we used to do so that the minimum log size doesn't change. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: create shadow transaction reservations for computing minimum log sizeDarrick J. Wong
Every time someone changes the transaction reservation sizes, they introduce potential compatibility problems if the changes affect the minimum log size that we validate at mount time. If the minimum log size gets larger (which should be avoided because doing so presents a serious risk of log livelock), filesystems created with old mkfs will not mount on a newer kernel; if the minimum size shrinks, filesystems created with newer mkfs will not mount on older kernels. Therefore, enable the creation of a shadow log reservation structure where we can "undo" the effects of tweaks when computing minimum log sizes. These shadow reservations should never be used in practice, but they insulate us from perturbations in minimum log size. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: stop artificially limiting the length of bunmap callsDarrick J. Wong
In commit e1a4e37cc7b6, we clamped the length of bunmapi calls on the data forks of shared files to avoid two failure scenarios: one where the extent being unmapped is so sparsely shared that we exceed the transaction reservation with the sheer number of refcount btree updates and EFI intent items; and the other where we attach so many deferred updates to the transaction that we pin the log tail and later the log head meets the tail, causing the log to livelock. We avoid triggering the first problem by tracking the number of ops in the refcount btree cursor and forcing a requeue of the refcount intent item any time we think that we might be close to overflowing. This has been baked into XFS since before the original e1a4 patch. A recent patchset fixed the second problem by changing the deferred ops code to finish all the work items created by each round of trying to complete a refcount intent item, which eliminates the long chains of deferred items (27dad); and causing long-running transactions to relog their intent log items when space in the log gets low (74f4d). Because this clamp affects /any/ unmapping request regardless of the sharing factors of the component blocks, it degrades the performance of all large unmapping requests -- whereas with an unshared file we can unmap millions of blocks in one go, shared files are limited to unmapping a few thousand blocks at a time, which causes the upper level code to spin in a bunmapi loop even if it wasn't needed. This also eliminates one more place where log recovery behavior can differ from online behavior, because bunmapi operations no longer need to requeue. Partial-revert-of: e1a4e37cc7b6 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent") Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished") Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: speed up write operations by using non-overlapped lookups when possiblermap-speedups_2021-10-22Darrick J. Wong
Reverse mapping on a reflink-capable filesystem has some pretty high overhead when performing file operations. This is because the rmap records for logically and physically adjacent extents might not be adjacent in the rmap index due to data block sharing. As a result, we use expensive overlapped-interval btree search, which walks every record that overlaps with the supplied key in the hopes of finding the record. However, profiling data shows that when the index contains a record that is an exact match for a query key, the non-overlapped btree search function can find the record much faster than the overlapped version. Try the non-overlapped lookup first when we're trying to find the left neighbor rmap record for a given file mapping, which makes unwritten extent conversion and remap operations run faster if data block sharing is minimal in this part of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: speed up rmap lookups by using non-overlapped lookups when possibleDarrick J. Wong
Reverse mapping on a reflink-capable filesystem has some pretty high overhead when performing file operations. This is because the rmap records for logically and physically adjacent extents might not be adjacent in the rmap index due to data block sharing. As a result, we use expensive overlapped-interval btree search, which walks every record that overlaps with the supplied key in the hopes of finding the record. However, profiling data shows that when the index contains a record that is an exact match for a query key, the non-overlapped btree search function can find the record much faster than the overlapped version. Try the non-overlapped lookup first, which will make scrub run much faster. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: simplify xfs_rmap_lookup_le call sitesDarrick J. Wong
Most callers of xfs_rmap_lookup_le will retrieve the btree record immediately if the lookup succeeds. The overlapped version of this function (xfs_rmap_lookup_le_range) will return the record if the lookup succeeds, so make the regular version do it too. Get rid of the useless len argument, since it's not part of the lookup key. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-22xfs: teach xfs_btree_has_record to return false if there are gapsDarrick J. Wong
The current implementation of xfs_btree_has_record returns true if it finds /any/ record within the given range. Unfortunately, that's not what the predicate is supposed to do -- it's supposed to test if the /entire/ range is covered by records. Therefore, enhance the routine to check that the first record it encounters starts earlier or at the same point as the low key, the last record ends at or after the same point as the high key, and that there aren't any gaps in the records. Signed-off-by: Darrick J. Wong <djwong@kernel.org>