summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-03-25xfs: enable atomic swapext featureatomic-file-updates_2021-03-25Darrick J. Wong
Add the atomic swapext feature to the set of features that we will permit. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: make atomic extent swapping support realtime filesDarrick J. Wong
Now that bmap items support the realtime device, we can add the necessary pieces to the atomic extent swapping code to support such things. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: condense directories after an atomic swapDarrick J. Wong
The previous commit added a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. Now add this ability for directories. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online directory repair feature can create salvaged dirents in a temporary directory and swap the data forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed directory down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: condense extended attributes after an atomic swapDarrick J. Wong
Add a new swapext flag that enables us to perform post-swap processing on file2 once we're done swapping the extent maps. If we were swapping the extended attributes, we want to be able to convert file2's attr fork from block to inline format. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online xattr repair feature can create salvaged attrs in a temporary file and swap the attr forks when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the swap. After the swap, we can try to condense the fixed file's attr fork back down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remove old swap extents implementationDarrick J. Wong
Migrate the old XFS_IOC_SWAPEXT implementation to use our shiny new one. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: allow xfs_swap_range to use older extent swap algorithmsDarrick J. Wong
If userspace permits non-atomic swap operations, use the older code paths to implement the same functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: refactor reflink flag handling in xfs_swap_extent_forksDarrick J. Wong
Refactor the old data fork swap function to use the new reflink flag helpers to propagate reflink flags between the two files. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: consolidate all of the xfs_swap_extent_forks codeDarrick J. Wong
Consolidate the bmbt owner change scan code in xfs_swap_extent_forks, since it's not needed for the deferred bmap log item swapext implementation. The goal is to package up all three implementations into functions that have the same preconditions and leave the system in the same state. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: port xfs_swap_extents_rmap to our new codeDarrick J. Wong
The inner loop of xfs_swap_extents_rmap does the same work as xfs_swapext_finish_one, so adapt it to use that. Doing so has the side benefit that the older code path no longer wastes its time remapping shared extents. This forms the basis of the non-atomic swaprange implementation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add error injection to test swapext recoveryDarrick J. Wong
Add an errortag so that we can test recovery of swapext log items. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add a ->xchg_file_range handlerDarrick J. Wong
Add a function to handle file range exchange requests from the vfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create deferred log items for extent swappingDarrick J. Wong
Now that we've created the skeleton of a log intent item to track and restart extent swap operations, add the upper level logic to commit intent items and turn them into concrete work recorded in the log. We use the deferred item "multihop" feature that was introduced a few patches ago to constrain the number of active swap operations to one per thread. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: introduce a swap-extent log intent itemDarrick J. Wong
Introduce a new intent log item to handle swapping extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create a log incompat flag for atomic extent swappingDarrick J. Wong
Create a log incompat flag so that we only attempt to process swap extent log items if the filesystem supports it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: clear log incompat feature bits when the log is idleDarrick J. Wong
When there are no ongoing transactions and the log contents have been checkpointed back into the filesystem, the log performs 'covering', which is to say that it log a dummy transaction to record the fact that the tail has caught up with the head. This is a good time to clear log incompat feature flags, because they are flags that are temporarily set to limit the range of kernels that can replay a dirty log. Since it's possible that some other higher level thread is about to start logging items protected by a log incompat flag, we create a rwsem so that upper level threads can coordinate this with the log. It would probably be more performant to use a percpu rwsem, but the ability to /try/ taking the write lock during covering is critical, and percpu rwsems do not provide that. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: allow setting and clearing of log incompat feature flagsDarrick J. Wong
Log incompat feature flags in the superblock exist for one purpose: to protect the contents of a dirty log from replay on a kernel that isn't prepared to handle those dirty contents. This means that they can be cleared if (a) we know the log is clean and (b) we know that there aren't any other threads in the system that might be setting or relying upon a log incompat flag. Therefore, clear the log incompat flags when we've finished recovering the log, when we're unmounting cleanly, remounting read-only, or freezing; and provide a function so that subsequent patches can start using this. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: support two inodes in the defer capture structureDarrick J. Wong
Make it so that xfs_defer_ops_capture_and_commit can capture two inodes. This will be needed by the atomic extent swap log item so that it can recover an operation involving two inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25vfs: introduce new file range exchange ioctlDarrick J. Wong
Introduce a new ioctl to handle swapping ranges of bytes between files. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: xfs_bmap_finish_one should map unwritten extents properlyexpand-bmap-intent-usage_2021-03-25Darrick J. Wong
The deferred bmap work state and the log item can transmit unwritten state, so the XFS_BMAP_MAP handler must map in extents with that unwritten state. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: support deferred bmap updates on the attr forkDarrick J. Wong
The deferred bmap update log item has always supported the attr fork, so plumb this in so that higher layers can access this. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: support recovering bmap intent items targetting realtime extentsrealtime-bmap-intents_2021-03-25Darrick J. Wong
Now that we have reflink on the realtime device, bmap intent items have to support remapping extents on the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add a realtime flag to the bmap update log redo itemsDarrick J. Wong
Extend the bmap update (BUI) log items with a new realtime flag that indicates that the updates apply against a realtime file's data fork. We'll wire up the actual code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create a helper to decide if a file mapping targets the rt volumeDarrick J. Wong
Create a helper so that we can stop open-coding this decision everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: fix xfs_bunmapi to allow unmapping of partial rt extentsDarrick J. Wong
When XFS_BMAPI_REMAP is passed to bunmapi, that means that we want to remove part of a block mapping without touching the allocator. For realtime files with rtextsize > 1, that also means that we should skip all the code that changes a partial remove request into an unwritten extent conversion. REMAP callers are of course required to unmap full rt extents, which implies log intents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair summary countersrepair-hard-problems_2021-03-25Darrick J. Wong
Use the same summary counter calculation infrastructure to generate new values for the in-core summary counters. The difference between the scrubber and the repairer is that the repairer will freeze the fs during setup, which means that the values should match exactly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: allow rmap repair to grab unlinked inodesDarrick J. Wong
Permit rmapbt repair to grab unlinked inodes so that we can avoid erroring out on those inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair the rmapbtDarrick J. Wong
Rebuild the reverse mapping btree from all primary metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: ask to freeze if fscounters scrubber failsDarrick J. Wong
If the fscounters scrubber notices incorrect summary counters, it's entirely possible that scrub is simply racing with other threads that are updating the incore counters. Therefore, if there's a mismatch and the fs isn't frozen, ask userspace if we can freeze the fs to eliminate the race condition. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: introduce online scrub freezeDarrick J. Wong
Introduce a new 'online scrub freeze' that we can use to lock out all filesystem modifications and background activity so that we can perform global scans in order to rebuild metadata. This introduces a new IFLAG to the scrub ioctl to indicate that userspace is willing to allow a freeze. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: update health status if we get a clean bill of healthindirect-health-reporting_2021-03-25Darrick J. Wong
If scrub finds that everything is ok with the filesystem, we need a way to tell the health tracking that it can let go of indirect health flags, since indirect flags only mean that at some point in the past we lost some context. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remember sick inodes that get inactivatedDarrick J. Wong
If an unhealthy inode gets inactivated, remember this fact in the per-fs health summary. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add secondary and indirect classes to the health tracking systemDarrick J. Wong
Establish two more classes of health tracking bits: * Indirect problems, which suggest problems in other health domains that we weren't able to preserve. * Secondary problems, which track state that's related to primary evidence of health problems; and The first class we'll use in an upcoming patch to record in the AG health status the fact that we ran out of memory and had to inactivate an inode with defective metadata. The second class we use to indicate that repair knows that an inode is bad and we need to fix it later. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: push inactive inodes if the quotacheck scrubber hits themdeferred-inactivation_2021-03-25Darrick J. Wong
If the online quotacheck code encounters an inode that is awaiting inactivation, it won't be able to iget the inode. Push inode gc in the respective AG to try to clear the inode, and try again. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: don't run speculative preallocation gc when fs is frozenDarrick J. Wong
Now that we have the infrastructure to switch background workers on and off at will, fix the block gc worker code so that we don't actually run the worker when the filesystem is frozen, same as we do for deferred inactivation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add inode scan limits to the eofblocks ioctlDarrick J. Wong
Allow callers of the userspace eofblocks ioctl to set a limit on the number of inodes to scan, and then plumb that through the interface. This removes a minor wart from the internal inode walk interface. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create a polled function to force inode inactivationDarrick J. Wong
Create a polled version of xfs_inactive_force so that we can force inactivation while holding a lock (usually the umount lock) without tripping over the softlockup timer. This is for callers that hold vfs locks while calling inactivation, which is currently unmount, iunlink processing during mount, and rw->ro remount. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: parallelize inode inactivationDarrick J. Wong
Split the inode inactivation work into per-AG work items so that we can take advantage of parallelization. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: force inode garbage collection before fallocate when space is lowDarrick J. Wong
Generally speaking, when a user calls fallocate, they're looking to preallocate space in a file in the largest contiguous chunks possible. If free space is low, it's possible that the free space will look unnecessarily fragmented because there are unlinked inodes that are holding on to space that we could allocate. When this happens, fallocate makes suboptimal allocation decisions for the sake of deleted files, which doesn't make much sense, so scan the filesystem for dead items to delete to try to avoid this. Note that there are a handful of fstests that fill a filesystem, delete just enough files to allow a single large allocation, and check that fallocate actually gets the allocation. These tests regress because the test runs fallocate before the inode gc has a chance to run, so add this behavior to maintain as much of the old behavior as possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: force inode inactivation and retry fs writes when there isn't spaceDarrick J. Wong
Any time we try to modify a file's contents and it fails due to ENOSPC or EDQUOT, force inode inactivation work to try to free space. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: expose sysfs knob to control inode inactivation delayDarrick J. Wong
Allow administrators to control the length that we defer inode inactivation. By default we'll set the delay to 2 seconds, as an arbitrary choice between allowing for some batching of a deltree operation, and not letting too many inodes pile up in memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: deferred inode inactivationDarrick J. Wong
Instead of calling xfs_inactive directly from xfs_fs_destroy_inode, defer the inactivation phase to a separate workqueue. With this we avoid blocking memory reclaim on filesystem metadata updates that are necessary to free an in-core inode, such as post-eof block freeing, COW staging extent freeing, and truncating and freeing unlinked inodes. Now that work is deferred to a workqueue where we can do the freeing in batches. We introduce two new inode flags -- NEEDS_INACTIVE and INACTIVATING. The first flag helps our worker find inodes needing inactivation, and the second flag marks inodes that are in the process of being inactivated. A concurrent xfs_iget on the inode can still resurrect the inode by clearing NEEDS_INACTIVE (or bailing if INACTIVATING is set). Unfortunately, deferring the inactivation has one huge downside -- eventual consistency. Since all the freeing is deferred to a worker thread, one can rm a file but the space doesn't come back immediately. This can cause some odd side effects with quota accounting and statfs, so we also force inactivation scans in order to maintain the existing behaviors, at least outwardly. For this patch we'll set the delay to zero to mimic the old timing as much as possible; in the next patch we'll play with different delay settings. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: refactor the inode recycling codeDarrick J. Wong
Hoist the code in xfs_iget_cache_hit that restores the VFS inode state to an xfs_inode that was previously vfs-destroyed. The next patch will add a new set of state flags, so we need the helper to avoid duplication. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: refactor per-AG inode tagging functionsDarrick J. Wong
In preparation for adding another incore inode tree tag, refactor the code that sets and clears tags from the per-AG inode tree and the tree of per-AG structures, and remove the open-coded versions used by the blockgc code. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_agDarrick J. Wong
Merge these two inode walk loops together, since they're pretty similar now. Get rid of XFS_ICI_NO_TAG since nobody uses it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: pass struct xfs_eofblocks to the inode scan callbackDarrick J. Wong
Pass a pointer to the actual eofb structure around the inode scanner functions instead of a void pointer, now that none of the functions is used as a callback. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remove indirect calls from xfs_inode_walk{,_ag}Darrick J. Wong
It turns out that there is a 1:1 mapping between the execute and tag parameters that are passed to xfs_inode_walk_ag: xfs_blockgc_scan_inode <=> XFS_ICI_BLOCKGC_TAG Since the only user of the inode walk function is the blockgc code, we don't need the tag parameter or the execute function pointer. The inode deferred inactivation changes in the next series will add a second tag:function pair, so we'll leave the tag parameter for now. For the price of a forward static declaration, we can eliminate the indirect function call. This likely has a negligible impact on performance (since the execute function runs transactions), but it also simplifies the function signature. Radix tree tags are unsigned ints, so fix the type usage for all those tags. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remove iter_flags parameter from xfs_inode_walk_*Darrick J. Wong
The sole iter_flags is XFS_INODE_WALK_INEW_WAIT, and there are no users. Remove the flag, and the parameter, and all the code that used it. Since there are no longer any external callers of xfs_inode_walk, make it static. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: use s_inodes in xfs_qm_dqrele_all_inodesChristoph Hellwig
Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill, given that function simplify wants to iterate all live inodes known to the VFS. Just iterate over the s_inodes list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: move the check for post-EOF mappings into xfs_can_free_eofblocksDarrick J. Wong
Fix the weird split of responsibilities between xfs_can_free_eofblocks and xfs_free_eofblocks by moving the chunk of code that looks for any actual post-EOF space mappings from the second function into the first. This clears the way for deferred inode inactivation to be able to decide if an inode needs inactivation work before committing the released inode to the inactivation code paths (vs. marking it for reclaim). Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: move the xfs_can_free_eofblocks call under the IOLOCKDarrick J. Wong
In xfs_inode_free_eofblocks, move the xfs_can_free_eofblocks call further down in the function to the point where we have taken the IOLOCK. This is preparation for the next patch, where we will need that lock (or equivalent) so that we can check if there are any post-eof blocks to clean out. Signed-off-by: Darrick J. Wong <djwong@kernel.org>