summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
2021-03-25xfs: don't run speculative preallocation gc when fs is frozenDarrick J. Wong
Now that we have the infrastructure to switch background workers on and off at will, fix the block gc worker code so that we don't actually run the worker when the filesystem is frozen, same as we do for deferred inactivation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add inode scan limits to the eofblocks ioctlDarrick J. Wong
Allow callers of the userspace eofblocks ioctl to set a limit on the number of inodes to scan, and then plumb that through the interface. This removes a minor wart from the internal inode walk interface. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create a polled function to force inode inactivationDarrick J. Wong
Create a polled version of xfs_inactive_force so that we can force inactivation while holding a lock (usually the umount lock) without tripping over the softlockup timer. This is for callers that hold vfs locks while calling inactivation, which is currently unmount, iunlink processing during mount, and rw->ro remount. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: parallelize inode inactivationDarrick J. Wong
Split the inode inactivation work into per-AG work items so that we can take advantage of parallelization. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: force inode garbage collection before fallocate when space is lowDarrick J. Wong
Generally speaking, when a user calls fallocate, they're looking to preallocate space in a file in the largest contiguous chunks possible. If free space is low, it's possible that the free space will look unnecessarily fragmented because there are unlinked inodes that are holding on to space that we could allocate. When this happens, fallocate makes suboptimal allocation decisions for the sake of deleted files, which doesn't make much sense, so scan the filesystem for dead items to delete to try to avoid this. Note that there are a handful of fstests that fill a filesystem, delete just enough files to allow a single large allocation, and check that fallocate actually gets the allocation. These tests regress because the test runs fallocate before the inode gc has a chance to run, so add this behavior to maintain as much of the old behavior as possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: force inode inactivation and retry fs writes when there isn't spaceDarrick J. Wong
Any time we try to modify a file's contents and it fails due to ENOSPC or EDQUOT, force inode inactivation work to try to free space. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: expose sysfs knob to control inode inactivation delayDarrick J. Wong
Allow administrators to control the length that we defer inode inactivation. By default we'll set the delay to 2 seconds, as an arbitrary choice between allowing for some batching of a deltree operation, and not letting too many inodes pile up in memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: deferred inode inactivationDarrick J. Wong
Instead of calling xfs_inactive directly from xfs_fs_destroy_inode, defer the inactivation phase to a separate workqueue. With this we avoid blocking memory reclaim on filesystem metadata updates that are necessary to free an in-core inode, such as post-eof block freeing, COW staging extent freeing, and truncating and freeing unlinked inodes. Now that work is deferred to a workqueue where we can do the freeing in batches. We introduce two new inode flags -- NEEDS_INACTIVE and INACTIVATING. The first flag helps our worker find inodes needing inactivation, and the second flag marks inodes that are in the process of being inactivated. A concurrent xfs_iget on the inode can still resurrect the inode by clearing NEEDS_INACTIVE (or bailing if INACTIVATING is set). Unfortunately, deferring the inactivation has one huge downside -- eventual consistency. Since all the freeing is deferred to a worker thread, one can rm a file but the space doesn't come back immediately. This can cause some odd side effects with quota accounting and statfs, so we also force inactivation scans in order to maintain the existing behaviors, at least outwardly. For this patch we'll set the delay to zero to mimic the old timing as much as possible; in the next patch we'll play with different delay settings. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: refactor the inode recycling codeDarrick J. Wong
Hoist the code in xfs_iget_cache_hit that restores the VFS inode state to an xfs_inode that was previously vfs-destroyed. The next patch will add a new set of state flags, so we need the helper to avoid duplication. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: refactor per-AG inode tagging functionsDarrick J. Wong
In preparation for adding another incore inode tree tag, refactor the code that sets and clears tags from the per-AG inode tree and the tree of per-AG structures, and remove the open-coded versions used by the blockgc code. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_agDarrick J. Wong
Merge these two inode walk loops together, since they're pretty similar now. Get rid of XFS_ICI_NO_TAG since nobody uses it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: pass struct xfs_eofblocks to the inode scan callbackDarrick J. Wong
Pass a pointer to the actual eofb structure around the inode scanner functions instead of a void pointer, now that none of the functions is used as a callback. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remove indirect calls from xfs_inode_walk{,_ag}Darrick J. Wong
It turns out that there is a 1:1 mapping between the execute and tag parameters that are passed to xfs_inode_walk_ag: xfs_blockgc_scan_inode <=> XFS_ICI_BLOCKGC_TAG Since the only user of the inode walk function is the blockgc code, we don't need the tag parameter or the execute function pointer. The inode deferred inactivation changes in the next series will add a second tag:function pair, so we'll leave the tag parameter for now. For the price of a forward static declaration, we can eliminate the indirect function call. This likely has a negligible impact on performance (since the execute function runs transactions), but it also simplifies the function signature. Radix tree tags are unsigned ints, so fix the type usage for all those tags. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remove iter_flags parameter from xfs_inode_walk_*Darrick J. Wong
The sole iter_flags is XFS_INODE_WALK_INEW_WAIT, and there are no users. Remove the flag, and the parameter, and all the code that used it. Since there are no longer any external callers of xfs_inode_walk, make it static. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: use s_inodes in xfs_qm_dqrele_all_inodesChristoph Hellwig
Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill, given that function simplify wants to iterate all live inodes known to the VFS. Just iterate over the s_inodes list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: move the check for post-EOF mappings into xfs_can_free_eofblocksDarrick J. Wong
Fix the weird split of responsibilities between xfs_can_free_eofblocks and xfs_free_eofblocks by moving the chunk of code that looks for any actual post-EOF space mappings from the second function into the first. This clears the way for deferred inode inactivation to be able to decide if an inode needs inactivation work before committing the released inode to the inactivation code paths (vs. marking it for reclaim). Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: move the xfs_can_free_eofblocks call under the IOLOCKDarrick J. Wong
In xfs_inode_free_eofblocks, move the xfs_can_free_eofblocks call further down in the function to the point where we have taken the IOLOCK. This is preparation for the next patch, where we will need that lock (or equivalent) so that we can check if there are any post-eof blocks to clean out. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: rename the blockgc workqueueDarrick J. Wong
Since we're about to start using the blockgc workqueue to dispose of inactivated inodes, strip the "block" prefix from the name; now it's merely the general garbage collection (gc) workqueue. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-03-25xfs: prevent metadata files from being inactivatedDarrick J. Wong
Files containing metadata (quota records, rt bitmap and summary info) are fully managed by the filesystem, which means that all resource cleanup must be explicit, not automatic. This means that they should never be subjected automatic to post-eof truncation, nor should they be freed automatically even if the link count drops to zero. In other words, xfs_inactive() should leave these files alone. Add the necessary predicate functions to make this happen. This adds a second layer of prevention for the kinds of fs corruption that was fixed by commit f4c32e87de7d. If we ever decide to support removing metadata files, we should make all those metadata updates explicit. Rearrange the order of #includes to fix compiler errors, since xfs_mount.h is supposed to be included before xfs_inode.h Followup-to: f4c32e87de7d ("xfs: fix realtime bitmap/summary file truncation when growing rt volume") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-03-25xfs: report XFS_CORRUPT_ON errors to the health systemcorruption-health-reports_2021-03-25Darrick J. Wong
Whenever we encounter XFS_CORRUPT_ON failures, we should report that to the health monitoring system for later reporting. I started with this and massaged everything until it built: @@ expression mp, test; @@ - if (XFS_CORRUPT_ON(mp, test)) return -EFSCORRUPTED; + if (XFS_CORRUPT_ON(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } @@ expression mp, test; identifier label, error; @@ - if (XFS_CORRUPT_ON(mp, test)) { error = -EFSCORRUPTED; goto label; } + if (XFS_CORRUPT_ON(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; } Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report realtime metadata corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt realtime metadat blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report quota block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt quota blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report inode corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt inode records, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report symlink block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt symbolic link blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report dir/attr block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt directory or extended attribute blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report btree block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report block map corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter a corrupt block mapping, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report ag header corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter a corrupt AG header, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: separate the marking of sick and checked metadataDarrick J. Wong
Split the setting of the sick and checked masks into separate functions as part of preparing to add the ability for regular runtime fs code (i.e. not scrub) to mark metadata structures sick when corruptions are found. Improve the documentation of libxfs' requirements for helper behavior. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair dquots based on live quotacheck resultsrepair-quota_2021-03-25Darrick J. Wong
Use the shadow quota counters that live quotacheck creates to reset the incore dquot counters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: track quota updates during live quotacheckDarrick J. Wong
Create a shadow dqtrx system in the quotacheck code that hooks the regular dquot counter update code. This will be the means to keep our copy of the dquot counters up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: implement live quotacheck inode scanDarrick J. Wong
Create a new trio of scrub functions to check quota counters. While the dquots themselves are filesystem metadata and should be checked early, the dquot counter values are computed from other metadata and are therefore summary counters. We don't plug these into the scrub dispatch just yet, because we still need to be able to watch quota updates while doing our scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: report the health of quota countsDarrick J. Wong
Report the health of quota counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair quotasDarrick J. Wong
Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create a new inode fork block unmap helperDarrick J. Wong
Create a new helper to unmap blocks from an inode's fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair the inode core and forks of a metadata inodeDarrick J. Wong
Add a helper function to repair the core and forks of a metadata inode, so that we can get move onto the task of repairing higher level metadata that lives in an inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair damaged symlinksrepair-inodes_2021-03-25Darrick J. Wong
Repair inconsistent symbolic link data. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair inode block mapsDarrick J. Wong
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair obviously broken inode modesDarrick J. Wong
Building off the rmap scanner that we added in the previous patch, we can now find block 0 and try to use the information contained inside of it to guess the mode of an inode if it's totally improper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: zap broken inode forksDarrick J. Wong
Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair inode recordsDarrick J. Wong
Try to reinitialize corrupt inodes, or clear the reflink flag if it's not needed. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair refcount btreesrepair-ag-btrees_2021-03-25Darrick J. Wong
Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair inode btreesDarrick J. Wong
Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: repair free space btreesDarrick J. Wong
Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: always rescan allegedly healthy per-ag metadata after repairDarrick J. Wong
After an online repair function runs for a per-AG metadata structure, sc->sick_mask is supposed to reflect the per-AG metadata that the repair function fixed. Our next move is to re-check the metadata to assess the completeness of our repair, so we don't want the rebuilt structure to be excluded from the rescan just because the health system previously logged a problem with the data structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: implement online scrubbing of rtsummary infoscrub-rtsummary_2021-03-25Darrick J. Wong
Finish the realtime summary scrubber by adding the functions we need to compute a fresh copy of the rtsummary info and comparing it to the copy on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: move the realtime summary file scrubber to a separate source fileDarrick J. Wong
Move the realtime summary file checking code to a separate file in preparation to actually implement it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: get our own reference to inodes that we want to scrubDarrick J. Wong
When we want to scrub a file, get our own reference to the inode unconditionally. This will make disposal rules simpler in the long run. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: create a big array data structureDarrick J. Wong
Create a simple 'big array' data structure for storage of fixed-size metadata records that will be used to reconstruct a btree index. For repair operations, the most important operations are append, iterate, and sort. Earlier implementations of the big array used linked lists and suffered from severe problems -- pinning all records in kernel memory was not a good idea and frequently lead to OOM situations; random access was very inefficient; and record overhead for the lists was unacceptably high at 40-60%. Therefore, the big memory array relies on the 'xfile' abstraction, which creates a memfd file and stores the records in page cache pages. Since the memfd is created in tmpfs, the memory pages can be pushed out to disk if necessary and we have a built-in usage limit of 50% of physical memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: log EFIs for all btree blocks being used to stage a btreerepair-prep-for-bulk-loading_2021-03-25Darrick J. Wong
We need to log EFIs for every extent that we allocate for the purpose of staging a new btree so that if we fail then the blocks will be freed during log recovery. Add a function to relog the EFIs, so that repair can relog them all every time it creates a new btree block, which will help us to avoid pinning the log tail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>