summaryrefslogtreecommitdiff
path: root/fs/xfs/scrub/trace.h
AgeCommit message (Collapse)Author
2022-10-14xfs: scrub each rtgroup's portion of the rtbitmap separatelyDarrick J. Wong
Create a new scrub type code so that userspace can scrub each rtgroup's portion of the rtbitmap file separately. This reduces the long tail latency that results from scanning the entire bitmap all at once, and prepares us for future patchsets, wherein we'll need to be able to lock a specific rtgroup so that we can rebuild that rtgroup's part of the rtbitmap contents from the rtgroup's rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: scrub the realtime group superblockDarrick J. Wong
Enable scrubbing of realtime group superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: use accessor functions for summary info wordsrefactor-rtbitmap-macros_2022-10-14Darrick J. Wong
Create get and set functions for rtsummary words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk summary info words so that the compiler can perform proper typechecking. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: convert rt extent numbers to xfs_rtxnum_tclean-up-realtime-units_2022-10-14Darrick J. Wong
Further disambiguate the xfs_rtblock_t uses by creating a new type, xfs_rtxnum_t, to store the position of an extent within the realtime section, in units of rtextents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: convert rt bitmap extent lengths to xfs_rtbxlen_tDarrick J. Wong
XFS uses xfs_rtblock_t for many different uses, which makes it much more difficult to perform a unit analysis on the codebase. One of these (ab)uses is when we need to store the length of a free space extent as stored in the realtime bitmap. Because there can be up to 2^64 realtime extents in a filesystem, we need a new type that is larger than xfs_rtxlen_t for callers that are querying the bitmap directly. This means scrub and growfs. Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx lengths. 'b' stands for 'bitmap' or 'big'; reader's choice. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair AGI unlinked inode bucket listsrepair-iunlink_2022-10-14Darrick J. Wong
Teach the AGI repair code to rebuild the unlinked buckets and lists. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: online repair of symbolic linksrepair-symlink_2022-10-14Darrick J. Wong
If a symbolic link target looks bad, try to sift through the rubble to find as much of the target buffer that we can, and stage a new target (short or remote format as needed) in a temporary file and use the atomic extent swapping mechanism to commit the results. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: ensure dentry consistency when the orphanage adopts a filerepair-orphanage_2022-10-14Darrick J. Wong
When the orphanage adopts a file, that file becomes a child of the orphanage. The dentry cache may have entries for the orphanage directory and the name we've chosen, so (1) make sure we abort if the dcache has a positive entry because something's not right; and (2) invalidate and purge negative dentries if the adoption goes through. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: move files to orphanage instead of letting nlinks drop to zeroDarrick J. Wong
If we encounter an inode with a nonzero link count but zero observed links, move it to the orphanage. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: move orphan files to the orphanageDarrick J. Wong
If we can't find a parent for a file, move it to the orphanage. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: ask the dentry cache if it knows the parent of a directoryrepair-dirs_2022-10-14Darrick J. Wong
It's possible that the dentry cache can tell us the parent of a directory. Therefore, when repairing directory dot dot entries, query the dcache as a last resort before scanning the entire filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: online repair of parent pointersDarrick J. Wong
Teach the online repair code to fix directory '..' entries (aka directory parent pointers). Since this requires us to know how to scan every dirent in every directory on the filesystem, we can reuse the parent scanner components to validate (or find!) the correct parent entry when rebuilding directories too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: online repair of directoriesDarrick J. Wong
If a directory looks like it's in bad shape, try to sift through the rubble to find whatever directory entries we can, scan the directory tree for the parent (if needed), stage the new directory contents in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. As a side effect of this patch, directory inactivation will be able to purge any leftover dir blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: scrub should set preen if attr leaf has holesDarrick J. Wong
If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-14xfs: repair extended attributesDarrick J. Wong
If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, stage a new attribute structure in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: teach the tempfile to support atomic extent swappingDarrick J. Wong
Create some new routines to exchange the contents of a temporary file created to stage a repair with another ondisk file. This will be used by the realtime summary repair function to commit atomically the new rtsummary data, which will be staged in the tempfile. The rest of XFS coordinates access to the realtime metadata inodes solely through the ILOCK. For repair to hold its exclusive access to the realtime summary file, it has to allocate a single large transaction and roll it repeatedly throughout the repair while holding the ILOCK. In turn, this means that for now there's only a partial swapext implementation for the temporary file, because we can only work within an existing transaction. Hence the only tempswap functions needed here are to estimate the resource requirements of swapext between, reserve more space/quota to an existing transaction, and kick off the actual swap. The rest will be added in a later patch in preparation for repairing xattrs and directories. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: support preallocating and copying content into temporary filesDarrick J. Wong
Create the routines we need to preallocate space in a temporary ondisk file and then copy the contents of an xfile into the tempfile. The upcoming rtsummary repair feature will construct the contents of a realtime summary file in memory, after which it will want to copy all that into the ondisk temporary file before atomically committing the new rtsummary contents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: add the ability to reap entire inode forksrepair-tempfiles_2022-10-14Darrick J. Wong
In preparation for supporting repair of indexed file-based metadata (such as realtime bitmaps, directories, and extended attribute data), add a function to reap the old blocks after a metadata repair finishes. IOWs, this is an elaborate bunmapi call that deals with crosslinked blocks by unmapping them without freeing them, and also scans for incore buffers to invalidate. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: create temporary files and directories for online repairDarrick J. Wong
Teach the online repair code how to create temporary files or directories. These temporary files can be used to stage reconstructed information until we're ready to perform an atomic extent swap to commit the new metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: hook live rmap operations during a repair operationrepair-rmap-btree_2022-10-14Darrick J. Wong
Hook the regular rmap code when an rmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair the rmapbtDarrick J. Wong
Rebuild the reverse mapping btree from all primary metadata. This first patch establishes the bare mechanics of finding records and putting together a new ondisk tree; more complex pieces are needed to make it work properly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: connect in-memory btrees to xfilesin-memory-btrees_2022-10-14Darrick J. Wong
Add to our stubbed-out in-memory btrees the ability to connect them with an actual in-memory backing file (aka xfiles) and the necessary pieces to track free space in the xfile and flush dirty xfbtree buffers on demand, which we'll need for online repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair summary countersrepair-fscounters_2022-10-14Darrick J. Wong
Use the same summary counter calculation infrastructure to generate new values for the in-core summary counters. The difference between the scrubber and the repairer is that the repairer will freeze the fs during setup, which means that the values should match exactly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: remove XCHK_REAPING_DISABLED from scrubDarrick J. Wong
Nobody uses this code anymore, so get rid of it. It was racy with regards to freezes and remounts anyway. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: stabilize fs summary counters for online fsckDarrick J. Wong
If the fscounters scrubber notices incorrect summary counters, it's entirely possible that scrub is simply racing with other threads that are updating the incore counters. There isn't a good way to stabilize percpu counters or set ourselves up to observe live updates with hooks like we do for the quotacheck or nlinks scanners, so we instead choose to freeze the filesystem long enough to walk the incore per-AG structures. Past me thought that it was going to be commonplace to have to freeze the filesystem to perform some kind of repair and set up a whole separate infrastructure to freeze the filesystem in such a way that userspace could not unfreeze while we were running. This involved adding a mutex and freeze_super/thaw_super functions and dealing with the fact that the VFS freeze/thaw functions can free the VFS superblock references on return. This was all very overwrought, since fscounters turned out to be the only user of scrub freezes, and it doesn't require the log to quiesce, only the incore superblock counters. We prevent other threads from changing the freeze level by adding a new SB_FREEZE_EXCLUSIVE level. The end result is that fscounters should be much more efficient. When we're checking a busy system and we can't stabilize the counters, the custom freeze will do less work, which should result in less downtime. Repair should be similarly speedy, but that's in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: update health status if we get a clean bill of healthindirect-health-reporting_2022-10-14Darrick J. Wong
If scrub finds that everything is ok with the filesystem, we need a way to tell the health tracking that it can let go of indirect health flags, since indirect flags only mean that at some point in the past we lost some context. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: teach repair to fix file nlinksscrub-nlinks_2022-10-14Darrick J. Wong
Fix the nlinks now too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: track file link count updates during live nlinks fsckDarrick J. Wong
Create the necessary hooks in the file create/unlink/rename code so that our live nlink scrub code can stay up to date with the rest of the filesystem. This will be the means to keep our shadow link count information up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: teach scrub to check file nlinksDarrick J. Wong
Create the necessary scrub code to walk the filesystem's directory tree so that we can compute file link counts. Similar to quotacheck, we create an incore shadow array of link count information and then we walk the filesystem a second time to compare the link counts. We need live updates to keep the information up to date during the lengthy scan, so this scrubber remains disabled until the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair dquots based on live quotacheck resultsrepair-quotacheck_2022-10-14Darrick J. Wong
Use the shadow quota counters that live quotacheck creates to reset the incore dquot counters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: track quota updates during live quotacheckDarrick J. Wong
Create a shadow dqtrx system in the quotacheck code that hooks the regular dquot counter update code. This will be the means to keep our copy of the dquot counters up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: implement live quotacheck inode scanDarrick J. Wong
Create a new trio of scrub functions to check quota counters. While the dquots themselves are filesystem metadata and should be checked early, the dquot counter values are computed from other metadata and are therefore summary counters. We don't plug these into the scrub dispatch just yet, because we still need to be able to watch quota updates while doing our scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: implement live inode scan for scrubDarrick J. Wong
This patch implements a live file scanner for online fsck functions that require the ability to walk a filesystem to gather metadata records and stay informed about metadata changes to files that have already been visited. The iscan structure consists of two inode number cursors: one to track which inode we want to visit next, and a second one to track which inodes have already been visited. This second cursor is key to capturing live updates to files previously scanned while the main thread continues scanning -- any inode greater than this value hasn't been scanned and can go on its way; any other update must be incorporated into the collected data. It is critical for the scanning thraad to hold exclusive access on the inode until after marking the inode visited. This new code is split out as a separate patch from its initial user for the sake of enabling the author to move patches around his tree with ease. The intended usage model for this code is roughly: xchk_iscan_start(iscan, 0, 0); while ((error = xchk_iscan_iter(sc, iscan, &ip)) == 1) { xfs_ilock(ip, ...); /* capture inode metadata */ xchk_iscan_mark_visited(iscan, ip); xfs_iunlock(ip, ...); xfs_irele(ip); } xchk_iscan_stop(iscan); if (error) return error; Hook functions for live updates can then do: if (xchk_iscan_want_live_update(...)) /* update the captured inode metadata */ Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair quotasrepair-quota_2022-10-14Darrick J. Wong
Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair problems in CoW forksrepair-file-mappings_2022-10-14Darrick J. Wong
Try to repair errors that we see in file CoW forks so that we don't do stupid things like remap garbage into a file. There's not a lot we can do with the COW fork -- the ondisk metadata record only that the COW staging extents are owned by the refcount btree, which effectively means that we can't reconstruct this incore structure from scratch. Actually, this is even worse -- we can't touch written extents, because those map space that are actively under writeback, and there's not much to do with delalloc reservations. Hence we can only detect crosslinked unwritten extents and fix them by punching out the problematic parts and replacing them with delalloc extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair inode fork block mapping data structuresDarrick J. Wong
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair obviously broken inode modesrepair-inodes_2022-10-14Darrick J. Wong
Building off the rmap scanner that we added in the previous patch, we can now find block 0 and try to use the information contained inside of it to guess the mode of an inode if it's totally improper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: zap broken inode forksDarrick J. Wong
Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair inode recordsDarrick J. Wong
If an inode is so badly damaged that it cannot be loaded into the cache, fix the ondisk metadata and try again. If there /is/ a cached inode, fix any problems and apply any optimizations that can be solved incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair refcount btreesrepair-ag-btrees_2022-10-14Darrick J. Wong
Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair inode btreesDarrick J. Wong
Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: repair free space btreesDarrick J. Wong
Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: allow userspace to rebuild metadata structuresrepair-force-rebuild_2022-10-14Darrick J. Wong
Add a new (superuser-only) flag to the online metadata repair ioctl to force it to rebuild structures, even if they're not broken. We will use this to move metadata structures out of the way during a free space defragmentation operation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: implement online scrubbing of rtsummary infoscrub-rtsummary_2022-10-14Darrick J. Wong
Finish the realtime summary scrubber by adding the functions we need to compute a fresh copy of the rtsummary info and comparing it to the copy on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: speed up xfarray sort by sorting xfile page contents directlyDarrick J. Wong
If all the records in an xfarray subset live within the same memory page, we can short-circuit even more quicksort recursion by mapping that page into the local CPU and using the kernel's heapsort function to sort the subset. On the author's computer, this reduces the runtime by another 15% on a 500,000 element array. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: teach xfile to pass back direct-map pages to callerDarrick J. Wong
Certain xfile array operations (such as sorting) can be sped up quite a bit by allowing xfile users to grab a page to bulk-read the records contained within it. Create helper methods to facilitate this. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: convert xfarray insertion sort to heapsort using scratchpad memoryDarrick J. Wong
In the previous patch, we created a very basic quicksort implementation for xfile arrays. While the use of an alternate sorting algorithm to avoid quicksort recursion on very small subsets reduces the runtime modestly, we could do better than a load and store-heavy insertion sort, particularly since each load and store requires a page mapping lookup in the xfile. For a small increase in kernel memory requirements, we could instead bulk load the xfarray records into memory, use the kernel's existing heapsort implementation to sort the records, and bulk store the memory buffer back into the xfile. On the author's computer, this reduces the runtime by about 5% on a 500,000 element array. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: enable sorting of xfile-backed arraysDarrick J. Wong
The btree bulk loading code requires that records be provided in the correct record sort order for the given btree type. In general, repair code cannot be required to collect records in order, and it is not feasible to insert new records in the middle of an array to maintain sort order. Implement a sorting algorithm so that we can sort the records just prior to bulk loading. In principle, an xfarray could consume many gigabytes of memory and its backing pages can be sent out to disk at any time. This means that we cannot map the entire array into memory at once, so we must find a way to divide the work into smaller portions (e.g. a page) that /can/ be mapped into memory. Quicksort seems like a reasonable fit for this purpose, since it uses a divide and conquer strategy to keep its average runtime logarithmic. The solution presented here is a port of the glibc implementation, which itself is derived from the median-of-three and tail call recursion strategies outlined by Sedgwick. Subsequent patches will optimize the implementation further by utilizing the kernel's heapsort on directly-mapped memory whenever possible, and improving the quicksort pivot selection algorithm to try to avoid O(n^2) collapses. Note: The sorting functionality gets its own patch because the basic big array mechanisms were plenty for a single code patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: create a big array data structureDarrick J. Wong
Create a simple 'big array' data structure for storage of fixed-size metadata records that will be used to reconstruct a btree index. For repair operations, the most important operations are append, iterate, and sort. Earlier implementations of the big array used linked lists and suffered from severe problems -- pinning all records in kernel memory was not a good idea and frequently lead to OOM situations; random access was very inefficient; and record overhead for the lists was unacceptably high at 40-60%. Therefore, the big memory array relies on the 'xfile' abstraction, which creates a memfd file and stores the records in page cache pages. Since the memfd is created in tmpfs, the memory pages can be pushed out to disk if necessary and we have a built-in usage limit of 50% of physical memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14xfs: implement block reservation accounting for btrees we're stagingDarrick J. Wong
Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>