summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-11-09xfs: update health status if we get a clean bill of healthindirect-health-reporting_2022-11-09Darrick J. Wong
If scrub finds that everything is ok with the filesystem, we need a way to tell the health tracking that it can let go of indirect health flags, since indirect flags only mean that at some point in the past we lost some context. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: remember sick inodes that get inactivatedDarrick J. Wong
If an unhealthy inode gets inactivated, remember this fact in the per-fs health summary. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: add secondary and indirect classes to the health tracking systemDarrick J. Wong
Establish two more classes of health tracking bits: * Indirect problems, which suggest problems in other health domains that we weren't able to preserve. * Secondary problems, which track state that's related to primary evidence of health problems; and The first class we'll use in an upcoming patch to record in the AG health status the fact that we ran out of memory and had to inactivate an inode with defective metadata. The second class we use to indicate that repair knows that an inode is bad and we need to fix it later. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report XFS_IS_CORRUPT errors to the health systemcorruption-health-reports_2022-11-09Darrick J. Wong
Whenever we encounter XFS_IS_CORRUPT failures, we should report that to the health monitoring system for later reporting. I started with this semantic patch and massaged everything until it built: @@ expression mp, test; @@ - if (XFS_IS_CORRUPT(mp, test)) return -EFSCORRUPTED; + if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } @@ expression mp, test; identifier label, error; @@ - if (XFS_IS_CORRUPT(mp, test)) { error = -EFSCORRUPTED; goto label; } + if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; } Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report realtime metadata corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt realtime metadat blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report quota block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt quota blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report inode corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt inode records, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report symlink block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt symbolic link blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report dir/attr block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt directory or extended attribute blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report btree block corruption errors to the health systemDarrick J. Wong
Whenever we encounter corrupt btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report block map corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter a corrupt block mapping, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report ag header corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter a corrupt AG header, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report fs corruption errors to the health tracking systemDarrick J. Wong
Whenever we encounter corrupt fs metadata, we should report that to the health monitoring system for later reporting. A convenient program for identifying places to insert xfs_*_mark_sick calls is as follows: #!/bin/bash # Detect missing calls to xfs_*_mark_sick filter=cat tty -s && filter=less git grep -B3 EFSCORRUPTED fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch] | awk ' BEGIN { ignore = 0; lineno = 0; delete lines; } { if ($0 == "--") { if (!ignore) { for (i = 0; i < lineno; i++) { print(lines[i]); } printf("--\n"); } delete lines; lineno = 0; ignore = 0; } else if ($0 ~ /mark_sick/) { ignore = 1; } else if ($0 ~ /if .fa/) { ignore = 1; } else if ($0 ~ /failaddr/) { ignore = 1; } else if ($0 ~ /_verifier_error/) { ignore = 1; } else if ($0 ~ /^ \* .*EFSCORRUPTED/) { ignore = 1; } else if ($0 ~ /== -EFSCORRUPTED/) { ignore = 1; } else if ($0 ~ /!= -EFSCORRUPTED/) { ignore = 1; } else { lines[lineno++] = $0; } } ' | $filter Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: separate the marking of sick and checked metadataDarrick J. Wong
Split the setting of the sick and checked masks into separate functions as part of preparing to add the ability for regular runtime fs code (i.e. not scrub) to mark metadata structures sick when corruptions are found. Improve the documentation of libxfs' requirements for helper behavior. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: teach repair to fix file nlinksscrub-nlinks_2022-11-09Darrick J. Wong
Fix the nlinks now too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: track file link count updates during live nlinks fsckDarrick J. Wong
Create the necessary hooks in the file create/unlink/rename code so that our live nlink scrub code can stay up to date with the rest of the filesystem. This will be the means to keep our shadow link count information up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: teach scrub to check file nlinksDarrick J. Wong
Create the necessary scrub code to walk the filesystem's directory tree so that we can compute file link counts. Similar to quotacheck, we create an incore shadow array of link count information and then we walk the filesystem a second time to compare the link counts. We need live updates to keep the information up to date during the lengthy scan, so this scrubber remains disabled until the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: streamline the directory iteration code for scrubDarrick J. Wong
Currently, online scrub reuses the xfs_readdir code to walk every entry in a directory. This isn't awesome for performance, since we end up cycling the directory ILOCK needlessly and coding around the particular quirks of the VFS dir_context interface. Create a streamlined version of readdir that keeps the ILOCK (since the walk function isn't going to copy stuff to userspace), skips a whole lot of directory walk cursor checks (since we start at 0 and walk to the end) and has a sane way to return error codes. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report health of inode link countsDarrick J. Wong
Report on the health of the inode link counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair dquots based on live quotacheck resultsrepair-quotacheck_2022-11-09Darrick J. Wong
Use the shadow quota counters that live quotacheck creates to reset the incore dquot counters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair cannot update the summary counters when logging quota flagsDarrick J. Wong
While running xfs/804 (quota repairs racing with fsstress), I observed a filesystem shutdown in the primary sb write verifier: run fstests xfs/804 at 2022-05-23 18:43:48 XFS (sda4): Mounting V5 Filesystem XFS (sda4): Ending clean mount XFS (sda4): Quotacheck needed: Please wait. XFS (sda4): Quotacheck: Done. XFS (sda4): EXPERIMENTAL online scrub feature in use. Use at your own risk! XFS (sda4): SB ifree sanity check failed 0xb5 > 0x80 XFS (sda4): Metadata corruption detected at xfs_sb_write_verify+0x5e/0x100 [xfs], xfs_sb block 0x0 XFS (sda4): Unmount and run xfs_repair The "SB ifree sanity check failed" message was a debugging printk that I added to the kernel; observe that 0xb5 - 0x80 = 53, which is less than one inode chunk. I traced this to the xfs_log_sb calls from the online quota repair code, which tries to clear the CHKD flags from the superblock to force a mount-time quotacheck if the repair fails. On a V5 filesystem, xfs_log_sb updates the ondisk sb summary counters with the current contents of the percpu counters. This is done without quiescing other writer threads, which means it could be racing with a thread that has updated icount and is about to update ifree. If the other write thread had incremented ifree before updating icount, the repair thread will write icount > ifree into the logged update. If the AIL writes the logged superblock back to disk before anyone else fixes this siutation, this will lead to a write verifier failure, which causes a filesystem shutdown. Resolve this problem by updating the quota flags and calling xfs_sb_to_disk directly, which does not touch the percpu counters. While we're at it, we can elide the entire update if the selected qflags aren't set. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: track quota updates during live quotacheckDarrick J. Wong
Create a shadow dqtrx system in the quotacheck code that hooks the regular dquot counter update code. This will be the means to keep our copy of the dquot counters up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: implement live quotacheck inode scanDarrick J. Wong
Create a new trio of scrub functions to check quota counters. While the dquots themselves are filesystem metadata and should be checked early, the dquot counter values are computed from other metadata and are therefore summary counters. We don't plug these into the scrub dispatch just yet, because we still need to be able to watch quota updates while doing our scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: report the health of quota countsDarrick J. Wong
Report the health of quota counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: allow blocking notifier chains with filesystem hooksscrub-iscan_2022-11-09Darrick J. Wong
Make it so that we can switch between notifier chain implementations for testing purposes. On the author's test system, calling an empty srcu notifier chain cost about 19ns per call, vs. 4ns for a blocking notifier chain. Hm. Might we actually want regular blocking notifiers? Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: allow scrub to hook metadata updates in other writersDarrick J. Wong
Certain types of filesystem metadata can only be checked by scanning every file in the entire filesystem. Specific examples of this include quota counts, file link counts, and reverse mappings of file extents. Directory and parent pointer reconstruction may also fall into this category. File scanning is much trickier than scanning AG metadata because we have to take inode locks in the same order as the rest of [VX]FS, we can't be holding buffer locks when we do that, and scanning the whole filesystem takes time. Earlier versions of the online repair patchset relied heavily on fsfreeze as a means to quiesce the filesystem so that we could take locks in the proper order without worrying about concurrent updates from other writers. Reviewers of those patches opined that freezing the entire fs to check and repair something was not sufficiently better than unmounting to run fsck offline. I don't agree with that 100%, but the message was clear: find a way to repair things that minimizes the quiet period where nobody can write to the filesystem. Generally, building btree indexes online can be split into two phases: a collection phase where we compute the records that will be put into the new btree; and a construction phase, where we construct the physical btree blocks and persist them. While it's simple to hold resource locks for the entirety of the two phases to ensure that the new index is consistent with the rest of the system, we don't need to hold resource locks during the collection phase if we have a means to receive live updates of other work going on elsewhere in the system. The goal of this patch, then, is to enable online fsck to learn about metadata updates going on in other threads while it constructs a shadow copy of the metadata records to verify or correct the real metadata. To minimize the overhead when online fsck isn't running, we use srcu notifiers because they prioritize fast access to the notifier call chain (particularly when the chain is empty) at a cost to configuring notifiers. Online fsck should be relatively infrequent, so this is acceptable. The intended usage model is fairly simple. Code that modifies a metadata structure of interest should declare a xfs_hook_chain structure in some well defined place, and call xfs_hook_call whenever an update happens. Online fsck code should define a struct notifier_block and use xfs_hook_add to attach the block to the chain, along with a function to be called. This function should synchronize with the fsck scanner to update whatever in-memory data the scanner is collecting. When finished, xfs_hook_del removes the notifier from the list and waits for them all to complete. On the author's computer, calling an empty srcu notifier chain was observed to have an overhead averaging ~40ns with a maximum of 60ns. Adding a no-op notifier function increased the average to ~58ns and 66ns. When the quotacheck live update notifier is attached, the average increases to ~322ns with a max of 372ns to update scrub's in-memory observation data, assuming no lock contention. With jump labels enabled, calls to empty srcu notifier chains are elided from the call sites when there are no hooks registered, which means that the overhead is 0.36ns when fsck is not running. For compilers that do not support jump labels (all major architectures do), the overhead of a no-op notifier call is less bad (on a many-cpu system) than the atomic counter ops, so we make the hook switch itself a nop. Note: This new code is also split out as a separate patch from its initial user so that the author can move patches around his tree with ease. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: implement live inode scan for scrubDarrick J. Wong
This patch implements a live file scanner for online fsck functions that require the ability to walk a filesystem to gather metadata records and stay informed about metadata changes to files that have already been visited. The iscan structure consists of two inode number cursors: one to track which inode we want to visit next, and a second one to track which inodes have already been visited. This second cursor is key to capturing live updates to files previously scanned while the main thread continues scanning -- any inode greater than this value hasn't been scanned and can go on its way; any other update must be incorporated into the collected data. It is critical for the scanning thraad to hold exclusive access on the inode until after marking the inode visited. This new code is split out as a separate patch from its initial user for the sake of enabling the author to move patches around his tree with ease. The intended usage model for this code is roughly: xchk_iscan_start(iscan, 0, 0); while ((error = xchk_iscan_iter(sc, iscan, &ip)) == 1) { xfs_ilock(ip, ...); /* capture inode metadata */ xchk_iscan_mark_visited(iscan, ip); xfs_iunlock(ip, ...); xfs_irele(ip); } xchk_iscan_stop(iscan); if (error) return error; Hook functions for live updates can then do: if (xchk_iscan_want_live_update(...)) /* update the captured inode metadata */ Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: speed up xfs_iwalk_adjust_start a little bitDarrick J. Wong
Replace the open-coded loop that recomputes freecount with a single call to a bit weight function. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair quotasrepair-quota_2022-11-09Darrick J. Wong
Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: online repair of realtime bitmapsDarrick J. Wong
Rebuild the realtime bitmap from the realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: create a new inode fork block unmap helperDarrick J. Wong
Create a new helper to unmap blocks from an inode's fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair the inode core and forks of a metadata inodeDarrick J. Wong
Add a helper function to repair the core and forks of a metadata inode, so that we can get move onto the task of repairing higher level metadata that lives in an inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair problems in CoW forksrepair-file-mappings_2022-11-09Darrick J. Wong
Try to repair errors that we see in file CoW forks so that we don't do stupid things like remap garbage into a file. There's not a lot we can do with the COW fork -- the ondisk metadata record only that the COW staging extents are owned by the refcount btree, which effectively means that we can't reconstruct this incore structure from scratch. Actually, this is even worse -- we can't touch written extents, because those map space that are actively under writeback, and there's not much to do with delalloc reservations. Hence we can only detect crosslinked unwritten extents and fix them by punching out the problematic parts and replacing them with delalloc extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: create a ranged query function for refcount btreesDarrick J. Wong
Implement ranged queries for refcount records. The next patch will use this to scan refcount data. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: refactor repair forcing tests into a repair.c helperDarrick J. Wong
There are a couple of conditions that userspace can set to force repairs of metadata. These really belong in the repair code and not open-coded into the check code, so refactor them into a helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair inode fork block mapping data structuresDarrick J. Wong
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: reintroduce reaping of file metadata blocks to xrep_reap_extentsDarrick J. Wong
Reintroduce to xrep_reap_extents the ability to reap extents from any AG. We dropped this before because it was buggy, but in the next patch we will gain the ability to reap old bmap btrees, which can have blocks in any AG. To do this, we require that sc->sa is uninitialized, so that we can use it to hold all the per-AG context for a given extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair obviously broken inode modesrepair-inodes_2022-11-09Darrick J. Wong
Building off the rmap scanner that we added in the previous patch, we can now find block 0 and try to use the information contained inside of it to guess the mode of an inode if it's totally improper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: zap broken inode forksDarrick J. Wong
Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair inode recordsDarrick J. Wong
If an inode is so badly damaged that it cannot be loaded into the cache, fix the ondisk metadata and try again. If there /is/ a cached inode, fix any problems and apply any optimizations that can be solved incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: try to attach dquots to files before repairing themDarrick J. Wong
Soon, we will be adding the ability to repair inodes. Inode resource usage is tracked in quota, which means that if we think we might have to repair a file, we ought to attach dquots from the start. Do this before we take the file's ILOCK, though we don't require success here because quota itself could also be in need of repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: disable online repair quota helpers when quota not enabledDarrick J. Wong
Don't compile the quota helper functions if quota isn't being built into the XFS module. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair refcount btreesrepair-ag-btrees_2022-11-09Darrick J. Wong
Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair inode btreesDarrick J. Wong
Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: rewrite xfs_icache_inode_is_allocatedDarrick J. Wong
Back in the mists of time[1], I proposed this function to assist the inode btree scrubbers in checking the inode btree contents against the allocation state of the inode records. The original version performed a direct lookup in the inode cache and returned the allocation status if the cached inode hadn't been reused and wasn't in an intermediate state. Brian thought it would be better to use the usual iget/irele mechanisms, so that was changed for the final version. Unfortunately, this hasn't aged well -- the IGET_INCORE flag only has one user and clutters up the regular iget path, which makes it hard to reason about how it actually works. Worse yet, the inode inactivation series silently broke it because iget won't return inodes that are anywhere in the inactivation machinery, even though the caller is already required to prevent inode allocation and freeing. Inodes in the inactivation machinery are still allocated, but the current code's interactions with the iget code prevent us from being able to say that. Now that I understand the inode lifecycle better than I did in early 2017, I now realize that as long as the cached inode hasn't been reused and isn't actively being reclaimed, it's safe to access the i_mode field (with the AGI, rcu, and i_flags locks held), and we don't need to worry about the inode being freed out from under us. Therefore, port the original version to modern code structure, which fixes the brokennes w.r.t. inactivation. In the next patch we'll remove IGET_INCORE since it's no longer necessary. [1] https://lore.kernel.org/linux-xfs/149643868294.23065.8094890990886436794.stgit@birch.djwong.org/ Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: repair free space btreesDarrick J. Wong
Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: clear pagf_agflreset when repairing the AGFLDarrick J. Wong
Clear the pagf_agflreset flag when we're repairing the AGFL because we fix all the same padding problems that xfs_agfl_reset does. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: allow userspace to rebuild metadata structuresrepair-force-rebuild_2022-11-09Darrick J. Wong
Add a new (superuser-only) flag to the online metadata repair ioctl to force it to rebuild structures, even if they're not broken. We will use this to move metadata structures out of the way during a free space defragmentation operation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: don't complain about unfixed metadata when repairs were injectedDarrick J. Wong
While debugging other parts of online repair, I noticed that if someone injects FORCE_SCRUB_REPAIR, starts an IFLAG_REPAIR scrub on a piece of metadata, and the metadata repair fails, we'll log a message about uncorrected errors in the filesystem. This isn't strictly true if the scrub function didn't set OFLAG_CORRUPT and we're only doing the repair because the error injection knob is set. Repair functions are allowed to abort the entire operation at any point before committing new metadata, in which case the piece of metadata is in the same state as it was before. Therefore, the log message should be gated on the results of the scrub. Refactor the predicate and rearrange the code flow to make this happen. Note: If the repair function errors out after it commits the new metadata, the transaction cancellation will shut down the filesystem, which is an obvious sign of corrupt metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09xfs: allow the user to cancel repairs before we start writingrepair-tweaks_2022-11-09Darrick J. Wong
All online repair functions have the same structure: walk filesystem metadata structures gathering enough data to rebuild the structure, stage a new copy, and then commit the new copy. The gathering steps do not write anything to disk, so they are peppered with xchk_should_terminate calls to avoid softlockup warnings and to provide an opportunity to abort the repair (by killing xfs_scrub). However, it's not clear in the code base when is the last chance to abort cleanly without having to undo a bunch of structure. Therefore, add one more call to xchk_should_terminate (along with a comment) providing the sysadmin with the ability to abort before it's too late and to make it clear in the source code when it's no longer convenient or safe to abort a repair. As there are only four repair functions right now, this patch exists more to establish a precedent for subsequent additions than to deliver practical functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>