bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-10-14	xfs: update health status if we get a clean bill of healthindirect-health-reporting_2022-10-14	Darrick J. Wong
	If scrub finds that everything is ok with the filesystem, we need a way to tell the health tracking that it can let go of indirect health flags, since indirect flags only mean that at some point in the past we lost some context. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: remember sick inodes that get inactivated	Darrick J. Wong
	If an unhealthy inode gets inactivated, remember this fact in the per-fs health summary. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: report XFS_IS_CORRUPT errors to the health systemcorruption-health-reports_2022-10-14	Darrick J. Wong
	Whenever we encounter XFS_IS_CORRUPT failures, we should report that to the health monitoring system for later reporting. I started with this semantic patch and massaged everything until it built: @@ expression mp, test; @@ - if (XFS_IS_CORRUPT(mp, test)) return -EFSCORRUPTED; + if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } @@ expression mp, test; identifier label, error; @@ - if (XFS_IS_CORRUPT(mp, test)) { error = -EFSCORRUPTED; goto label; } + if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; } Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: report btree block corruption errors to the health system	Darrick J. Wong
	Whenever we encounter corrupt btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: separate the marking of sick and checked metadata	Darrick J. Wong
	Split the setting of the sick and checked masks into separate functions as part of preparing to add the ability for regular runtime fs code (i.e. not scrub) to mark metadata structures sick when corruptions are found. Improve the documentation of libxfs' requirements for helper behavior. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: teach repair to fix file nlinksscrub-nlinks_2022-10-14	Darrick J. Wong
	Fix the nlinks now too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: track file link count updates during live nlinks fsck	Darrick J. Wong
	Create the necessary hooks in the file create/unlink/rename code so that our live nlink scrub code can stay up to date with the rest of the filesystem. This will be the means to keep our shadow link count information up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: teach scrub to check file nlinks	Darrick J. Wong
	Create the necessary scrub code to walk the filesystem's directory tree so that we can compute file link counts. Similar to quotacheck, we create an incore shadow array of link count information and then we walk the filesystem a second time to compare the link counts. We need live updates to keep the information up to date during the lengthy scan, so this scrubber remains disabled until the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: streamline the directory iteration code for scrub	Darrick J. Wong
	Currently, online scrub reuses the xfs_readdir code to walk every entry in a directory. This isn't awesome for performance, since we end up cycling the directory ILOCK needlessly and coding around the particular quirks of the VFS dir_context interface. Create a streamlined version of readdir that keeps the ILOCK (since the walk function isn't going to copy stuff to userspace), skips a whole lot of directory walk cursor checks (since we start at 0 and walk to the end) and has a sane way to return error codes. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair dquots based on live quotacheck resultsrepair-quotacheck_2022-10-14	Darrick J. Wong
	Use the shadow quota counters that live quotacheck creates to reset the incore dquot counters. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair cannot update the summary counters when logging quota flags	Darrick J. Wong
	While running xfs/804 (quota repairs racing with fsstress), I observed a filesystem shutdown in the primary sb write verifier: run fstests xfs/804 at 2022-05-23 18:43:48 XFS (sda4): Mounting V5 Filesystem XFS (sda4): Ending clean mount XFS (sda4): Quotacheck needed: Please wait. XFS (sda4): Quotacheck: Done. XFS (sda4): EXPERIMENTAL online scrub feature in use. Use at your own risk! XFS (sda4): SB ifree sanity check failed 0xb5 > 0x80 XFS (sda4): Metadata corruption detected at xfs_sb_write_verify+0x5e/0x100 [xfs], xfs_sb block 0x0 XFS (sda4): Unmount and run xfs_repair The "SB ifree sanity check failed" message was a debugging printk that I added to the kernel; observe that 0xb5 - 0x80 = 53, which is less than one inode chunk. I traced this to the xfs_log_sb calls from the online quota repair code, which tries to clear the CHKD flags from the superblock to force a mount-time quotacheck if the repair fails. On a V5 filesystem, xfs_log_sb updates the ondisk sb summary counters with the current contents of the percpu counters. This is done without quiescing other writer threads, which means it could be racing with a thread that has updated icount and is about to update ifree. If the other write thread had incremented ifree before updating icount, the repair thread will write icount > ifree into the logged update. If the AIL writes the logged superblock back to disk before anyone else fixes this siutation, this will lead to a write verifier failure, which causes a filesystem shutdown. Resolve this problem by updating the quota flags and calling xfs_sb_to_disk directly, which does not touch the percpu counters. While we're at it, we can elide the entire update if the selected qflags aren't set. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: track quota updates during live quotacheck	Darrick J. Wong
	Create a shadow dqtrx system in the quotacheck code that hooks the regular dquot counter update code. This will be the means to keep our copy of the dquot counters up to date while the scan runs in real time. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: implement live quotacheck inode scan	Darrick J. Wong
	Create a new trio of scrub functions to check quota counters. While the dquots themselves are filesystem metadata and should be checked early, the dquot counter values are computed from other metadata and are therefore summary counters. We don't plug these into the scrub dispatch just yet, because we still need to be able to watch quota updates while doing our scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: implement live inode scan for scrub	Darrick J. Wong
	This patch implements a live file scanner for online fsck functions that require the ability to walk a filesystem to gather metadata records and stay informed about metadata changes to files that have already been visited. The iscan structure consists of two inode number cursors: one to track which inode we want to visit next, and a second one to track which inodes have already been visited. This second cursor is key to capturing live updates to files previously scanned while the main thread continues scanning -- any inode greater than this value hasn't been scanned and can go on its way; any other update must be incorporated into the collected data. It is critical for the scanning thraad to hold exclusive access on the inode until after marking the inode visited. This new code is split out as a separate patch from its initial user for the sake of enabling the author to move patches around his tree with ease. The intended usage model for this code is roughly: xchk_iscan_start(iscan, 0, 0); while ((error = xchk_iscan_iter(sc, iscan, &ip)) == 1) { xfs_ilock(ip, ...); /* capture inode metadata / xchk_iscan_mark_visited(iscan, ip); xfs_iunlock(ip, ...); xfs_irele(ip); } xchk_iscan_stop(iscan); if (error) return error; Hook functions for live updates can then do: if (xchk_iscan_want_live_update(...)) / update the captured inode metadata */ Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair quotasrepair-quota_2022-10-14	Darrick J. Wong
	Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: online repair of realtime bitmaps	Darrick J. Wong
	Rebuild the realtime bitmap from the realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair the inode core and forks of a metadata inode	Darrick J. Wong
	Add a helper function to repair the core and forks of a metadata inode, so that we can get move onto the task of repairing higher level metadata that lives in an inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair problems in CoW forksrepair-file-mappings_2022-10-14	Darrick J. Wong
	Try to repair errors that we see in file CoW forks so that we don't do stupid things like remap garbage into a file. There's not a lot we can do with the COW fork -- the ondisk metadata record only that the COW staging extents are owned by the refcount btree, which effectively means that we can't reconstruct this incore structure from scratch. Actually, this is even worse -- we can't touch written extents, because those map space that are actively under writeback, and there's not much to do with delalloc reservations. Hence we can only detect crosslinked unwritten extents and fix them by punching out the problematic parts and replacing them with delalloc extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: refactor repair forcing tests into a repair.c helper	Darrick J. Wong
	There are a couple of conditions that userspace can set to force repairs of metadata. These really belong in the repair code and not open-coded into the check code, so refactor them into a helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair inode fork block mapping data structures	Darrick J. Wong
	Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents	Darrick J. Wong
	Reintroduce to xrep_reap_extents the ability to reap extents from any AG. We dropped this before because it was buggy, but in the next patch we will gain the ability to reap old bmap btrees, which can have blocks in any AG. To do this, we require that sc->sa is uninitialized, so that we can use it to hold all the per-AG context for a given extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair obviously broken inode modesrepair-inodes_2022-10-14	Darrick J. Wong
	Building off the rmap scanner that we added in the previous patch, we can now find block 0 and try to use the information contained inside of it to guess the mode of an inode if it's totally improper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: zap broken inode forks	Darrick J. Wong
	Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair inode records	Darrick J. Wong
	If an inode is so badly damaged that it cannot be loaded into the cache, fix the ondisk metadata and try again. If there /is/ a cached inode, fix any problems and apply any optimizations that can be solved incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: try to attach dquots to files before repairing them	Darrick J. Wong
	Soon, we will be adding the ability to repair inodes. Inode resource usage is tracked in quota, which means that if we think we might have to repair a file, we ought to attach dquots from the start. Do this before we take the file's ILOCK, though we don't require success here because quota itself could also be in need of repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: disable online repair quota helpers when quota not enabled	Darrick J. Wong
	Don't compile the quota helper functions if quota isn't being built into the XFS module. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair refcount btreesrepair-ag-btrees_2022-10-14	Darrick J. Wong
	Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair inode btrees	Darrick J. Wong
	Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: repair free space btrees	Darrick J. Wong
	Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: clear pagf_agflreset when repairing the AGFL	Darrick J. Wong
	Clear the pagf_agflreset flag when we're repairing the AGFL because we fix all the same padding problems that xfs_agfl_reset does. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: allow userspace to rebuild metadata structuresrepair-force-rebuild_2022-10-14	Darrick J. Wong
	Add a new (superuser-only) flag to the online metadata repair ioctl to force it to rebuild structures, even if they're not broken. We will use this to move metadata structures out of the way during a free space defragmentation operation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: don't complain about unfixed metadata when repairs were injected	Darrick J. Wong
	While debugging other parts of online repair, I noticed that if someone injects FORCE_SCRUB_REPAIR, starts an IFLAG_REPAIR scrub on a piece of metadata, and the metadata repair fails, we'll log a message about uncorrected errors in the filesystem. This isn't strictly true if the scrub function didn't set OFLAG_CORRUPT and we're only doing the repair because the error injection knob is set. Repair functions are allowed to abort the entire operation at any point before committing new metadata, in which case the piece of metadata is in the same state as it was before. Therefore, the log message should be gated on the results of the scrub. Refactor the predicate and rearrange the code flow to make this happen. Note: If the repair function errors out after it commits the new metadata, the transaction cancellation will shut down the filesystem, which is an obvious sign of corrupt metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: allow the user to cancel repairs before we start writingrepair-tweaks_2022-10-14	Darrick J. Wong
	All online repair functions have the same structure: walk filesystem metadata structures gathering enough data to rebuild the structure, stage a new copy, and then commit the new copy. The gathering steps do not write anything to disk, so they are peppered with xchk_should_terminate calls to avoid softlockup warnings and to provide an opportunity to abort the repair (by killing xfs_scrub). However, it's not clear in the code base when is the last chance to abort cleanly without having to undo a bunch of structure. Therefore, add one more call to xchk_should_terminate (along with a comment) providing the sysadmin with the ability to abort before it's too late and to make it clear in the source code when it's no longer convenient or safe to abort a repair. As there are only four repair functions right now, this patch exists more to establish a precedent for subsequent additions than to deliver practical functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: always rescan allegedly healthy per-ag metadata after repair	Darrick J. Wong
	After an online repair function runs for a per-AG metadata structure, sc->sick_mask is supposed to reflect the per-AG metadata that the repair function fixed. Our next move is to re-check the metadata to assess the completeness of our repair, so we don't want the rebuilt structure to be excluded from the rescan just because the health system previously logged a problem with the data structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: implement online scrubbing of rtsummary infoscrub-rtsummary_2022-10-14	Darrick J. Wong
	Finish the realtime summary scrubber by adding the functions we need to compute a fresh copy of the rtsummary info and comparing it to the copy on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: move the realtime summary file scrubber to a separate source file	Darrick J. Wong
	Move the realtime summary file checking code to a separate file in preparation to actually implement it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: wrap ilock/iunlock operations on sc->ip	Darrick J. Wong
	Scrub tracks the resources that it's holding onto in the xfs_scrub structure. This includes the inode being checked (if applicable) and the inode lock state of that inode. Replace the open-coded structure manipulation with a trivial helper to eliminate sources of error. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: get our own reference to inodes that we want to scrub	Darrick J. Wong
	When we want to scrub a file, get our own reference to the inode unconditionally. This will make disposal rules simpler in the long run. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: improve xfarray quicksort pivotbig-array_2022-10-14	Darrick J. Wong
	Now that we have the means to do insertion sorts of small in-memory subsets of an xfarray, use it to improve the quicksort pivot algorithm by reading 7 records into memory and finding the median of that. This should prevent bad partitioning when a[lo] and a[hi] end up next to each other in the final sort, which can happen when sorting for cntbt repair when the free space is extremely fragmented (e.g. generic/176). This doesn't speed up the average quicksort run by much, but it will (hopefully) avoid the quadratic time collapse for which quicksort is famous. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: cache pages used for xfarray quicksort convergence	Darrick J. Wong
	After quicksort picks a pivot item for a particular subsort, it walks the records in that subset from the outside in, rearranging them so that every record less than the pivot comes before it, and every record greater than the pivot comes after it. This scan has a lot of locality, so we can speed it up quite a bit by grabbing the xfile backing page and holding onto it as long as we possibly can. Doing so reduces the runtime by another 5% on the author's computer. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: speed up xfarray sort by sorting xfile page contents directly	Darrick J. Wong
	If all the records in an xfarray subset live within the same memory page, we can short-circuit even more quicksort recursion by mapping that page into the local CPU and using the kernel's heapsort function to sort the subset. On the author's computer, this reduces the runtime by another 15% on a 500,000 element array. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: teach xfile to pass back direct-map pages to caller	Darrick J. Wong
	Certain xfile array operations (such as sorting) can be sped up quite a bit by allowing xfile users to grab a page to bulk-read the records contained within it. Create helper methods to facilitate this. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: convert xfarray insertion sort to heapsort using scratchpad memory	Darrick J. Wong
	In the previous patch, we created a very basic quicksort implementation for xfile arrays. While the use of an alternate sorting algorithm to avoid quicksort recursion on very small subsets reduces the runtime modestly, we could do better than a load and store-heavy insertion sort, particularly since each load and store requires a page mapping lookup in the xfile. For a small increase in kernel memory requirements, we could instead bulk load the xfarray records into memory, use the kernel's existing heapsort implementation to sort the records, and bulk store the memory buffer back into the xfile. On the author's computer, this reduces the runtime by about 5% on a 500,000 element array. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: enable sorting of xfile-backed arrays	Darrick J. Wong
	The btree bulk loading code requires that records be provided in the correct record sort order for the given btree type. In general, repair code cannot be required to collect records in order, and it is not feasible to insert new records in the middle of an array to maintain sort order. Implement a sorting algorithm so that we can sort the records just prior to bulk loading. In principle, an xfarray could consume many gigabytes of memory and its backing pages can be sent out to disk at any time. This means that we cannot map the entire array into memory at once, so we must find a way to divide the work into smaller portions (e.g. a page) that /can/ be mapped into memory. Quicksort seems like a reasonable fit for this purpose, since it uses a divide and conquer strategy to keep its average runtime logarithmic. The solution presented here is a port of the glibc implementation, which itself is derived from the median-of-three and tail call recursion strategies outlined by Sedgwick. Subsequent patches will optimize the implementation further by utilizing the kernel's heapsort on directly-mapped memory whenever possible, and improving the quicksort pivot selection algorithm to try to avoid O(n^2) collapses. Note: The sorting functionality gets its own patch because the basic big array mechanisms were plenty for a single code patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: create a big array data structure	Darrick J. Wong
	Create a simple 'big array' data structure for storage of fixed-size metadata records that will be used to reconstruct a btree index. For repair operations, the most important operations are append, iterate, and sort. Earlier implementations of the big array used linked lists and suffered from severe problems -- pinning all records in kernel memory was not a good idea and frequently lead to OOM situations; random access was very inefficient; and record overhead for the lists was unacceptably high at 40-60%. Therefore, the big memory array relies on the 'xfile' abstraction, which creates a memfd file and stores the records in page cache pages. Since the memfd is created in tmpfs, the memory pages can be pushed out to disk if necessary and we have a built-in usage limit of 50% of physical memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: log EFIs for all btree blocks being used to stage a btree	Darrick J. Wong
	We need to log EFIs for every extent that we allocate for the purpose of staging a new btree so that if we fail then the blocks will be freed during log recovery. Add a function to relog the EFIs, so that repair can relog them all every time it creates a new btree block, which will help us to avoid pinning the log tail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: add debug knobs to control btree bulk load slack factors	Darrick J. Wong
	Add some debug knobs so that we can control the leaf and node block slack when rebuilding btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: implement block reservation accounting for btrees we're staging	Darrick J. Wong
	Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: use per-AG bitmaps to reap unused AG metadata blocks during repairrepair-reap-fixes_2022-10-14	Darrick J. Wong
	The AGFL repair code uses a series of bitmaps to figure out where there are OWN_AG blocks that are not claimed by the free space and rmap btrees. These blocks become the new AGFL, and any overflow is reaped. The bitmaps current track xfs_fsblock_t even though we already know the AG number. In the last patch, we introduced a new bitmap "type" for tracking xfs_agblock_t extents. Port the reaping code and the AGFL repair to use this new type, which makes it very obvious what we're tracking. This also eliminates a bunch of unnecessary agblock <-> fsblock conversions. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-14	xfs: reap large AG metadata extents when possible	Darrick J. Wong
	When we're freeing extents that have been set in a bitmap, break the bitmap extent into multiple sub-extents organized by fate, and reap the extents. This enables us to dispose of old resources more efficiently than doing them block by block. While we're at it, rename the reaping functions to make it clear that they're reaping per-AG extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>