summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
2021-12-15xfs: repair damaged symlinksrepair-inodes_2021-12-15Darrick J. Wong
Repair inconsistent symbolic link data. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: repair inode block mapsDarrick J. Wong
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: reintroduce reaping of file extents to xrep_reap_extentsDarrick J. Wong
Reintroduce to xrep_reap_extents the ability to reap extents from any AG. We dropped this before because it was buggy, but in the next patch we will gain the ability to reap old bmap btrees, which can have blocks in any AG. To do this, we require that sc->sa is uninitialized, so that we can use it to hold all the per-AG context for a given extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: repair obviously broken inode modesDarrick J. Wong
Building off the rmap scanner that we added in the previous patch, we can now find block 0 and try to use the information contained inside of it to guess the mode of an inode if it's totally improper. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: zap broken inode forksDarrick J. Wong
Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: repair inode recordsDarrick J. Wong
Try to reinitialize corrupt inodes, or clear the reflink flag if it's not needed. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: warn about inodes with project id of -1Darrick J. Wong
Inodes aren't supposed to have a project id of -1U (aka 4294967295) but the kernel hasn't always validated FSSETXATTR correctly. Flag this as something for the sysadmin to check out. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: repair refcount btreesrepair-ag-btrees_2021-12-15Darrick J. Wong
Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: repair inode btreesDarrick J. Wong
Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: rewrite xfs_icache_inode_is_allocatedDarrick J. Wong
Back in the mists of time[1], I proposed this function to assist the inode btree scrubbers in checking the inode btree contents against the allocation state of the inode records. The original version performed a direct lookup in the inode cache and returned the allocation status if the cached inode hadn't been reused and wasn't in an intermediate state. Brian thought it would be better to use the usual iget/irele mechanisms, so that was changed for the final version. Unfortunately, this hasn't aged well -- the IGET_INCORE flag only has one user and clutters up the regular iget path, which makes it hard to reason about how it actually works. Worse yet, the inode inactivation series silently broke it because iget won't return inodes that are anywhere in the inactivation machinery, even though the caller is already required to prevent inode allocation and freeing. Inodes in the inactivation machinery are still allocated, but the current code's interactions with the iget code prevent us from being able to say that. Now that I understand the inode lifecycle better than I did in early 2017, I now realize that as long as the cached inode hasn't been reused and isn't actively being reclaimed, it's safe to access the i_mode field (with the AGI, rcu, and i_flags locks held), and we don't need to worry about the inode being freed out from under us. Therefore, port the original version to modern code structure, which fixes the brokennes w.r.t. inactivation. In the next patch we'll remove IGET_INCORE since it's no longer necessary. [1] https://lore.kernel.org/linux-xfs/149643868294.23065.8094890990886436794.stgit@birch.djwong.org/ Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: repair free space btreesDarrick J. Wong
Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: allow the user to cancel repairs before we start writingDarrick J. Wong
All online repair functions have the same structure: walk filesystem metadata structures gathering enough data to rebuild the structure, stage a new copy, and then commit the new copy. The gathering steps do not write anything to disk, so they are peppered with xchk_should_terminate calls to avoid softlockup warnings and to provide an opportunity to abort the repair (by killing xfs_scrub). However, it's not clear in the code base when is the last chance to abort cleanly without having to undo a bunch of structure. Therefore, add one more call to xchk_should_terminate (along with a comment) providing the sysadmin with the ability to abort before it's too late and to make it clear in the source code when it's no longer convenient or safe to abort a repair. As there are only four repair functions right now, this patch exists more to establish a precedent for subsequent additions than to deliver practical functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: don't complain about unfixed metadata when repairs were injectedDarrick J. Wong
While debugging other parts of online repair, I noticed that if someone injects FORCE_SCRUB_REPAIR, starts an IFLAG_REPAIR scrub on a piece of metadata, and the metadata repair fails, we'll log a message about uncorrected errors in the filesystem. This isn't strictly true if the scrub function didn't set OFLAG_CORRUPT and we're only doing the repair because the error injection knob is set. Repair functions are allowed to abort the entire operation at any point before committing new metadata, in which case the piece of metadata is in the same state as it was before. Therefore, the log message should be gated on the results of the scrub. Refactor the predicate and rearrange the code flow to make this happen. Note: If the repair function errors out after it commits the new metadata, the transaction cancellation will shut down the filesystem, which is an obvious sign of corrupt metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: always rescan allegedly healthy per-ag metadata after repairDarrick J. Wong
After an online repair function runs for a per-AG metadata structure, sc->sick_mask is supposed to reflect the per-AG metadata that the repair function fixed. Our next move is to re-check the metadata to assess the completeness of our repair, so we don't want the rebuilt structure to be excluded from the rescan just because the health system previously logged a problem with the data structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: implement online scrubbing of rtsummary infoscrub-rtsummary_2021-12-15Darrick J. Wong
Finish the realtime summary scrubber by adding the functions we need to compute a fresh copy of the rtsummary info and comparing it to the copy on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: move the realtime summary file scrubber to a separate source fileDarrick J. Wong
Move the realtime summary file checking code to a separate file in preparation to actually implement it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: wrap ilock/iunlock operations on sc->ipDarrick J. Wong
Scrub tracks the resources that it's holding onto in the xfs_scrub structure. This includes the inode being checked (if applicable) and the inode lock state of that inode. Replace the open-coded structure manipulation with a trivial helper to eliminate sources of error. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: get our own reference to inodes that we want to scrubDarrick J. Wong
When we want to scrub a file, get our own reference to the inode unconditionally. This will make disposal rules simpler in the long run. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: create a big array data structureDarrick J. Wong
Create a simple 'big array' data structure for storage of fixed-size metadata records that will be used to reconstruct a btree index. For repair operations, the most important operations are append, iterate, and sort. Earlier implementations of the big array used linked lists and suffered from severe problems -- pinning all records in kernel memory was not a good idea and frequently lead to OOM situations; random access was very inefficient; and record overhead for the lists was unacceptably high at 40-60%. Therefore, the big memory array relies on the 'xfile' abstraction, which creates a memfd file and stores the records in page cache pages. Since the memfd is created in tmpfs, the memory pages can be pushed out to disk if necessary and we have a built-in usage limit of 50% of physical memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: log EFIs for all btree blocks being used to stage a btreerepair-prep-for-bulk-loading_2021-12-15Darrick J. Wong
We need to log EFIs for every extent that we allocate for the purpose of staging a new btree so that if we fail then the blocks will be freed during log recovery. Add a function to relog the EFIs, so that repair can relog them all every time it creates a new btree block, which will help us to avoid pinning the log tail. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: add debug knobs to control btree bulk load slack factorsDarrick J. Wong
Add some debug knobs so that we can control the leaf and node block slack when rebuilding btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: implement block reservation accounting for btrees we're stagingDarrick J. Wong
Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: force all buffers to be written during btree bulk loadDarrick J. Wong
While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffer readers encountering buffers with DELWRI_Q set, even though the btree bulk load had already committed and the buffer itself wasn't on any delwri list. I traced this to a misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. This clears DELWRI_Q from the buffer state. If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep. When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list. This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency. Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag. To fix this, create a new function that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: convert xbitmap to interval treerepair-bitmap-rework_2021-12-15Darrick J. Wong
Convert the xbitmap code to use interval trees instead of linked lists. This reduces the amount of coding required to handle the disunion operation and in the future will make it easier to set bits in arbitrary order yet later be able to extract maximally sized extents, which we'll need for rebuilding certain structures. We define our own interval tree type so that it can deal with 64-bit indices even on 32-bit machines. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: reap large extents when possibleDarrick J. Wong
When we're freeing extents that have been set in a bitmap, break the bitmap extent into multiple sub-extents organized by fate, and reap the extents. This enables us to dispose of old resources more efficiently than doing them block by block. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: ignore stale buffers when scanning the buffer cacheDarrick J. Wong
After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possible that parts of a tree that were previously cached in memory are no longer accessible due to media failure or other corruption on interior nodes, so repair figures out the old blocks from the reverse mapping data and scans the buffer cache directly. Unfortunately, the current buffer cache code triggers asserts if the rhashtable lookup finds a non-stale buffer of a different length than the key we searched for. For regular operation this is desirable, but for this repair procedure, we don't care since we're going to forcibly stale the buffer anyway. Add an internal lookup flag to avoid the assert. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: rearrange xrep_reap_block to make future code flow easierDarrick J. Wong
Rearrange the logic inside xrep_reap_block to make it more obvious that crosslinked metadata blocks are handled differently. Add a couple of tracepoints so that we can tell what's going on at the end of a btree rebuild operation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: drop the _safe behavior from the xbitmap foreach macroDarrick J. Wong
It's not safe to edit bitmap intervals while we're iterating them with for_each_xbitmap_extent. None of the existing callers actually need that ability anyway, so drop the safe variable. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: remove the for_each_xbitmap_ helpersDarrick J. Wong
Remove the for_each_xbitmap_ macros in favor of proper iterator functions. We'll soon be switching this data structure over to an interval tree implementation, which means that we can't allow callers to modify the bitmap during iteration without telling us. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: use deferred frees to reap old btree blocksrepair-reap-fixes_2021-12-15Darrick J. Wong
Use deferred frees (EFIs) to reap the blocks of a btree that we just replaced. This helps us to shrink the window in which those old blocks could be lost due to a system crash, though we try to flush the EFIs every few hundred blocks so that we don't also overflow the transaction reservations during and after we commit the new btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: only allow reaping of per-AG blocks in xrep_reap_extentsDarrick J. Wong
Now that we've refactored btree cursors to require the caller to pass in a perag structure, there are numerous problems in xrep_reap_extents if it's being called to reap extents for an inode metadata repair. We don't have any repair functions that can do that, so drop the support for now. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: only invalidate blocks if we're going to free themDarrick J. Wong
When we're discarding old btree blocks after a repair, only invalidate the buffers for the ones that we're freeing -- if the metadata was crosslinked with another data structure, we don't want to touch it. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: don't return -EFSCORRUPTED from repair when resources cannot be grabbedDarrick J. Wong
If we tried to repair something but the repair failed with -EDEADLOCK or -EAGAIN, that means that the repair function couldn't grab some resource it needed and wants us to try again. If we try again (with TRY_HARDER) but still can't do it, exit back to userspace, since xfs_scrub_metadata requires xrep_attempt to return -EAGAIN. This makes the return value diagnostics look less weird, and fixes a wart that remains from very early in the repair implementation. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: amend xfs_rmap_has_other_keys to check file offsets properlyDarrick J. Wong
This function was created to help online repair deal with freeing blocks from an old metadata structure after a repair completes. When considering a block to free, we want to know if the rmap btree contains records of any other owners. If yes, the block is crosslinked and we merely want to drop the forward reference to the block and any rmap record from the deleted structure. If no, then we should additionally free the block. The current version doesn't actually check file offsets properly, since I hadn't built gotten that far with online repair. Now that I have, I know that it should be performing a range comparison of the offsets and not the simple offset check that it currently performs. So, fix this deficiency ahead of the rest of online repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: rewrite xfs_reflink_end_cow to use intentsreflink-speedups_2021-12-15Darrick J. Wong
Currently, the code that performs CoW remapping after a write has this odd behavior where it walks /backwards/ through the data fork to remap extents in reverse order. Earlier, we rewrote the reflink remap function to use deferred bmap log items instead of trying to cram as much into the first transaction that we could. Now do the same for the CoW remap code. There doesn't seem to be any performance impact; we're just making better use of code that we added for the benefit of reflink. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: reduce transaction reservations with reflinkDarrick J. Wong
Before to the introduction of deferred refcount operations, reflink would try to cram refcount btree updates into the same transaction as an allocation or a free event. Mainline XFS has never actually done that, but we never refactored the transaction reservations to reflect that we now do all refcount updates in separate transactions. Fix this to reduce the transaction reservation size even farther, so that between this patch and the previous one, we reduce the tr_write and tr_itruncate sizes by 66%. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: reduce the absurdly large log reservationsDarrick J. Wong
Back in the early days of reflink and rmap development I set the transaction reservation sizes to be overly generous for rmap+reflink filesystems, and a little under-generous for rmap-only filesystems. Since we don't need *eight* transaction rolls to handle three new log intent items, decrease the logcounts to what we actually need, and amend the shadow reservation computation function to reflect what we used to do so that the minimum log size doesn't change. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: create shadow transaction reservations for computing minimum log sizeDarrick J. Wong
Every time someone changes the transaction reservation sizes, they introduce potential compatibility problems if the changes affect the minimum log size that we validate at mount time. If the minimum log size gets larger (which should be avoided because doing so presents a serious risk of log livelock), filesystems created with old mkfs will not mount on a newer kernel; if the minimum size shrinks, filesystems created with newer mkfs will not mount on older kernels. Therefore, enable the creation of a shadow log reservation structure where we can "undo" the effects of tweaks when computing minimum log sizes. These shadow reservations should never be used in practice, but they insulate us from perturbations in minimum log size. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: remove a __xfs_bunmapi call from reflinkDarrick J. Wong
This raw call isn't necessary since we can always remove a full delalloc extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: stop artificially limiting the length of bunmap callsDarrick J. Wong
In commit e1a4e37cc7b6, we clamped the length of bunmapi calls on the data forks of shared files to avoid two failure scenarios: one where the extent being unmapped is so sparsely shared that we exceed the transaction reservation with the sheer number of refcount btree updates and EFI intent items; and the other where we attach so many deferred updates to the transaction that we pin the log tail and later the log head meets the tail, causing the log to livelock. We avoid triggering the first problem by tracking the number of ops in the refcount btree cursor and forcing a requeue of the refcount intent item any time we think that we might be close to overflowing. This has been baked into XFS since before the original e1a4 patch. A recent patchset fixed the second problem by changing the deferred ops code to finish all the work items created by each round of trying to complete a refcount intent item, which eliminates the long chains of deferred items (27dad); and causing long-running transactions to relog their intent log items when space in the log gets low (74f4d). Because this clamp affects /any/ unmapping request regardless of the sharing factors of the component blocks, it degrades the performance of all large unmapping requests -- whereas with an unshared file we can unmap millions of blocks in one go, shared files are limited to unmapping a few thousand blocks at a time, which causes the upper level code to spin in a bunmapi loop even if it wasn't needed. This also eliminates one more place where log recovery behavior can differ from online behavior, because bunmapi operations no longer need to requeue. Partial-revert-of: e1a4e37cc7b6 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent") Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished") Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: speed up write operations by using non-overlapped lookups when possiblermap-speedups_2021-12-15Darrick J. Wong
Reverse mapping on a reflink-capable filesystem has some pretty high overhead when performing file operations. This is because the rmap records for logically and physically adjacent extents might not be adjacent in the rmap index due to data block sharing. As a result, we use expensive overlapped-interval btree search, which walks every record that overlaps with the supplied key in the hopes of finding the record. However, profiling data shows that when the index contains a record that is an exact match for a query key, the non-overlapped btree search function can find the record much faster than the overlapped version. Try the non-overlapped lookup first when we're trying to find the left neighbor rmap record for a given file mapping, which makes unwritten extent conversion and remap operations run faster if data block sharing is minimal in this part of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: speed up rmap lookups by using non-overlapped lookups when possibleDarrick J. Wong
Reverse mapping on a reflink-capable filesystem has some pretty high overhead when performing file operations. This is because the rmap records for logically and physically adjacent extents might not be adjacent in the rmap index due to data block sharing. As a result, we use expensive overlapped-interval btree search, which walks every record that overlaps with the supplied key in the hopes of finding the record. However, profiling data shows that when the index contains a record that is an exact match for a query key, the non-overlapped btree search function can find the record much faster than the overlapped version. Try the non-overlapped lookup first, which will make scrub run much faster. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: simplify xfs_rmap_lookup_le call sitesDarrick J. Wong
Most callers of xfs_rmap_lookup_le will retrieve the btree record immediately if the lookup succeeds. The overlapped version of this function (xfs_rmap_lookup_le_range) will return the record if the lookup succeeds, so make the regular version do it too. Get rid of the useless len argument, since it's not part of the lookup key. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: check the reference counts of gaps in the refcount btreescrub-fix-checking-gaps_2021-12-15Darrick J. Wong
Gaps in the reference count btree are also significant -- for these regions, there must not be any overlapping reverse mappings. We don't currently check this, so make the refcount scrubber more complete. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: check quota files for unwritten extentsDarrick J. Wong
Teach scrub to flag quota files containing unwritten extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: teach scrub to check for adjacent bmaps when rmap larger than bmapDarrick J. Wong
When scrub is checking file fork mappings against rmap records and the rmap record starts before or ends after the bmap record, check the adjacent bmap records to make sure that they're adjacent to the one we're checking. This helps us to detect cases where the rmaps cover territory that the bmaps do not. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: online checking of the free rt extent countDarrick J. Wong
Teach the summary count checker to count the number of free realtime extents and compare that to the superblock copy. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: check btree keys reflect the child blockDarrick J. Wong
When scrub is checking a non-root btree block, it should make sure that the keys in the parent btree block accurately capture the keyspace that the child block stores. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: make checking directory dotdot entries more efficientDarrick J. Wong
The current directory parent scrubbing code could be tighter in its checking -- when we've dropped all the directory iolocks, it's not necessary to trylock the first of a nested pair, so we can eliminate the first call to xchk_ilock_inverted on those grounds. Second, if the child directory's parent changes during the lock cycling, we know that the new parent has stamped the correct parent into the dotdot entry, so we can conclude that the parent entry is correct. Therefore, we don't need the second xchk_ilock_inverted call at all. We can spin in a lock/trylock loop until we can grab both locks and recheck the child directory afterwards. This eliminates an entire source of -EDEADLOCK-based "retry harder" code executions. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-15xfs: teach xfs_btree_has_record to return false if there are gapsDarrick J. Wong
The current implementation of xfs_btree_has_record returns true if it finds /any/ record within the given range. Unfortunately, that's not what the predicate is supposed to do -- it's supposed to test if the /entire/ range is covered by records. Therefore, enhance the routine to check that the first record it encounters starts earlier or at the same point as the low key, the last record ends at or after the same point as the high key, and that there aren't any gaps in the records. Signed-off-by: Darrick J. Wong <djwong@kernel.org>