bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-11-09	xfs: remove unnecessary dstmap in xattr scrubber	Darrick J. Wong
	Replace bitmap_and with bitmap_intersects in the xattr leaf block scrubber, since we only care if there's overlap between the used space bitmap and the free space bitmap. This means we don't need dstmap any more, and can thus reduce the memory requirements. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: don't shadow @leaf in xchk_xattr_block	Darrick J. Wong
	Don't shadow the leaf variable here, because it's misleading to have one place in the codebase where two variables with different types have the same name. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: check for reverse mapping records that could be mergedscrub-detect-mergeable-records_2022-11-09	Darrick J. Wong
	Enhance the rmap scrubber to flag adjacent records that could be merged. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: check overlapping rmap btree records	Darrick J. Wong
	The rmap btree scrubber doesn't contain sufficient checking for records that cannot overlap but do anyway. For the other btrees, this is enforced by the inorder checks in xchk_btree_rec, but the rmap btree is special because it allows overlapping records to handle shared data extents. Therefore, enhance the rmap btree record check function to compare each record against the previous one so that we can detect overlapping rmap records for space allocations that do not allow sharing. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: flag refcount btree records that could be merged	Darrick J. Wong
	Complain if we encounter refcount btree records that could be merged. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: flag free space btree records that could be merged	Darrick J. Wong
	Complain if we encounter free space btree records that could be merged. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: alert the user about data/attr fork mappings that could be merged	Darrick J. Wong
	If the data or attr forks have mappings that could be merged, let the user know that the structure could be optimized. This isn't a filesystem corruption since the regular filesystem does not try to be smart about merging bmbt records. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: change bmap scrubber to store the previous mapping	Darrick J. Wong
	Convert the inode data/attr/cow fork scrubber to remember the entire previous mapping, not just the next expected offset. No behavior changes here, but this will enable some better checking in subsequent patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: always check the existence of a dirent's child inodescrub-dir-iget-fixes_2022-11-09	Darrick J. Wong
	When we're scrubbing directory entries, we always need to iget the child inode to make sure that the inode pointer points to a valid inode. The original directory scrub code (commit a5c4) only set us up to do this for ftype=1 filesystems, which is not sufficient; and then commit 4b80 made it worse by exempting the dot and dotdot entries. Sorta-fixes: a5c46e5e8912 ("xfs: scrub directory metadata") Sorta-fixes: 4b80ac64450f ("xfs: scrub should mark a directory corrupt if any entries cannot be iget'd") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: xfs_iget in the directory scrubber needs to use UNTRUSTED	Darrick J. Wong
	In commit 4b80ac64450f, we tried to strengthen the directory scrubber by using the iget call to detect directory entries that point to unallocated inodes. Unfortunately, that commit neglected to pass XFS_IGET_UNTRUSTED to xfs_iget, so we don't check the inode btree first. If the inode number points to something that isn't even an inode cluster, iget will throw corruption errors and return -EFSCORRUPTED, which means that we fail to mark the directory corrupt. Fixes: 4b80ac64450f ("xfs: scrub should mark a directory corrupt if any entries cannot be iget'd") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: make checking directory dotdot entries more reliable	Darrick J. Wong
	The current directory parent scrubbing code could be tighter in its execution -- instead of bailing out to userspace after a couple of seconds of waiting for the (alleged) parent directory's IOLOCK while refusing to release the child directory's IOLOCK, we could just cycle both locks until we get both or the child process absorbs a fatal signal. Note that because the usual sequence is to take IOLOCKs before grabbing a transaction, we have to use the _nowait variants on both inodes to avoid an ABBA deadlock. Since parent pointer checking is the only place in scrub that needs this kind of functionality, move it to parent.c as a private function. Furthermore, if the child directory's parent changes during the lock cycling, we know that the new parent has stamped the correct parent into the dotdot entry, so we can conclude that the parent entry is correct. This eliminates an entire source of -EDEADLOCK-based "retry harder" scrub executions. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: retain the AGI when we can't iget an inode to scrub the corescrub-iget-fixes_2022-11-09	Darrick J. Wong
	xchk_get_inode is not quite the right function to be calling from the inode scrubber setup function. The common get_inode function either gets an inode and installs it in the scrub context, or it returns an error code explaining what happened. This is acceptable for most file scrubbers because it is not in their scope to fix corruptions in the inode core and fork areas that cause iget to fail. Dealing with these problems is within the scope of the inode scrubber, however. If iget fails with EFSCORRUPTED, we need to xchk_inode to flag that as corruption. Since we can't get our hands on an incore inode, we need to hold the AGI to prevent inode allocation activity so that nothing changes in the inode metadata. Looking ahead to the inode core repair patches, we will also need to hold the AGI buffer into xrep_inode so that we can make modifications to the xfs_dinode structure without any other thread swooping in to allocate or free the inode. Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off use case where the error codes we check for are a little different, and the return state is much different from the common function. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: fix an inode lookup race in xchk_get_inode	Darrick J. Wong
	In commit d658e, we tried to improve the robustnes of xchk_get_inode in the face of EINVAL returns from iget by calling xfs_imap to see if the inobt itself thinks that the inode is allocated. Unfortunately, that commit didn't consider the possibility that the inode gets allocated after iget but before imap. In this case, the imap call will succeed, but we turn that into a corruption error and tell userspace the inode is corrupt. Avoid this false corruption report by grabbing the AGI header and retrying the iget before calling imap. If the iget succeeds, we can proceed with the usual scrub-by-handle code. Fix all the incorrect comments too, since unreadable/corrupt inodes no longer result in EINVAL returns. Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: manage inode DONTCACHE status at irele time	Darrick J. Wong
	Right now, there are statements scattered all over the online fsck codebase about how we can't use XFS_IGET_DONTCACHE because of concerns about scrub's unusual practice of releasing inodes with transactions held. However, iget is the wrong place to handle this -- the DONTCACHE state doesn't matter at all until we try to release the inode, and here we get things wrong in multiple ways: First, if we /do/ have a transaction, we must NOT drop the inode, because the inode could have dirty pages, dropping the inode will trigger writeback, and writeback can trigger a nested transaction. Second, if the inode already had an active reference and the DONTCACHE flag set, the icache hit when scrub grabs another ref will not clear DONTCACHE. This is sort of by design, since DONTCACHE is now used to initiate cache drops so that sysadmins can change a file's access mode between pagecache and DAX. Third, if we do actually have the last active reference to the inode, we can set DONTCACHE to avoid polluting the cache. This is the /one/ case where we actually want that flag. Create an xchk_irele helper to encode all that logic and switch the online fsck code to use it. Since this now means that nearly all scrubbers use the same xfs_iget flags, we can wrap them too. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: check inode core when scrubbing metadata filesscrub-check-metadata-inode-records_2022-11-09	Darrick J. Wong
	Metadata files (e.g. realtime bitmaps and quota files) do not show up in the bulkstat output, which means that scrub-by-handle does not work; they can only be checked through a specific scrub type. Therefore, each scrub type calls xchk_metadata_inode_forks to check the metadata for whatever's in the file. Unfortunately, that function doesn't actually check the inode record itself. Refactor the function a bit to make that happen. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: don't warn about files that are exactly s_maxbytes long	Darrick J. Wong
	We can handle files that are exactly s_maxbytes bytes long; we just can't handle anything larger than that. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: ensure that single-owner file blocks are not owned by othersscrub-detect-rmapbt-gaps_2022-11-09	Darrick J. Wong
	For any file fork mapping that can only have a single owner, make sure that there are no other rmap owners for that mapping. This patch requires the more detailed checking provided by xfs_rmap_count_owners so that we can know how many rmap records for a given range of space had a matching owner, how many had a non-matching owner, and how many conflicted with the records that have a matching owner. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: teach scrub to check for sole ownership of metadata objects	Darrick J. Wong
	Strengthen online scrub's checking even further by enabling us to check that a range of blocks are owned solely by a given owner. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: convert xfs_ialloc_has_inodes_at_extent to return keyfill scan resultsscrub-detect-inobt-gaps_2022-11-09	Darrick J. Wong
	Convert the xfs_ialloc_has_inodes_at_extent function to return keyfill scan results because for a given range of inode numbers, we might have no indexed inodes at all; the entire region might be allocated ondisk inodes; or there might be a mix of the two. Unfortunately, sparse inodes adds to the complexity, because each inode record can have holes, which means that we cannot use the generic btree _scan_keyfill function because we must look for holes in individual records to decide the result. On the plus side, online fsck can now detect sub-chunk discrepancies in the inobt. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: directly cross-reference the inode btrees with each other	Darrick J. Wong
	Improve the cross-referencing of the two inode btrees by directly checking the free and hole state of each inode with the other btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: clean up broken eearly-exit code in the inode btree scrubber	Darrick J. Wong
	Corrupt inode chunks should cause us to exit early after setting the CORRUPT flag on the scrub state. While we're at it, collapse trivial helpers. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: remove pointless shadow variable from xfs_difree_inobt	Darrick J. Wong
	In xfs_difree_inobt, the pag passed in was previously used to look up the AGI buffer. There's no need to extract it again, so remove the shadow variable and shut up -Wshadow. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: ensure that all metadata and data blocks are not cow staging extentsscrub-detect-refcount-gaps_2022-11-09	Darrick J. Wong
	Make sure that all filesystem metadata blocks and file data blocks are not also marked as CoW staging extents. The extra checking added here was inspired by an actual VM host filesystem corruption incident due to bugs in the CoW handling of 4.x kernels. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-09	xfs: check the reference counts of gaps in the refcount btree	Darrick J. Wong
	Gaps in the reference count btree are also significant -- for these regions, there must not be any overlapping reverse mappings. We don't currently check this, so make the refcount scrubber more complete. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-09	xfs: implement masked btree key comparisons for _has_records scans	Darrick J. Wong
	For keyspace fullness scans, we want to be able to mask off the parts of the key that we don't care about. For most btree types we /do/ want the full keyspace, but for checking that a given space usage also has a full complement of rmapbt records (even if different/multiple owners) we need this masking so that we only track sparseness of rm_startblock, not the whole keyspace (which is extremely sparse). Augment the ->diff_two_keys and ->keys_contiguous helpers to take a third union xfs_btree_key argument, and wire up xfs_rmap_has_records to pass this through. This third "mask" argument should contain a nonzero value in each structure field that should be used in the key comparisons done during the scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: replace xfs_btree_has_record with a general keyspace scanner	Darrick J. Wong
	The current implementation of xfs_btree_has_record returns true if it finds /any/ record within the given range. Unfortunately, that's not sufficient for scrub. We want to be able to tell if a range of keyspace for a btree is devoid of records, is totally mapped to records, or is somewhere in between. By forcing this to be a boolean, we conflated sparseness and fullness, which caused scrub to return incorrect results. Fix the API so that we can tell the caller which of those three is the current state. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: refactor ->diff_two_keys callsites	Darrick J. Wong
	Create wrapper functions around ->diff_two_keys so that we don't have to remember what the return values mean, and adjust some of the code comments to reflect the longtime code behavior. We're going to introduce more uses of ->diff_two_keys in the next patch, so reduce the cognitive load for readers by doing this refactoring now. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: refactor converting btree irec to btree key	Darrick J. Wong
	We keep doing these conversions to support btree queries, so refactor this into a helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-09	xfs: always scrub record/key order of interior recordsbtree-key-enhancements_2022-11-09	Darrick J. Wong
	In commit d47fef9342d0, we removed the firstrec and firstkey fields of struct xchk_btree because Christoph thought they were unnecessary because we could use the record index in the btree cursor. This is incorrect because bc_ptrs (now bc_levels[].ptr) tracks the cursor position within a specific btree block, not within the entire level. The end result is that scrub no longer detects situations where the rightmost record of a block is identical to the leftmost record of that block's right sibling. Fix this regression by reintroducing record validity booleans so that order checking skips only the leftmost record/key in each level. Fixes: d47fef9342d0 ("xfs: don't track firstrec/firstkey separately in xchk_btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: check btree keys reflect the child block	Darrick J. Wong
	When scrub is checking a non-root btree block, it should make sure that the keys in the parent btree block accurately capture the keyspace that the child block stores. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-09	xfs: detect unwritten bit set in rmapbt node block keys	Darrick J. Wong
	In the last patch, we changed the rmapbt code to remove the UNWRITTEN bit when creating an rmapbt key from an rmapbt record, and we changed the rmapbt key comparison code to start considering the ATTR and BMBT flags during lookup. This brought the behavior of the rmapbt implementation in line with its specification. However, there may exist filesystems that have the unwritten bit still set in the rmapbt keys. We should detect these situations and flag the rmapbt as one that would benefit from optimization. Eventually, online repair will be able to do something in response to this. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: fix rm_offset flag handling in rmap keys	Darrick J. Wong
	Keys for extent interval records in the reverse mapping btree are supposed to be computed as follows: (physical block, owner, fork, is_btree, offset) This provides users the ability to look up a reverse mapping from a file block mapping record -- start with the physical block; then if there are multiple records for the same block, move on to the owner; then the inode fork type; and so on to the file offset. Unfortunately, the code that creates rmap lookup keys from rmap records forgot to mask off the record attribute flags, leading to ondisk keys that look like this: (physical block, owner, fork, is_btree, unwritten state, offset) Fortunately, this has all worked ok for the past six years because the key comparison functions incorrectly ignore the fork/bmbt/unwritten information that's encoded in the on-disk offset. This means that lookup comparisons are only done with: (physical block, owner, offset) Queries can (theoretically) return incorrect results because of this omission. On consistent filesystems this isn't an issue because xattr and bmbt blocks cannot be shared and hence the comparisons succeed purely on the contents of the rm_startblock field. For the one case where we support sharing (written data fork blocks) all flag bits are zero, so the omission in the comparison has no ill effects. Unfortunately, this bug prevents scrub from detecting incorrect fork and bmbt flag bits in the rmap btree, so we really do need to fix the compare code. Old filesystems with the unwritten bit erroneously set in the rmap key struct will work fine on new kernels since we still ignore the unwritten bit. New filesystems on older kernels will work fine since the old kernels never paid attention to the unwritten bit. A previous version of this patch forgot to keep the (un)written state flag masked during the comparison and caused a major regression in 5.9.x since unwritten extent conversion can update an rmap record without requiring key updates. Note that blocks cannot go directly from data fork to attr fork without being deallocated and reallocated, nor can they be added to or removed from a bmbt without a free/alloc cycle, so this should not cause any regressions. Found by fuzzing keys[1].attrfork = ones on xfs/371. Fixes: 4b8ed67794fe ("xfs: add rmap btree operations") Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: hoist inode record alignment checks from scrubbtree-hoist-scrub-checks_2022-11-09	Darrick J. Wong
	Move the inobt record alignment checks from xchk_iallocbt_rec into xfs_inobt_check_irec so that they are applied everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: hoist rmap record flag checks from scrub	Darrick J. Wong
	Move the rmap record flag checks from xchk_rmapbt_rec into xfs_rmap_check_irec so that they are applied everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: hoist rmap record flag checks from scrub	Darrick J. Wong
	Move the rmap record flag checks from xchk_rmapbt_rec into xfs_rmap_check_irec so that they are applied everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: complain about bad file mapping records in the ondisk bmbtbtree-complain-bad-records_2022-11-09	Darrick J. Wong
	Similar to what we've just done for the other btrees, create a function to log corrupt bmbt records and call it whenever we encounter a bad record in the ondisk btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: complain about bad records in query_range helpers	Darrick J. Wong
	For every btree type except for the bmbt, refactor the code that complains about bad records into a helper and make the ->query_range helpers call it so that corruptions found via that avenue are logged. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: standardize ondisk to incore conversion for bmap btrees	Darrick J. Wong
	Fix all xfs_bmbt_disk_get_all callsites to call xfs_bmap_validate_extent and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: standardize ondisk to incore conversion for rmap btrees	Darrick J. Wong
	Create a xfs_rmap_check_irec function to detect corruption in btree records. Fix all xfs_rmap_btrec_to_irec callsites to call the new helper and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: return a failure address from xfs_rmap_irec_offset_unpack	Darrick J. Wong
	Currently, xfs_rmap_irec_offset_unpack returns only 0 or -EFSCORRUPTED. Change this function to return the code address of a failed conversion in preparation for the next patch, which standardizes localized record checking and reporting code. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: standardize ondisk to incore conversion for refcount btrees	Darrick J. Wong
	Create a xfs_refcount_check_irec function to detect corruption in btree records. Fix all xfs_refcount_btrec_to_irec callsites to call the new helper and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: standardize ondisk to incore conversion for inode btrees	Darrick J. Wong
	Create a xfs_inobt_check_irec function to detect corruption in btree records. Fix all xfs_inobt_btrec_to_irec callsites to call the new helper and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: standardize ondisk to incore conversion for free space btrees	Darrick J. Wong
	Create a xfs_alloc_btrec_to_irec function to convert an ondisk record to an incore record, and a xfs_alloc_check_irec function to detect corruption. Replace all the open-coded logic with calls to the new helpers and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: scrub should use ECHRNG to signal that the drain is neededscrub-drain-intents_2022-11-09	Darrick J. Wong
	In the previous patch, we added jump labels to the intent drain code so that regular filesystem operations need not pay the price of checking for someone (scrub) waiting on intents to drain from some part of the filesystem when that someone isn't running. However, I observed that xfs/285 now spends a lot more time pushing the AIL from the inode btree scrubber than it used to. This is because the inobt scrubber will try push the AIL to try to get logged inode cores written to the filesystem when it sees a weird discrepancy between the ondisk inode and the inobt records. This AIL push is triggered when the setup function sees TRY_HARDER is set; and the requisite EDEADLOCK return is initiated when the discrepancy is seen. The solution to this performance slow down is to use a different result code (ECHRNG) for scrub code to signal that it needs to wait for deferred intent work items to drain out of some part of the filesystem. When this happens, set a new scrub state flag (XCHK_NEED_DRAIN) so that setup functions will activate the jump label. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: minimize overhead of drain wakeups by using jump labels	Darrick J. Wong
	To reduce the runtime overhead even further when online fsck isn't running, use a static branch key to decide if we call wake_up on the drain. For compilers that support jump labels, the call to wake_up is replaced by a nop sled when nobody is waiting for intents to drain. From my initial microbenchmarking, every transition of the static key between the on and off states takes about 22000ns to complete; this is paid entirely by the xfs_scrub process. When the static key is off (which it should be when fsck isn't running), the nop sled adds an overhead of approximately 0.36ns to runtime code. For the few compilers that don't support jump labels, runtime code pays the cost of calling wake_up on an empty waitqueue, which was observed to be about 30ns. However, most architectures that have sufficient memory and CPU capacity to run XFS also support jump labels, so this is not much of a worry. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: clean up scrub context if scrub setup returns -EDEADLOCK	Darrick J. Wong
	It has been a longstanding convention that online scrub and repair functions can return -EDEADLOCK to signal that they weren't able to obtain some necessary resource. When this happens, the scrub framework is supposed to release all resources attached to the scrub context, set the TRY_HARDER flag in the scrub context flags, and try again. In this context, individual scrub functions are supposed to take all the resources they (incorrectly) speculated were not necessary. We're about to make it so that the functions that lock and wait for a filesystem AG can also return EDEADLOCK to signal that we need to try again with the drain waiters enabled. Therefore, refactor xfs_scrub_metadata to support this behavior for ->setup() functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: allow queued AG intents to drain before scrubbing	Darrick J. Wong
	When a writer thread executes a chain of log intent items, the AG header buffer locks will cycle during a transaction roll to get from one intent item to the next in a chain. Although scrub takes all AG header buffer locks, this isn't sufficient to guard against scrub checking an AG while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding allocation groups. When there's a collision, cross-referencing between data structures (e.g. rmapbt and refcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the perag structure the count of active intents and make scrub wait until it has both AG header buffer locks and the intent counter reaches zero. One quirk of the drain code is that deferred bmap updates also bump and drop the intent counter. A fundamental decision made during the design phase of the reverse mapping feature is that updates to the rmapbt records are always made by the same code that updates the primary metadata. In other words, callers of bmapi functions expect that the bmapi functions will queue deferred rmap updates. Some parts of the reflink code queue deferred refcount (CUI) and bmap (BUI) updates in the same head transaction, but the deferred work manager completely finishes the CUI before the BUI work is started. As a result, the CUI drops the intent count long before the deferred rmap (RUI) update even has a chance to bump the intent count. The only way to keep the intent count elevated between the CUI and RUI is for the BUI to bump the counter until the RUI has been created. A second quirk of the intent drain code is that deferred work items must increment the intent counter as soon as the work item is added to the transaction. When a BUI completes and queues an RUI, the RUI must increment the counter before the BUI decrements it. The only way to accomplish this is to require that the counter be bumped as soon as the deferred work item is created in memory. In the next patches we'll improve on this facility, but this patch provides the basic functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: add a tracepoint to report incorrect extent refcounts	Darrick J. Wong
	Add a new tracepoint so that I can see exactly what and where we failed the refcount check. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: create a function to duplicate an active perag referencepass-perag-refs_2022-11-09	Darrick J. Wong
	There a few object constructor functions throughout XFS where a caller provides an active perag reference and the constructor wants to give the new object its own active reference. Replace the open-coded logic with a common function to do this instead of open-coding atomic_inc logic. This new function adds a few safeguards -- it checks that there's at least one active reference to the perag structure passed in, and it records the refcount bump in the ftrace information. This makes it much easier to debug refcounting problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-11-09	xfs: give xfs_refcount_intent its own perag referenceintents-perag-refs_2022-11-09	Darrick J. Wong
	Give the xfs_refcount_intent an active reference to the perag structure data. This reference will be used to enable scrub intent draining functionality in subsequent patches. Later, shrink will use these active references to know if an AG is quiesced or not. Signed-off-by: Darrick J. Wong <djwong@kernel.org>