Age | Commit message (Collapse) | Author |
|
Rework xfs_btree_compute_maxlevels to handle larger record counts, since
we're about to add support for very large indices for the realtime rmap
btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Don't waste tracepoint segment memory on per-btree block allocation
tracepoints when we can do it from the generic btree code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Don't waste tracepoint segment memory on per-btree block freeing
tracepoints when we can do it from the generic btree code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Change the name of all pointers to xfs_extent_item structures to "xefi"
to make the name consistent and because the current selections ("new"
and "free") mean other things in C.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent. Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we have reflink on the realtime device, extent-free intent
items have to support remapping extents on the realtime volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Teach the EFI mechanism how to free realtime extents. We do this very
sneakily, by using the upper bit of the length field in the log format
(and a boolean flag incore) to convey the realtime status. We're going
to need this to enforce proper ordering of operations when we enable
realtime rmap.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Convert the boolean to skip discard on free into a proper flags field so
that we can add more flags in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Pass the incore EFI structure to the tracepoints instead of open-coding
the argument passing, and augment the tracepoints to tell us which
operation we're selecting to match the other intent item tracepoints.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Pass the incore xfs_extent_free_item through the EFI logging code
instead of repeatedly boxing and unboxing parameters. We'll clean up
the tracepoints shortly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
xfs_bmap_add_free isn't a block mapping function; it schedules deferred
freeing operations for a later point in a compound transaction chain.
While it's primarily used by bunmapi, its use has expanded beyond that.
Move it to xfs_alloc.c and rename the function since it's now general
freeing functionality.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
It's not possible to fail at increasing fdblocks, so get rid of all the
error returns here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we've centralized the realtime metadata locking routines, get
rid of the ILOCK subclasses since we now use explicit lockdep classes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Refactor realtime metadata inode locking so that we can get some sense
here.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Realtime metadata files are not quite regular files because userspace
can't access the realtime bitmap directly, and because we take the ILOCK
of the rt bitmap file while holding the ILOCK of a realtime file. The
double nature of inodes confuses lockdep, so up until now we've created
lockdep subclasses to help lockdep keep things straight.
We've gotten away with using lockdep subclasses because there's only two
rt metadata files, but with the coming addition of realtime rmap and
refcounting, we'd need two more subclasses, which is a lot of class bits
to burn on a side feature.
Therefore, switch to manually setting the lockdep class of the rt
metadata ILOCKs. In the next patch we'll remove the rt-related ILOCK
subclasses.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a pair of helpers to deal with setting up the necessary incore
context to check metadata records against the realtime metadata. Right
now this is limited to locking the realtime bitmap and summary inodes,
but as we add rmap and reflink to the realtime device this will grow to
include btree cursors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Nobody uses this symbol anymore, so kill it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
enough to handle the maximally tall rmap btree when all blocks are in
use and maximally shared, let's compute the maximum height assuming the
rmapbt consumes as many blocks as possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Compute the actual maximum btree height when deciding if per-AG block
reservation is critically low. This only affects the sanity check
condition, since we /generally/ will trigger on the 10% threshold.
This is a long-winded way of saying that we're removing one more
usage of XFS_BTREE_MAXLEVELS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Replace the statically-sized btree cursor zone with dynamically sized
allocations so that we can reduce the memory overhead for per-AG bt
cursors while handling very tall btrees for rt metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Encode the maximum btree height in the cursor, since we're soon going to
allow smaller cursors for AG btrees and larger cursors for file btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The btree geometry computation function has an off-by-one error in that
it does not allow maximally tall btrees (nlevels == XFS_BTREE_MAXLEVELS).
This can result in repairs failing unnecessarily on very fragmented
filesystems. Subsequent patches to remove MAXLEVELS usage in favor of
the per-btree type computations will make this a much more likely
occurrence.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Refactor btree allocation to a common helper.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA. The realtime rmap btree
(which is rooted in an inode) will require the ability to support many
more levels than a per-AG btree cursor, which means that we're going to
create two btree cursor caches to conserve memory for the more common
case.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Warn if we ever bump nlevels higher than the allowed maximum cursor
height.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When we're scanning for btree roots to rebuild the AG headers, make sure
that the proposed tree does not exceed the maximum height for that btree
type (and not just XFS_BTREE_MAXLEVELS).
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Since each btree type has its own precomputed maxlevels variable now,
use them instead of the generic XFS_BTREE_MAXLEVELS to check the level
of each per-AG btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Reorganize struct xchk_btree so that we can dynamically size the context
structure to fit the type of btree cursor that we have. This will
enable us to use memory more efficiently once we start adding very tall
btree types.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Convert the on-stack scrub context, btree scrub context, and da btree
scrub context into a heap allocation so that we reduce stack usage and
gain the ability to handle tall btrees without issue.
Specifically, this saves us ~208 bytes for the dabtree scrub, ~464 bytes
for the btree scrub, and ~200 bytes for the main scrub context.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.
However, I missed a subtlety about the way inode-rooted btrees work. If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block. We don't
pass a pointer to the new block to the caller because that work has
already been done. The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.
This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.
Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree. This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function. Remove some unnecessary conditionals and clean
up a few function calls in the new function. Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function. Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add some logic to xfs_iroot_realloc so that we can handle leaf records
in the btree root block correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
In preparation for storing realtime rmap btree roots in an inode fork,
make xfs_iroot_realloc take an ops structure that takes care of all the
btree-specific geometry pieces.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs
so that we have consistent calling conventions. This doesn't affect the
kernel that much, but enables us to clean up userspace a bit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Rearrange the innards of xfs_iroot_realloc so that we can reduce
duplicated code prior to genericizing the function. No functional
changes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The bmap btree cannot ever have zero records in an incore btree block.
If the number of records drops to zero, that means we're converting the
fork to extents format and are trying to remove the tree. This logic
won't hold for the future realtime rmap btree, so move the logic into
the bmbt code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Whenever we change the size of the memory buffer holding an inode fork
btree root block, we have to copy the contents over. Refactor all this
into a single function that handles both, in preparation for making
xfs_iroot_realloc more generic.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
While refactoring code, I noticed that when xfs_iroot_realloc tries to
shrink a bmbt root block, it allocates a smaller new block and then
copies "records" and pointers to the new block. However, bmbt root
blocks cannot ever be leaves, which means that it's not technically
correct to copy records. We /should/ be copying keys.
Note that this has never resulted in actual memory corruption because
sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However,
this will no longer be true when we start adding realtime rmap stuff,
so fix this now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we've created inode fork helpers to allocate and free btree
roots, create a new bmap btree helper to create a new bmbt root, and
refactor the extents <-> btree conversion functions to use our new
helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Refactor the code that allocates and freese the incore inode fork btree
roots. This will help us disentangle some of the weird logic when we're
creating and tearing down inode-based btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Replace all the shouty bmap btree and bmap disk root macros with actual
functions, and fix a type handling error in the xattr code that the
macros previously didn't care about.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Enable the metadata directory feature.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Teach online scrub about the metadata directory tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.
(Metadata files of course require per-file scrub code and hence do not
need exposure.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Metadata inodes are private files and therefore cannot be exposed to
userspace. This means no bulkstat, no open-by-handle, no linking them
into the directory tree, and no feeding them to LSMs. As such, we mark
them S_PRIVATE, which stops all that.
While we're at it, put them in a separate lockdep class so that it won't
get confused by "recursive" i_rwsem locking such as what happens when we
write to a rt file and need to allocate from the rt bitmap file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|