Age | Commit message (Collapse) | Author |
|
In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation. If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.
Unfortunately, the very next thing we do is commit the transaction and
drop the inode. If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up. Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.
Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
In most places in XFS, we have a specific order in which we gather
resources: grab the inode, allocate a transaction, then lock the inode.
xfs_bui_item_recover doesn't do it in that order, so fix it to be more
consistent. This also makes the error bailout code a bit less weird.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The bmap intent item checking code in xfs_bui_item_recover is spread all
over the function. We should check the recovered log item at the top
before we allocate any resources or do anything else, so do that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should absorb the remaining block reservation so
that when we continue the dfops chain, we still have those blocks to
use.
This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue. On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items. As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items. Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.
For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items. After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.
This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.
Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent. Finally, refactor the loop
that replays those chains to do so using one transaction per chain.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Repair the realtime rmap btree while mounted.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Rebuild the realtime bitmap from the realtime rmap btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Repair the block mappings of realtime files.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Teach the data fork and realtime bitmap scrubbers to cross-reference
information with the realtime rmap btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When we're checking the realtime rmap btree entries, cross-reference
those entries with the realtime bitmap too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a pair of helpers to deal with setting up the necessary incore
context to check metadata records against the realtime metadata. This
was already (sort of) open-coded in the data fork checker.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Connect the getfsmap ioctl to the realtime rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
If the administrator asks us to add a realtime volume to an existing
rmap filesystem, we must allocate and attach the rtrmapbt inode to the
system prior to enabling the rt volume.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a library routine to allocate and initialize an empty realtime
rmapbt inode. We'll use this for growfs, mkfs, and repair.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks. This enables us to
perform rmap operations against the correct btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Plumb in the pieces we need to embed the root of the realtime rmap
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Add a metadir path to select the realtime rmap btree inode and load
it at mount time. The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Identify rtrmapbt blocks in the log correctly so that we can
validate them during log recovery.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt. We'll
wire up the actual rmap code later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions. Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Start filling out the rtrmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for connecting the
btree operations implementation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Add new realtime rmap btree definitions. The realtime rmap btree will
be rooted from a hidden inode, but has its own shape and therefore
needs to have most of its own separate types.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Change the startblock and blockcount fields of xfs_rmap_irec to be 64
bits wide. This enables us to use the same high level rmap code for
either tree. We'll also collect all the resulting breakage fixes here.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Make it so that we can actually store btree records in the inode
core (i.e. enable bb_level == 0) so that the rtrmapbt can do this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
For btrees that are rooted in the inode core, we have to have a
function to resize the root. This is fairly specific to each
btree type, so make xfs_iroot_realloc a per-btree function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Enable the metadata inode directory feature.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk. Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log. Therefore, disable
AGI rotoring for metadata inode allocations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Since xfs_imeta_create can create new metadata files arbitrarily deep in
the metadata directory tree, we must supply a function that can ensure
that all directories in a path exist, and call it before the quota
functions create the quota inodes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Plumb in the bits we need to look up metadata inode numbers from the
metadata inode directory and save them back.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Convert the magic metadata inode lookup keys to use actual strings
for paths.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Load the metadata directory inode into memory at mount time and release
it at unmount time. We also make sure that the obsolete inode pointers
in the superblock are not logged or read from the superblock.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Define the on-disk layout and feature flags for the metadata inode
directory feature.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a xfs_iget_meta function for metadata inodes to ensure that we
always check that the inobt thinks a metadata inode is in use.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Convert all open-coded sb metadata inode pointer logging to use
xfs_imeta_log.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Refactor the group and project quota inode pointer switcheroo that
happens only on v4 filesystems into a separate function prior to
enhancing the xfs_qm_qino_alloc function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create transaction reservation types and block reservation helpers to
help us calculate transaction requirements.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create some helper routines to get and set metadata inode numbers
instead of open-coding them throughout xfs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Get rid of the largely pointless xfs_cross_rename now that we've
refactored its parent.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a new libxfs function to rename two directory entries. The
upcoming metadata directory feature will need this to replace a metadata
inode directory entry.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a new libxfs function to exchange two directory entries.
The upcoming metadata directory feature will need this to replace a
metadata inode directory entry.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a new libxfs function to remove a (name, inode) entry from a
directory. The upcoming metadata directory feature will need this to
create a metadata directory tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a libxfs helper function that marks an inode free on disk.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a new libxfs function to link an existing inode into a directory.
The upcoming metadata directory feature will need this to create a
metadata directory tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Create a new libxfs function to link a newly created inode into a
directory. The upcoming metadata directory feature will need this to
create a metadata directory tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|