Age | Commit message (Collapse) | Author |
|
In an upcoming patch, we will need to be able to look for xfs_buf
objects caching file-based metadata blocks without needing to walk the
(possibly corrupt) structures to find all the buffers. Repair already
has most of the code needed to scan the buffer cache, so hoist these
utility functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Teach the online repair code how to create temporary files or
directories. These temporary files can be used to stage reconstructed
information until we're ready to perform an atomic extent swap to commit
the new metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
We're about to start adding functionality that uses internal inodes that
are private to XFS. What this means is that userspace should never be
able to access any information about these files, and should not be able
to open these files by handle. Callers are not allowed to link these
files into the directory tree, which should suffice to make these
private inodes actually private.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add the atomic swapext feature to the set of features that we will
permit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The VFS exchange-range alignment checks use (fast) bitmasks to perform
block alignment checks on the exchange parameters. Unfortunately,
bitmasks require that the alignment size be a power of two. This isn't
true for realtime devices, so we have to copy-pasta the VFS checks using
long division for this to work properly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that bmap items support the realtime device, we can add the
necessary pieces to the atomic extent swapping code to support such
things.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The previous commit added a new swapext flag that enables us to perform
post-swap processing on file2 once we're done swapping the extent maps.
Now add this ability for symlinks.
This isn't used anywhere right now, but we need to have the basic ondisk
flags in place so that a future online symlink repair feature can
salvage the remote target in a temporary link and swap the data forks
when ready. If one file is in extents format and the other is inline,
we will have to promote both to extents format to perform the swap.
After the swap, we can try to condense the fixed symlink down to inline
format if possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The previous commit added a new swapext flag that enables us to perform
post-swap processing on file2 once we're done swapping the extent maps.
Now add this ability for directories.
This isn't used anywhere right now, but we need to have the basic ondisk
flags in place so that a future online directory repair feature can
create salvaged dirents in a temporary directory and swap the data forks
when ready. If one file is in extents format and the other is inline,
we will have to promote both to extents format to perform the swap.
After the swap, we can try to condense the fixed directory down to
inline format if possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add a new swapext flag that enables us to perform post-swap processing
on file2 once we're done swapping the extent maps. If we were swapping
the extended attributes, we want to be able to convert file2's attr fork
from block to inline format.
This isn't used anywhere right now, but we need to have the basic ondisk
flags in place so that a future online xattr repair feature can create
salvaged attrs in a temporary file and swap the attr forks when ready.
If one file is in extents format and the other is inline, we will have to
promote both to extents format to perform the swap. After the swap, we
can try to condense the fixed file's attr fork back down to inline
format if possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Migrate the old XFS_IOC_SWAPEXT implementation to use our shiny new one.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If userspace permits non-atomic swap operations, use the older code
paths to implement the same functionality.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Port the old extent fork swapping function to take a xfs_swapext_req as
input, which aligns it with the new fiexchange interface.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we've moved the old swapext code to use the new log-assisted
extent swap code for rmap filesystems, let's start porting the old
implementation to the new ioctl interface so that later we can port the
old interface to the new interface.
Consolidate the reflink flag swap code and the the bmbt owner change
scan code in xfs_swap_extent_forks, since both interfaces are going to
need that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The inner loop of xfs_swap_extents_rmap does the same work as
xfs_swapext_finish_one, so adapt it to use that. Doing so has the side
benefit that the older code path no longer wastes its time remapping
shared extents.
This forms the basis of the non-atomic swaprange implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add an errortag so that we can test recovery of swapext log items.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add a function to handle file range exchange requests from the vfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Plumb the necessary bits into the xlog code so that higher level callers
can enable the atomic extent swapping feature and have it clear
automatically when possible.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we've created the skeleton of a log intent item to track and
restart extent swap operations, add the upper level logic to commit
intent items and turn them into concrete work recorded in the log. We
use the deferred item "multihop" feature that was introduced a few
patches ago to constrain the number of active swap operations to one per
thread.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Introduce a new intent log item to handle swapping extents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a log incompat flag so that we only attempt to process swap
extent log items if the filesystem supports it, and a geometry flag to
advertise support if it's present.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
We're about to define a new XFS_SB_FEAT_INCOMPAT_LOG_ bit, which means
that callers will soon require the ability to toggle on and off
different log incompat feature bits. Parameterize the
xlog_{use,drop}_incompat_feat and xfs_sb_remove_incompat_log_features
functions so that callers can specify which feature they're trying to
use and so that we can clear individual log incompat bits as needed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a helper function that can compute if a 64-bit number is an
integer multiple of a 32-bit number, where the 32-bit number is not
required to be an even power of two. This is needed for some new code
for the realtime device, where we can set 37k allocation units and then
have to remap them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a new helper function to calculate the fundamental allocation
unit (i.e. the smallest unit of space we can allocate) of a file.
Things are going to get hairy with range-exchange on the realtime
device, so prepare for this now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Introduce a new ioctl to handle swapping ranges of bytes between files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Move xfs_symlink_write_target to xfs_symlink_remote.c so that kernel and
mkfs can share the same function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Move xfs_readlink_bmap_ilocked to xfs_symlink_remote.c so that the
swapext code can use it to convert a remote format symlink back to
shortform format after a metadata repair. While we're at it, fix a
broken printf prefix.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Move declarations for libxfs symlink functions into a separate header
file like we do for most everything else.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The deferred bmap work state and the log item can transmit unwritten
state, so the XFS_BMAP_MAP handler must map in extents with that
unwritten state.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The deferred bmap update log item has always supported the attr fork, so
plumb this in so that higher layers can access this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we have reflink on the realtime device, bmap intent items have
to support remapping extents on the realtime volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Extend the bmap update (BUI) log items with a new realtime flag that
indicates that the updates apply against a realtime file's data fork.
We'll wire up the actual code later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert
the physical extent of a data fork mapping for a realtime file into rt
extents and pass that to the rt extent freeing function. Since the
details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to
xfs_rtbitmap.c to reduce code size when realtime isn't enabled.
This will (one day) enable realtime EFIs to reuse the same
unit-converting call with less code duplication.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When XFS_BMAPI_REMAP is passed to bunmapi, that means that we want to
remove part of a block mapping without touching the allocator. For
realtime files with rtextsize > 1, that also means that we should skip
all the code that changes a partial remove request into an unwritten
extent conversion. IOWs, bunmapi in this mode should handle removing
the mapping from the rt file and nothing else.
Note that XFS_BMAPI_REMAP callers are required to decrement the
reference count and/or free the space manually.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Remove this single-use helper.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Pass the incore bmap structure to the tracepoints instead of open-coding
the argument passing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
We're about to start adding support for deferred log intent items for
realtime extents, so split these four types into separate classes so
that we can customize them as the transition happens.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Hook the regular rmap code when an rmapbt repair operation is running so
that we can unlock the AGF buffer to scan the filesystem and keep the
in-memory btree up to date during the scan.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create an in-memory btree of rmap records instead of an array. This
enables us to do live record collection instead of freezing the fs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Rebuild the reverse mapping btree from all primary metadata. This first
patch establishes the bare mechanics of finding records and putting
together a new ondisk tree; more complex pieces are needed to make it
work properly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a helper so that we can stop open-coding this decision
everywhere.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add to our stubbed-out in-memory btrees the ability to connect them with
an actual in-memory backing file (aka xfiles) and the necessary pieces
to track free space in the xfile and flush dirty xfbtree buffers on
demand, which we'll need for online repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Adapt the generic btree cursor code to be able to create a btree whose
buffers come from a (presumably in-memory) buftarg with a header block
that's specific to in-memory btrees. We'll connect this to other parts
of online scrub in the next patches.
Note that in-memory btrees always have a block size matching the system
memory page size for efficiency reasons.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Don't waste tracepoint segment memory on per-btree block allocation
tracepoints when we can do it from the generic btree code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Don't waste tracepoint segment memory on per-btree block freeing
tracepoints when we can do it from the generic btree code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Allow the buffer cache to target in-memory files by connecting it to
xfiles.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Currently, cached buffers are indexed by per-AG hashtables. This works
great for the data device, but won't work for in-memory btrees. Make it
so that buftargs can index buffers too. Introduce XFS_BSTATE_CACHED as
an explicit state flag for buffers that are cached in an rhashtable,
since we can't rely on b_pag being set for buffers that are cached but
not on behalf of an AG. We'll soon be using the buffer cache for
xfiles.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add a debug function to dump an xfile's contents for debug purposes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Use the same summary counter calculation infrastructure to generate new
values for the in-core summary counters. The difference between the
scrubber and the repairer is that the repairer will freeze the fs during
setup, which means that the values should match exactly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Nobody uses this code anymore, so get rid of it. It was racy with
regards to freezes and remounts anyway.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If the fscounters scrubber notices incorrect summary counters, it's
entirely possible that scrub is simply racing with other threads that
are updating the incore counters. There isn't a good way to stabilize
percpu counters or set ourselves up to observe live updates with hooks
like we do for the quotacheck or nlinks scanners, so we instead choose
to freeze the filesystem long enough to walk the incore per-AG
structures.
Past me thought that it was going to be commonplace to have to freeze
the filesystem to perform some kind of repair and set up a whole
separate infrastructure to freeze the filesystem in such a way that
userspace could not unfreeze while we were running. This involved
adding a mutex and freeze_super/thaw_super functions and dealing with
the fact that the VFS freeze/thaw functions can free the VFS superblock
references on return.
This was all very overwrought, since fscounters turned out to be the
only user of scrub freezes, and it doesn't require the log to quiesce,
only the incore superblock counters. We prevent other threads from
changing the freeze level by adding a new SB_FREEZE_EXCLUSIVE level.
The end result is that fscounters should be much more efficient. When
we're checking a busy system and we can't stabilize the counters, the
custom freeze will do less work, which should result in less downtime.
Repair should be similarly speedy, but that's in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|