Age | Commit message (Collapse) | Author |
|
Enable quotas for the realtime device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When growing the realtime bitmap or summary inodes, use
xfs_trans_alloc_inode to reserve quota for the blocks that could be
allocated to the file. Although we never enforce limits against the
root dquot, making a reservation means that the bmap code will update
the quota block count, which is necessary for correct accounting.
Found by running xfs/521.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Make chown's quota adjustments work with realtime files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add the necessary alignment checking code to the reflink remap code to
ensure that remap requests are aligned to rt extent boundaries if the
realtime extent size isn't a power of two. The VFS helpers assume that
they can use the usual (blocksize - 1) masking to avoid slow 64-bit
division, but since XFS is special we won't make everyone pay that cost
for our weird edge case.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Make the necessary tweaks to the reflink remapping code to support
remapping on the realtime volume when the rt extent size is larger than
a single rt block. We need to check that the remap arguments from
userspace are aligned to a rt extent boundary, and that the length
is always aligned, even if the kernel tried to round it up to EOF for
us. XFS can only map and remap full rt extents, so we have to be a
little more strict about the alignment there.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
CoW extent size hints are not allowed on filesystems that have large
realtime extents because we only want to perform the minimum required
amount of write-around (aka write amplification) for shared extents.
On filesystems where rtextsize > 1, allocations can only be done in
units of full rt extents, which means that we can only map an entire rt
extent's worth of blocks into the data fork. Hole punch requests become
conversions to unwritten if the request isn't aligned properly.
Because a copy-write fundamentally requires remapping, this means that
we also can only do copy-writes of a full rt extent. This is too
expensive for large hint sizes, since it's all or nothing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If we have shared realtime files and the rt extent size is larger than a
single fs block, we need to extend writeback requests to be aligned to
rt extent size granularity because we cannot share partial rt extents.
The front end should have set us up for this by dirtying the relevant
ranges.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
As noted in the previous patch, XFS can only unmap and map full rt
extents. This means that we cannot stop mid-extent for any reason,
including stepping around unwritten/written extents. Second, the
reflink and CoW mechanisms were not designed to handle shared unwritten
extents, so we have to do something to get rid of them.
If the user asks us to remap two files, we must scan both ranges
beforehand to convert any unwritten extents that are not aligned to rt
extent boundaries into zeroed written extents before sharing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Copy on write encounters a major plot twist when the file being CoW'd
lives on the realtime volume and the realtime extent size is larger than
a single filesystem block. XFS can only unmap and remap full rt
extents, which means that allocations are always done in units of full
rt extents, and a request to unmap less than one extent is treated as a
request to convert an extent to unwritten status.
This behavioral quirk is not compatible with the existing CoW mechanism,
so we have to intercept every path through which files can be modified
to ensure that we dirty an entire rt extent at once so that we can remap
a full rt extent. Use the existing VFS unshare functions to dirty the
page cache to set that up.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
In anticipation of enabling reflink on the realtime volume where the
allocation unit is larger than a page, create an iomap function to dirty
arbitrary parts of a file's page cache so that when we dirty part of a
file that could undergo a COW extent, we can dirty an entire allocation
unit's worth of pages.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Make it so that filesystems can pass an explicit blocksize to the remap
prep function. This enables filesystems whose fundamental allocation
units are /not/ the same as the blocksize to ensure that the remapping
checks are aligned properly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Enable reflink for realtime devices, sort of.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Port the CoW fork repair to realtime files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Plumb knowledge of refcount btrees into the inode core repair code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Port the data device's refcount btree repair code to the realtime
refcount btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Walk the realtime refcount btree to find the CoW staging extents when
we're rebuilding the realtime rmap btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When we're rebuilding the data device rmap, if we encounter a "refcount"
format fork, we have to walk the (realtime) refcount btree inode to
build the appropriate mappings.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When we're rebuilding the realtime bitmap, check the proposed free
extents against the rt refcount btree to make sure we don't commit any
grievous errors.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If we encounter a directory that has been configured to pass on a CoW
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review and/or turn it off because that is a
misconfiguration.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Update the quota scrubber to allow dquots where the realtime block count
exceeds the block count of the rt volume if reflink is enabled.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If there's a gap between records in the rt refcount btree, we ought to
cross-reference the gap with the rtrmap records to make sure that there
aren't any overlapping records for a region that doesn't have any shared
ownership.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Allow overlapping realtime reverse mapping records if they both describe
shared data extents and the fs supports reflink on the realtime volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Use the realtime refcount btree to implement cross-reference checks in
other data structures.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add code to scrub realtime refcount btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If we're adding enough space to the realtime section to require the
creation of new realtime groups, create the rt refcount btree inode
before we start adding the space.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The size of filesystem transaction reservations depends on the maximum
height (maxlevels) of the realtime btrees. Since we don't want a grow
operation to increase the reservation size enough that we'll fail the
minimum log size checks on the next mount, constrain growfs operations
if they would cause an increase in the rt refcount btree maxlevels.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint. Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.
Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files. This apparently was done to disable
delayed allocation for realtime files.
However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files. In this case, the
logic will incorrectly send the write through the delalloc write path.
Fix this by adjusting the logic slightly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we can share blocks between realtime files, allow this
combination.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Update the remapping routines to be able to handle realtime files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Update our write paths to support copy on write on the rt volume. This
works in more or less the same way as it does on the data device, with
the major exception that we never do delalloc on the rt volume.
Because we consider unwritten CoW fork staging extents to be incore
quota reservation, we update xfs_quota_reserve_blkres to support this
case. Though xfs doesn't allow rt and quota together, the change is
trivial and we shouldn't leave a logic bomb here.
While we're at it, add a missing xfs_mod_delalloc call when we remove
delalloc block reservation from the inode. This is largely irrelvant
since realtime files do not use delalloc, but we want to avoid leaving
logic bombs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Hoist all quota updates for reflink into a helper function, since things
are about to become more complicated.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Don't error out on CoW staging extent records when realtime reflink is
enabled.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Create a library routine to allocate and initialize an empty realtime
refcountbt inode. We'll use this for growfs, mkfs, and repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with new fork type and
on-disk interpretation functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Add a metadir path to select the realtime refcount btree inode and load
it at mount time. The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Identify rt refcount btree blocks in the log correctly so that we can
validate them during log recovery.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Now that we have reflink on the realtime device, refcount intent items
have to support remapping extents on the realtime volume.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt. We'll
wire up the actual refcount code later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions. Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.
Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Start filling out the rtrefcount btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate refcount btree blocks. This prepares the way for connecting
the btree operations implementation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|