bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-04-15	xfs: implement live quotacheck as part of quota repairrepair-part-two_2019-04-15	Darrick J. Wong
	Use the fs freezing mechanism we developed for the rmapbt repair to freeze the fs, this time to scan the fs for a live quotacheck. We add a new dqget variant to use the existing scrub transaction to allocate an on-disk dquot block if it is missing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair the rmapbt	Darrick J. Wong
	Rebuild the reverse mapping btree from all primary metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: introduce online scrub freeze	Darrick J. Wong
	Introduce a new 'online scrub freeze' that we can use to lock out all filesystem modifications and background activity so that we can perform global scans in order to rebuild metadata. This introduces a new IFLAG to the scrub ioctl to indicate that userspace is willing to allow a freeze. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: update health status if we get a clean bill of healthdeferred-inactivation_2019-04-15	Darrick J. Wong
	If scrub finds that everything is ok with the filesystem, we need a way to tell the health tracking that it can let go of indirect health flags, since indirect flags only mean that at some point in the past we lost some context. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: remember sick inodes that get inactivated	Darrick J. Wong
	If an unhealthy inode gets inactivated, remember this fact in the per-fs health summary. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: add secondary and indirect classes to the health tracking system	Darrick J. Wong
	Establish two more classes of health tracking bits: * Indirect problems, which suggest problems in other health domains that we weren't able to preserve. * Secondary problems, which track state that's related to primary evidence of health problems; and The first class we'll use in an upcoming patch to record in the AG health status the fact that we ran out of memory and had to inactivate an inode with defective metadata. The second class we use to indicate that repair knows that an inode is bad and we need to fix it later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: parallelize inode inactivation	Darrick J. Wong
	Split the inode inactivation work into per-AG work items so that we can take advantage of parallelization. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: force inactivation before fallocate when space is low	Darrick J. Wong
	If we think that inactivation will free enough blocks to make it easier to satisfy an fallocate request, force inactivation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: retry fs writes when there isn't space	Darrick J. Wong
	Any time we try a file write that fails due to ENOSPC or EDQUOT, force inactivation work to free up some resources and try one more time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: deferred inode inactivation	Darrick J. Wong
	Instead of calling xfs_inactive directly from xfs_fs_destroy_inode, defer the inactivation phase to a separate workqueue. With this we avoid blocking memory reclaim on filesystem metadata updates that are necessary to free an in-core inode, such as post-eof block freeing, COW staging extent freeing, and truncating and freeing unlinked inodes. Now that work is deferred to a workqueue where we can do the freeing in batches. We introduce two new inode flags -- NEEDS_INACTIVE and INACTIVATING. The first flag helps our worker find inodes needing inactivation, and the second flag marks inodes that are in the process of being inactivated. A concurrent xfs_iget on the inode can still resurrect the inode by clearing NEEDS_INACTIVE (or bailing if INACTIVATING is set). Unfortunately, deferring the inactivation has one huge downside -- eventual consistency. Since all the freeing is deferred to a worker thread, one can rm a file but the space doesn't come back immediately. This can cause some odd side effects with quota accounting and statfs, so we also force inactivation scans in order to maintain the existing behaviors, at least outwardly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: refactor eofblocks inode match code	Darrick J. Wong
	Refactor the code that determines if an inode matches an eofblocks structure into a helper, since we already use it twice and we're about to use it a third time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: refactor walking of per-AG RECLAIM inodes	Darrick J. Wong
	Refactor the code that walks reclaim-tagged inodes so that we can reuse the same loop in a subsequent patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: track unlinked inactive inode quota counters	Darrick J. Wong
	Set up quota counters to track the number of inodes and blocks that will be freed from inactivating unlinked inodes. We'll use this in the deferred inactivation patch to hide the effects of deferred processing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: track unlinked inactive inode fs summary counters	Darrick J. Wong
	Set up counters to track the number of inodes and blocks that will be freed from inactivating unlinked inodes. We'll use this in the deferred inactivation patch to hide the effects of deferred processing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: decide if inode needs inactivation	Darrick J. Wong
	Add a predicate function to decide if an inode needs (deferred) inactivation. Any file that has been unlinked or has speculative preallocations either for post-EOF writes or for CoW qualifies. This function will also be used by the upcoming deferred inactivation patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: refactor the predicate part of xfs_free_eofblocks	Darrick J. Wong
	Refactor the part of _free_eofblocks that decides if it's really going to truncate post-EOF blocks into a separate helper function. The upcoming deferred inode inactivation patch requires us to be able to decide this prior to actual inactivation. No functionality changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: convert big array and blob array to use memfd backendrepair-part-one_2019-04-15	Darrick J. Wong
	There are several problems with the initial implementations of the big array and the blob array data structures. First, using linked lists imposes a two-pointer overhead on every record stored. For blobs this isn't serious, but for fixed-size records this increases memory requirements by 40-60%. Second, we're using kernel memory to store the intermediate records. Kernel memory cannot be paged out, which means we run the risk of OOMing the machine when we run out of physical memory. Therefore, replace the linked lists in both structures with memfd files. Random access becomes much easier, memory overhead drops to a negligible amount, and because memfd pages can be swapped, we have considerably more flexibility for memory use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair quotas	Darrick J. Wong
	Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: scrub should set preen if attr leaf has holes	Darrick J. Wong
	If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-04-15	xfs: repair extended attributes	Darrick J. Wong
	If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, zap the attr tree, and re-add the values. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: create a new inode fork block unmap helper	Darrick J. Wong
	Create a new helper to unmap blocks from an inode's fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: remove unnecessary inode-transaction roll	Darrick J. Wong
	Remove the transaction roll at the end of the loop in xfs_itruncate_extents_flags. xfs_defer_finish takes care of rolling the transaction as needed and reattaching the inode, which means we already start each loop with a clean transaction. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: convert xfs_itruncate_extents_flags to use __xfs_bunmapi	Darrick J. Wong
	There's no reason why we can't consume unmap_len, just use the raw version. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: create a blob array data structure	Darrick J. Wong
	Create a simple 'blob array' data structure for storage of arbitrarily sized metadata objects that will be used to reconstruct metadata. For the intended usage (temporarily storing extended attribute names and values) we only have to support storing objects and retrieving them. This initial implementation uses linked lists to store the blobs, but a subsequent patch will restructure the backend to avoid using high order pinned kernel memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair damaged symlinks	Darrick J. Wong
	Repair inconsistent symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair inode block maps	Darrick J. Wong
	Use the reverse-mapping btree information to rebuild an inode fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: zap broken inode forks	Darrick J. Wong
	Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair inode records	Darrick J. Wong
	Try to reinitialize corrupt inodes, or clear the reflink flag if it's not needed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair refcount btrees	Darrick J. Wong
	Reconstruct the refcount data from the rmap btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair inode btrees	Darrick J. Wong
	Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: separate inode geometry	Darrick J. Wong
	Separate the inode geometry information into a distinct structure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: repair free space btrees	Darrick J. Wong
	Rebuild the free space btrees from the gaps in the rmap btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: create a big array data structure	Darrick J. Wong
	Create a simple 'big array' data structure for storage of fixed-size metadata records that will be used to reconstruct a btree index. For repair operations, the most important operations are append, iterate, and sort; while supported, get and put are not for frequent use. For the initial implementation we will use linked-list containers, though a subsequent patch will restructure the backend to avoid using pinned kernel memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: add a repair revalidation function pointer	Darrick J. Wong
	Allow repair functions to set a separate function pointer to validate the metadata that they've rebuilt. This prevents us from exiting from a repair function that rebuilds both A and B without checking that both A and B can pass a scrub test. We'll need this for the free space and inode btree repair strategies. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: extend the range of flush_unmap rangesstale-exposure_2019-04-15	Darrick J. Wong
	If we have to initiate writeback of a range that starts beyond the on-disk EOF, extend the flushed range to start at the on-disk EOF so that there's no chance that we put real extents in the data fork having not actually flushed the data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: force writes to delalloc regions to unwritten	Darrick J. Wong
	When writing to a delalloc region in the data fork, commit the new allocations (of the da reservation) as unwritten so that the mappings are only marked written once writeback completes successfully. This fixes the problem of stale data exposure if the system goes down during targeted writeback of a specific region of a file, as tested by generic/042. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: scrub should only cross-reference with healthy btreesscrub-health-tracking_2019-04-15	Darrick J. Wong
	Skip cross-referencing with a btree if the health report tells us that it's known to be bad. This should reduce the dmesg spew considerably. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: scrub/repair should update filesystem metadata health	Darrick J. Wong
	Now that we have the ability to track sick metadata in-core, make scrub and repair update those health assessments after doing work. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: hoist the already_fixed variable to the scrub context	Darrick J. Wong
	Now that we no longer memset the scrub context, we can move the already_fixed variable into the scrub context's state flags instead of passing around pointers to separate stack variables. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: collapse scrub bool state flags into a single unsigned int	Darrick J. Wong
	Combine all the boolean state flags in struct xfs_scrub into a single unsigned int, because we're going to be adding more state flags soon. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: refactor scrub context initialization	Darrick J. Wong
	It's a little silly how the memset in scrub context initialization forces us to declare stack variables to preserve context variables across a retry. Since the teardown functions already null out most of the ephemeral state (buffer pointers, btree cursors, etc.), just skip the memset and move the initialization as needed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	ext4: don't allow any modifications to an immutable fileimmutable-files_2019-04-15	Darrick J. Wong
	Don't allow any modifications to a file that's marked immutable, which means that we have to flush all the writable pages to make the readonly and we have to check the setattr/setflags parameters more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: don't allow most setxattr to immutable files	Darrick J. Wong
	The chattr manpage has this to say about immutable files: "A file with the 'i' attribute cannot be modified: it cannot be deleted or renamed, no link can be created to this file, most of the file's metadata can not be modified, and the file can not be opened in write mode." However, we don't actually check the immutable flag in the setattr code, which means that we can update project ids and extent size hints on supposedly immutable files. Therefore, reject a setattr call on an immutable file except for the case where we're trying to unset IMMUTABLE. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: clean up xfs_merge_ioc_xflags	Darrick J. Wong
	Clean up the calling convention since we're editing the fsxattr struct anyway. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: refactor setflags to use setattr code directly	Darrick J. Wong
	Refactor the SETFLAGS implementation to use the SETXATTR code directly instead of partially constructing a struct fsxattr and calling bits and pieces of the setxattr code. This reduces code size and becomes necessary in the next patch to maintain the behavior of allowing userspace to set immutable on an immutable file so long as nothing /else/ about the attributes change. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: flush page mappings as part of setting immutable	Darrick J. Wong
	The chattr manpage has this to say about immutable files: "A file with the 'i' attribute cannot be modified: it cannot be deleted or renamed, no link can be created to this file, most of the file's metadata can not be modified, and the file can not be opened in write mode." This means that we need to flush the page cache when setting the immutable flag so that all mappings will become read-only again and therefore programs cannot continue to write to writable mappings. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction	Darrick J. Wong
	We passed an inode into xfs_ioctl_setattr_get_trans with join_flags indicating which locks are held on that inode. If we can't allocate a transaction then we need to unlock the inode before we bail out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	mm/fs: don't allow writes to immutable files	Darrick J. Wong
	The chattr manpage has this to say about immutable files: "A file with the 'i' attribute cannot be modified: it cannot be deleted or renamed, no link can be created to this file, most of the file's metadata can not be modified, and the file can not be opened in write mode." Once the flag is set, it is enforced for quite a few file operations, such as fallocate, fpunch, fzero, rm, touch, open, etc. However, we don't check for immutability when doing a write(), a PROT_WRITE mmap(), a truncate(), or a write to a previously established mmap. If a program has an open write fd to a file that the administrator subsequently marks immutable, the program still can change the file contents. Weird! The ability to write to an immutable file does not follow the manpage promise that immutable files cannot be modified. Worse yet it's inconsistent with the behavior of other syscalls which don't allow modifications of immutable files. Therefore, add the necessary checks to make the write, mmap, and truncate behavior consistent with what the manpage says and consistent with other syscalls on filesystems which support IMMUTABLE. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-15	xfs: merge adjacent io completions of the same typemerged-completions_2019-04-15	Darrick J. Wong
	It's possible for pagecache writeback to split up a large amount of work into smaller pieces for throttling purposes or to reduce the amount of time a writeback operation is pending. Whatever the reason, XFS can end up with a bunch of IO completions that call for the same operation to be performed on a contiguous extent mapping. Since mappings are extent based in XFS, we'd prefer to run fewer transactions when we can. When we're processing an ioend on the list of io completions, check to see if the next items on the list are both adjacent and of the same type. If so, we can merge the completions to reduce transaction overhead. On fast storage this doesn't seem to make much of a difference in performance, though the number of transactions for an overnight xfstests run seems to drop by ~5%. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-04-15	xfs: remove unused m_data_workqueue	Darrick J. Wong
	Now that we're no longer using m_data_workqueue, remove it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>