summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
2024-03-31bcachefs: Split out recovery_passes.cKent Overstreet
We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: fix backpointer for missing alloc key msgKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix bch2_btree_increase_depth()Kent Overstreet
When we haven't yet allocated any btree nodes for a given btree, we first need to call the regular split path to allocate one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Kill bch2_bkey_ptr_data_type()Kent Overstreet
Remove some duplication, and inconsistency between check_fix_ptrs and the main ptr marking paths Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix use after free in check_root_trans()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix repair path for missing indirect extentsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix use after free in bch2_check_fix_ptrs()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix btree node keys accounting in topology repair pathKent Overstreet
When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Check btree ptr min_key in .invalidKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: add REQ_SYNC and REQ_IDLE in write diozhuxiaohui
when writing file with direct_IO on bcachefs, then performance is much lower than other fs due to write back throttle in block layer: wbt_wait+1 __rq_qos_throttle+32 blk_mq_submit_bio+394 submit_bio_noacct_nocheck+649 bch2_submit_wbio_replicas+538 __bch2_write+2539 bch2_direct_write+1663 bch2_write_iter+318 aio_write+355 io_submit_one+1224 __x64_sys_io_submit+169 do_syscall_64+134 entry_SYSCALL_64_after_hwframe+110 add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Improved topology repair checksKent Overstreet
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Be careful about btree node splits during journal replayKent Overstreet
Don't pick a pivot that's going to be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finishedKent Overstreet
btree_and_journal_iter is now safe to use at runtime, not just during recovery before journal keys have been freed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: fix trans->mem realloc in __bch2_trans_kmallocHongbo Li
The old code doesn't consider the mem alloced from mempool when call krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX" is inaccurate when trans->mem was allocated by krealloc function. Instead, we use used_mempool stuff to record the situation, and realloc or free the trans->mem in elegant way. Also, after krealloc failed in __bch2_trans_kmalloc, the old data should be copied to the new buffer when alloc from mempool_alloc. Fixes: 31403dca5bb1 ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Don't do extent merging before journal replay is finishedKent Overstreet
We don't normally do extent updates this early in recovery, but some of the repair paths have to and when we do, we don't want to do anything that requires the snapshots table. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Add checks for invalid snapshot IDsKent Overstreet
Previously, we assumed that keys were consistent with the snapshots btree - but that's not correct as fsck may not have been run or may not be complete. This adds checks and error handling when using the in-memory snapshots table (that mirrors the snapshots btree). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Move snapshot table size to struct snapshot_tableKent Overstreet
We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Add an assertion for trying to evict btree rootKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: fix mount error pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text()Thomas Bertschinger
before: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0bi_atime=227064388944 ... after: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0 bi_atime=227064388944 ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix journal pins in btree write bufferKent Overstreet
btree write buffer flush has two phases - in natural key order, which is more efficient but may fail - then in journal order The journal order flush was assuming that keys were still correctly ordered by journal sequence number - but due to coalescing by the previous phase, we need an additional sort. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix assert in bch2_backpointer_invalid()Kent Overstreet
Backpointers that point to invalid devices are caught by fsck, not .key_invalid; so .key_invalid needs to check for them instead of hitting asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-19Merge tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Assorted bugfixes. Most are fixes for simple assertion pops; the most significant fix is for a deadlock in recovery when we have to rewrite large numbers of btree nodes to fix errors. This was incorrectly running out of the same workqueue as the core interior btree update path - we now give it its own single threaded workqueue. This was visible to users as "bch2_btree_update_start(): error: BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging" * tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix lost wakeup on journal shutdown bcachefs; Fix deadlock in bch2_btree_update_start() bcachefs: ratelimit errors from async_btree_node_rewrite bcachefs: Run check_topology() first bcachefs: Improve bch2_fatal_error() bcachefs: Fix lost transaction restart error bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info bcachefs: fix for building in userspace bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init() bcachefs: Improve sysfs internal/btree_updates bcachefs: Split out btree_node_rewrite_worker bcachefs: Fix locking in bch2_alloc_write_key() bcachefs: Avoid extent entry type assertions in .invalid() bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested bcachefs: Fix check_key_has_snapshot() call bcachefs: Change "accounting overran journal reservation" to a warning
2024-03-18bcachefs: Fix lost wakeup on journal shutdownbcachefs-2024-03-19Kent Overstreet
We need to check for journal shutdown first in __journal_res_get() - after the journal is shutdown, j->watermark won't be changing anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs; Fix deadlock in bch2_btree_update_start()Kent Overstreet
BCH_TRANS_COMMIT_journal_reclaim with watermark != BCH_WATERMARK_reclaim means nonblocking, and we need the journal_res_get() in btree_update_start() to respect that. In a future refactoring we'll be deleting BCH_TRANS_COMMIT_journal_reclaim and replacing it with an explicit BCH_TRANS_COMMIT_nonblocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: ratelimit errors from async_btree_node_rewriteKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Run check_topology() firstKent Overstreet
check_topology() doesn't actually require alloc info - and running it first means other passes don't have to catch btree read errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Fix lost transaction restart errorKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Don't corrupt journal keys gap buffer when dropping alloc infoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: fix for building in userspaceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recoveryKent Overstreet
this fixes an assertion pop in bch2_check_snapshot_trees() -> check_snapshot_tree() -> bch2_snapshot_tree_master_subvol() -> bch2_snapshot_is_ancestor() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()Kent Overstreet
Nested transaction restart handling is typically best avoided; when the inner context handles a transaction restart it invalidates the outer transaction context, so we need to make sure to return a transaction_restart_nested error. This code wasn't doing that, and hit the assertion in for_each_btree_key() that checks for that via trans->restart_count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Improve sysfs internal/btree_updatesKent Overstreet
Print out the function that launched the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Split out btree_node_rewrite_workerKent Overstreet
This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix locking in bch2_alloc_write_key()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Avoid extent entry type assertions in .invalid()Kent Overstreet
After keys have passed bkey_ops.key_invalid we should never see invalid extent entry types - but .key_invalid itself needs to cope with them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix spurious -BCH_ERR_transaction_restart_nestedKent Overstreet
We only need to return transaction_restart_nested when we're inside a context that's handling transaction restarts. Also, add a missing check_subdir_count() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix check_key_has_snapshot() callKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Change "accounting overran journal reservation" to a warningKent Overstreet
This doesn't need to be a BUG_ON(); the actual serious "things break" condition is if the whole journal write overruns the available space, and that has a fatal error, not a BUG_ON(). This check indicates we screwed something up, but it should be a warning. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-15Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs updates from Kent Overstreet: - Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save|restore} - mempool now does kvmalloc mempools * tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits) bcachefs: time_stats: shrink time_stat_buffer for better alignment bcachefs: time_stats: split stats-with-quantiles into a separate structure bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet bcachefs: time_stats: add larger units bcachefs: pull out time_stats.[ch] bcachefs: reconstruct_alloc cleanup bcachefs: fix bch_folio_sector padding bcachefs: Fix btree key cache coherency during replay bcachefs: Always flush write buffer in delete_dead_inodes() bcachefs: Fix order of gc_done passes bcachefs: fix deletion of indirect extents in btree_gc bcachefs: Prefer struct_size over open coded arithmetic bcachefs: Kill unused flags argument to btree_split() bcachefs: Check for writing superblocks with nonsense member seq fields bcachefs: fix bch2_journal_buf_to_text() lib/generic-radix-tree.c: Make nodes more reasonably sized bcachefs: copy_(to|from)_user_errcode() bcachefs: Split out bkey_types.h bcachefs: fix lost journal buf wakeup due to improved pipelining bcachefs: intercept mountoption value for bool type ...
2024-03-13bcachefs: time_stats: shrink time_stat_buffer for better alignmentbcachefs-2024-03-13Darrick J. Wong
Shrink this percpu object by one array element so that the object size becomes exactly 512 bytes. This will lead to more efficient memory use, hopefully. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: time_stats: split stats-with-quantiles into a separate structureDarrick J. Wong
Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a dietDarrick J. Wong
The only caller of this code (time_stats) always knows the weights and whether or not any information has been collected. Pass this information into the mean and variance code so that it doesn't have to store that information. This reduces the structure size from 24 to 16 bytes, which shrinks each time_stats counter to 192 bytes from 208. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: time_stats: add larger unitsDarrick J. Wong
Filesystems can stay mounted for a very long time, so add some larger units. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: pull out time_stats.[ch]Kent Overstreet
prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: reconstruct_alloc cleanupKent Overstreet
Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: fix bch_folio_sector paddingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix btree key cache coherency during replayKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Always flush write buffer in delete_dead_inodes()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>