bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
5 hours	bcachefs: opts.casefold_disabledbcachefs-for-upstream	Kent Overstreet
	Add an option for completely disabling casefolding on a filesystem, as a workaround for overlayfs. This should only be needed as a temporary workaround, until the overlayfs fix arrives. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 hours	bcachefs: Fix incorrect transaction restart handling	Alan Huang
	Reported-by: syzbot+cc7567f096079cb4146f@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
46 hours	bcachefs: fix btree_trans_peek_prev_journal()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 days	bcachefs: mark invalid_btree_id autofix	Bharadwaj Raju
	Checking for invalid IDs was introduced in 9e7cfb35e266 ("bcachefs: Check for invalid btree IDs") to prevent an invalid shift later, but since 141526548052 ("bcachefs: Bad btree roots are now autofix") which made btree_root_bkey_invalid autofix, the fsck_err_on call didn't do anything. We can mark this err type (invalid_btree_id) autofix as well, so it gets handled. Reported-by: syzbot+029d1989099aa5ae3e89@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=029d1989099aa5ae3e89 Fixes: 141526548052 ("bcachefs: Bad btree roots are now autofix") Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 days	bcachefs: Plumb correct ip to trans_relock_fail tracepointbcachefs-2025-06-26	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 days	bcachefs: Ensure we rewind to run recovery passes	Kent Overstreet
	Fix a 6.16 regression from the recovery pass rework, which introduced a bug where calling bch2_run_explicit_recovery_pass() would only return the error code to rewind recovery for the first call that scheduled that recovery pass. If the error code from the first call was swallowed (because it was called by an asynchronous codepath), subsequent calls would go "ok, this pass is already marked as needing to run" and return 0. Fixing this ensures that check_topology bails out to run btree_node_scan before doing any repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 days	bcachefs: Ensure btree node scan runs before checking for scanned nodes	Kent Overstreet
	Previously, calling bch2_btree_has_scanned_nodes() when btree node scan hadn't actually run would erroniously return false - causing us to think a btree was entirely gone. This fixes a 6.16 regression from moving the scheduling of btree node scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule it persistently in the superblock) and only scheduling it when check_toploogy() is asking for scanned btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 days	bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix	Kent Overstreet
	Autofix is specified in btree_gc.c if it's not an important btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 days	bcachefs: fix bch2_journal_keys_peek_prev_min() underflow	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 days	bcachefs: Use wait_on_allocator() when allocating journal	Kent Overstreet
	wait_on_allocator() emits debug info when we hang trying to allocate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 days	bcachefs: Check for bad write buffer key when moving from journal	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 days	bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked	Alan Huang
	Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 days	bcachefs: Fix range in bch2_lookup_indirect_extent() error path	Kent Overstreet
	Before calling bch2_indirect_extent_missing_error(), we have to calculate the missing range, which is the intersection of the reflink pointer and the non-indirect-extent we found. The calculation didn't take into account that the returned extent may span the iter position, leading to an infinite loop when we (unnecessarily) resized the extent we were returning to one that didn't extend past the offset we were looking up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 days	bcachefs: fix spurious error_throw	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 days	bcachefs: Add missing bch2_err_class() to fileattr_set()	Kent Overstreet
	Make sure we return a standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 days	bcachefs: Add missing key type checks to check_snapshot_exists()bcachefs-2025-06-19	Kent Overstreet
	For now we only have one key type in these btrees, but forward compatibility means we do have to check. Reported-by: syzbot+b4cb4a6988aced0cec4b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 days	bcachefs: Don't log fsck err in the journal if doing repair elsewhere	Kent Overstreet
	This fixes exceeding the bump allocator limit when the allocator finds many buckets that need repair - they're repaired asynchronously, which means that every error logged a message in the bump allocator, without committing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
11 days	bcachefs: Fix *__bch2_trans_subbuf_alloc() error path	Kent Overstreet
	Don't change buf->size on error - this would usually be a transaction restart, but it could also be -ENOMEM - when we've exceeded the bump allocator max). Fixes: 247abee6ae6d ("bcachefs: btree_trans_subbuf") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 days	bcachefs: Fix missing newlines before ero	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 days	bcachefs: fix spurious error in read_btree_roots()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 days	bcachefs: fsck: Fix oops in key_visible_in_snapshot()	Kent Overstreet
	The normal fsck code doesn't call key_visible_in_snapshot() with an empty list of snapshot IDs seen (the current snapshot ID will always be on the list), but str_hash_repair_key() -> bch2_get_snapshot_overwrites() can, and that's totally fine as long as we check for it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 days	bcachefs: fsck: fix unhandled restart in topology repair	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 days	bcachefs: fsck: Fix check_directory_structure when no check_dirents	Kent Overstreet
	check_directory_structure runs after check_dirents, so it expects that it won't see any inodes with missing backpointers - normally. But online fsck can't run check_dirents yet, or the user might only be running a specific pass, so we need to be careful that this isn't an error. If an inode is unreachable, that's handled by a separate pass. Also, add a new 'bch2_inode_has_backpointer()' helper, since we were doing this inconsistently. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 days	bcachefs: Fix restart handling in btree_node_scrub_work()	Kent Overstreet
	btree node scrub was sometimes failing to rewrite nodes with errors; bch2_btree_node_rewrite() can return a transaction restart and we weren't checking - the lockrestart_do() needs to wrap the entire operation. And there's a better helper it should've been using, bch2_btree_node_rewrite_key(), which makes all this more convenient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Fix bch2_read_bio_to_text()	Kent Overstreet
	We can only pass negative error codes to bch2_err_str(); if it's a positive integer it's not an error and we trip an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: Fix check_path_loop() + snapshots	Kent Overstreet
	A path exists in a particular snapshot: we should do the pathwalk in the snapshot ID of the inode we started from, _not_ change snapshot ID as we walk inodes and dirents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: check_subdir_count logs path	Kent Overstreet
	We can easily go from inode number -> path now, which makes for more useful log messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: additional diagnostics for reattach_inode()	Kent Overstreet
	Log the inode's new path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: check_directory_structure runs in reverse order	Kent Overstreet
	When we find a directory connectivity problem, we should do the repair in the oldest snapshot that has the issue - so that we don't end up duplicating work or making a real mess of things. Oldest snapshot IDs have the highest integer value, so - just walk inodes in reverse order. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: Fix reattach_inode() for subvol roots	Kent Overstreet
	bch_subvolume.fs_path_parent needs to be updated as well, it should match inode.bi_parent_subvol. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: Fix remove_backpointer() for subvol roots	Kent Overstreet
	The dirent will be in a different snapshot if the inode is a subvolume root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: Print path when we find a subvol loop	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Fix __bch2_inum_to_path() when crossing subvol boundaries	Kent Overstreet
	The bch2_subvolume_get_snapshot() call needs to happen before the dirent lookup - the dirent is in the parent subvolume. Also, check for loops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Call bch2_fs_init_rw() early if we'll be going rw	Kent Overstreet
	kthread creation checks for pending signals, which is _very_ annoying if we have to do a long recovery and don't go rw until we've done significant work. Check if we'll be going rw and pre-allocate kthreads/workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: fsck: Improve check_key_has_inode()	Kent Overstreet
	Print out more info when we find a key (extent, dirent, xattr) for a missing inode - was there a good inode in an older snapshot, full(ish) list of keys for that missing inode, so we can make better decisions on how to repair. If it looks like it should've been deleted, autofix it. If we ever hit the non-autofix cases, we'll want to write more repair code (possibly reconstituting the inode). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: don't return fsck_fix for unfixable node errors in __btree_err	Bharadwaj Raju
	After cd3cdb1ef706 ("Single err message for btree node reads"), all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what the actual error type was if the recovery pass was scanning for btree nodes. This lead to the code continuing despite things like bad node formats when they earlier would have caused a jump to fsck_err, because btree_err only jumps when the return from __btree_err does not match fsck_fix. Ultimately this lead to undefined behavior by attempting to unpack a key based on an invalid format. Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause __btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes. Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com Fixes: cd3cdb1ef706 ("bcachefs: Single err message for btree node reads") Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Fix pool->alloc NULL pointer dereference	Alan Huang
	btree_interior_update_pool has not been initialized before the filesystem becomes read-write, thus mempool_alloc in bch2_btree_update_start will trigger pool->alloc NULL pointer dereference in mempool_alloc_noprof Reported-by: syzbot+2f3859bd28f20fa682e6@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Move bset size check before csum check	Alan Huang
	In syzbot's crash, the bset's u64s is larger than the btree node. Reported-by: syzbot+bfaeaa8e26281970158d@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: mark more errors autofix	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Kill unused tracepoints	Kent Overstreet
	Dead code cleanup. Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: opts.journal_rewind	Kent Overstreet
	Add a mount option for rewinding the journal, bringing the entire filesystem to where it was at a previous point in time. This is for extreme disaster recovery scenarios - it's not intended as an undelete operation. The option takes a journal sequence number; the desired sequence number can be determined with 'bcachefs list_journal' Caveats: - The 'journal_transaction_names' option must have been enabled (it's on by default). The option controls emitting of extra debug info in the journal, so we can see what individual transactions were doing; It also enables journalling of keys being overwritten, which is what we rely on here. - A full fsck run will be automatically triggered since alloc info will be inconsistent. Only leaf node updates to non-alloc btrees are rewound, since rewinding interior btree updates isn't possible or desirable. - We can't do anything about data that was deleted and overwritten. Lots of metadata updates after the point in time we're rewinding to shouldn't cause a problem, since we segragate data and metadata allocations (this is in order to make repair by btree node scan practical on larger filesystems; there's a small 64-bit per device bitmap in the superblock of device ranges with btree nodes, and we try to keep this small). However, having discards enabled will cause problems, since buckets are discarded as soon as they become empty (this is why we don't implement fstrim: we don't need it). Hopefully, this feature will be a one-off thing that's never used again: this was implemented for recovering from the "vfs i_nlink 0 -> subvol deletion" bug, and that bug was unusually disastrous and additional safeguards have since been implemented. But if it does turn out that we need this more in the future, I'll have to implement an option so that empty buckets aren't discarded immediately - lagging by perhaps 1% of device capacity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: fsck: fix extent past end of inode repair	Kent Overstreet
	Fix the case where we're deleting in a different snapshot and need to emit a whiteout - that requires a regular BTREE_ITER_filter_snapshots iterator. Also, only delete the part of the extent that extents past i_size. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: fsck: fix add_inode()	Kent Overstreet
	the inode btree uses the offset field for the inum, not the inode field. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: Fix snapshot_key_missing_inode_snapshot repair	Kent Overstreet
	When the inode was a whiteout, we were inserting a new whiteout at the wrong (old) snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: Fix "now allowing incompatible features" message	Kent Overstreet
	Check against version_incompat_allowed, not version_incompat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: pass last_seq into fs_journal_start()	Kent Overstreet
	Prep work for journal rewind, where the seq we're replaying from may be different than the last journal entry's last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: better __bch2_snapshot_is_ancestor() assert	Kent Overstreet
	Previously, we weren't checking the result of the skiplist walk, just the is_ancestor bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: btree_iter: fix updates, journal overlay	Kent Overstreet
	We need to start searching from search_key - _not_ path->pos, which will point to the key we found in the btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: Fix bch2_journal_keys_peek_prev_min()	Kent Overstreet
	this code is rarely invoked, so - we had a few bugs left from basing it off of bch2_journal_keys_peek_max()... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15	bcachefs: Delay calculation of trans->journal_u64s	Alan Huang
	When there is commit error that need split btree leaf, fsck might change the value of trans->journal_entries.u64s, when retry commit, the value of trans->journal_u64s would be incorrect, which will lead to trans->journal_res.u64s underflow, and then out of bounds write will occur: [ 464.496970][T11969] Call trace: [ 464.496973][T11969] show_stack+0x3c/0x88 (C) [ 464.496995][T11969] dump_stack_lvl+0xf8/0x178 [ 464.497014][T11969] dump_stack+0x20/0x30 [ 464.497031][T11969] __bch2_trans_log_str+0x344/0x350 [ 464.497048][T11969] bch2_trans_log_str+0x3c/0x60 [ 464.497065][T11969] __bch2_fsck_err+0x11bc/0x1390 [ 464.497083][T11969] bch2_check_discard_freespace_key+0xad4/0x10d0 [ 464.497100][T11969] bch2_bucket_alloc_freelist+0x99c/0x1130 [ 464.497117][T11969] bch2_bucket_alloc_trans+0x79c/0xcb8 [ 464.497133][T11969] bch2_bucket_alloc_set_trans+0x378/0xc20 [ 464.497151][T11969] __open_bucket_add_buckets+0x7fc/0x1c00 [ 464.497168][T11969] open_bucket_add_buckets+0x184/0x3a8 [ 464.497185][T11969] bch2_alloc_sectors_start_trans+0xa04/0x1da0 [ 464.497203][T11969] bch2_btree_reserve_get+0x6e0/0xef0 [ 464.497220][T11969] bch2_btree_update_start+0x1618/0x2600 [ 464.497239][T11969] bch2_btree_split_leaf+0xcc/0x730 [ 464.497258][T11969] bch2_trans_commit_error+0x22c/0xc30 [ 464.497276][T11969] __bch2_trans_commit+0x207c/0x4e30 [ 464.497292][T11969] bch2_journal_replay+0x9e0/0x1420 [ 464.497305][T11969] __bch2_run_recovery_passes+0x458/0xf98 [ 464.497318][T11969] bch2_run_recovery_passes+0x280/0x478 [ 464.497331][T11969] bch2_fs_recovery+0x24f0/0x3a28 [ 464.497344][T11969] bch2_fs_start+0xb80/0x1248 [ 464.497358][T11969] bch2_fs_get_tree+0xe94/0x1708 [ 464.497377][T11969] vfs_get_tree+0x84/0x2d0 Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>