summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-26bcachefs: Fix a kasan splat in bch2_dev_add()Kent Overstreet
This fixes a use after free - mi is dangling after the resize call. Additionally, resizing the device's member info section was useless - we were attempting to preallocate the space required before adding it to the filesystem superblock, but there's other sections that we should have been preallocating as well for that to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-26fixup! bcachefs: Add new helper to retrieve bch_member from sbKent Overstreet
2023-10-25fixup! bcachefs: rebalance_workKent Overstreet
2023-10-25closures: Fix race in closure_sync()Kent Overstreet
As pointed out by Linus, closure_sync() was racy; we could skip blocking immediately after a get() and a put(), but then that would skip any barrier corresponding to the other thread's put() barrier. To fix this, always do the full __closure_sync() sequence whenever any get() has happened and the closure might have been used by other threads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-25closures: Better memory barriersKent Overstreet
atomic_(dec|sub)_return_release() are a thing now - use them. Also, delete the useless barrier in set_closure_fn(): it's redundant with the memory barrier in closure_put(0. Since closure_put() would now otherwise just have a release barrier, we also need a new barrier when the ref hits 0 - smp_acquire__after_ctrl_dep(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-24bcachefs: rebalance_workKent Overstreet
This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_inum_opts_get()Kent Overstreet
New helper for new rebalance code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: move: move_stats refactoringKent Overstreet
data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: move: convert to bbposKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: moving_context now owns a btree_transKent Overstreet
btree_trans and moving_context are used together, and having the moving_context owns the transaction object reduces some plumbing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: move.c exports, refactoringKent Overstreet
Prep work for the new rebalance code - we need a few helpers exported. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Guard against unknown compression optionsKent Overstreet
Since compression options now include compression level, proper validation is a bit more involved. This adds bch2_compression_opt_valid(), and plumbs it around appropriately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: trivial extents.c refactoringKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_prt_bitflags()Kent Overstreet
This fixes an infinite loop when there's a set bit at position >= 32. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check for too-large encoded extentsKent Overstreet
We don't yet repair (split) them, just check. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure we don't exceed encoded_extent_maxKent Overstreet
The write path may (rarely) see an encoded (checksummed) extent that exceeds encoded_extent_max - this can happen when we're moving an existing extent that was not checksummed, but was given a checksum by bch2_write_rechecksum(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_disk_path_to_text() no longer takes sb_lockKent Overstreet
We're going to be using bch2_target_to_text() -> bch2_disk_path_to_text() from bch2_bkey_ptrs_to_text() and bch2_bkey_ptrs_invalid(), which can be called in any context. This patch adds the actual label to bch_disk_group_cpu so that it can be used by bch2_disk_path_to_text, and splits out bch2_disk_path_to_text() into two variants - like the previous patch, one for when we have a running filesystem and another for when we only have a superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out disk_groups_types.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split apart bch2_target_to_text(), bch2_target_to_text_sb()Kent Overstreet
Previously we just had bch2_opt_target_to_text() which could be passed either a filesystem object or just a superblock - depending on if we have a running filesystem or not. Split these into two functions for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-21bcachefs: All triggers are BTREE_TRIGGER_WANTS_OLD_AND_NEWKent Overstreet
Upcoming rebalance_work btree will require extent triggers to be BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion, let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-21bcachefs: Improve io option handling in data move pathKent Overstreet
The data move path now correctly picks IO options when inodes in different snapshots have different options applied. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-21bcachefs: Ensure devices are always correctly initializedKent Overstreet
We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-21bcachefs: Delete duplicate time stats initializationKent Overstreet
This code duplicated initialization already done in bch2_fs_btree_iter_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-20bcachefs: Kill dead code extent_save()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-20bcachefs: Fix ca->oldest_gen allocationKent Overstreet
The ca->oldest_gen array needs to be the same size as the bucket_gens array; ca->mi.nbuckets is updated with only state_lock held, not gc_lock, so bch2_gc_gens() could race with device resize and allocate too small of an oldest_gens array. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-20mm: Add shrinker_to_text() to debugfs interfaceKent Overstreet
Previously, we added shrinker_to_text() and hooked it up to the OOM report - now, the same report is available via debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-20bcachefs: Fix shrinker namesKent Overstreet
Shrinkers are now exported to debugfs, so the names can't have slashes in them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-20bcachefs: Fix btree_node_type enumKent Overstreet
More forwards compatibility fixups: having BKEY_TYPE_btree at the end of the enum conflicts with unnkown btree IDs, this shifts BKEY_TYPE_btree to slot 0 and fixes things up accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-19bcachefs: bch2_btree_id_str()Kent Overstreet
Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-19bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarilyKent Overstreet
Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-19fixup! bcachefs: snapshot_create_lockKent Overstreet
2023-10-18bcachefs: Refactor memcpy into direct assignmentKees Cook
The memcpy() in bch2_bkey_append_ptr() is operating on an embedded fake flexible array which looks to the compiler like it has 0 size. This causes W=1 builds to emit warnings due to -Wstringop-overflow: In file included from include/linux/string.h:254, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/radix-tree.h:14, from include/linux/backing-dev-defs.h:6, from fs/bcachefs/bcachefs.h:182: fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr': include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=] 57 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy' 648 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk' 693 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy' 235 | memcpy((void *) &k->v + bkey_val_bytes(&k->k), | ^~~~~~ fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0 287 | struct bch_val v; | ^ Avoid making any structure changes and just replace the u64 copy into a direct assignment, side-stepping the entire problem. Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@intel.com/ Link: https://lore.kernel.org/r/20231010235609.work.594-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Fix drop_alloc_keys()Kent Overstreet
For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: snapshot_create_lockKent Overstreet
Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Fix snapshot skiplists during snapshot deletionKent Overstreet
In snapshot deleion, we have to pick new skiplist nodes for entries that point to nodes being deleted. The function that finds a new skiplist node, skipping over entries being deleted, was incorrect: if n = 0, but the parent node is being deleted, we also need to skip over that node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: bch2_sb_field_get() refactoringKent Overstreet
Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: KEY_TYPE_error now counts towards i_sectorsKent Overstreet
KEY_TYPE_error is used when all replicas in an extent are marked as failed; it indicates that data was present, but has been lost. So that i_sectors doesn't change when replacing extents with KEY_TYPE_error, we now have to count error keys as allocations - this fixes fsck errors later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Fix handling of unknown bkey typesKent Overstreet
min_val_size was U8_MAX for unknown key types, causing us to flag any known key as invalid - it should have been 0. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Switch to unsafe_memcpy() in a few placesKent Overstreet
The new fortify checking doesn't work for us in all places; this switches to unsafe_memcpy() where appropriate to silence a few warnings/errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Use struct_size()Christophe JAILLET
Use struct_size() instead of hand writing it. This is less verbose and more robust. While at it, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Correctly initialize new buckets on device resizeKent Overstreet
bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Fix another smatch complaintKent Overstreet
This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Use strsep() in split_devs()Kent Overstreet
Minor refactoring to fix a smatch complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Add iops fields to bch_memberHunter Shaffer
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1Hunter Shaffer
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: New superblock section members_v2Hunter Shaffer
members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-18bcachefs: Add new helper to retrieve bch_member from sbHunter Shaffer
Prep work for introducing bch_sb_field_members_v2 - introduce new helpers that will check for members_v2 if it exists, otherwise using v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-06bcachefs: bucket_lock() is now a sleepable lockKent Overstreet
fsck_err() may sleep - it takes a mutex and may allocate memory, so bucket_lock() needs to be a sleepable lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-06bcachefs: fix crc32c checksum merge byte order problemBrian Foster
An fsstress task on a big endian system (s390x) quickly produces a bunch of CRC errors in the system logs. Most of these are related to the narrow CRCs path, but the fundamental problem can be reduced to a single write and re-read (after dropping caches) of a previously merged extent. The key merge path that handles extent merges eventually calls into bch2_checksum_merge() to combine the CRCs of the associated extents. This code attempts to avoid a byte order swap by feeding the le64 values into the crc32c code, but the latter casts the resulting u64 value down to a u32, which truncates the high bytes where the actual crc value ends up. This results in a CRC value that does not change (since it is merged with a CRC of 0), and checksum failures ensue. Fix the checksum merge code to swap to cpu byte order on the boundaries to the external crc code such that any value casting is handled properly. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-06bcachefs: Fix bch2_inode_delete_keys()Kent Overstreet
bch2_inode_delete_keys() was using BTREE_ITER_NOT_EXTENTS, on the assumption that it would never need to split extents. But that caused a race with extents being split by other threads - specifically, the data move path. Extents iterators have the iterator position pointing to the start of the extent, which avoids the race. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>