summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_key_cache.c
AgeCommit message (Collapse)Author
4 hoursbcachefs: shrinker.to_text() methodsKent Overstreet
This adds shrinker.to_text() methods for our shrinkers and hooks them up to our existing to_text() functions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
7 hoursbcachefs: Evict/bypass key cache when deletingKent Overstreet
Fix a performance bug when doing many unlinks. The btree has optimizations to ensure we don't have too many whiteouts to scan in peek() before we find a real key to return, but unflushed key cache deletions break this. To fix this, tweak the existing code for redirecting updates that create a key to the underlying btree so that we can use it for deletions as well. Reported-by: John Schoenick <johns@valvesoftware.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02bcachefs: bch_err_throw()Kent Overstreet
Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01bcachefs: Replace rcu_read_lock() with guardsKent Overstreet
The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Plumb btree_trans for more locking assertsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Clear should_be_locked before unlock in key_cache_drop()Kent Overstreet
We're adding new should_be_locked assertions, also add a comment explaining why clearing should_be_locked is safe here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: btree key cache assertsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-17bcachefs: Fix bch2_btree_path_traverse_cached() when paths reallocedKent Overstreet
btree_key_cache_fill() will allocate and traverse another path (for the underlying btree), so we can't hold pointers to paths across a call - we have to pass indices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Kill btree_iter.transKent Overstreet
This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: do_trace_key_cache_fill()Kent Overstreet
Reducing stack frame usage; this moves the printbuf out of the main stack frame. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-26bcachefs: Fix deadlockAlan Huang
This fixes two deadlocks: 1.pcpu_alloc_mutex involved one as pointed by syzbot[1] 2.recursion deadlock. The root cause is that we hold the bc lock during alloc_percpu, fix it by following the pattern used by __btree_node_mem_alloc(). [1] https://lore.kernel.org/all/66f97d9a.050a0220.6bad9.001d.GAE@google.com/T/ Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()Kent Overstreet
Spotted by sparse. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21bcachefs: Fix btree_trans_peek_key_cache()Kent Overstreet
BTREE_ITER_cached_nofill has some tricky corner cases; it's used internally for iterators that aren't walking the key cache, but need to be coherent with the key cache. It tells traverse to look up and lock the key cache entry if present, but don't create one if it doesn't exist. That means we have to have a BTREE_ITER_UPTODATE path (because after traverse the path has to be UPTODATE, or we pop assertions) that doesn't point to anything (which is the less bad option, taken by the previous fix). The previous fix for this path missed an issue that can happen in bch2_trans_peek_key_cache(): we can't set should_be_locked on a path that doesn't point to anything and doesn't hold locks. Fixes: bd5b09727f3d ("bcachefs: Don't set btree_path to updtodate if we don't fill") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Don't set btree_path to updtodate if we don't fillKent Overstreet
This fixes various locking asserts, and a null ptr deref in bch2_btree_iter_peek_path(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: fix bch2_btree_key_cache_drop()Kent Overstreet
When evicting, we shouldn't leave a pointer to the key cache entry lying around - that screws up btree path asserts we're adding. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Write lock btree node in key cache fillsKent Overstreet
this addresses a key cache coherency bug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: trace_key_cache_fillKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Assert that we're not violating key cache coherency rulesKent Overstreet
We're not allowed to have a dirty key in the key cache if the key doesn't exist at all in the btree - creation has to bypass the key cache, so that iteration over the btree can check if the key is present in the key cache. Things break in subtle ways if cache coherency is broken, so this needs an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21bcachefs: Use __GFP_ACCOUNT for reclaimable memoryKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Fix format specifier in bch2_btree_key_cache_to_text()Nathan Chancellor
When building for a 32-bit architecture, for which 'size_t' is 'unsigned int', there is a compiler warning due to use of '%lu': In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:80, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:795:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:78:63: note: in definition of macro 'prt_printf' 78 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:795:38: note: format string is defined here 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors Use the proper specifier, '%zu', to resolve the warning. Fixes: e447e49977b8 ("bcachefs: key cache can now allocate from pending") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: key cache can now allocate from pendingKent Overstreet
btree_trans objects can hold the btree_trans_barrier srcu read lock for an extended amount of time (they shouldn't, but it's difficult to guarantee). the srcu barrier blocks memory reclaim, so to avoid too many stranded key cache items, this uses the new pending_rcu_items to allocate from pending items - like we did before, but now without a global lock on the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Rip out freelists from btree key cacheKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: don't use rht_bucket() in btree_key_cache_scan()Kent Overstreet
rht_bucket() does strange complicated things when a rehash is in progress. Instead, just skip scanning when a rehash is in progress: scanning is going to be more expensive (many more empty slots to cover), and some sort of infinite loop is being observed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop()Kent Overstreet
bch2_btree_key_cache_drop() evicts the key cache entry - it's used when we're doing an update that bypasses the key cache, because for cache coherency reasons a key can't be in the key cache unless it also exists in the btree - i.e. creates have to bypass the cache. After evicting, the path no longer points to a key cache key, and relock() will always fail if should_be_locked is true. Prep for improving path->should_be_locked assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()Kent Overstreet
bch2_btree_path_traverse_cached() was previously checking if it could just relock the path, which is a common idiom in path traversal. However, it was using btree_node_relock(), not btree_path_relock(); btree_path_relock() only succeeds if the path was in state BTREE_ITER_NEED_RELOCK. If the path was in state BTREE_ITER_NEED_TRAVERSE a full traversal is needed; this led to a null ptr deref in bch2_btree_path_traverse_cached(). And the short circuit check here isn't needed, since it was already done in the main bch2_btree_path_traverse_one(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Simplify btree key cache fill pathKent Overstreet
Don't allocate the new bkey_cached until after we've done the btree lookup; this means we can kill bkey_cached.valid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_btree_key_cache_drop() now evictsKent Overstreet
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: btree_path_cached_set()Kent Overstreet
new helper - small refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Leave a buffer in the btree key cache to avoid lock thrashingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Fix reporting of freed objects from key cache shrinkerKent Overstreet
We count objects as freed when we move them to the srcu-pending lists because we're doing the equivalent of a kfree_srcu(); the only difference is managing the pending list ourself means we can allocate from the pending list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: increase key cache shrinker batch sizeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Enable automatic shrinking for rhashtablesKent Overstreet
Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Fix locking assertKent Overstreet
We now track whether a transaction is locked, and verify that we don't have nodes locked when the transaction isn't locked; reorder relocks to not pop the new assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: x-macroize journal flags enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: uninline set_btree_iter_dontneed()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix format specifiers in bch2_btree_key_cache_to_text()Nathan Chancellor
When building for a 32-bit target, for which 'size_t' is 'unsigned int', there are two warnings around mismatched format specifiers and argument types: In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:79, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:1046:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1046:47: note: format string is defined here 1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); | ~~^ | | | long unsigned int | %u fs/bcachefs/btree_key_cache.c:1047:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1047:44: note: format string is defined here 1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as error Use the proper 'size_t' specifier, '%zu', to clear up the warnings for these platforms. Fixes: f2d47ec26af5 ("bcachefs: Btree key cache instrumentation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Btree key cache instrumentationKent Overstreet
It turns out the btree key cache shrinker wasn't actually reclaiming anything, prior to the previous patch. This adds instrumentation so that if we have further issues we can see what's going on. Specifically, sysfs internal/btree_key_cache is greatly expanded with new counters, and the SRCU sequence numbers of the first 10 entries on each pending freelist, and we also add trigger_btree_key_cache_shrink for testing without having to prune all the system caches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Use bch2_btree_path_upgrade() in key cache traverseKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()Kent Overstreet
Reported-by: syzbot+a35cdb62ec34d44fb062@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Tweak btree key cache shrinker so it actually freesKent Overstreet
Freeing key cache items is a multi stage process; we need to wait for an SRCU grace period to elapse, and we handle this ourselves - partially to avoid callback overhead, but primarily so that when allocating we can first allocate from the freed items waiting for an SRCU grace period. Previously, the shrinker was counting the items on the 'waiting for SRCU grace period' lists as items being scanned, but this meant that too many items waiting for an SRCU grace period could prevent it from doing any work at all. After this, we're seeing that items skipped due to the accessed bit are the main cause of the shrinker not making any progress, and we actually want the key cache shrinker to run quite aggressively because reclaimed items will still generally be found (more compactly) in the btree node cache - so we also tweak the shrinker to not count those against nr_to_scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li
When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06bcachefs: JOURNAL_SPACE_LOWKent Overstreet
"bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix btree key cache coherency during replayKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path() no longer uses path->idxKent Overstreet
path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_insert_entry -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_iter -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: skip journal more often in key cache reclaimKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>