summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-20bcachefs: Btree key cache instrumentationbcachefs-memalloc-profilingKent Overstreet
It turns out the btree key cache shrinker wasn't actually reclaiming anything, prior to the previous patch. This adds instrumentation so that if we have further issues we can see what's going on. Specifically, sysfs internal/btree_key_cache is greatly expanded with new counters, and the SRCU sequence numbers of the first 10 entries on each pending freelist, and we also add trigger_btree_key_cache_shrink for testing without having to prune all the system caches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Tweak btree key cache shrinker so it actually freesKent Overstreet
Freeing key cache items is a multi stage process; we need to wait for an SRCU grace period to elapse, and we handle this ourselves - partially to avoid callback overhead, but primarily so that when allocating we can first allocate from the freed items waiting for an SRCU grace period. Previously, the shrinker was counting the items on the 'waiting for SRCU grace period' lists as items being scanned, but this meant that too many items waiting for an SRCU grace period could prevent it from doing any work at all. After this, we're seeing that items skipped due to the accessed bit are the main cause of the shrinker not making any progress, and we actually want the key cache shrinker to run quite aggressively because reclaimed items will still generally be found (more compactly) in the btree node cache - so we also tweak the shrinker to not count those against nr_to_scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulongKent Overstreet
this stores the SRCU sequence number, which we use to check if an SRCU barrier has elapsed; this is a partial fix for the key cache shrinker not actually freeing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Fix missing call to bch2_fs_allocator_background_exit()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: memalloc profiling support for darrayKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: alloc tagging support for bkey bufsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-15Merge branch 'memalloc_prof_v6' of https://github.com/surenbaghdasaryan/linuxReed Riley
2024-04-13fixup! bcachefs: Fix UAFs of btree_insert_entry arrayKent Overstreet
2024-04-13bcachefs: Fix btree node merging on write buffer btreesKent Overstreet
The btree write buffer flush fastpath that avoids the main transaction commit path had the unfortunate side effect of not doing btree node merging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Disable merges from interior update pathKent Overstreet
There's been a bug in the btree write buffer where it wasn't triggering btree node merges - and leaving behind a bunch of nearly empty btree nodes. Then during journal replay, when updates to the backpointers btree aren't using the btree write buffer (because we require synchronization with journal replay), we end up doing those merges all at once. Then if it's the interior update path running them, we deadlock because those run with the highest watermark. There's no real need for the interior update path to be doing btree node merges; other code paths can handle that at lower watermarks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_trans_commit_flags_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: prefer drop_locks_do()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Run merges at BCH_WATERMARK_btreeKent Overstreet
This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: get_unlocked_mut_path -> bch2_path_get_unlocked_mutKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_trans_verify_not_unlocked()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_btree_path_can_relock()Kent Overstreet
With the new assertions, we shouldn't be holding locks when trans->locked is false, thus, we shouldn't use relock when we just want to check if we can relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: trans->lockedKent Overstreet
Add a field for tracking whether a transaction object holds btree locks, and assertions to verify state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()Kent Overstreet
We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Use bch2_btree_path_upgrade() in key cache traverseKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_btree_path_upgrade() checks nodes_locked, not uptodateKent Overstreet
In the key cache fill path, we use path_upgrade() on a path that isn't uptodate yet but should be locked. This change makes bch2_btree_path_upgrade() slightly looser so we can use it in key cache upgrade, instead of the __ version. Also, make the related assert - that path->uptodate implies nodes_locked - slightly clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: maintain lock invariants in btree_iter_next_node()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: fix typo in reference to BCACHEFS_DEBUGLukas Bulwahn
Commit ec9cc18fc2e6 ("bcachefs: Add checks for invalid snapshot IDs") intends to check the sanity of a snapshot and panic when BCACHEFS_DEBUG is set, but that conditional has a typo. Fix the typo to refer to the actual existing Kconfig symbol. This was found with ./scripts/checkkconfigsymbols.py. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: chardev: make bch_chardev_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the bch_chardev_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Also, correctly clean up after failing paths in bch2_chardev_init(). Cc: Hongbo Li <lihongbo22@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: member helper cleanupsKent Overstreet
Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bucket_valid()Kent Overstreet
cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_trans_relock_fail() - factor out slowpathKent Overstreet
Factor out slowpath into a separate helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_dir_emit() - drop_locks_do() conversionKent Overstreet
Add a new helper that calls dir_emit() and updates ctx->pos on success; this lets us convert bch2_readdir() to drop_locks_do(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_btree_insert_trans() no longer specifies BTREE_ITER_cachedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: __BTREE_ITER_ALL_SNAPSHOTS -> BTREE_ITER_SNAPSHOT_FIELDKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: mark_superblock cleanupKent Overstreet
Consolidate mark_superblock() and trans_mark_superblock(), like we did with the other trigger paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: gc_btree_init_recurse() uses gc_mark_node()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: move root node topo checks to node_check_topology()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: move topology repair kick to gc_btrees()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: kill metadata only gcKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Finish converting reconstruct_alloc to errors_silentKent Overstreet
with errors_silent, reconstruct_alloc no longer requires fsck and fix_errors to work Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_gc() is now private to btree_gc.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: for_each_btree_key_continue()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: kill for_each_btree_key_old()Kent Overstreet
Dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Optimize eytzinger0_sort() with bottom-up heapsortKuan-Wei Chiu
This optimization reduces the average number of comparisons required from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is sufficiently large, it results in approximately 50% fewer comparisons. Currently, eytzinger0_sort employs the textbook version of heapsort, where during the heapify process, each level requires two comparisons to determine the maximum among three elements. In contrast, the bottom-up heapsort, during heapify, only compares two children at each level until reaching a leaf node. Then, it backtracks from the leaf node to find the correct position. Since heapify typically continues until very close to the leaf node, the standard heapify requires about 2*log2(n) comparisons, while the bottom-up variant only needs log2(n) comparisons. The experimental data presented below is based on an array generated by get_random_u32(). | N | comparisons(old) | comparisons(new) | time(old) | time(new) | |-------|------------------|------------------|-----------|-----------| | 10000 | 235381 | 136615 | 25545 us | 20366 us | | 20000 | 510694 | 293425 | 31336 us | 18312 us | | 30000 | 800384 | 457412 | 35042 us | 27386 us | | 40000 | 1101617 | 626831 | 48779 us | 38253 us | | 50000 | 1409762 | 799637 | 62238 us | 46950 us | | 60000 | 1721191 | 974521 | 75588 us | 58367 us | | 70000 | 2038536 | 1152171 | 90823 us | 68778 us | | 80000 | 2362958 | 1333472 | 104165 us | 78625 us | | 90000 | 2690900 | 1516065 | 116111 us | 89573 us | | 100000| 3019413 | 1699879 | 133638 us | 100998 us | Refs: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) Ingo Wegener Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993 https://doi.org/10.1016/0304-3975(93)90364-Y Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: When traversing to interior nodes, propagate result to paths to ↵Kent Overstreet
same leaf node Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Don't read journal just for fsckKent Overstreet
reading the journal can take a decent amount of time compared to the rest of fsck, let's only read it when required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: allow for custom action in fsck error messagesKent Overstreet
Be more explicit to the user about what we're doing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: New assertion for writing to the journal after shutdownKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_btree_path_to_text()Kent Overstreet
Long form version of bch2_btree_path_to_text() - useful in error messages and tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: add btree_node_merging_disabled debug paramKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_hash_lookup() now returns bkey_s_cKent Overstreet
small cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_journal_keys_dump()Kent Overstreet
debug helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: bch2_btree_node_header_to_text()Kent Overstreet
better btree node read path error messages Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: prt_printf() now respects \r\n\tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>