summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_key_cache.c
AgeCommit message (Collapse)Author
22 hoursbcachefs: bch2_btree_key_cache_drop() now evictsKent Overstreet
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
22 hoursbcachefs: btree_path_cached_set()Kent Overstreet
new helper - small refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: Leave a buffer in the btree key cache to avoid lock thrashingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: Fix reporting of freed objects from key cache shrinkerKent Overstreet
We count objects as freed when we move them to the srcu-pending lists because we're doing the equivalent of a kfree_srcu(); the only difference is managing the pending list ourself means we can allocate from the pending list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: increase key cache shrinker batch sizeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: Enable automatic shrinking for rhashtablesKent Overstreet
Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Fix locking assertKent Overstreet
We now track whether a transaction is locked, and verify that we don't have nodes locked when the transaction isn't locked; reorder relocks to not pop the new assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: x-macroize journal flags enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: uninline set_btree_iter_dontneed()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix format specifiers in bch2_btree_key_cache_to_text()Nathan Chancellor
When building for a 32-bit target, for which 'size_t' is 'unsigned int', there are two warnings around mismatched format specifiers and argument types: In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:79, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:1046:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1046:47: note: format string is defined here 1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); | ~~^ | | | long unsigned int | %u fs/bcachefs/btree_key_cache.c:1047:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1047:44: note: format string is defined here 1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as error Use the proper 'size_t' specifier, '%zu', to clear up the warnings for these platforms. Fixes: f2d47ec26af5 ("bcachefs: Btree key cache instrumentation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Btree key cache instrumentationKent Overstreet
It turns out the btree key cache shrinker wasn't actually reclaiming anything, prior to the previous patch. This adds instrumentation so that if we have further issues we can see what's going on. Specifically, sysfs internal/btree_key_cache is greatly expanded with new counters, and the SRCU sequence numbers of the first 10 entries on each pending freelist, and we also add trigger_btree_key_cache_shrink for testing without having to prune all the system caches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Use bch2_btree_path_upgrade() in key cache traverseKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()Kent Overstreet
Reported-by: syzbot+a35cdb62ec34d44fb062@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Tweak btree key cache shrinker so it actually freesKent Overstreet
Freeing key cache items is a multi stage process; we need to wait for an SRCU grace period to elapse, and we handle this ourselves - partially to avoid callback overhead, but primarily so that when allocating we can first allocate from the freed items waiting for an SRCU grace period. Previously, the shrinker was counting the items on the 'waiting for SRCU grace period' lists as items being scanned, but this meant that too many items waiting for an SRCU grace period could prevent it from doing any work at all. After this, we're seeing that items skipped due to the accessed bit are the main cause of the shrinker not making any progress, and we actually want the key cache shrinker to run quite aggressively because reclaimed items will still generally be found (more compactly) in the btree node cache - so we also tweak the shrinker to not count those against nr_to_scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li
When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06bcachefs: JOURNAL_SPACE_LOWKent Overstreet
"bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix btree key cache coherency during replayKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path() no longer uses path->idxKent Overstreet
path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_insert_entry -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_iter -> btree_path_idx_tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: skip journal more often in key cache reclaimKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs; kill bch2_btree_key_cache_flush()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: delete useless commit_do()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroKent Overstreet
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Don't rejournal keys in key cache flushKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename BTREE_INSERT flagsKent Overstreet
BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-03bcachefs: Don't drop journal pins in exit pathKent Overstreet
There's no need to drop journal pins in our exit paths - the code was trying to have everything cleaned up on any shutdown, but better to just tweak the assertions a bit. This fixes a bug where calling into journal reclaim in the exit path would cass a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Kill journal pre-reservationsKent Overstreet
This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Run btree key cache shrinker less aggressivelyKent Overstreet
The btree key cache maintains lists of items that have been freed, but can't yet be reclaimed because a bch2_trans_relock() call might find them - we're waiting for SRCU readers to release. Previously, we wouldn't count these items against the number we're attempting to scan for, which would mean we'd evict more live key cache entries - doing quite a bit of potentially unecessary work. With recent work to make sure we don't hold SRCU locks for too long, it should be safe to count all the items on the freelists against number to scan - even if we can't reclaim them yet, we will be able to soon. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-07Merge tag 'bcachefs-2023-11-5' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "Here's the second big bcachefs pull request. This brings your tree up to date with my master branch, which is what existing bcachefs users are currently running. New features: - rebalance_work btree (and metadata version 1.3): the rebalance thread no longer has to scan to find extents that need processing - big scalability improvement. - sb_errors superblock section: this adds counters for each fsck error type, since filesystem creation, along with the date of the most recent error. It'll get us better bug reports (since users do not typically report errors that fsck was able to fix), and I might add telemetry for this in the future. Fixes include: - multiple snapshot deletion fixes - members_v2 fixups - deleted_inodes btree fixes - copygc thread no longer spins when a device is full but has no fragmented buckets (i.e. rebalance needs to move data around instead) - a fix for a memory reclaim issue with the btree key cache: we're now careful not to hold the srcu read lock that blocks key cache reclaim for too long - an early allocator locking fix, from Brian - endianness fixes, from Brian - CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y, a big performance improvement on multithreaded workloads" * tag 'bcachefs-2023-11-5' of https://evilpiepirate.org/git/bcachefs: (70 commits) bcachefs: Improve stripe checksum error message bcachefs: Simplify, fix bch2_backpointer_get_key() bcachefs: kill thing_it_points_to arg to backpointer_not_found() bcachefs: bch2_ec_read_extent() now takes btree_trans bcachefs: bch2_stripe_to_text() now prints ptr gens bcachefs: Don't iterate over journal entries just for btree roots bcachefs: Break up bch2_journal_write() bcachefs: Replace ERANGE with private error codes bcachefs: bkey_copy() is no longer a macro bcachefs: x-macro-ify inode flags enum bcachefs: Convert bch2_fs_open() to darray bcachefs: Move __bch2_members_v2_get_mut to sb-members.h bcachefs: bch2_prt_datetime() bcachefs: CONFIG_BCACHEFS_DEBUG_TRANSACTIONS no longer defaults to y bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage bcachefs: rebalance_work btree is not a snapshots btree bcachefs: Add missing printk newlines bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entry bcachefs: .get_parent() should return an error pointer bcachefs: Fix bch2_delete_dead_inodes() ...
2023-11-02Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-11-01bcachefs: Don't downgrade locks on transaction restartKent Overstreet
We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Fix shrinker namesKent Overstreet
Shrinkers are now exported to debugfs, so the names can't have slashes in them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: bch2_btree_id_str()Kent Overstreet
Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Heap allocate btree_transKent Overstreet
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix W=12 build errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix -Wformat in bch2_btree_key_cache_to_text()Nathan Chancellor
When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_btree_key_cache_to_text() due to use of an incorrect format specifier: fs/bcachefs/btree_key_cache.c:1060:36: error: format specifies type 'size_t' (aka 'unsigned int') but the argument has type 'long' [-Werror,-Wformat] 1060 | prt_printf(out, "nr_freed:\t%zu", atomic_long_read(&c->nr_freed)); | ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | %ld fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %zu but on 32-bit architectures, size_t is 'unsigned int'. Use '%lu' to match the other format specifiers used in this function for printing values returned from atomic_long_read(). Fixes: 6d799930ce0f ("bcachefs: btree key cache pcpu freedlist") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix silent enum conversion errorKent Overstreet
This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: mark bch_inode_info and bkey_cached as reclaimableMikulas Patocka
Mark these caches as reclaimable, so that available memory is correctly reported when there is a lot of cached inodes. Note that more work is needed - you should add __GFP_RECLAIMABLE to some of the kmalloc calls, so that they are allocated from the "kmalloc-rcl-*" caches. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add new assertions for shutdown pathKent Overstreet
We've been seeing assertions pop that indicate the btree node cache or key cache have dirty items when we just did a clean shutdown. Add some more assertions so we can catch this when we're dirtying items. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet
Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill JOURNAL_WATERMARKKent Overstreet
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: unregister_shrinker() now safe on not-registered shrinkerKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: allocate_dropping_locks()Kent Overstreet
Add two new helpers for allocating memory with btree locks held: The idea is to first try the allocation with GFP_NOWAIT|__GFP_NOWARN, then if that fails - unlock, retry with GFP_KERNEL, and then call trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Kill six_lock_state unionKent Overstreet
As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Kill six_lock_pcpu_(alloc|free)Kent Overstreet
six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate or free the percpu reader count on an existing lock that's in use, the only safe time to allocate percpu readers is when the lock is first being initialized. This patch adds a flags parameter to six_lock_init(), and instead of six_lock_pcpu_free() we now expose six_lock_exit(), which does the same thing but is less likely to be misused. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_bkey_get_iter() helpersKent Overstreet
Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>