summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_update_interior.c
AgeCommit message (Collapse)Author
2023-06-20bcachefs: Improve trans_restart_split_race tracepointKent Overstreet
Seeing occasional test failures where we get stuck in a livelock that involves this event - this will help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix bch2_extent_fallocate() in nocow modeKent Overstreet
When we allocate disk space, we need to be incrementing the WRITE io clock, which perhaps should be renamed to sectors allocated - copygc uses this io clock to know when to run. Also, we should be incrementing the same clock when allocating btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Private error codes: ENOMEMKent Overstreet
This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Drop some anonymous structs, unionsKent Overstreet
Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: BKEY_PADDED_ONSTACK()Kent Overstreet
Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Plumb btree_trans through btree cache codeKent Overstreet
Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Add tracepoint & counter for btree split raceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: bch2_journal_entries_postprocess()Kent Overstreet
This brings back journal_entries_compact(), but in a more efficient form - we need to do multiple postprocess steps, so iterate over the journal entries being written just once to make it more efficient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Handle btree node rewrites before going RWKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Add some logging for btree node rewrites due to errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Debug mode for c->writes referencesKent Overstreet
This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix btree_node_write_blocked() not being clearedKent Overstreet
The btree_node_write_blocked bit was a later addition to this code, it only mirrors the state of the b->write_blocked list (empty or nonempty) - unfortunately, when it was added it wasn't correctly kept in sync - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Improve btree_reserve_get_fail tracepointKent Overstreet
Now we include the return code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Convert EAGAIN errors to private error codesKent Overstreet
More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: btree_iter->ip_allocatedKent Overstreet
In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet
This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix a race with b->write_typeKent Overstreet
b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Btree split improvementKent Overstreet
This improves the bkey_format calculation when splitting btree nodes. Previously, we'd use a format calculated for the original node for the lower of the two new nodes. This was particularly bad on sequential insertions, where we iteratively split the last btree node, whos format has to include KEY_MAX. Now, we calculate formats precisely for the keys the two new nodes will contain. This also should make splitting a bit more efficient, since we're only copying keys once (from the original node to the new node, instead of new node, replacement node, then upper split). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Improved btree write statisticsKent Overstreet
This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Make error messages more uniformKent Overstreet
Use __func__ in error messages that refer to function name, and do so more uniformly - suggested by checkpatch.pl Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Assorted checkpatch fixesKent Overstreet
checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Call bch2_btree_update_add_new_node() before dropping write lockKent Overstreet
btree nodes can be written by other threads (shrinker, journal reclaim) with only a read lock, but brand new nodes should only be written by the thread doing the split/interior update. bch2_btree_update_add_new_node() sets btree node flags to indicate that this is a new node and should not be written out by other threads, thus we need to call it before dropping our write lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Btree splits now only take the locks they needKent Overstreet
Previously, bch2_btree_update_start() would always take all intent locks, all the way up to the root. We've finally got data from users where this became a scalability issue - so, this patch fixes bch2_btree_update_start() to only take the locks we need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofailKent Overstreet
Now that we have an error path plumbed through, there's no need to be using bch2_btree_node_lock_write_nofail(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Add error path to btree_split()Kent Overstreet
The next patch in the series is (finally!) going to change btree splits (and interior updates in general) to not take intent locks all the way up to the root - instead only locking the nodes they'll need to modify. However, this will be introducing a race since if we're not holding a write lock on a btree node it can be written out by another thread, and then we might not have enough space for a new bset entry. We can handle this by retrying - we just need to introduce a new error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Write new btree nodes after parent updateKent Overstreet
In order to avoid locking all btree nodes up to the root for btree node splits, we're going to have to introduce a new error path into bch2_btree_insert_node(); this mean we can't have done any writes or modified global state before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix a deadlock in btree_update_nodes_written()Kent Overstreet
btree_node_lock_nopath() is something we'd like to get rid of, it's always prone to deadlocks if we accidentally are holding other locks, because it doesn't mark the lock it's taking in a path: we'll want to get rid of it in the future, but for now this patch works it by calling bch2_trans_unlock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs; Mark __bch2_trans_iter_init as inlineKent Overstreet
This function is fairly small and only used in two places: one very hot, the other cold, so it should definitely be inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix blocking with locks heldKent Overstreet
This is a major oopsy - we should always be unlocking before calling closure_sync(), else we'll cause a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVEKent Overstreet
This fixes an obvious deadlock - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix error handling in bch2_btree_update_start()Kent Overstreet
We were checking for -EAGAIN, but we're not returned that when we didn't pass a closure to wait with - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: All held locks must be in a btree pathKent Overstreet
With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: bch2_btree_path_upgrade() now emits transaction restartKent Overstreet
Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Avoid using btree_node_lock_nopath()Kent Overstreet
With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Convert more locking code to btree_bkey_cached_commonKent Overstreet
Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: bch2_btree_node_lock_write_nofail()Kent Overstreet
Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: New locking functionsKent Overstreet
In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: Add persistent counters for all tracepointsKent Overstreet
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Fix bch2_btree_update_start() to return ↵Kent Overstreet
-BCH_ERR_journal_reclaim_would_deadlock On failure to get a journal pre-reservation because we're called from journal reclaim we're not supposed to return a transaction restart error - this fixes a livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Improve trans_restart_journal_preres_get tracepointKent Overstreet
It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Add assertions for unexpected transaction restartsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: Tracepoint improvementsKent Overstreet
Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: Don't set should_be_locked on paths that aren't lockedKent Overstreet
It doesn't make any sense to set should_be_locked on btree_paths that aren't locked, and is often a bug - this patch adds assertions and fixes some of those bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: EINTR -> BCH_ERR_transaction_restartKent Overstreet
Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: lock time stats prep work.Daniel Hill
We need the caller name and a place to store our results, btree_trans provides this. Signed-off-by: Daniel Hill <daniel@gluo.nz>
2023-06-20bcachefs: Rename __bch2_trans_do() -> commit_do()Kent Overstreet
Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: Always use percpu_ref_tryget_live() on c->writesKent Overstreet
If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-06-20bcachefs: Printbuf reworkKent Overstreet
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-20bcachefs: Tracepoint improvementsKent Overstreet
Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>