summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-22thread_with_file: Lift from bcachefstime_stats_twfKent Overstreet
thread_with_file and thread_with_stdio are abstractions for connecting kthreads to file descriptors, which is handy for all sorts of things - the running kthread has its lifetime connected to the file descriptor, which means an asynchronous job running in the kernel can easily exit in response to a ctrl-c, and the file descriptor also provides a communications channel. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22darray: lift from bcachefsKent Overstreet
dynamic arrays - inspired from CCAN darrays, basically c++ stl vectors. Used by thread_with_stdio, which is also being lifted from bcachefs for xfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcache: Convert to lib/time_statsKent Overstreet
delete bcache's time stats code, convert to newer version from bcachefs. example output: root@moria-kvm:/sys/fs/bcache/bdaedb8c-4554-4dd2-87e4-276e51eb47cc# cat internal/btree_sort_times count: 6414 since mount recent duration of events min: 440 ns max: 1102 us total: 674 ms mean: 105 us 102 us stddev: 101 us 88 us time between events min: 881 ns max: 3 s mean: 7 ms 6 ms stddev: 52 ms 6 ms Cc: Coly Li <colyli@suse.de> Cc: linux-bcache@vger.kernel.org Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22time_stats: report information in json formatDarrick J. Wong
Export json versions of time statistics information. Given the tabular nature of the numbers exposed, this will make it a lot easier for higher (than C) level languages (e.g. python) to import information without needing to write yet another clumsy string parser. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22time_stats: Promote to lib/Kent Overstreet
Library code from bcachefs for tracking latency measurements. The main interface is time_stats_update(stats, start_time); which collects a new event with an end time of the current time. It features percpu buffering of input values, making it very low overhead, and nicely formatted output to printbufs or seq_buf. Sample output, from the bcache conversion: root@moria-kvm:/sys/fs/bcache/bdaedb8c-4554-4dd2-87e4-276e51eb47cc# cat internal/btree_sort_times count: 6414 since mount recent duration of events min: 440 ns max: 1102 us total: 674 ms mean: 105 us 102 us stddev: 101 us 88 us time between events min: 881 ns max: 3 s mean: 7 ms 6 ms stddev: 52 ms 6 ms Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2024-03-22eytzinger: Promote to include/linux/Kent Overstreet
eytzinger trees are a faster alternative to binary search. They're a bit more expensive to setup, but lookups perform much better assuming the tree isn't entirely in cache. Binary search is a worst case scenario for branch prediction and prefetching, but eytzinger trees have children adjacent in memory and thus we can prefetch before knowing the result of a comparison. An eytzinger tree is a binary tree laid out in an array, with the same geometry as the usual binary heap construction, but used as a search tree instead. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22mean and variance: Promote to lib/mathKent Overstreet
Small statistics library, for taking in a series of value and computing mean, weighted mean, standard deviation and weighted deviation. The main use case is for statistics on latency measurements. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Daniel Hill <daniel@gluo.nz> Cc: Darrick J. Wong <djwong@kernel.org>
2024-03-22time_stats: allow custom epoch namesDarrick J. Wong
Let callers of time_stats_to_seq_buf define the epoch name; "mount" doesn't make sense generally. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22time_stats: don't print any output if event count is zeroDarrick J. Wong
There's no point in printing an empty report for no data, so add a flag that allows us to do that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22time_stats: report lifetime of the stats objectDarrick J. Wong
Capture the initialization time of the time_stats object so that we can report how long the counter has been observing data. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: bch2_time_stats_to_seq_buf()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: create debugfs dir for each btreeThomas Bertschinger
This creates a subdirectory for each individual btree under the btrees/ debugfs directory. Directory structure, before: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc ├── alloc-bfloat-failed ├── alloc-formats ├── backpointers ├── backpointers-bfloat-failed ├── backpointers-formats ... Directory structure, after: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc │   ├── bfloat-failed │   ├── formats │   └── keys ├── backpointers │   ├── bfloat-failed │   ├── formats │   └── keys ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Handle empty btree roots in topology repairKent Overstreet
With the new btree node scan code, we can now recover from corrupt btree roots - simply create a new fake root at depth 1, and then insert all the leaves we found. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Repair code for missing btree rootsKent Overstreet
If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements that scanning and reconstruction - for now just for roots, not other interior nodes - we can now recover from a lost btree root without losing data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Don't skip fake btree roots in fsckKent Overstreet
When a btree root is unreadable, we might still have keys fro the journal to walk and mark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Limited recreation of missing snapshot nodesKent Overstreet
A snapshot node that goes missing will typically cause deletion of all data associated with that snapshot ID - not a desirable outcome, but the usual expected inconsistency is data being left around after snapshot deletion. But if we know that snapshot nodes have gone missing, we should instead attempt to reconstruct. We can't do much because the snapshot tree topology is gone if there were multiple snapshots, but we can at least retain data for the snapshot ID directly pointed to by a given subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Flag btrees with missing dataKent Overstreet
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Fix bch2_btree_increase_depth()Kent Overstreet
When we haven't yet allocated any btree nodes for a given btree, we first need to call the regular split path to allocate one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Add checks for invalid snapshot IDsKent Overstreet
Previously, we assumed that keys were consistent with the snapshots btree - but that's not correct as fsck may not have been run or may not be complete. This adds checks and error handling when using the in-memory snapshots table (that mirrors the snapshots btree). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Move snapshot table size to struct snapshot_tableKent Overstreet
We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: bch2_shoot_down_journal_keys()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text()Thomas Bertschinger
before: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0bi_atime=227064388944 ... after: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0 bi_atime=227064388944 ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Fix journal pins in btree write bufferKent Overstreet
btree write buffer flush has two phases - in natural key order, which is more efficient but may fail - then in journal order The journal order flush was assuming that keys were still correctly ordered by journal sequence number - but due to coalescing by the previous phase, we need an additional sort. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-22bcachefs: Fix assert in bch2_backpointer_invalid()Kent Overstreet
Backpointers that point to invalid devices are caught by fsck, not .key_invalid; so .key_invalid needs to check for them instead of hitting asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Fix lost wakeup on journal shutdownbcachefs-2024-03-19Kent Overstreet
We need to check for journal shutdown first in __journal_res_get() - after the journal is shutdown, j->watermark won't be changing anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs; Fix deadlock in bch2_btree_update_start()Kent Overstreet
BCH_TRANS_COMMIT_journal_reclaim with watermark != BCH_WATERMARK_reclaim means nonblocking, and we need the journal_res_get() in btree_update_start() to respect that. In a future refactoring we'll be deleting BCH_TRANS_COMMIT_journal_reclaim and replacing it with an explicit BCH_TRANS_COMMIT_nonblocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: ratelimit errors from async_btree_node_rewriteKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Run check_topology() firstKent Overstreet
check_topology() doesn't actually require alloc info - and running it first means other passes don't have to catch btree read errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Fix lost transaction restart errorKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Don't corrupt journal keys gap buffer when dropping alloc infoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: fix for building in userspaceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recoveryKent Overstreet
this fixes an assertion pop in bch2_check_snapshot_trees() -> check_snapshot_tree() -> bch2_snapshot_tree_master_subvol() -> bch2_snapshot_is_ancestor() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()Kent Overstreet
Nested transaction restart handling is typically best avoided; when the inner context handles a transaction restart it invalidates the outer transaction context, so we need to make sure to return a transaction_restart_nested error. This code wasn't doing that, and hit the assertion in for_each_btree_key() that checks for that via trans->restart_count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Improve sysfs internal/btree_updatesKent Overstreet
Print out the function that launched the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Split out btree_node_rewrite_workerKent Overstreet
This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix locking in bch2_alloc_write_key()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Avoid extent entry type assertions in .invalid()Kent Overstreet
After keys have passed bkey_ops.key_invalid we should never see invalid extent entry types - but .key_invalid itself needs to cope with them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix spurious -BCH_ERR_transaction_restart_nestedKent Overstreet
We only need to return transaction_restart_nested when we're inside a context that's handling transaction restarts. Also, add a missing check_subdir_count() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Fix check_key_has_snapshot() callKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Change "accounting overran journal reservation" to a warningKent Overstreet
This doesn't need to be a BUG_ON(); the actual serious "things break" condition is if the whole journal write overruns the available space, and that has a fatal error, not a BUG_ON(). This check indicates we screwed something up, but it should be a warning. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: time_stats: shrink time_stat_buffer for better alignmentbcachefs-2024-03-13Darrick J. Wong
Shrink this percpu object by one array element so that the object size becomes exactly 512 bytes. This will lead to more efficient memory use, hopefully. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: time_stats: split stats-with-quantiles into a separate structureDarrick J. Wong
Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a dietDarrick J. Wong
The only caller of this code (time_stats) always knows the weights and whether or not any information has been collected. Pass this information into the mean and variance code so that it doesn't have to store that information. This reduces the structure size from 24 to 16 bytes, which shrinks each time_stats counter to 192 bytes from 208. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: time_stats: add larger unitsDarrick J. Wong
Filesystems can stay mounted for a very long time, so add some larger units. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: pull out time_stats.[ch]Kent Overstreet
prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: reconstruct_alloc cleanupKent Overstreet
Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: fix bch_folio_sector paddingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix btree key cache coherency during replayKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>