summaryrefslogtreecommitdiff
path: root/fs/bcachefs/bcachefs_format.h
AgeCommit message (Collapse)Author
9 daysbcachefs: split out lru_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: Disk space accounting rewriteKent Overstreet
Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: KEY_TYPE_accountingKent Overstreet
New key type for the disk space accounting rewrite. - Holds a variable sized array of u64s (may be more than one for accounting e.g. compressed and uncompressed size, or buckets and sectors for a given data type) - Updates are deltas, not new versions of the key: this means updates to accounting can happen via the btree write buffer, which we'll be teaching to accumulate deltas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: metadata version bucket_stripe_sectorsKent Overstreet
New on disk format version for bch_alloc->stripe_sectors and BCH_DATA_unstriped - accounting for unstriped data in stripe buckets. Upgrade/downgrade requires regenerating alloc info - but only if erasure coding is in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
9 daysbcachefs: BCH_DATA_unstripedKent Overstreet
Add a new pseudo data type, to track buckets that are members of a stripe, but have unstriped data in them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: Fix safe errors by defaultKent Overstreet
i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: Guard against overflowing LRU_TIME_BITSKent Overstreet
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
13 daysbcachefs: Fix btree ID bitmasksKent Overstreet
these should be 64 bit bitmasks, not 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out sb-errors_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out journal_seq_blacklist_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out replicas_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Split out disk_groups_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: split out sb-downgrade_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: split out sb-members_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Fix shift overflow in btree_lost_data()Kent Overstreet
Reported-by: syzbot+29f65db1a5fe427b5c56@syzkaller.appspotmail.com Fixes: 55936afe1107 ("bcachefs: Flag btrees with missing data") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.hPetr Vorel
Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC. There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC, which use UUID_INIT() and are used only in libbcachefs. Therefore move only BCACHEFS_STATFS_MAGIC value, which can be used outside of libbcachefs for f_type field in struct statfs in statfs() or fstatfs(). Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch_member.last_journal_bucketKent Overstreet
On recovery from clean shutdown we don't typically read the journal, but we still want to avoid overwriting existing entries in the journal for list_journal debugging. Thus, add some fields to the member info section so we can remember where we left off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAXKent Overstreet
Define a constant for the max superblock size, to avoid a too-large shift. Reported-by: syzbot+a8b0fb419355c91dda7f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Add a better limit for maximum number of bucketsKent Overstreet
The bucket_gens array is a single array allocation (one byte per bucket), and kernel allocations are still limited to INT_MAX. Check this limit to avoid failing the bucket_gens array allocation. Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-17bcachefs: KEY_TYPE_error is allowed for reflinkKent Overstreet
KEY_TYPE_error is left behind when we have to delete all pointers in an extent in fsck; it allows errors to be correctly returned by reads later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: bch_member.btree_allocated_bitmapKent Overstreet
This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Flag btrees with missing dataKent Overstreet
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: omit alignment attribute on big endian struct bkeyThomas Bertschinger
This is needed for building Rust bindings on big endian architectures like s390x. Currently this is only done in userspace, but it might happen in-kernel in the future. When creating a Rust binding for struct bkey, the "packed" attribute is needed to get a type with the correct member offsets in the big endian case. However, rustc does not allow types to have both a "packed" and "align" attribute. Thus, in order to get a Rust type compatible with the C type, we must omit the "aligned" attribute in C. This does not affect the struct's size or member offsets, only its toplevel alignment, which should be an acceptable impact. The little endian version can have the "align" attribute because the "packed" attr is redundant, and rust-bindgen will omit the "packed" attr when an "align" attr is present and it can do so without changing a type's layout Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: BTREE_ID_subvolume_childrenKent Overstreet
Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: bch_subvolume::fs_path_parentKent Overstreet
Record the filesystem path heirarchy for subvolumes in bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: jset_entry_datetimeKent Overstreet
This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: logged_ops_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: reflink_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; extents_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: ec_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: subvolume_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: snapshot_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: alloc_background_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: xattr_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: dirent_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: inode_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; quota_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: sb-counters_format.hKent Overstreet
bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: comment bch_subvolumeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch_snapshot::btimeKent Overstreet
Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Upgrades now specify errors to fix, like downgradesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: bch_member->seqKent Overstreet
Add new fields for split brain detection: - bch_member->seq, which tracks the sequence number of the last superblock write that happened to each member device - bch_sb->write_time, which tracks the time of the last superblock write, to allow detection of when two members have diverged but had the same number of superblock writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Check journal entries for invalid keys in trans commit pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix userspace build errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree write buffer now slurps keys from journalKent Overstreet
Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve btree write buffer tracepointsKent Overstreet
- add a tracepoint for write_buffer_flush_sync; this is expensive - fix the write_buffer_flush_slowpath tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill dev_usage->buckets_ecKent Overstreet
This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1Kent Overstreet
Prep work for introducing bch_replicas_entry_v2 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>