summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_io.c
AgeCommit message (Collapse)Author
2020-06-15bcachefs: btree_bkey_cached_commonKent Overstreet
This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Improve assorted error messagesKent Overstreet
This also consolidates the various checks in bch2_mark_pointer() and bch2_trans_mark_pointer(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Validate that we read the correct btree nodeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Fix two more deadlocksKent Overstreet
Deadlock on shutdown: btree_update_nodes_written() unblocks btree nodes from being written; after doing so, it has to check if they were marked as needing to be written and if so kick off those writes - if that doesn't happen, we'll never release journal pins and shutdown will get stuck when flushing the journal. There was an error path where this didn't happen, because in the error path we don't actually want those btree nodes write to happen; however, we still have to kick off the write path so the journal pins get released. The btree write path checks if we're in a journal error state and doesn't do the actual write if we are. Also - there was another deadlock because btree_update_nodes_written() was taking the btree update off of the unwritten_list too soon - before getting a journal reservation, which could fail and have to be retried. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Kill bkey_type_successorKent Overstreet
Previously, BTREE_ID_INODES was special - inodes were indexed by the inode field, which meant the offset field of struct bpos wasn't used, which led to special cases in e.g. the btree iterator code. Now, inodes in the inodes btree are indexed by the offset field. Also: prevously min_key was special for extents btrees, min_key for extents would equal max_key for the previous node. Now, min_key = bkey_successor() of the previous node, same as non extent btrees. This means we can completely get rid of btree_type_sucessor/predecessor. Also make some improvements to the metadata IO validate/compat code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Use memalloc_nofs_save()Kent Overstreet
vmalloc allocations don't always obey GFP_NOFS - memalloc_nofs_save() is the prefered approach for the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Journal updates to interior nodesKent Overstreet
Previously, the btree has always been self contained and internally consistent on disk without anything from the journal - the journal just contained pointers to the btree roots. However, this meant that btree node split or compact operations - i.e. anything that changes btree node topology and involves updates to interior nodes - would require that interior btree node to be written immediately, which means emitting a btree node write that's mostly empty (using 4k of space on disk if the filesystemm blocksize is 4k to only write perhaps ~100 bytes of new keys). More importantly, this meant most btree node writes had to be FUA, and consumer drives have a history of slow and/or buggy FUA support - other filesystes have been bit by this. This patch changes the interior btree update path to journal updates to interior nodes, after the writes for the new btree nodes have completed. Best of all, it turns out to simplify the interior node update path somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Move extent overwrite handling out of core btree codeKent Overstreet
Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Improve an error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookupKent Overstreet
Nice performance optimization Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: btree_ptr_v2Kent Overstreet
Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: introduce b->hash_valKent Overstreet
This is partly prep work for introducing bch_btree_ptr_v2, but it'll also be a bit of a performance boost by moving the full key out of the hot part of struct btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Fix bch2_ptr_swab for indirect extentsKent Overstreet
bch2_ptr_swab was never updated when the code for generic keys with pointers was added - it assumed the entire val was only used for pointers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Use KEY_TYPE_deleted whitouts for extentsKent Overstreet
Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Kill btree_node_iter_largeKent Overstreet
Long overdue cleanup - this converts btree_node_iter_large uses to sort_iter. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Use one buffer for sorting whiteoutsKent Overstreet
We're not really supposed to allocate from the same mempool more than once. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Refactor whiteouts compactionKent Overstreet
The whiteout compaction path - as opposed to just dropping whiteouts - is now only needed for extents, and soon will only be needed for extent btree nodes in the old format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Whiteout changesKent Overstreet
More prep work for snapshots: extents will soon be using KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to keep these whiteouts with the rest of the extents in the btree node, due to sorting invariants breaking. We can deal with this by immediately moving the new whiteouts to the unwritten whiteouts area - this just means those whiteouts won't be sorted, so we need new code to sort them prior to merging them with the rest of the keys to be written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: bkey noopsKent Overstreet
For upcoming inline data extents, we're going to need to be able to shorten the value of existing bkeys in the btree - and to make that work we're going to be able to need to pad out the space the value previously took up with something. This patch changes the various code that iterates over bkeys to handle k->u64s == 0 as meaning "skip the next 8 bytes". Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Don't use FUA unnecessarilyKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Kill direct access to bi_io_vecKent Overstreet
Switch to always using bio_add_page(), which merges contiguous pages now that we have multipage bvecs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: More work to avoid transaction restartsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Rewrite journal_seq_blacklist machineryKent Overstreet
Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2020-06-09bcachefs: Initial commitKent Overstreet
Fork of drivers/md/bcache Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>