diff options
Diffstat (limited to 'BtreeWhiteouts.mdwn')
-rw-r--r-- | BtreeWhiteouts.mdwn | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/BtreeWhiteouts.mdwn b/BtreeWhiteouts.mdwn new file mode 100644 index 0000000..b75646a --- /dev/null +++ b/BtreeWhiteouts.mdwn @@ -0,0 +1,39 @@ +Whiteout optimizations: +======================= + +bcachefs btree nodes are log structured, and thus are also effectively a hybrid +compacting data structure. Most btree node keys are in large sorted lists of +keys that are too big to be efficiently inserted to or deleted from - but since +we're not doing random updates on them, we can build special data structures to +highly accelerate lookups. + +Since we usually can't delete or update in place an existing key, this means we +need whiteouts, but whiteouts add their own complications. When we generate a +whiteout because we overwrite or delete an existing key, we may or may not need +to keep that whiteout around and write it out to disk: if the key we overwrote +had been written out to disk we need to ensure that we write something out to +disk that overwrites it. If we're updating that key - i.e. inserting a new +version of that key - then writing out the new key is sufficient, but if we were +deleting we need to ensure that that whiteout is kept around and written out to +disk. + +To track this we have the flag `bkey.needs_whiteout`: keys have this flag set +when they've been written out to disk, or when they overwrote something that was +written out to disk. + +Additionally: as mentioned, some whiteouts need to be retained until the next +btree node write, but we don't want to keep them mixed in with the rest of the +keys in a btree node where they'd have to be skipped over when we're iterating +over live keys. Therefore, when a given bset in a btree node has too many +whiteouts (as a fraction of the total amount of data in that bset), we do a +compact operation that drops whiteouts, saving the ones that need to be written +where the next btree node write will find it. + +Extents +------- + +Extents add their own complications. `KEY_TYPE_deleted` is used for whiteouts +for normal keys, and it's also used for extents that have been trimmed to 0 +size. But a 0 size extent is meaningless - these can always be dropped. +`KEY_TYPE_discard` is used for nonzero size extent whiteouts - i.e. whiteouts +that actually do overwrite other extents. |