From c0e0f92431d4413dfc8a676f129a7a98ed7f2655 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Wed, 20 Nov 2019 13:52:15 -0500 Subject: Btree whiteouts doc --- BtreeWhiteouts.mdwn | 39 +++++++++++++++++++++++++++++++++++++++ Snapshots.mdwn | 10 +++++----- sidebar.mdwn | 2 ++ 3 files changed, 46 insertions(+), 5 deletions(-) create mode 100644 BtreeWhiteouts.mdwn diff --git a/BtreeWhiteouts.mdwn b/BtreeWhiteouts.mdwn new file mode 100644 index 0000000..b75646a --- /dev/null +++ b/BtreeWhiteouts.mdwn @@ -0,0 +1,39 @@ +Whiteout optimizations: +======================= + +bcachefs btree nodes are log structured, and thus are also effectively a hybrid +compacting data structure. Most btree node keys are in large sorted lists of +keys that are too big to be efficiently inserted to or deleted from - but since +we're not doing random updates on them, we can build special data structures to +highly accelerate lookups. + +Since we usually can't delete or update in place an existing key, this means we +need whiteouts, but whiteouts add their own complications. When we generate a +whiteout because we overwrite or delete an existing key, we may or may not need +to keep that whiteout around and write it out to disk: if the key we overwrote +had been written out to disk we need to ensure that we write something out to +disk that overwrites it. If we're updating that key - i.e. inserting a new +version of that key - then writing out the new key is sufficient, but if we were +deleting we need to ensure that that whiteout is kept around and written out to +disk. + +To track this we have the flag `bkey.needs_whiteout`: keys have this flag set +when they've been written out to disk, or when they overwrote something that was +written out to disk. + +Additionally: as mentioned, some whiteouts need to be retained until the next +btree node write, but we don't want to keep them mixed in with the rest of the +keys in a btree node where they'd have to be skipped over when we're iterating +over live keys. Therefore, when a given bset in a btree node has too many +whiteouts (as a fraction of the total amount of data in that bset), we do a +compact operation that drops whiteouts, saving the ones that need to be written +where the next btree node write will find it. + +Extents +------- + +Extents add their own complications. `KEY_TYPE_deleted` is used for whiteouts +for normal keys, and it's also used for extents that have been trimmed to 0 +size. But a 0 size extent is meaningless - these can always be dropped. +`KEY_TYPE_discard` is used for nonzero size extent whiteouts - i.e. whiteouts +that actually do overwrite other extents. diff --git a/Snapshots.mdwn b/Snapshots.mdwn index 185e4c3..c5d4527 100644 --- a/Snapshots.mdwn +++ b/Snapshots.mdwn @@ -158,9 +158,9 @@ extent pos to get extent 1. -2: |-----| -1: |----------------| + 2: |-----| + 1: |----------------| -2: |-----| -1: |----------------| -2: |------------| + 2: |-----| + 1: |----------------| + 2: |------------| diff --git a/sidebar.mdwn b/sidebar.mdwn index aa428e9..be2edb7 100644 --- a/sidebar.mdwn +++ b/sidebar.mdwn @@ -11,7 +11,9 @@ * Bcachefs developer documentation * [[Architecture]] * [[BtreeNodes]] + * [[BtreeWhiteouts]] * [[Encryption]] * [[Transactions]] + * [[Snapshots]] * Other technical docs * [[StablePages]] -- cgit v1.2.3