summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKent Overstreet <kent.overstreet@gmail.com>2019-11-20 13:52:15 -0500
committerKent Overstreet <kent.overstreet@gmail.com>2019-11-20 13:52:15 -0500
commitc0e0f92431d4413dfc8a676f129a7a98ed7f2655 (patch)
tree8e1b5eceb26b141a50963880acaf7616c6eddb81
parent8b95f9672373b80fe4b799b9def4f2969be8a082 (diff)
Btree whiteouts doc
-rw-r--r--BtreeWhiteouts.mdwn39
-rw-r--r--Snapshots.mdwn10
-rw-r--r--sidebar.mdwn2
3 files changed, 46 insertions, 5 deletions
diff --git a/BtreeWhiteouts.mdwn b/BtreeWhiteouts.mdwn
new file mode 100644
index 0000000..b75646a
--- /dev/null
+++ b/BtreeWhiteouts.mdwn
@@ -0,0 +1,39 @@
+Whiteout optimizations:
+=======================
+
+bcachefs btree nodes are log structured, and thus are also effectively a hybrid
+compacting data structure. Most btree node keys are in large sorted lists of
+keys that are too big to be efficiently inserted to or deleted from - but since
+we're not doing random updates on them, we can build special data structures to
+highly accelerate lookups.
+
+Since we usually can't delete or update in place an existing key, this means we
+need whiteouts, but whiteouts add their own complications. When we generate a
+whiteout because we overwrite or delete an existing key, we may or may not need
+to keep that whiteout around and write it out to disk: if the key we overwrote
+had been written out to disk we need to ensure that we write something out to
+disk that overwrites it. If we're updating that key - i.e. inserting a new
+version of that key - then writing out the new key is sufficient, but if we were
+deleting we need to ensure that that whiteout is kept around and written out to
+disk.
+
+To track this we have the flag `bkey.needs_whiteout`: keys have this flag set
+when they've been written out to disk, or when they overwrote something that was
+written out to disk.
+
+Additionally: as mentioned, some whiteouts need to be retained until the next
+btree node write, but we don't want to keep them mixed in with the rest of the
+keys in a btree node where they'd have to be skipped over when we're iterating
+over live keys. Therefore, when a given bset in a btree node has too many
+whiteouts (as a fraction of the total amount of data in that bset), we do a
+compact operation that drops whiteouts, saving the ones that need to be written
+where the next btree node write will find it.
+
+Extents
+-------
+
+Extents add their own complications. `KEY_TYPE_deleted` is used for whiteouts
+for normal keys, and it's also used for extents that have been trimmed to 0
+size. But a 0 size extent is meaningless - these can always be dropped.
+`KEY_TYPE_discard` is used for nonzero size extent whiteouts - i.e. whiteouts
+that actually do overwrite other extents.
diff --git a/Snapshots.mdwn b/Snapshots.mdwn
index 185e4c3..c5d4527 100644
--- a/Snapshots.mdwn
+++ b/Snapshots.mdwn
@@ -158,9 +158,9 @@ extent pos to get extent 1.
-2: |-----|
-1: |----------------|
+ 2: |-----|
+ 1: |----------------|
-2: |-----|
-1: |----------------|
-2: |------------|
+ 2: |-----|
+ 1: |----------------|
+ 2: |------------|
diff --git a/sidebar.mdwn b/sidebar.mdwn
index aa428e9..be2edb7 100644
--- a/sidebar.mdwn
+++ b/sidebar.mdwn
@@ -11,7 +11,9 @@
* Bcachefs developer documentation
* [[Architecture]]
* [[BtreeNodes]]
+ * [[BtreeWhiteouts]]
* [[Encryption]]
* [[Transactions]]
+ * [[Snapshots]]
* Other technical docs
* [[StablePages]]