diff options
author | Kent Overstreet <kent.overstreet@gmail.com> | 2019-11-20 16:51:08 -0500 |
---|---|---|
committer | Kent Overstreet <kent.overstreet@gmail.com> | 2019-11-20 16:51:08 -0500 |
commit | d29fc4f7f90184f21a4fd6277d00634edfed5a88 (patch) | |
tree | 4432fdda6e96d79e6e5789059efd5823977f748b | |
parent | c0e0f92431d4413dfc8a676f129a7a98ed7f2655 (diff) |
more whiteouts
-rw-r--r-- | BtreeWhiteouts.mdwn | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/BtreeWhiteouts.mdwn b/BtreeWhiteouts.mdwn index b75646a..1775bfa 100644 --- a/BtreeWhiteouts.mdwn +++ b/BtreeWhiteouts.mdwn @@ -7,6 +7,9 @@ keys that are too big to be efficiently inserted to or deleted from - but since we're not doing random updates on them, we can build special data structures to highly accelerate lookups. +Optimization: tracking when whiteouts need to be retained/written out: +---------------------------------------------------------------------- + Since we usually can't delete or update in place an existing key, this means we need whiteouts, but whiteouts add their own complications. When we generate a whiteout because we overwrite or delete an existing key, we may or may not need @@ -21,6 +24,9 @@ To track this we have the flag `bkey.needs_whiteout`: keys have this flag set when they've been written out to disk, or when they overwrote something that was written out to disk. +Optimization: storing whiteouts separately from other keys +---------------------------------------------------------- + Additionally: as mentioned, some whiteouts need to be retained until the next btree node write, but we don't want to keep them mixed in with the rest of the keys in a btree node where they'd have to be skipped over when we're iterating @@ -29,6 +35,19 @@ whiteouts (as a fraction of the total amount of data in that bset), we do a compact operation that drops whiteouts, saving the ones that need to be written where the next btree node write will find it. +Optimization: deleting keys without emitting a new whiteout +----------------------------------------------------------- + +We can delete a key, even one that's been written out to disk, without emitting +a new whiteout (because that would require inserting into the last bset and an +expensive memmove). + +This works by changing the key type to `KEY_TYPE_deleted` - as usual whenever we +overwrite an existing key - and leaving `needs_whiteout` set for that key, and +additionally calling `reserve_whiteout()` to reserve space in the next btree +write. The btree write code will scan the already-written bsets for whiteouts +that need to be written and pick them up. + Extents ------- |