2015-02-12bcache: Validate bkey formatbcache-dev-februaryKent Overstreet
2015-02-12bcache: Hook up new btree node fieldsKent Overstreet
2015-02-12bcache: New btree node formatKent Overstreet
2015-02-12bcache: Bkey format field offsetsKent Overstreet
2015-02-12bcache: Add accounting for nr packed/unpacked keysKent Overstreet
2015-02-12bcache: Add accounting for nr packed/unpacked keysKent Overstreet
2015-02-12bcache: New debugfs codeKent Overstreet
2015-02-12bcache: Move some assertions to debug buildsKent Overstreet
2015-02-12bcache: Pointer compression for btree_node_iterKent Overstreet
2015-02-12bcache: Drop btree_node_iter->b, btree_node_iter->sizeKent Overstreet
2015-02-12bcache: Packed bkeysKent Overstreet
2015-02-12bch_query_uuid now returns user_uuid of cache-set instead ofRaghu Krishnamurthy
bch_query_uuid now returns user_uuid of cache-set instead ofRaghu Krishnamurthy

set_uuid (which is internal uuid). Issue DAT-
2015-02-12bcache: Kill bch_btree_count_u64s()Kent Overstreet
Since we only merge extents when doing a full sort, we don't need this anymore.
2015-02-12bcache: Validate, show btree node sizeKent Overstreet
2015-02-12bcache: Make __ptr_invalid() more explicitKent Overstreet
also, more useful bkey_to_text()
2015-02-12bcache: Better inliningKent Overstreet
Inlining bch_bset_search() turned out to be a performance regression (inlining?), also we don't actually want to inline bch_btree_node_iter_push(), so do it like this instead.
2015-02-12bcache: Fix compiler warningsKent Overstreet
gcc complains about unused results with the old defition of EBUG_ON. -Werror was accidently turned off, turn it back on.
2015-02-12bcache: fix rare race on first startup of fresh cache setSlava Pestov
On the very first startup of a cache set, we would set CACHE_SYNC to false and then initialize the first journal entry by calling bch_journal_next(). If the allocator thread calls bch_journal_meta() in between these two steps, bad things could happen: a) we could dereference a NULL pointer because c->journal.cur wasn't set yet b) if we were in between setting c->journal.cur and pushing the first entry onto c-> journal_reclaim_fast() would hang when trying to pop elements off c-> Change the order of these two steps so that bch_journal_meta() doesn't try to get a journal reservation before bch_journal_next() has been called, and change the FIFO reclaim loop to BUG_ON if the FIFO is empty instead of just looping.
2015-02-12bcache: mca_alloc() never fails with -ENOMEMSlava Pestov
This would previously happen if all nodes in the cache were intent-locked. This is very unlikely to happen, so instead of failing IOs, just try to reap a node again. Issue DAT-1050
2015-02-12bcache: skip non-extents in bch_sectors_dirty_init()Slava Pestov
On bootup, we would count discards as dirty sectors on the backing device. This was wrong. This fixes a regression from "New bkey format" or "Don't insert deleted keys with nonzero size", depending on your political beliefs. Issue DAT-1844
2015-02-12bcache: add more dynamic faults for init and device add pathsSlava Pestov
Also fix bugs in device add path exposed by these. Issue DAT-1050
2015-02-12bcache: enforce minimum journal entry sizeSlava Pestov
This lowers write latency by reducing the likelyhood that we fill up a journal entry really quickly while the previous journal write is still in progress. Previously, this would often happen when we are near the end of a journal bucket. Now, we just skip to the next journal bucket if we can't get a 32KiB journal entry.
2015-02-12bcache: get c->verify mode working againSlava Pestov
Probably a waste of time, but I noticed it wasn't being tested in coverage reports.
2015-02-12bcache: Minor refactoringKent Overstreet
2015-02-12bcache: Better bch_btree_node_iter_verify()Kent Overstreet
Change-Id: Id0f750939bd99626f73dbdf2ed73757dcea7a6bf
2015-02-12bcache: Better inliningKent Overstreet
bch_bset_search() is now only called in the one place, so flatten that function instead
2015-02-12bcache: NO_IOKent Overstreet
2015-02-12bcache: don't evaluate EBUG_ON() expressions when not in debug modeKent Overstreet
2015-02-12bcache: Fix a null ptr deref in btree_iter_traverse()Kent Overstreet
btree_iter_node_set() does the lookup within the node, so we don't want to do it while the node is still empty...
2015-02-12bcache: remove over-eager BUG_ONSlava Pestov
This was added in "fix journal reclaim deadlock during journal replay" but we don't believe its actually helpful.
2015-02-12bcache: make discard work like it did before when version is zeroSlava Pestov
- list_extents ioctl skips discard keys with zero version - inserting discard key with zero version unconditionally deletes overlapping keys
2015-02-12bcache: fix init error pathSlava Pestov
If cache_set_alloc() fails before we add ourselves to sysfs, we would end up calling kobject_del() on a kobject that hasn't been added yet. This was exposed by the new init fault added recently.
2015-02-12bcache: kick off background journal reclaim eagerlySlava Pestov
This patch changes bch_journal_reclaim_fast() to return that reclaim is needed before we're completely out of journal space. This allows background reclaim to overlap with using more journal buckets from bch_journal_next_bucket(), eliminating stalls from the write path waiting on a journal reservation.
2015-02-12bcache: do btree node flushing in a work itemSlava Pestov
This is the first patch preparing us for background journal reclaim.
2015-02-12bcache: improve journal tracepointsSlava Pestov
2015-02-12bcache: rename c-> to c->journal.write_workSlava Pestov
2015-02-12bcache: rename journal_reclaim() to journal_next_bucket()Slava Pestov
This more accurately describes what it is doing.
2015-02-12bcache: don't run journal_reclaim() logic if current journal bucket has spaceSlava Pestov
We can just start a new entry in this case, only doing all the other stuff when the journal bucket is completely full.
2015-02-12bcache: move btree node flush to journal_reclaim()Slava Pestov
This is cleaner than having the caller do it.
2015-02-12bcache: don't need to call journal_reclaim() when kicking off journal writeSlava Pestov
2015-02-12bcache: fix faulty logic in bch_journal_res_get()Slava Pestov
If current journal entry was completely full, we should try to write it before doing a journal reclaim. Otherwise we might do a reclaim for nothing and just sit there waiting 10ms for the timer to write the entry. This fixes a very old regression from "journal reservations".
2015-02-12bcache: only wake up journal.wait if pin refcount is 0Slava Pestov
2015-02-12bcache: fix journal reclaim deadlock during journal replaySlava Pestov
Journal reclaim has to work during journal replay, because the allocator might need to invalidate buckets and write out prios and gens, or because we might need to set a new btree root. The recent patch "Fix journal replay" made this work by tracking reference counts on journal entries during replay in the same way that we do during normal operation, except that a reference is dropped once an entry has been replayed rather than dropping a reference when an entry has been written out. The problem with that patch is that we might start replay with a completely full journal, and be unable to add any new journal entries until the first bucket of entries has been replayed. If replaying the first bucket of entries required allocating buckets, we would deadlock in the allocator thread while waiting on a journal entry to write out prios and gens, because we would be unable to reclaim any journal buckets -- no entries have been replayed yet. Dig ourselves out of this hole by priming the allocator freelists with completely free buckets, by extending the existing logic to prime the PRIO freelist to prime all freelists. Also, wake up any threads waiting on reclaim when we drop a journal entry's reference count. Finally, add a BUG_ON() to ensure that flushing btree nodes makes forward progress during replay.
2015-02-12bcache: add BUG_ONs for suspected memory scribble around ↵Slava Pestov
Issue DAT-1868
2015-02-12bcache: fix NULL deref in init error pathSlava Pestov
This fixes a regression from "notify user space of state changes using kobject_uevent".
2015-02-12bcache: add remove_failed notificationSlava Pestov
Change-Id: I1c2b68248eefd77b48fc5deb15e2908db6ef3f28
2015-02-12bcache: fix handling of PTR_LOST_DEV keysSlava Pestov
If we lose all copies of a key, we want to fail the read, not treat it as a hole in the keyspace, so don't mark the key as deleted. Instead, set the key type to error. This fixes a regression from "New bkey format". Also, instead of having bch_extent_normalize() set a key's deleted flag, just return true if the key should be dropped. If the key is an extent and has no pointers, it becomes a discard. This could come up in bch_flag_key_bad(). If a cached key points to a device that has gone bad, we end up dropping all pointers from the key. This would cause us to insert a deleted key, which triggers a BUG_ON ever since "Don't insert deleted keys with nonzero size". What we want instead is to discard this range of the keyspace -- we just lost some cached data, which is not an error.
2015-02-12bcache: re-work bbio IO error reportingSlava Pestov
We can't call bch_notify_*() from atomic context, so move it to a new ca->io_error_work.
2015-02-12bcache: Add sysfs internal uuid attributeJacob Malevich
Signed-off-by: Jacob Malevich <> Issue DAT-1913
2015-02-12bcache: fix erroneous BUG_ON in journal.cSlava Pestov
It is fine for there to be a dirty journal entry with no keys, as long as JOURNAL_NEED_WRITE is also set, meaning the journal write is about to go down. This happens when bch_journal_meta() is called for example.
2015-02-12bcache: Drop bch_check_keys()Kent Overstreet
bch_btree_node_iter_next_check() is now able to check everything bch_check_keys() did.