summaryrefslogtreecommitdiff
path: root/Documentation/device-mapper/cache-policies.rst
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-07-09 12:34:26 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2019-07-09 12:34:26 -0700
commite9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (patch)
tree66dc466ff9aec0f9bb7f39cba50a47eab6585559 /Documentation/device-mapper/cache-policies.rst
parent7011b7e1b702cc76f9e969b41d9a95969f2aecaa (diff)
parent454f96f2b738374da4b0a703b1e2e7aed82c4486 (diff)
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
Diffstat (limited to 'Documentation/device-mapper/cache-policies.rst')
-rw-r--r--Documentation/device-mapper/cache-policies.rst131
1 files changed, 131 insertions, 0 deletions
diff --git a/Documentation/device-mapper/cache-policies.rst b/Documentation/device-mapper/cache-policies.rst
new file mode 100644
index 000000000000..b17fe352fc41
--- /dev/null
+++ b/Documentation/device-mapper/cache-policies.rst
@@ -0,0 +1,131 @@
+=============================
+Guidance for writing policies
+=============================
+
+Try to keep transactionality out of it. The core is careful to
+avoid asking about anything that is migrating. This is a pain, but
+makes it easier to write the policies.
+
+Mappings are loaded into the policy at construction time.
+
+Every bio that is mapped by the target is referred to the policy.
+The policy can return a simple HIT or MISS or issue a migration.
+
+Currently there's no way for the policy to issue background work,
+e.g. to start writing back dirty blocks that are going to be evicted
+soon.
+
+Because we map bios, rather than requests it's easy for the policy
+to get fooled by many small bios. For this reason the core target
+issues periodic ticks to the policy. It's suggested that the policy
+doesn't update states (eg, hit counts) for a block more than once
+for each tick. The core ticks by watching bios complete, and so
+trying to see when the io scheduler has let the ios run.
+
+
+Overview of supplied cache replacement policies
+===============================================
+
+multiqueue (mq)
+---------------
+
+This policy is now an alias for smq (see below).
+
+The following tunables are accepted, but have no effect::
+
+ 'sequential_threshold <#nr_sequential_ios>'
+ 'random_threshold <#nr_random_ios>'
+ 'read_promote_adjustment <value>'
+ 'write_promote_adjustment <value>'
+ 'discard_promote_adjustment <value>'
+
+Stochastic multiqueue (smq)
+---------------------------
+
+This policy is the default.
+
+The stochastic multi-queue (smq) policy addresses some of the problems
+with the multiqueue (mq) policy.
+
+The smq policy (vs mq) offers the promise of less memory utilization,
+improved performance and increased adaptability in the face of changing
+workloads. smq also does not have any cumbersome tuning knobs.
+
+Users may switch from "mq" to "smq" simply by appropriately reloading a
+DM table that is using the cache target. Doing so will cause all of the
+mq policy's hints to be dropped. Also, performance of the cache may
+degrade slightly until smq recalculates the origin device's hotspots
+that should be cached.
+
+Memory usage
+^^^^^^^^^^^^
+
+The mq policy used a lot of memory; 88 bytes per cache block on a 64
+bit machine.
+
+smq uses 28bit indexes to implement its data structures rather than
+pointers. It avoids storing an explicit hit count for each block. It
+has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
+the entries (each hotspot block covers a larger area than a single
+cache block).
+
+All this means smq uses ~25bytes per cache block. Still a lot of
+memory, but a substantial improvement nontheless.
+
+Level balancing
+^^^^^^^^^^^^^^^
+
+mq placed entries in different levels of the multiqueue structures
+based on their hit count (~ln(hit count)). This meant the bottom
+levels generally had the most entries, and the top ones had very
+few. Having unbalanced levels like this reduced the efficacy of the
+multiqueue.
+
+smq does not maintain a hit count, instead it swaps hit entries with
+the least recently used entry from the level above. The overall
+ordering being a side effect of this stochastic process. With this
+scheme we can decide how many entries occupy each multiqueue level,
+resulting in better promotion/demotion decisions.
+
+Adaptability:
+The mq policy maintained a hit count for each cache block. For a
+different block to get promoted to the cache its hit count has to
+exceed the lowest currently in the cache. This meant it could take a
+long time for the cache to adapt between varying IO patterns.
+
+smq doesn't maintain hit counts, so a lot of this problem just goes
+away. In addition it tracks performance of the hotspot queue, which
+is used to decide which blocks to promote. If the hotspot queue is
+performing badly then it starts moving entries more quickly between
+levels. This lets it adapt to new IO patterns very quickly.
+
+Performance
+^^^^^^^^^^^
+
+Testing smq shows substantially better performance than mq.
+
+cleaner
+-------
+
+The cleaner writes back all dirty blocks in a cache to decommission it.
+
+Examples
+========
+
+The syntax for a table is::
+
+ cache <metadata dev> <cache dev> <origin dev> <block size>
+ <#feature_args> [<feature arg>]*
+ <policy> <#policy_args> [<policy arg>]*
+
+The syntax to send a message using the dmsetup command is::
+
+ dmsetup message <mapped device> 0 sequential_threshold 1024
+ dmsetup message <mapped device> 0 random_threshold 8
+
+Using dmsetup::
+
+ dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
+ /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
+ creates a 128GB large mapped device named 'blah' with the
+ sequential threshold set to 1024 and the random_threshold set to 8.