summaryrefslogtreecommitdiff
path: root/IoTunables.mdwn
blob: 3228fc4a9d54204dfd58efc5784e43d63f276541 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
IO Tunables, Options, and Knobs

## What there is

These IO tunables can be set on a per-filesystem basis in
`/sys/fs/bcache/<FSUUID>`. Some can be overridden on a per-inode basis via
extended attributes; the xattr name will be listed in such cases.

### Metadata tunables

- `metadata_checksum`
    - What checksum algorithm to use for metadata
    - Default: `crc32c`
    - Valid values: `none`, `crc32c`, `crc64`
- `metadata_replicas`
    - The number of replicas to keep for metadata
    - Default: `1`
- `metadata_replicas_required`
    - The minimum number of replicas to tolerate for metadata
    - Mount-only (not in sysfs)
    - Default: `1`
- `str_hash`
    - The hash algorithm to use for dentries and xattrs
    - Default: `siphash`
    - Valid values: `crc32c`, `crc64`, `siphash`

### Data tunables

- `data_checksum`
    - What checksum algorithm to use for data
    - Extended attribute: `bcachefs.data_checksum`
    - Default: `crc32c`
    - Valid values: `none`, `crc32c`, `crc64`
- `data_replicas`
    - The number of replicas to keep for data
    - Extended attribute: `bcachefs.data_replicas`
    - Default: `1`
- `data_replicas_required`
    - The minimum number of replicas to tolerate for data
    - Mount-only (not in sysfs)
    - Default: `1`

### Foreground tunables

- `compression`
    - What compression algorithm to use for foreground writes
    - Extended attribute: `bcachefs.compression`
    - Default: `none`
    - Valid values: `none`, `lz4`, `gzip`, `zstd`
- `foreground_target`
    - Extended attribute: `bcachefs.foreground_target`
    - What disk group foreground writes should prefer (may use other disks if
      sufficient replicas are not available in-group)

### Background tunables

- `background_compression`
    - What compression algorithm to recompress to in the background
    - Extended attribute: `bcachefs.background_compression`
    - Default: `none`
    - Valid values: `none`, `lz4`, `gzip`, `zstd`
- `background_target`
    - Extended attribute: `bcachefs.background_target`
    - What disk group data should be written back to in the background (may
      use other disks if sufficient replicas are not available in-group)

### Promote (cache) tunables

- `promote_target`
    - What disk group data should be copied to when frequently accessed
    - Extended attribute: `bcachefs.promote_target`

## What would be nice

- Different replication(/raid) settings between FG, BG, and promote
    - For example, replicated foreground across all SSDs (durability with low
      latency), RAID-6 background (sufficient durability, optimal space
      overhead per durability), and RAID-0/single promote (maximum performance,
      minimum cache footprint)

- Different compression for promote
    - For example, uncompressed foreground (minimize latency), zstd background
      (minimize size), and lz4 promote (save cache space, but prioritize speed)

- Fine-grained tuning of stripes vs. replicas vs. parity?
    - Would allow configuring N stripes + M parity, so trading off throughput
      (more stripes) vs. durability (more parity or replicas) vs. space
      efficiency (fewer replicas, higher stripes/parity ratio)
    - Well defined: (N stripes, R replicas, P parity) = (N * R + P) total
      chunks, with each stripe replicated R times, and P parity chunks to
      recover from if those are insufficient.
    - Can be implemented with P <= 2 for now, possibly extended later using
      Andrea Mazzoleni's compatible approach (see [[Todo]], wishlist section)

- Writethrough caching
    - Does this basically obviate 'foreground' as a category, and instead
      write directly into background and promote?
    - What constraints does this place on (say) parity, given the plan for
      it seems to be to restripe and pack as a migration action?