1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
|
# Bcachefs
Bcachefs is an advanced new filesystem for Linux, with an emphasis on
reliability and robustness: it's the COW filesystem that won't lose your data.
It has a long list of features, completed or in progress:
* Copy on write (COW) - like zfs or btrfs
* Full data and metadata checksumming
* Multiple devices, including replication and other types of RAID
* Caching
* Compression
* Encryption
* Snapshots
* Scalable - has been tested to 50+ TB, will eventually scale far higher
* Already working and stable, with a small community of users
We prioritize robustness and reliability over features and hype: we make every
effort to ensure you won't lose data. It's building on top of a codebase with a
pedigree - bcache already has a reasonably good track record for reliability
(particularly considering how young upstream bcache is, in terms of engineer
man/years). Starting from there, bcachefs development has prioritized
incremental development, and keeping things stable, and aggressively fixing
design issues as they are found; the bcachefs codebase is considerably more
robust and mature than upstream bcache.
Developing a filesystem is also not cheap or quick or easy; we need funding!
Please chip in on [[Patreon|https://www.patreon.com/bcachefs]] - the Patreon
page also has more information on the motivation for bcachefs and the state of
Linux filesystems, as well as some bcachefs status updates and information on
development.
If you don't want to use Patreon, I'm also happy to take donations via paypal:
kent.overstreet@gmail.com.
Join us in the bcache IRC channel, we have a small group of bcachefs users and
testers there: #bcache on OFTC (irc.oftc.net).
## Getting started
Bcachefs is not yet upstream - you'll have to build a kernel to use it.
First, check out the bcache kernel and tools repositories:
git clone https://evilpiepirate.org/git/bcachefs.git
git clone https://evilpiepirate.org/git/bcachefs-tools.git
Build and install as usual - make sure you enable `CONFIG_BCACHE_FS` Then, to
format and mount a single device with the default options, run:
bcachefs format /dev/sda1
mount -t bcachefs /dev/sda1 /mnt
See `bcachefs format --help` for more options.
## Documentation
End user documentation is currently fairly minimal; this would be a very helpful
area for anyone who wishes to contribute - I would like the bcache man page in
the bcache-tools repository to be rewritten and expanded.
## Status
Bcachefs can currently be considered beta quality. It has a small pool of
outside users and has been stable for quite some time now; there's no reason
to expect issues as long as you stick to the currently supported feature set.
However, given that it's still under active development backups are a good idea.
It's been passing all xfstests for well over a year.
Performance is generally quite good - generally faster than btrfs, and not far
behind xfs/ext4. On metadata intensive benchmarks, it's often considerably
faster than xfs/ext4/btrfs.
Normal posix filesystem functionality is all finished - if you're using bcachefs
as a replacement for ext4 on a desktop, you shouldn't find anything missing. For
servers, NFS export support is still missing (but coming soon) and we don't yet
support quotas (probably further off).
Up until bcachefs goes upstream I reserve the right to change the on disk format
if necessary, but I'm not expecting any more incompatible disk format changes.
### Feature status
- Full data checksumming
Fully supported and enabled by default. We do need to implement scrubbing,
once we've got replication and can take advantage of it.
- Compression
Not _quite_ finished - it's safe to enable, but there's some work left
related to copy GC before we can enable free space accounting based on
compressed size: right now, enabling compression won't actually let you store
any more data in your filesystem than if the data was uncompressed
- Tiering/writeback caching:
Bcachefs allows you to assign devices to different tiers - the faster tier
will effectively be used as a writeback cache for the slower tier, and
metadata will be pinned in the faster tier.
Basic tiering functionality works, but it's not (yet) as configurable as
bcache's caching (e.g. you can't specify writethrough caching).
- Replication
All the core functionality is complete, and it's getting close to usable: you
can create a multi device filesystem with replication, and then while the
filesystem is in use take one device offline without any loss of
availability.
- [[Encryption]]
Whole filesystem AEAD style encryption (with ChaCha20 and Poly1305) is done
and merged. I would suggest not relying on it for anything critical until the
code has seen more outside review, though.
- Snapshots
Snapshot implementation has been started, but snapshots are by far the most
complex of the remaining features to implement - it's going to be quite
awhile before I can dedicate enough time to finishing them, but I'm very much
looking forward to showing off what it'll be able to do.
### Known issues/caveats
- Mount time
We currently walk all metadata at mount time (multiple times, in fact) - on
flash this shouldn't even be noticeable unless your filesystem is very large,
but on rotating disk expect mount times to be slow.
This will be addressed in the future - mount times will likely be the next
big push after the next big batch of on disk format changes.
## Todo list
### Current priorities:
* Replication
* Compression is almost done: it's quite thoroughly tested, the only remaining
issue is a problem with copygc fragmenting existing compressed extents that
only breaks accounting.
* NFS export support is almost done: implementing i_generation correctly
required some new transaction machinery, but that's mostly done. What's left
is implementing a new kind of reservation of journal space for the new, long
running transactions.
### Other wishlist items:
* When we're using compression, we end up wasting a fair amount of space on
internal fragmentation because compressed extents get rounded up to the
filesystem block size when they're written - usually 4k. It'd be really nice
if we could pack them in more efficiently - probably 512 byte sector
granularity.
On the read side this is no big deal to support - we have to bounce
compressed extents anyways. The write side is the annoying part. The options
are:
* Buffer up writes when we don't have full blocks to write? Highly
problematic, not going to do this.
* Read modify write? Not an option for raw flash, would prefer it to not be
our only option
* Do data journalling when we don't have a full block to write? Possible
solution, we want data journalling anyways
* Inline extents - good for space efficiency for both small files, and
compression when extents happen to compress particularly well.
* Full data journalling - we're definitely going to want this for when the
journal is on an NVRAM device (also need to implement external journalling
(easy), and direct journal on NVRAM support (what's involved here?)).
Would be good to get a simple implementation done and tested so we know what
the on disk format is going to be.
|