From 1fec6f13837263d5abc35a87a78999fd26eb580f Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 22 Sep 2023 18:30:23 -0400 Subject: More website improvements - expand, reorg frontpage - update roadmap - new debugging page - new btree perf numbers Signed-off-by: Kent Overstreet --- Todo.mdwn | 157 -------------------------------------------------------------- 1 file changed, 157 deletions(-) delete mode 100644 Todo.mdwn (limited to 'Todo.mdwn') diff --git a/Todo.mdwn b/Todo.mdwn deleted file mode 100644 index 06fbdb5..0000000 --- a/Todo.mdwn +++ /dev/null @@ -1,157 +0,0 @@ -# TODO - -## Kernel developers - -### P0 features - - * Disk space accounting rework: now that we've got the btree write buffer, we - can store disk space accounting in btree keys (instead of the current hacky - mechanism where it's only stored in the journal). - - This will make it possible to store many more counters - meaning we'll be - able to store per-snapshot-id accounting. - - We also need to add separate counters for compressed/uncompressed size, and - broken out by compression type. - - It'd also be really nice to add more counters to the inode for - compressed/uncompressed size. Right now the inode just has `bi_sectors`, - which the ondisk size for fallocate purposes, i.e. discounting replication. - We could add something like - - bi_sectors_ondisk_compressed - bi_sectors_ondisk_uncompressed - -### P1 features - -### P2 features - - * When we're using compression, we end up wasting a fair amount of space on - internal fragmentation because compressed extents get rounded up to the - filesystem block size when they're written - usually 4k. It'd be really nice - if we could pack them in more efficiently - probably 512 byte sector - granularity. - - On the read side this is no big deal to support - we have to bounce - compressed extents anyways. The write side is the annoying part. The options - are: - * Buffer up writes when we don't have full blocks to write? Highly - problematic, not going to do this. - * Read modify write? Not an option for raw flash, would prefer it to not be - our only option - * Do data journalling when we don't have a full block to write? Possible - solution, we want data journalling anyways - - * Full data journalling - we're definitely going to want this for when the - journal is on an NVRAM device (also need to implement external journalling - (easy), and direct journal on NVRAM support (what's involved here?)). - - Would be good to get a simple implementation done and tested so we know what - the on disk format is going to be. - -## Developers - - * End user documentation needs a lot of work - complete man pages, etc. - - * bcachefs-tools needs some fleshing out in the --help department - -## Users - - * Benchmark bcachefs performance on different configurations: - (Note that these tests will have to be repeated in the future.) - * SSD - * HDD - * SSD + HDD - - * Update the website / Documentation - * Ask questions (so and we can update the documentation / website). - -## Wishlist - - * "Seed devices", hard-readonly devices that are CoWed from on write (btrfs - has this; useful for base devices for virtualization, among other things). - * Nonce-misuse-resistant authenticated encryption, such as [AES-SIV][AESSIV] - or HS1-SIV (Closes potential hole regarding nonce reuse and "external" - snapshots, as might happen to VMs or systems with externally-managed storage - like iSCSI). - * Some form of "secure delete" functionality. (However, see [this LWN article][secdel] - regarding implementation strategies and pitfalls). - * A simplified userspace API with no hierarchy, only blobs identified by - unique integer keys (eternaleye thinks this might be useful for - object-capability systems, such as [Robigalia][robigalia]). - * An API like the above, but supporting multiple streams per blob, possibly - with string identifiers (needs further examination, intent is to match the - [needs of CephFS for OSD backends][cephback]). - * More advanced caching algorithms; one potentially-relevant paper is - [Pannier: A Container-based Flash Cache for Compound Objects][pannier]. - * "Asymmetrical" compression algorithms, that support only decompression (XZ - is a nice candidate here, and would be a very good match for some seed - device use cases). - * RAID-6 with parity 3 or greater - could potentially use Andrea Mazzoleni's - [technique][triple-parity] for generating Cauchy matrices compatible with - Linux' current RAID-5 and RAID-6 formats, providing a clean upgrade path. - * "Inline" forward error correction, possibly using a fountain code like - [RaptorQ][RFC6330]. - * Support Trusted/Encrypted kernel keyring keys, in order to take advantage - of TPMs. - * Support for multiple key slots. - * LUKS2 has a new [keyslot system][LUKS2-keyslots] that better supports - two-factor auth and other external keying mechanisms. - * Ponder the ramifications of (and safe defaults for) compression in the - presence of encryption. - * Swap file support. - * Support (a subset of?) the Ext[234] attributes denoting special behaviors: - * `+c` for compressed files - * `+C` for disabling copy-on-write - * `+e` for extent-based storage (always set? btrfs doesn't set it...) - * `+E` for _displaying_ that a file is encrypted - * `+N` for _displaying_ that a file's contents are inlined into the inode - * `+s` for files that should be securely erased on delete - * `+u` for files that should permit being "undeleted" - * Others seem less relevant, but may be worth investigating. - * "Remote Subvolumes", subvolumes that reside on a subset of the filesystem's - physical devices and can be deactivated so that the physical devices can be - detached (if all subvolumes on them are remote and deactivated) without - unmounting the entire filesystem. - * Per-file allocation policies - highly useful for VM disks, potential zvol - killer. - * Conversion of multi-device filesystems (potentially relevant for btrfs) - * May be possible to use the new `getfsmap` ioctl; it seems to provide - device information, and it looks like it iterates the whole FS (perfect - for doing a whole-FS conversion, so it may even simplify single-device - cases). - * A fully-explicit mount syntax that does no scanning at all, possibly with - a purely-userspace mount.bcachefs helper for scanning that then invokes - the explicit form. Allows eliminating scanning/registration from the kernel. - * Strawman: mount -t bcachefs bcachefs -o dev=...,dev=... /mount/point - * Support the [`i_version` inode attribute][iversion], useful for NFSv4, - backup tools, indexers, and other things that want to have a counter - that updates whenever something changes about a file. - * Support DAX, to allow using NVM devices directly and reduce page-cache - pressure - * Caveat: May need careful thinking if the kernel assumes it can write - directly if DAX is supported; getting checksums invalidated would suck. - * Support converting filesystems not just to bcachefs volumes, but to image - files _on_ bcachefs volumes (and once reflinks are in, to both at once) - * Notably, if conversion made an image as the first step, this could allow - the subsequent extraction of files to bcachefs files a _completely - userspace process_, which merely reflinks ranges of content out. This - would also avoid the risks facing the current system, where the converter - notes where the data lives in the "inner" bcachefs image, and then the - content is moved out from under it (such as by GC, or btrfs rebalancing) - * Support "adopting" devices, by importing their content _into_ an existing - filesystem using the existing extents on the old device. In theory, could - supplant "adding a cache" in a minimally-invasive way. - * Add a bcachefs command to export a path as a block device (satisfy some - bcache use cases that bcachefs is clumsy for today, and in combination - with the above two items may make migration from other filesystems easier) - -[AESSIV]: https://tools.ietf.org/html/rfc5297 -[secdel]: https://lwn.net/Articles/462437/ -[robigalia]: https://robigalia.org -[cephback]: http://bryanapperson.com/blog/ceph-osd-performance/ -[pannier]: https://pdfs.semanticscholar.org/fa5f/3aa6de62e126e6fe2986c70a34e4d678860b.pdf -[triple-parity]: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg28964.html -[RFC6330]: https://tools.ietf.org/html/rfc6330 -[LUKS2-keyslots]: https://fosdem.org/2018/schedule/event/cryptsetup/attachments/slides/2506/export/events/attachments/cryptsetup/slides/2506/fosdem18_cryptsetup_aead.pdf#page=24 -[iversion]: https://jtlayton.wordpress.com/2016/12/16/the-inode-i_version-counter-in-linux/ -- cgit v1.2.3