summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/bcachefs/SubmittingPatches.rst98
-rw-r--r--Documentation/filesystems/bcachefs/index.rst1
-rw-r--r--Documentation/filesystems/fuse-io-uring.rst99
-rw-r--r--Documentation/filesystems/index.rst1
-rw-r--r--Documentation/filesystems/locking.rst7
-rw-r--r--Documentation/filesystems/porting.rst16
-rw-r--r--Documentation/filesystems/vfs.rst24
7 files changed, 244 insertions, 2 deletions
diff --git a/Documentation/filesystems/bcachefs/SubmittingPatches.rst b/Documentation/filesystems/bcachefs/SubmittingPatches.rst
new file mode 100644
index 000000000000..026b12ae0d6a
--- /dev/null
+++ b/Documentation/filesystems/bcachefs/SubmittingPatches.rst
@@ -0,0 +1,98 @@
+Submitting patches to bcachefs:
+===============================
+
+Patches must be tested before being submitted, either with the xfstests suite
+[0], or the full bcachefs test suite in ktest [1], depending on what's being
+touched. Note that ktest wraps xfstests and will be an easier method to running
+it for most users; it includes single-command wrappers for all the mainstream
+in-kernel local filesystems.
+
+Patches will undergo more testing after being merged (including
+lockdep/kasan/preempt/etc. variants), these are not generally required to be
+run by the submitter - but do put some thought into what you're changing and
+which tests might be relevant, e.g. are you dealing with tricky memory layout
+work? kasan, are you doing locking work? then lockdep; and ktest includes
+single-command variants for the debug build types you'll most likely need.
+
+The exception to this rule is incomplete WIP/RFC patches: if you're working on
+something nontrivial, it's encouraged to send out a WIP patch to let people
+know what you're doing and make sure you're on the right track. Just make sure
+it includes a brief note as to what's done and what's incomplete, to avoid
+confusion.
+
+Rigorous checkpatch.pl adherence is not required (many of its warnings are
+considered out of date), but try not to deviate too much without reason.
+
+Focus on writing code that reads well and is organized well; code should be
+aesthetically pleasing.
+
+CI:
+===
+
+Instead of running your tests locally, when running the full test suite it's
+prefereable to let a server farm do it in parallel, and then have the results
+in a nice test dashboard (which can tell you which failures are new, and
+presents results in a git log view, avoiding the need for most bisecting).
+
+That exists [2], and community members may request an account. If you work for
+a big tech company, you'll need to help out with server costs to get access -
+but the CI is not restricted to running bcachefs tests: it runs any ktest test
+(which generally makes it easy to wrap other tests that can run in qemu).
+
+Other things to think about:
+============================
+
+- How will we debug this code? Is there sufficient introspection to diagnose
+ when something starts acting wonky on a user machine?
+
+ We don't necessarily need every single field of every data structure visible
+ with introspection, but having the important fields of all the core data
+ types wired up makes debugging drastically easier - a bit of thoughtful
+ foresight greatly reduces the need to have people build custom kernels with
+ debug patches.
+
+ More broadly, think about all the debug tooling that might be needed.
+
+- Does it make the codebase more or less of a mess? Can we also try to do some
+ organizing, too?
+
+- Do new tests need to be written? New assertions? How do we know and verify
+ that the code is correct, and what happens if something goes wrong?
+
+ We don't yet have automated code coverage analysis or easy fault injection -
+ but for now, pretend we did and ask what they might tell us.
+
+ Assertions are hugely important, given that we don't yet have a systems
+ language that can do ergonomic embedded correctness proofs. Hitting an assert
+ in testing is much better than wandering off into undefined behaviour la-la
+ land - use them. Use them judiciously, and not as a replacement for proper
+ error handling, but use them.
+
+- Does it need to be performance tested? Should we add new peformance counters?
+
+ bcachefs has a set of persistent runtime counters which can be viewed with
+ the 'bcachefs fs top' command; this should give users a basic idea of what
+ their filesystem is currently doing. If you're doing a new feature or looking
+ at old code, think if anything should be added.
+
+- If it's a new on disk format feature - have upgrades and downgrades been
+ tested? (Automated tests exists but aren't in the CI, due to the hassle of
+ disk image management; coordinate to have them run.)
+
+Mailing list, IRC:
+==================
+
+Patches should hit the list [3], but much discussion and code review happens on
+IRC as well [4]; many people appreciate the more conversational approach and
+quicker feedback.
+
+Additionally, we have a lively user community doing excellent QA work, which
+exists primarily on IRC. Please make use of that resource; user feedback is
+important for any nontrivial feature, and documenting it in commit messages
+would be a good idea.
+
+[0]: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
+[1]: https://evilpiepirate.org/git/ktest.git/
+[2]: https://evilpiepirate.org/~testdashboard/ci/
+[3]: linux-bcachefs@vger.kernel.org
+[4]: irc.oftc.net#bcache, #bcachefs-dev
diff --git a/Documentation/filesystems/bcachefs/index.rst b/Documentation/filesystems/bcachefs/index.rst
index 95fc4b90739e..7db4d7ceab58 100644
--- a/Documentation/filesystems/bcachefs/index.rst
+++ b/Documentation/filesystems/bcachefs/index.rst
@@ -9,4 +9,5 @@ bcachefs Documentation
:numbered:
CodingStyle
+ SubmittingPatches
errorcodes
diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 000000000000..d73dd0dbd238
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,99 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+FUSE-over-io-uring design documentation
+=======================================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through io-uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through io-uring, userspace
+is required to also handle requests through /dev/fuse after io-uring setup
+is complete. Specifically notifications (initiated from the daemon side)
+and interrupts.
+
+Fuse io-uring configuration
+===========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until io-uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the /dev/fuse
+connection file descriptor. Initial submit is with the sub command
+FUSE_URING_REQ_REGISTER, which will just register entries to be
+available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------::
+
+ | | FUSE filesystem daemon
+ | |
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD /
+ | | FUSE_URING_CMD_REGISTER
+ | | [wait cqe]
+ | | >io_uring_wait_cqe() or
+ | | >io_uring_submit_and_wait()
+ | |
+ | >fuse_uring_cmd() |
+ | >fuse_uring_register() |
+
+
+Sending requests with CQEs
+--------------------------::
+
+ | | FUSE filesystem daemon
+ | | [waiting for CQEs]
+ | "rm /mnt/fuse/file" |
+ | |
+ | >sys_unlink() |
+ | >fuse_unlink() |
+ | [allocate request] |
+ | >fuse_send_one() |
+ | ... |
+ | >fuse_uring_queue_fuse_req |
+ | [queue request on fg queue] |
+ | >fuse_uring_add_req_to_ring_ent() |
+ | ... |
+ | >fuse_uring_copy_to_ring() |
+ | >io_uring_cmd_done() |
+ | >request_wait_answer() |
+ | [sleep on req->waitq] |
+ | | [receives and handles CQE]
+ | | [submit result and fetch next]
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD/
+ | | FUSE_URING_CMD_COMMIT_AND_FETCH
+ | >fuse_uring_cmd() |
+ | >fuse_uring_commit_fetch() |
+ | >fuse_uring_commit() |
+ | >fuse_uring_copy_from_ring() |
+ | [ copy the result to the fuse req] |
+ | >fuse_uring_req_end() |
+ | >fuse_request_end() |
+ | [wake up req->waitq] |
+ | >fuse_uring_next_fuse_req |
+ | [wait or handle next req] |
+ | |
+ | [req->waitq woken up] |
+ | <fuse_unlink() |
+ | <sys_unlink() |
+
+
+
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 44e9e77ffe0d..2636f2a41bd3 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -98,6 +98,7 @@ Documentation for filesystem implementations.
hpfs
fuse
fuse-io
+ fuse-io-uring
inotify
isofs
nilfs2
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index f5e3676db954..d20a32b77b60 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -17,7 +17,8 @@ dentry_operations
prototypes::
- int (*d_revalidate)(struct dentry *, unsigned int);
+ int (*d_revalidate)(struct inode *, const struct qstr *,
+ struct dentry *, unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
int (*d_hash)(const struct dentry *, struct qstr *);
int (*d_compare)(const struct dentry *,
@@ -30,6 +31,8 @@ prototypes::
struct vfsmount *(*d_automount)(struct path *path);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
locking rules:
@@ -49,6 +52,8 @@ d_dname: no no no no
d_automount: no no yes no
d_manage: no no yes (ref-walk) maybe
d_real no no yes no
+d_unalias_trylock yes no no no
+d_unalias_unlock yes no no no
================== =========== ======== ============== ========
inode_operations
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index c1c121055204..1639e78e3146 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1141,3 +1141,19 @@ pointer are gone.
set_blocksize() takes opened struct file instead of struct block_device now
and it *must* be opened exclusive.
+
+---
+
+** mandatory**
+
+->d_revalidate() gets two extra arguments - inode of parent directory and
+name our dentry is expected to have. Both are stable (dir is pinned in
+non-RCU case and will stay around during the call in RCU case, and name
+is guaranteed to stay unchanging). Your instance doesn't have to use
+either, but it often helps to avoid a lot of painful boilerplate.
+Note that while name->name is stable and NUL-terminated, it may (and
+often will) have name->name[name->len] equal to '/' rather than '\0' -
+in normal case it points into the pathname being looked up.
+NOTE: if you need something like full path from the root of filesystem,
+you are still on your own - this assists with simple cases, but it's not
+magic.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 0b18af3f954e..31eea688609a 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1251,7 +1251,8 @@ defined:
.. code-block:: c
struct dentry_operations {
- int (*d_revalidate)(struct dentry *, unsigned int);
+ int (*d_revalidate)(struct inode *, const struct qstr *,
+ struct dentry *, unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
int (*d_hash)(const struct dentry *, struct qstr *);
int (*d_compare)(const struct dentry *,
@@ -1264,6 +1265,8 @@ defined:
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
+ bool (*d_unalias_trylock)(const struct dentry *);
+ void (*d_unalias_unlock)(const struct dentry *);
};
``d_revalidate``
@@ -1427,6 +1430,25 @@ defined:
For non-regular files, the 'dentry' argument is returned.
+``d_unalias_trylock``
+ if present, will be called by d_splice_alias() before moving a
+ preexisting attached alias. Returning false prevents __d_move(),
+ making d_splice_alias() fail with -ESTALE.
+
+ Rationale: setting FS_RENAME_DOES_D_MOVE will prevent d_move()
+ and d_exchange() calls from the outside of filesystem methods;
+ however, it does not guarantee that attached dentries won't
+ be renamed or moved by d_splice_alias() finding a preexisting
+ alias for a directory inode. Normally we would not care;
+ however, something that wants to stabilize the entire path to
+ root over a blocking operation might need that. See 9p for one
+ (and hopefully only) example.
+
+``d_unalias_unlock``
+ should be paired with ``d_unalias_trylock``; that one is called after
+ __d_move() call in __d_unalias().
+
+
Each dentry has a pointer to its parent dentry, as well as a hash list
of child dentries. Child dentries are basically like files in a
directory.