From e84009336711d2bba885fc9cea66348ddfce3758 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 31 Oct 2019 01:43:31 -0400 Subject: fix dget_parent() fastpath race We are overoptimistic about taking the fast path there; seeing the same value in ->d_parent after having grabbed a reference to that parent does *not* mean that it has remained our parent all along. That wouldn't be a big deal (in the end it is our parent and we have grabbed the reference we are about to return), but... the situation with barriers is messed up. We might have hit the following sequence: d is a dentry of /tmp/a/b CPU1: CPU2: parent = d->d_parent (i.e. dentry of /tmp/a) rename /tmp/a/b to /tmp/b rmdir /tmp/a, making its dentry negative grab reference to parent, end up with cached parent->d_inode (NULL) mkdir /tmp/a, rename /tmp/b to /tmp/a/b recheck d->d_parent, which is back to original decide that everything's fine and return the reference we'd got. The trouble is, caller (on CPU1) will observe dget_parent() returning an apparently negative dentry. It actually is positive, but CPU1 has stale ->d_inode cached. Use d->d_seq to see if it has been moved instead of rechecking ->d_parent. NOTE: we are *NOT* going to retry on any kind of ->d_seq mismatch; we just go into the slow path in such case. We don't wait for ->d_seq to become even either - again, if we are racing with renames, we can bloody well go to slow path anyway. Signed-off-by: Al Viro --- fs/dcache.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index e88cf0554e65..b2a7f1765f0b 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -903,17 +903,19 @@ struct dentry *dget_parent(struct dentry *dentry) { int gotref; struct dentry *ret; + unsigned seq; /* * Do optimistic parent lookup without any * locking. */ rcu_read_lock(); + seq = raw_seqcount_begin(&dentry->d_seq); ret = READ_ONCE(dentry->d_parent); gotref = lockref_get_not_zero(&ret->d_lockref); rcu_read_unlock(); if (likely(gotref)) { - if (likely(ret == READ_ONCE(dentry->d_parent))) + if (!read_seqcount_retry(&dentry->d_seq, seq)) return ret; dput(ret); } -- cgit v1.2.3 From 2fa6b1e01a9b1a54769c394f06cd72c3d12a2d48 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 12 Nov 2019 16:13:06 -0500 Subject: fs/namei.c: fix missing barriers when checking positivity Pinned negative dentries can, generally, be made positive by another thread. Conditions that prevent that are * ->d_lock on dentry in question * parent directory held at least shared * nobody else could have observed the address of dentry Most of the places working with those fall into one of those categories; however, d_lookup() and friends need to be used with some care. Fortunately, there's not a lot of call sites, and with few exceptions all of those fall under one of the cases above. Exceptions are all in fs/namei.c - in lookup_fast(), lookup_dcache() and mountpoint_last(). Another one is lookup_slow() - there dcache lookup is done with parent held shared, but the result is used after we'd drop the lock. The same happens in do_last() - the lookup (in lookup_one()) is done with parent locked, but result is used after unlocking. lookup_fast(), do_last() and mountpoint_last() flat-out reject negatives. Most of lookup_dcache() calls are made with parent locked at least shared; the only exception is lookup_one_len_unlocked(). It might return pinned negative, needs serious care from callers. Fortunately, almost nobody calls it directly anymore; all but two callers have converted to lookup_positive_unlocked(), which rejects negatives. lookup_slow() is called by the same lookup_one_len_unlocked() (see above), mountpoint_last() and walk_component(). In those two negatives are rejected. In other words, there is a small set of places where we need to check carefully if a pinned potentially negative dentry is, in fact, positive. After that check we want to be sure that both ->d_inode and type bits in ->d_flags are stable and observed. The set consists of follow_managed() (where the rejection happens for lookup_fast(), walk_component() and do_last()), last_mountpoint() and lookup_positive_unlocked(). Solution: 1) transition from negative to positive (in __d_set_inode_and_type()) stores ->d_inode, then uses smp_store_release() to set ->d_flags type bits. 2) aforementioned 3 places in fs/namei.c fetch ->d_flags with smp_load_acquire() and bugger off if it type bits say "negative". That way anyone downstream of those checks has dentry know positive pinned, with ->d_inode and type bits of ->d_flags stable and observed. I considered splitting off d_lookup_positive(), so that the checks could be done right there, under ->d_lock. However, that leads to massive duplication of rather subtle code in fs/namei.c and fs/dcache.c. It's worse than it might seem, thanks to autofs ->d_manage() getting involved ;-/ No matter what, autofs_d_manage()/autofs_d_automount() must live with the possibility of pinned negative dentry passed their way, becoming positive under them - that's the intended behaviour when lookup comes in the middle of automount in progress, so we can't keep them out of the area that has to deal with those, more's the pity... Reported-by: Ritesh Harjani Signed-off-by: Al Viro --- fs/dcache.c | 2 +- fs/namei.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index b2a7f1765f0b..a6d6b5f95f62 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -319,7 +319,7 @@ static inline void __d_set_inode_and_type(struct dentry *dentry, flags = READ_ONCE(dentry->d_flags); flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); flags |= type_flags; - WRITE_ONCE(dentry->d_flags, flags); + smp_store_release(&dentry->d_flags, flags); } static inline void __d_clear_type_and_inode(struct dentry *dentry) diff --git a/fs/namei.c b/fs/namei.c index 6f72fb7ef5ad..117950657e63 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1220,7 +1220,7 @@ static int follow_managed(struct path *path, struct nameidata *nd) /* Given that we're not holding a lock here, we retain the value in a * local variable for each dentry as we look at it so that we don't see * the components of that value change under us */ - while (flags = READ_ONCE(path->dentry->d_flags), + while (flags = smp_load_acquire(&path->dentry->d_flags), unlikely(flags & DCACHE_MANAGED_DENTRY)) { /* Allow the filesystem to manage the transit without i_mutex * being held. */ @@ -2569,7 +2569,7 @@ struct dentry *lookup_positive_unlocked(const char *name, struct dentry *base, int len) { struct dentry *ret = lookup_one_len_unlocked(name, base, len); - if (!IS_ERR(ret) && d_is_negative(ret)) { + if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) { dput(ret); ret = ERR_PTR(-ENOENT); } @@ -2671,7 +2671,7 @@ mountpoint_last(struct nameidata *nd) return PTR_ERR(path.dentry); } } - if (d_is_negative(path.dentry)) { + if (d_flags_negative(smp_load_acquire(&path.dentry->d_flags))) { dput(path.dentry); return -ENOENT; } -- cgit v1.2.3