Age | Commit message (Collapse) | Author |
|
commit eea2fb4851e9dcbab6b991aaf47e2e024f1f55a0 upstream.
When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.
Previously the mount code removed this directory if it already existed and
created a new one. If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.
While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.
In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.
This patch implements cleaning out any files left in workdir. It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3fe6e52f062643676eb4518d68cee3bc1272091b upstream.
In user namespace the whiteout creation fails with -EPERM because the
current process isn't capable(CAP_SYS_ADMIN) when setting xattr.
A simple reproducer:
$ mkdir upper lower work merged lower/dir
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
$ unshare -m -p -f -U -r bash
Now as root in the user namespace:
\# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
\# rm -fR merged/*
This ends up failing with -EPERM after the files in dir has been
correctly deleted:
unlinkat(4, "2", 0) = 0
unlinkat(4, "1", 0) = 0
unlinkat(4, "3", 0) = 0
close(4) = 0
unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
permitted)
Interestingly, if you don't place files in merged/dir you can remove it,
meaning if upper/dir does not exist, creating the char device file works
properly in that same location.
This patch uses ovl_sb_creator_cred() to get the cred struct from the
superblock mounter and override the old cred with these new ones so that
the whiteout creation is possible because overlay is wrong in assuming that
the creds it will get with prepare_creds will be in the initial user
namespace. The old cap_raise game is removed in favor of just overriding
the old cred struct.
This patch also drops from ovl_copy_up_one() the following two lines:
override_cred->fsuid = stat->uid;
override_cred->fsgid = stat->gid;
This is because the correct uid and gid are taken directly with the stat
struct and correctly set with ovl_set_attr().
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 56656e960b555cb98bc414382566dcb59aae99a2 upstream.
The 'is_merge' is an historical naming from when only a single lower layer
could exist. With the introduction of multiple lower layers the meaning of
this flag was changed to mean only the "lowest layer" (while all lower
layers were being merged).
So now 'is_merge' is inaccurate and hence renaming to 'is_lowest'
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 45aebeaf4f67468f76bedf62923a576a519a9b68 upstream.
In some instances xfs has been created with ftype=0 and there if a file
on lower fs is removed, overlay leaves a whiteout in upper fs but that
whiteout does not get filtered out and is visible to overlayfs users.
And reason it does not get filtered out because upper filesystem does
not report file type of whiteout as DT_CHR during iterate_dir().
So it seems to be a requirement that upper filesystem support d_type for
overlayfs to work properly. Do this check during mount and fail if d_type
is not supported.
Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d796e77f1dd541fe34481af2eee6454688d13982 upstream.
As a writable mount, it is not expected for overlayfs to return
EINVAL/EROFS for fsync, even if dir/file is not changed.
This commit fixes the case of fsync of directory, which is easier to
address, because overlayfs already implements fsync file operation for
directories.
The problem reported by Raphael is that new PostgreSQL 10.0 with a
database in overlayfs where lower layer in squashfs fails to start.
The failure is due to fsync error, when PostgreSQL does fsync on all
existing db directories on startup and a specific directory exists
lower layer with no changes.
Reported-by: Raphael Hertzog <raphael@ouaza.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Raphaël Hertzog <hertzog@debian.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 84889d49335627bc770b32787c1ef9ebad1da232 upstream.
This patch fixes kernel crash at removing directory which contains
whiteouts from lower layers.
Cache of directory content passed as "list" contains entries from all
layers, including whiteouts from lower layers. So, lookup in upper dir
(moved into work at this stage) will return negative entry. Plus this
cache is filled long before and we can race with external removal.
Example:
mkdir -p lower0/dir lower1/dir upper work overlay
touch lower0/dir/a lower0/dir/b
mknod lower1/dir/a c 0 0
mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work
rm -fr overlay/dir
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If jffs2 can deadlock on overlayfs readdir because it takes the same lock
on ->iterate() as in ->lookup().
Fix by moving whiteout checking outside iterate_dir(). Optimized by
collecting potential whiteouts (DT_CHR) in a temporary list and if
non-empty iterating throug these and checking for a 0/0 chardev.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: 49c21e1cacd7 ("ovl: check whiteout while reading directory")
Reported-by: Roman Yeryomin <leroi.lists@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-next
|
|
Since the ovl_dir_cache is stable during a directory reading, the cursor
of struct ovl_dir_file don't need to be an independent entry in the list
of a merged directory.
This patch changes *cursor* to a pointer which points to the entry in the
ovl_dir_cache. After this, we don't need to check *is_cursor* either.
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Not checking whiteouts on lowest layer was an optimization (there's nothing
to white out there), but it could result in inconsitent behavior when a
layer previously used as upper/middle is later used as lowest.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
If multiple lower layers exist, merge them as well in readdir according to
the same rules as merging upper with lower. I.e. take whiteouts and opaque
directories into account on all but the lowers layer.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
OVL_PATH_PURE_UPPER -> __OVL_PATH_UPPER | __OVL_PATH_PURE
OVL_PATH_UPPER -> __OVL_PATH_UPPER
OVL_PATH_MERGE -> __OVL_PATH_UPPER | __OVL_PATH_MERGE
OVL_PATH_LOWER -> 0
Multiple R/O layers will allow __OVL_PATH_MERGE without __OVL_PATH_UPPER.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Don't make a separate pass for checking whiteouts, since we can do it while
reading the upper directory.
This will make it easier to handle multiple layers.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
|
|
Check against !OVL_PATH_LOWER instead of OVL_PATH_MERGE. For a copied up
directory the two are currently equivalent.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Pass dentry into ovl_dir_read_merged() insted of upperpath and lowerpath.
This cleans up callers and paves the way for multi-layer directory reads.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
ovl_cache_put() can be called from ovl_dir_reset() if the cache needs to be
rebuilt. We did list_del() on the cursor, which results in an Oops on the
poisoned pointer in ovl_seek_cursor().
Reported-by: Jordi Pujol Palomer <jordipujolp@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Jordi Pujol Palomer <jordipujolp@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
In an overlay directory that shadows an empty lower directory, say
/mnt/a/empty102, do:
touch /mnt/a/empty102/x
unlink /mnt/a/empty102/x
rmdir /mnt/a/empty102
It's actually harmless, but needs another level of nesting between
I_MUTEX_CHILD and I_MUTEX_NORMAL.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ovl_cache_entry.name is now an array not a pointer, so it makes no sense
test for it being NULL.
Detected by coverity.
From: Miklos Szeredi <mszeredi@suse.cz>
Fixes: 68bf8611076a ("overlayfs: make ovl_cache_entry->name an array instead of
+pointer")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
make sure that
a) all stores done by opening struct file don't leak past storing
the reference in od->upperfile
b) the lockless side has read dependency barrier
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same story...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
no sense having it a pointer - all instances have it pointing to
local variable in the same stack frame
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
just use it to serialize the assignment
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Overlayfs allows one, usually read-write, directory tree to be
overlaid onto another, read-only directory tree. All modifications
go to the upper, writable layer.
This type of mechanism is most often used for live CDs but there's a
wide variety of other uses.
The implementation differs from other "union filesystem"
implementations in that after a file is opened all operations go
directly to the underlying, lower or upper, filesystems. This
simplifies the implementation and allows native performance in these
cases.
The dentry tree is duplicated from the underlying filesystems, this
enables fast cached lookups without adding special support into the
VFS. This uses slightly more memory than union mounts, but dentries
are relatively small.
Currently inodes are duplicated as well, but it is a possible
optimization to share inodes for non-directories.
Opening non directories results in the open forwarded to the
underlying filesystem. This makes the behavior very similar to union
mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
descriptors).
Usage:
mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay
The following cotributions have been folded into this patch:
Neil Brown <neilb@suse.de>:
- minimal remount support
- use correct seek function for directories
- initialise is_real before use
- rename ovl_fill_cache to ovl_dir_read
Felix Fietkau <nbd@openwrt.org>:
- fix a deadlock in ovl_dir_read_merged
- fix a deadlock in ovl_remove_whiteouts
Erez Zadok <ezk@fsl.cs.sunysb.edu>
- fix cleanup after WARN_ON
Sedat Dilek <sedat.dilek@googlemail.com>
- fix up permission to confirm to new API
Robin Dong <hao.bigrat@gmail.com>
- fix possible leak in ovl_new_inode
- create new inode in ovl_link
Andy Whitcroft <apw@canonical.com>
- switch to __inode_permission()
- copy up i_uid/i_gid from the underlying inode
AV:
- ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
- ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
lack of check for udir being the parent of upper, dropping and regaining
the lock on udir (which would require _another_ check for parent being
right).
- bogus d_drop() in copyup and rename [fix from your mail]
- copyup/remove and copyup/rename races [fix from your mail]
- ovl_dir_fsync() leaving ERR_PTR() in ->realfile
- ovl_entry_free() is pointless - it's just a kfree_rcu()
- fold ovl_do_lookup() into ovl_lookup()
- manually assigning ->d_op is wrong. Just use ->s_d_op.
[patches picked from Miklos]:
* copyup/remove and copyup/rename races
* bogus d_drop() in copyup and rename
Also thanks to the following people for testing and reporting bugs:
Jordi Pujol <jordipujolp@gmail.com>
Andy Whitcroft <apw@canonical.com>
Michal Suchanek <hramrach@centrum.cz>
Felix Fietkau <nbd@openwrt.org>
Erez Zadok <ezk@fsl.cs.sunysb.edu>
Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|