summaryrefslogtreecommitdiff
path: root/fs/namei.c
AgeCommit message (Collapse)Author
2020-03-11namei: only return -ECHILD from follow_dotdot_rcu()Aleksa Sarai
commit 2b98149c2377bff12be5dd3ce02ae0506e2dd613 upstream. It's over-zealous to return hard errors under RCU-walk here, given that a REF-walk will be triggered for all other cases handling ".." under RCU. The original purpose of this check was to ensure that if a rename occurs such that a directory is moved outside of the bind-mount which the resolution started in, it would be detected and blocked to avoid being able to mess with paths outside of the bind-mount. However, triggering a new REF-walk is just as effective a solution. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-05vfs: fix do_last() regressionAl Viro
commit 6404674acd596de41fd3ad5f267b4525494a891a upstream. Brown paperbag time: fetching ->i_uid/->i_mode really should've been done from nd->inode. I even suggested that, but the reason for that has slipped through the cracks and I went for dir->d_inode instead - made for more "obvious" patch. Analysis: - at the entry into do_last() and all the way to step_into(): dir (aka nd->path.dentry) is known not to have been freed; so's nd->inode and it's equal to dir->d_inode unless we are already doomed to -ECHILD. inode of the file to get opened is not known. - after step_into(): inode of the file to get opened is known; dir might be pointing to freed memory/be negative/etc. - at the call of may_create_in_sticky(): guaranteed to be out of RCU mode; inode of the file to get opened is known and pinned; dir might be garbage. The last was the reason for the original patch. Except that at the do_last() entry we can be in RCU mode and it is possible that nd->path.dentry->d_inode has already changed under us. In that case we are going to fail with -ECHILD, but we need to be careful; nd->inode is pointing to valid struct inode and it's the same as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we should use that. Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29do_last(): fetch directory ->i_mode and ->i_uid before it's too lateAl Viro
commit d0cb50185ae942b03c4327be322055d622dc79f6 upstream. may_create_in_sticky() call is done when we already have dropped the reference to dir. Fixes: 30aba6656f61e (namei: allow restricted O_CREAT of FIFOs and regular files) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01namei: allow restricted O_CREAT of FIFOs and regular filesSalvatore Mesoraca
commit 30aba6656f61ed44cba445a3c0d38b296fa9e8f5 upstream. Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag. The purpose is to make data spoofing attacks harder. This protection can be turned on and off separately for FIFOs and regular files via sysctl, just like the symlinks/hardlinks protection. This patch is based on Openwall's "HARDEN_FIFO" feature by Solar Designer. This is a brief list of old vulnerabilities that could have been prevented by this feature, some of them even allow for privilege escalation: CVE-2000-1134 CVE-2007-3852 CVE-2008-0525 CVE-2009-0416 CVE-2011-4834 CVE-2015-1838 CVE-2015-7442 CVE-2016-7489 This list is not meant to be complete. It's difficult to track down all vulnerabilities of this kind because they were often reported without any mention of this particular attack vector. In fact, before hardlinks/symlinks restrictions, fifos/regular files weren't the favorite vehicle to exploit them. [s.mesoraca16@gmail.com: fix bug reported by Dan Carpenter] Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gmail.com [keescook@chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future] [keescook@chromium.org: adjust commit subjet] Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Solar Designer <solar@openwall.com> Suggested-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Loic <hackurx@opensec.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24getname_kernel() needs to make sure that ->name != ->iname in long caseAl Viro
commit 30ce4d1903e1d8a7ccd110860a5eef3c638ed8be upstream. missed it in "kill struct filename.separate" several years ago. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22fs: Teach path_connected to handle nfs filesystems with multiple roots.Eric W. Biederman
commit 95dd77580ccd66a0da96e6d4696945b8cea39431 upstream. On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22vfs: don't do RCU lookup of empty pathnamesLinus Torvalds
commit c0eb027e5aef70b71e5a38ee3e264dc0b497f343 upstream. Normal pathname lookup doesn't allow empty pathnames, but using AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you can trigger an empty pathname lookup. And not only is the RCU lookup in that case entirely unnecessary (because we'll obviously immediately finalize the end result), it is actively wrong. Why? An empth path is a special case that will return the original 'dirfd' dentry - and that dentry may not actually be RCU-free'd, resulting in a potential use-after-free if we were to initialize the path lazily under the RCU read lock and depend on complete_walk() finalizing the dentry. Found by syzkaller and KASAN. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-06dentry name snapshotsAl Viro
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream. take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15fs: Check for invalid i_uid in may_follow_link()Seth Forshee
[ Upstream commit 2d7f9e2ad35e4e7a3086231f19bfab33c6a8a64a ] Filesystem uids which don't map into a user namespace may result in inode->i_uid being INVALID_UID. A symlink and its parent could have different owners in the filesystem can both get mapped to INVALID_UID, which may result in following a symlink when this would not have otherwise been permitted when protected symlinks are enabled. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18atomic_open(): fix the handling of create_errorAl Viro
commit 10c64cea04d3c75c306b3f990586ffb343b63287 upstream. * if we have a hashed negative dentry and either CREAT|EXCL on r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL with failing may_o_create(), we should fail with EROFS or the error may_o_create() has returned, but not ENOENT. Which is what the current code ends up returning. * if we have CREAT|TRUNC hitting a regular file on a read-only filesystem, we can't fail with EROFS here. At the very least, not until we'd done follow_managed() - we might have a writable file (or a device, for that matter) bound on top of that one. Moreover, the code downstream will see that O_TRUNC and attempt to grab the write access (*after* following possible mount), so if we really should fail with EROFS, it will happen. No need to do that inside atomic_open(). The real logics is much simpler than what the current code is trying to do - if we decided to go for simple lookup, ended up with a negative dentry *and* had create_error set, fail with create_error. No matter whether we'd got that negative dentry from lookup_real() or had found it in dcache. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18vfs: rename: check backing inode being equalMiklos Szeredi
commit 9409e22acdfc9153f88d9b1ed2bd2a5b34d2d3ca upstream. If a file is renamed to a hardlink of itself POSIX specifies that rename(2) should do nothing and return success. This condition is checked in vfs_rename(). However it won't detect hard links on overlayfs where these are given separate inodes on the overlayfs layer. Overlayfs itself detects this condition and returns success without doing anything, but then vfs_rename() will proceed as if this was a successful rename (detach_mounts(), d_move()). The correct thing to do is to detect this condition before even calling into overlayfs. This patch does this by calling vfs_select_inode() to get the underlying inodes. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03do_last(): ELOOP failure exit should be done after leaving RCU modeAl Viro
commit 5129fa482b16615fd4464d2f5d23acb1b7056c66 upstream. ... or we risk seeing a bogus value of d_is_symlink() there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03should_follow_link(): validate ->d_seq after having decided to followAl Viro
commit a7f775428b8f5808815c0e3004020cedb94cbe3b upstream. ... otherwise d_is_symlink() above might have nothing to do with the inode value we've got. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03do_last(): don't let a bogus return value from ->open() et.al. to confuse usAl Viro
commit c80567c82ae4814a41287618e315a60ecf513be6 upstream. ... into returning a positive to path_openat(), which would interpret that as "symlink had been encountered" and proceed to corrupt memory, etc. It can only happen due to a bug in some ->open() instance or in some LSM hook, etc., so we report any such event *and* make sure it doesn't trick us into further unpleasantness. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03namei: ->d_inode of a pinned dentry is stable only for positivesAl Viro
commit d4565649b6d6923369112758212b851adc407f0c upstream. both do_last() and walk_component() risk picking a NULL inode out of dentry about to become positive, *then* checking its flags and seeing that it's not negative anymore and using (already stale by then) value they'd fetched earlier. Usually ends up oopsing soon after that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-06Don't reset ->total_link_count on nested calls of vfs_path_lookup()Al Viro
we already zero it on outermost set_nameidata(), so initialization in path_init() is pointless and wrong. The same DoS exists on pre-4.2 kernels, but there a slightly different fix will be needed. Cc: stable@vger.kernel.org # v4.2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...
2015-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions
2015-11-06mm, fs: introduce mapping_gfp_constraint()Michal Hocko
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns hardlink capability check fix from Eric Biederman: "This round just contains a single patch. There has been a lot of other work this period but it is not quite ready yet, so I am pushing it until 4.5. The remaining change by Dirk Steinmetz wich fixes both Gentoo and Ubuntu containers allows hardlinks if we have the appropriate capabilities in the user namespace. Security wise it is really a gimme as the user namespace root can already call setuid become that user and create the hardlink" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: namei: permit linking with CAP_FOWNER in userns
2015-10-27namei: permit linking with CAP_FOWNER in usernsDirk Steinmetz
Attempting to hardlink to an unsafe file (e.g. a setuid binary) from within an unprivileged user namespace fails, even if CAP_FOWNER is held within the namespace. This may cause various failures, such as a gentoo installation within a lxc container failing to build and install specific packages. This change permits hardlinking of files owned by mapped uids, if CAP_FOWNER is held for that namespace. Furthermore, it improves consistency by using the existing inode_owner_or_capable(), which is aware of namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid"). Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com> This is hitting us in Ubuntu during some dpkg upgrades in containers. When upgrading a file dpkg creates a hard link to the old file to back it up before overwriting it. When packages upgrade suid files owned by a non-root user the link isn't permitted, and the package upgrade fails. This patch fixes our problem. Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-10-10namei: results of d_is_negative() should be checked after dentry revalidationTrond Myklebust
Leandro Awa writes: "After switching to version 4.1.6, our parallelized and distributed workflows now fail consistently with errors of the form: T34: ./regex.c:39:22: error: config.h: No such file or directory From our 'git bisect' testing, the following commit appears to be the possible cause of the behavior we've been seeing: commit 766c4cbfacd8" Al Viro says: "What happens is that 766c4cbfacd8 got the things subtly wrong. We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT". That was wrong - checking ->d_flags outside of ->d_seq protection is unreliable and failing with hard error on what should've fallen back to non-RCU pathname resolution is a bug. Unfortunately, we'd pulled the test too far up and ran afoul of another kind of staleness. The dentry might have been absolutely stable from the RCU point of view (and we might be on UP, etc), but stale from the remote fs point of view. If ->d_revalidate() returns "it's actually stale", dentry gets thrown away and the original code wouldn't even have looked at its ->d_flags. What we need is to check ->d_flags where 766c4cbfacd8 does (prior to ->d_seq validation) but only use the result in cases where we do not discard this dentry outright" Reported-by: Leandro Awa <lawa@nvidia.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911 Fixes: 766c4cbfacd8 ("namei: d_is_negative() should be checked...") Tested-by: Leandro Awa <lawa@nvidia.com> Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-29fs: Drop unlikely before IS_ERR(_OR_NULL)Viresh Kumar
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-10namei: fix warning while make xmldocs caused by namei.cMasanari Iida
Fix the following warnings: Warning(.//fs/namei.c:2422): No description found for parameter 'nd' Warning(.//fs/namei.c:2422): Excess function parameter 'nameidata' description in 'path_mountpoint' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-21vfs: Test for and handle paths that are unreachable from their mnt_rootEric W. Biederman
In rare cases a directory can be renamed out from under a bind mount. In those cases without special handling it becomes possible to walk up the directory tree to the root dentry of the filesystem and down from the root dentry to every other file or directory on the filesystem. Like division by zero .. from an unconnected path can not be given a useful semantic as there is no predicting at which path component the code will realize it is unconnected. We certainly can not match the current behavior as the current behavior is a security hole. Therefore when encounting .. when following an unconnected path return -ENOENT. - Add a function path_connected to verify path->dentry is reachable from path->mnt.mnt_root. AKA to validate that rename did not do something nasty to the bind mount. To avoid races path_connected must be called after following a path component to it's next path component. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-04may_follow_link() should use nd->inodeAl Viro
Now that we can get there in RCU mode, we shouldn't play with nd->path.dentry->d_inode - it's not guaranteed to be stable. Use nd->inode instead. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-01link_path_walk(): be careful when failing with ENOTDIRAl Viro
In RCU mode we might end up with dentry evicted just we check that it's a directory. In such case we should return ECHILD rather than ENOTDIR, so that pathwalk would be retries in non-RCU mode. Breakage had been introduced in commit b18825a - prior to that we were looking at nd->inode, which had been fetched before verifying that ->d_seq was still valid. That form of check would only be satisfied if at some point the pathname prefix would indeed have resolved to a non-directory. The fix consists of checking ->d_seq after we'd run into a non-directory dentry, and failing with ECHILD in case of mismatch. Note that all branches since 3.12 have that problem... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-29namei: make set_root_rcu() return voidAl Viro
The only caller that cares about its return value can just as easily pick it from nd->root_seq itself. We used to just calculate it and return to caller, but these days we are storing it in nd->root_seq in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15turn user_{path_at,path,lpath,path_dir}() into static inlinesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move saved_nd pointer into struct nameidataAl Viro
these guys are always declared next to each other; might as well put the former (pointer to previous instance) into the latter and simplify the calling conventions for {set,restore}_nameidata() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_create()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_parent()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: trim do_last() argumentsAl Viro
now that struct filename is stashed in nameidata we have no need to pass it in Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: stash dfd and name into nameidataAl Viro
fewer arguments to pass around... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: fold path_cleanup() into terminate_walk()Al Viro
they are always called next to each other; moreover, terminate_walk() is more symmetrical that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: saner calling conventions for filename_parentat()Al Viro
a) make it reject ERR_PTR() for name b) make it putname(name) on all other failure exits c) make it return name on success again, simplifies the callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: saner calling conventions for filename_create()Al Viro
a) make it reject ERR_PTR() for name b) make it putname(name) upon return in all other cases. seriously simplifies the callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: shift nameidata down into filename_parentat()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: make filename_lookup() reject ERR_PTR() passed as nameAl Viro
makes for much easier life in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: shift nameidata inside filename_lookup()Al Viro
pass root instead; non-NULL => copy to nd.root and set LOOKUP_ROOT in flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move putname() call into filename_lookup()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: pass the struct path to store the result down into path_lookupat()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: uninline set_root{,_rcu}()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: be careful with mountpoint crossings in follow_dotdot_rcu()Al Viro
Otherwise we are risking a hard error where nonlazy restart would be the right thing to do; it's a very narrow race with mount --move and most of the time it ends up being completely harmless, but it's possible to construct a case when we'll get a bogus hard error instead of falling back to non-lazy walk... For one thing, when crossing _into_ overmount of parent we need to check for mount_lock bumps when we get NULL from __lookup_mnt() as well. For another, and less exotically, we need to make sure that the data fetched in follow_up_rcu() had been consistent. ->mnt_mountpoint is pinned for as long as it is a mountpoint, but we need to check mount_lock after fetching to verify that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: unlazy_walk() doesn't need to mess with current->fs anymoreAl Viro
now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq) will do just fine... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: handle absolute symlinks without dropping out of RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15enable passing fast relative symlinks without dropping out of RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15VFS/namei: make the use of touch_atime() in get_link() RCU-safe.NeilBrown
touch_atime is not RCU-safe, and so cannot be called on an RCU walk. However, in situations where RCU-walk makes a difference, the symlink will likely to accessed much more often than it is useful to update the atime. So split out the test of "Does the atime actually need to be updated" into atime_needs_update(), and have get_link() unlazy if it finds that it will need to do that update. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: don't unlazy until get_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_linkAl Viro
We are almost done - primitives for leaving RCU mode are aware of nd->stack now, a new primitive for going to non-RCU mode when we have a symlink on hands added. The thing we are heavily relying upon is that *any* unlazy failure will be shortly followed by terminate_walk(), with no access to nameidata in between. So it's enough to leave the things in a state terminate_walk() would cope with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>