summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-01-18XFS: truncate readdir offsets to signed 32 bit valuesChristoph Hellwig
commit 15440319767942a363f282d6585303d3d75088ba upstream. John Stanley reported EOVERFLOW errors in readdir from his self-build glibc. I traced this down to glibc enabling d_off overflow checks in one of the about five million different getdents implementations. In 2.6.28 Dave Woodhouse moved our readdir double buffering required for NFS4 readdirplus into nfsd and at that point we lost the capping of the directory offsets to 32 bit signed values. Johns glibc used getdents64 to even implement readdir for normal 32 bit offset dirents, and failed with EOVERFLOW only if this happens on the first dirent in a getdents call. I managed to come up with a testcase that uses raw getdents and does the EOVERFLOW check manually. We always hit it with our last entry due to the special end of directory marker. The patch below is a dumb version of just putting back the masking, to make sure we have the same behavior as in 2.6.27 and earlier. I will work on a better and cleaner fix for 2.6.30. Reported-by: John Stanley <jpsinthemix@verizon.net> Tested-by: John Stanley <jpsinthemix@verizon.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18eCryptfs: check readlink result was not an error before using itDuane Griffin
commit a17d5232de7b53d34229de79ec22f4bb04adb7e4 upstream. The result from readlink is being used to index into the link name buffer without checking whether it is a valid length. If readlink returns an error this will fault or cause memory corruption. Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: ecryptfs-devel@lists.launchpad.net Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Acked-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18nfs: remove redundant tests on reading new pagesWu Fengguang
commit 136221fc3219b3805c48db5da065e8e3467175d4 upstream. aops->readpages() and its NFS helper readpage_async_filler() will only be called to do readahead I/O for newly allocated pages. So it's not necessary to test for the always 0 dirty/uptodate page flags. The removal of nfs_wb_page() call also fixes a readahead bug: the NFS readahead has been synchronous since 2.6.23, because that call will clear PG_readahead, which is the reminder for asynchronous readahead. More background: the PG_readahead page flag is shared with PG_reclaim, one for read path and the other for write path. clear_page_dirty_for_io() unconditionally clears PG_readahead to prevent possible readahead residuals, assuming itself to be always called in the write path. However, NFS is one and the only exception in that it _always_ calls clear_page_dirty_for_io() in the read path, i.e. for readpages()/readpage(). Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18fix switch_names() breakage in short-to-short caseAl Viro
commit dc711ca35f9d95a1eec02118e0c298b5e3068315 upstream. We want ->name.len to match the resulting name on *both* source and target Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Fix timeouts in sys_pselect7Bernd Schmidt
commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d upstream. Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've seen occasional 5-second hangs from telnet. telnetd calls select with a NULL timeout, but with the new kernel, the system call occasionally returns 0, which causes telnet to call sleep (5). This did not happen with earlier kernels. The code in sys_pselect7 looks a bit strange, in particular the variable "to" is initialized to NULL, then changed if a non-null timeout was passed in, but not used further. It needs to be passed to core_sys_select instead of &end_time. This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51 ("select: switch select() and poll() over to hrtimers"). Signed-off-by: Bernd Schmidt <bernd.schmidt@analog.com> Reviewed-by: Ulrich Drepper <drepper@redhat.com> Tested-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 33Heiko Carstens
commit 2b66421995d2e93c9d1a0111acf2581f8529c6e5 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 32Heiko Carstens
commit d4e82042c4cfa87a7d51710b71f568fe80132551 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 31Heiko Carstens
commit 836f92adf121f806e9beb5b6b88bd5c9c4ea3f24 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 30Heiko Carstens
commit 6559eed8ca7db0531a207cd80be5e28cd6f213c5 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 29Heiko Carstens
commit 2e4d0924eb0c403ce4014fa139d1d61bf2c44fee upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 28Heiko Carstens
commit 938bb9f5e840eddbf54e4f62f6c5ba9b3ae12c9d upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 27Heiko Carstens
commit 1e7bfb2134dfec37ce04fb3a4ca89299e892d10c upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 23Heiko Carstens
commit 5a8a82b1d306a325d899b67715618413657efda4 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 21Heiko Carstens
commit 20f37034fb966a1c35894f9fe529fda0b6440101 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 20Heiko Carstens
commit 3cdad42884bbd95d5aa01297e8236ea1bad70053 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 19Heiko Carstens
commit 003d7ab479168132a2b2c6700fe682b08f08ab0c upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 17Heiko Carstens
commit ca013e945b1ba5828b151ee646946f1297b67a4c upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 16Heiko Carstens
commit 002c8976ee537724b20a5e179d9b349309438836 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 15Heiko Carstens
commit a26eab2400f0477bfac0255600552394855016f7 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 14Heiko Carstens
commit 3480b25743cb7404928d57efeaa3d085708b04c2 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 13Heiko Carstens
commit 6a6160a7b5c27b3c38651baef92a14fa7072b3c1 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 12Heiko Carstens
commit 64fd1de3d821659ac0a3004fd5ee1de59e64af30 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 11Heiko Carstens
commit 257ac264d69017270fbc3cf5536953525db4076c upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 10Heiko Carstens
commit bdc480e3bef6eb0e7071770834cbdda7e30a5436 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 09Heiko Carstens
commit a5f8fa9e9ba5ef3305e147f41ad6e1e84ac1f0bd upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrapper special casesHeiko Carstens
commit 6673e0c3fbeaed2cd08e2fd4a4aa97382d6fedb0 upstream. System calls with an unsigned long long argument can't be converted with the standard wrappers since that would include a cast to long, which in turn means that we would lose the upper 32 bit on 32 bit architectures. Also semctl can't use the standard wrapper since it has a 'union' parameter. So we handle them as special case and add some extra wrappers instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Make sys_pselect7 staticHeiko Carstens
commit c9da9f2129d6a421c32e334a83770a9e67f7feac upstream. Not a single architecture has wired up sys_pselect7 plus it is the only system call with seven parameters. Just make it static and rename it to do_pselect which will do the work for sys_pselect6. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Remove __attribute__((weak)) from sys_pipe/sys_pipe2Heiko Carstens
commit 1134723e96f6e2abcf8bfd7a2d1c96fcc323ef35 upstream. Remove __attribute__((weak)) from common code sys_pipe implemantation. IA64, ALPHA, SUPERH (32bit) and SPARC (32bit) have own implemantations with the same name. Just rename them. For sys_pipe2 there is no architecture specific implementation. Cc: Richard Henderson <rth@twiddle.net> Cc: David S. Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Rename old_readdir to sys_old_readdirHeiko Carstens
commit e55380edf68796d75bf41391a781c68ee678587d upstream. This way it matches the generic system call name convention. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18Convert all system calls to return a longHeiko Carstens
commit 2ed7c03ec17779afb4fcfa3b8c61df61bd4879ba upstream. Convert all system calls to return a long. This should be a NOP since all converted types should have the same size anyway. With the exception of sys_exit_group which returned void. But that doesn't matter since the system call doesn't return. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18inotify: fix type errors in interfacesMichael Kerrisk
commit 4ae8978cf92a96257cd8998a49e781be83571d64 upstream. The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes. For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is currently 'u32', it should be '__s32' . That is Robert's suggestion, and is consistent with the other declarations of watch descriptors in the kernel source, in particular, the inotify_event structure in include/linux/inotify.h: struct inotify_event { __s32 wd; /* watch descriptor */ __u32 mask; /* watch mask */ __u32 cookie; /* cookie to synchronize two events */ __u32 len; /* length (including nulls) of name */ char name[0]; /* stub for possible name */ }; The patch makes the changes needed for inotify_rm_watch(). Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Robert Love <rlove@google.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18minix: fix add link's wrong position calculationEvgeniy Dushistov
commit d6b54841f4ddd836c886d1e6ac381cf309ee98a3 upstream. Fix the add link method. The oosition in the directory was calculated in wrong way - it had the incorrect shift direction. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18fs: symlink write_begin allocation context fixNick Piggin
commit 54566b2c1594c2326a645a3551f9d989f7ba3c5e upstream. With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18CIFS: make sure that DFS pathnames are properly formedSteve French
commit c6fbba0546d3ead18d4a623e76e28bcbaa66a325 upstream. The paths in a DFS request are supposed to only have a single preceding backslash, but we are sending them with a double backslash. This is exposing a bug in Windows where it also sends a path in the response that has a double backslash. The existing code that builds the mount option string however expects a double backslash prefix in a couple of places when it tries to use the path returned by build_path_from_dentry. Fix compose_mount_options to expect properly formed DFS paths (single backslash at front). Also clean up error handling in that function. There was a possible NULL pointer dereference and situations where a partially built option string would be returned. Tested against Samba 3.0.28-ish server and Samba 3.3 and Win2k8. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-19fs/9p: change simple_strtol to simple_strtoulJulia Lawall
Since v9ses->uid is unsigned, it would seem better to use simple_strtoul that simple_strtol. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r2@ long e; position p; @@ e = simple_strtol@p(...) @@ position p != r2.p; type T; T e; @@ e = - simple_strtol@p + simple_strtoul (...) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-12-199p: convert d_iname references to d_name.nameWu Fengguang
d_iname is rubbish for long file names. Use d_name.name in printks instead. Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-12-199p: Remove potentially bad parameter from function entry debug print.Duane Griffin
Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-12-17Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Add JBD2 compat feature bit. ocfs2: Always update xattr search when creating bucket.
2008-12-17cifs: fix buffer overrun in parse_DFS_referralsJeff Layton
While testing a kernel with memory poisoning enabled, I saw some warnings about the redzone getting clobbered when chasing DFS referrals. The buffer allocation for the unicode converted version of the searchName is too small and needs to take null termination into account. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-16ocfs2: Add JBD2 compat feature bit.Joel Becker
Define the OCFS2_FEATURE_COMPAT_JBD2 bit in the filesystem header. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-16ocfs2: Always update xattr search when creating bucket.Tao Ma
When we create xattr bucket during the process of xattr set, we always need to update the ocfs2_xattr_search since even if the bucket size is the same as block size, the offset will change because of the removal of the ocfs2_xattr_block header. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-10Merge branch 'to-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland * 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland: tracehook: exec double-reporting fix
2008-12-10KSYM_SYMBOL_LEN fixesHugh Dickins
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10inotify: fix IN_ONESHOT unmount event watcherDmitri Monakhov
On umount two event will be dispatched to watcher: 1: inotify_dev_queue_event(.., IN_UNMOUNT,..) 2: remove_watch(watch, dev) ->inotify_dev_queue_event(.., IN_IGNORED, ..) But if watcher has IN_ONESHOT bit set then the watcher will be released inside first event. Which result in accessing invalid object later. IMHO it is not pure regression. This bug wasn't triggered while initial inotify interface testing phase because of another bug in IN_ONESHOT handling logic :) commit ac74c00e499ed276a965e5b5600667d5dc04a84a Author: Ulisses Furquim <ulissesf@gmail.com> Date: Fri Feb 8 04:18:16 2008 -0800 inotify: fix check for one-shot watches before destroying them As the IN_ONESHOT bit is never set when an event is sent we must check it in the watch's mask and not in the event's mask. TESTCASE: mkdir mnt mount -ttmpfs none mnt mkdir mnt/d ./inotify mnt/d& umount mnt ## << lockup or crash here TESTSOURCE: /* gcc -oinotify inotify.c */ #include <stdio.h> #include <stdlib.h> #include <sys/inotify.h> int main(int argc, char **argv) { char buf[1024]; struct inotify_event *ie; char *p; int i; ssize_t l; p = argv[1]; i = inotify_init(); inotify_add_watch(i, p, ~0); l = read(i, buf, sizeof(buf)); printf("read %d bytes\n", l); ie = (struct inotify_event *) buf; printf("event mask: %d\n", ie->mask); return 0; } Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Ulisses Furquim <ulissesf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10pagemap: fix 32-bit pagemap regressionMatt Mackall
The large pages fix from bcf8039ed45 broke 32-bit pagemap by pulling the pagemap entry code out into a function with the wrong return type. Pagemap entries are 64 bits on all systems and unsigned long is only 32 bits on 32-bit systems. Signed-off-by: Matt Mackall <mpm@selenic.com> Reported-by: Doug Graham <dgraham@nortel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10revert "percpu_counter: new function percpu_counter_sum_and_set"Andrew Morton
Revert commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 11 19:27:31 2008 -0400 percpu_counter: new function percpu_counter_sum_and_set As described in revert "percpu counter: clean up percpu_counter_sum_and_set()" the new percpu_counter_sum_and_set() is racy against updates to the cpu-local accumulators on other CPUs. Revert that change. This means that ext4 will be slow again. But correct. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10revert "percpu counter: clean up percpu_counter_sum_and_set()"Andrew Morton
Revert commit 1f7c14c62ce63805f9574664a6c6de3633d4a354 Author: Mingming Cao <cmm@us.ibm.com> Date: Thu Oct 9 12:50:59 2008 -0400 percpu counter: clean up percpu_counter_sum_and_set() Before this patch we had the following: percpu_counter_sum(): return the percpu_counter's value percpu_counter_sum_and_set(): return the percpu_counter's value, copying that value into the central value and zeroing the per-cpu counters before returning. After this patch, percpu_counter_sum_and_set() has gone, and percpu_counter_sum() gets the old percpu_counter_sum_and_set() functionality. Problem is, as Eric points out, the old percpu_counter_sum_and_set() functionality was racy and wrong. It zeroes out counters on "other" cpus, without holding any locks which will prevent races agaist updates from those other CPUS. This patch reverts 1f7c14c62ce63805f9574664a6c6de3633d4a354. This means that percpu_counter_sum_and_set() still has the race, but percpu_counter_sum() does not. Note that this is not a simple revert - ext4 has since started using percpu_counter_sum() for its dirty_blocks counter as well. Note that this revert patch changes percpu_counter_sum() semantics. Before the patch, a call to percpu_counter_sum() will bring the counter's central counter mostly up-to-date, so a following percpu_counter_read() will return a close value. After this patch, a call to percpu_counter_sum() will leave the counter's central accumulator unaltered, so a subsequent call to percpu_counter_read() can now return a significantly inaccurate result. If there is any code in the tree which was introduced after e8ced39d5e8911c662d4d69a342b9d053eaaac4e was merged, and which depends upon the new percpu_counter_sum() semantics, that code will break. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-09tracehook: exec double-reporting fixRoland McGrath
The patch 6341c39 "tracehook: exec" introduced a small regression in 2.6.27 regarding binfmt_misc exec event reporting. Since the reporting is now done in the common search_binary_handler() function, an exec of a misc binary will result in two (or possibly multiple) exec events being reported, instead of just a single one, because the misc handler contains a recursive call to search_binary_handler. To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple SIGTRAP signals will in fact cause only a single ptrace intercept, as the signals are not queued. However, if PTRACE_O_TRACEEXEC is on, the debugger will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC). The test program included below demonstrates the problem. This change fixes the bug by calling tracehook_report_exec() only in the outermost search_binary_handler() call (bprm->recursion_depth == 0). The additional change to restore bprm->recursion_depth after each binfmt load_binary call is actually superfluous for this bug, since we test the value saved on entry to search_binary_handler(). But it keeps the use of of the depth count to its most obvious expected meaning. Depending on what binfmt handlers do in certain cases, there could have been false-positive tests for recursion limits before this change. /* Test program using PTRACE_O_TRACEEXEC. This forks and exec's the first argument with the rest of the arguments, while ptrace'ing. It expects to see one PTRACE_EVENT_EXEC stop and then a successful exit, with no other signals or events in between. Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec: $ gcc -g traceexec.c -o traceexec $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register' $ echo 'foobar test' > ./foobar $ chmod +x ./foobar $ ./traceexec ./foobar; echo $? ==> good <== foobar test 0 $ ==> bad <== foobar test unexpected status 0x4057f != 0 3 $ */ #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <unistd.h> #include <signal.h> #include <stdlib.h> static void wait_for (pid_t child, int expect) { int status; pid_t p = wait (&status); if (p != child) { perror ("wait"); exit (2); } if (status != expect) { fprintf (stderr, "unexpected status %#x != %#x\n", status, expect); exit (3); } } int main (int argc, char **argv) { pid_t child = fork (); if (child < 0) { perror ("fork"); return 127; } else if (child == 0) { ptrace (PTRACE_TRACEME); raise (SIGUSR1); execv (argv[1], &argv[1]); perror ("execve"); _exit (127); } wait_for (child, W_STOPCODE (SIGUSR1)); if (ptrace (PTRACE_SETOPTIONS, child, 0L, (void *) (long) PTRACE_O_TRACEEXEC) != 0) { perror ("PTRACE_SETOPTIONS"); return 4; } if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 5; } wait_for (child, W_STOPCODE (SIGTRAP | (PTRACE_EVENT_EXEC << 8))); if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 6; } wait_for (child, W_EXITCODE (0, 0)); return 0; } Reported-by: Arnd Bergmann <arnd@arndb.de> CC: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>
2008-12-08EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent()J. Bruce Fields
While 440037287c5 "[PATCH] switch all filesystems over to d_obtain_alias" removed some cases where fh_to_dentry() and fh_to_parent() could return NULL, there are still a few NULL returns left in individual filesystems. Thus it was a mistake for that commit to remove the handling of NULL returns in the callers. Revert those parts of 440037287c5 which removed the NULL handling. (We could, alternatively, modify all implementations to return -ESTALE instead of NULL, but that proves to require fixing a number of filesystems, and in some cases it's arguably more natural to return NULL.) Thanks to David for original patch and Linus, Christoph, and Hugh for review. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-05Fix a race condition in FASYNC handlingJonathan Corbet
Changeset a238b790d5f99c7832f9b73ac8847025815b85f7 (Call fasync() functions without the BKL) introduced a race which could leave file->f_flags in a state inconsistent with what the underlying driver/filesystem believes. Revert that change, and also fix the same races in ioctl_fioasync() and ioctl_fionbio(). This is a minimal, short-term fix; the real fix will not involve the BKL. Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>