summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-09-10fat: permit to return phy block number by fibmap in fallocated regionNamjae Jeon
Make the fibmap call the return the proper physical block number for any offset request in the fallocated range. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fat: fallback to buffered write in case of fallocated region on direct IONamjae Jeon
For normal cases of direct IO write, trying to seek to location greater than file size, makes it fall back to buffered write to fill that region. Similarly, in case for write in Fallocated region, make it fall to buffered write. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fat: zero out seek range on _fat_get_blockNamjae Jeon
For normal buffered write operations, normally if we try to write to an offset > than file size, it does a cont_expand_zero till that offset. Now, in case of fallocated regions, since the blocks are already allocated. So, make it zero out that buffers for those blocks till the seek'ed offset. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fat: add fat_fallocate operationNamjae Jeon
Implement preallocation via the fallocate syscall on VFAT partitions. This patch is based on an earlier patch of the same name which had some issues detailed below and did not get accepted. Refer https://lkml.org/lkml/2007/12/22/130. a) The preallocated space was not persistent when the FALLOC_FL_KEEP_SIZE flag was set. It will deallocate cluster at evict time. b) There was no need to zero out the clusters when the flag was set Instead of doing an expanding truncate, just allocate clusters and add them to the fat chain. This reduces preallocation time. Compatibility with windows: There are no issues when FALLOC_FL_KEEP_SIZE is not set because it just does an expanding truncate. Thus reading from the preallocated area on windows returns null until data is written to it. When a file with preallocated area using the FALLOC_FL_KEEP_SIZE was written to on windows, the windows driver freed-up the preallocated clusters and allocated new clusters for the new data. The freed up clusters gets reflected in the free space available for the partition which can be seen from the Volume properties. The windows chkdsk tool also does not report any errors on a disk containing files with preallocated space. And there is also no issue using linux fat fsck. because discard preallocated clusters at repair time. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fat: add i_disksize to represent uninitialized sizeNamjae Jeon
Add i_disksize to represent uninitialized allocated size. And mmu_private represent initialized allocated size. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10Merge branch 'akpm-current/current'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'userns/for-next'Stephen Rothwell
Conflicts: fs/nfs/client.c
2014-09-10Merge remote-tracking branch 'percpu/for-next'Stephen Rothwell
Conflicts: fs/ext4/super.c
2014-09-10Merge remote-tracking branch 'rcu/rcu/next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'tip/auto-latest'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'trivial/for-next'Stephen Rothwell
Conflicts: block/blk-core.c
2014-09-10Merge remote-tracking branch 'integrity/next'Stephen Rothwell
Conflicts: fs/nfsd/vfs.c
2014-09-10Merge remote-tracking branch 'block/for-next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'vfs/for-next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'file-locks/linux-next'Stephen Rothwell
Conflicts: fs/locks.c include/linux/fs.h
2014-09-10Merge remote-tracking branch 'xfs/for-next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'ubifs/linux-next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'nfsd/nfsd-next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'nfs/linux-next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'logfs/master'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'gfs2/master'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'f2fs/dev'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'ext4/dev'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'ext3/for_next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'ecryptfs/next'Stephen Rothwell
2014-09-10Merge remote-tracking branch 'cifs/for-next'Stephen Rothwell
2014-09-10hfsplus: fix longname handlingSougata Santra
Longname is not correctly handled by hfsplus driver. If an attempt to create a longname(>255) file/directory is made, it succeeds by creating a file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key. Thus leaving the volume in an inconsistent state. This patch fixes this issue. Although lookup is always called first to create a negative entry, so just doing a check in lookup would probably fix this issue. I choose to propagate error to other iops as well. Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from hfsplus_cat_build_key, to avoid unncessary branching. Thanks a lot. TEST: ------ dir="TEST_DIR" cdir=`pwd` name255="_123456789_123456789_123456789_123456789_123456789_123456789\ _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ _123456789_123456789_123456789_123456789_123456789_1234" name256="${name255}5" mkdir $dir cd $dir touch $name255 rm -f $name255 touch $name256 ls -la cd $cdir rm -rf $dir RESULT: ------- [sougata@ultrabook tmp]$ cdir=`pwd` [sougata@ultrabook tmp]$ name255="_123456789_123456789_123456789_123456789_123456789_123456789\ > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ > _123456789_123456789_123456789_123456789_123456789_1234" [sougata@ultrabook tmp]$ name256="${name255}5" [sougata@ultrabook tmp]$ [sougata@ultrabook tmp]$ mkdir $dir [sougata@ultrabook tmp]$ cd $dir [sougata@ultrabook TEST_DIR]$ touch $name255 [sougata@ultrabook TEST_DIR]$ rm -f $name255 [sougata@ultrabook TEST_DIR]$ touch $name256 [sougata@ultrabook TEST_DIR]$ ls -la ls: cannot access _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234: No such file or directory total 0 drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 . drwxrwxrwx 1 root root 6 Feb 20 19:56 .. -????????? ? ? ? ? ? _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 [sougata@ultrabook TEST_DIR]$ cd $cdir [sougata@ultrabook tmp]$ rm -rf $dir rm: cannot remove `TEST_DIR': Directory not empty -ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops. This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN and incorrect keys, leaving the FS in an inconsistent state. This patch fixes this issue. Signed-off-by: Sougata Santra <sougata@tuxera.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/befs/btree.c: remove typedef befs_btree_nodeHimangi Saraogi
The Linux kernel coding style guidelines suggest not using typedefs for structure types. This patch gets rid of the typedef for befs_btree_node. The following Coccinelle semantic patch detects the case. @tn1@ type td; @@ typedef struct { ... } td; @script:python tf@ td << tn1.td; tdres; @@ coccinelle.tdres = td; @@ type tn1.td; identifier tf.tdres; @@ -typedef struct + tdres { ... } -td ; @@ type tn1.td; identifier tf.tdres; @@ -td + struct tdres Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10autofs4: use ACCESS_ONCE rather than rcu_dereference for dentry->d_inodeNeilBrown
As the kbuild test robot reports, rcu_dereference() isn't really appropriate here - we don't need a memory barrier, just a guard against accidentally dereferencing NULL. This can be merged into autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mod= e. in 'mm'. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.NeilBrown
If rcu-walk mode we don't *have* to return -EISDIR for non-mount-traps as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT dentrys the slow way. But it is better if we do when possible. In 'oz_mode', use the same condition as ref-walk: if not a mountpoint, then it must be -EISDIR. In regular mode there are most tests needed. Most of them can be performed without taking any spinlocks. If we find a directory that isn't obviously empty, and isn't mounted on, we need to call 'simple_empty()' which does take a spinlock. If this turned out to hurt performance, some other approach could be found to signal when a directory is known to be empty. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10autofs4: avoid taking fs_lock during rcu-walkNeilBrown
->fs_lock protects AUTOFS_INF_EXPIRING. We need to be sure that once the flag is set, no new references beneath the dentry are taken. So rcu-walk currently needs to take fs_lock before checking the flag. This hurts performance. Change the expiry to a two-stage process. First set AUTOFS_INF_NO_RCU which forces any path walk into ref-walk mode, then drop the lock and call synchronize_rcu(). Once that returns we can be sure no rcu-walk is active beneath the dentry and we can check reference counts again. Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without taking the lock as along as we test AUTOFS_INF_NO_RCU too. If either are set, we must abort the RCU-walk If neither are set, we know that refcounts will be tested again after we finish the RCU-walk so we are safe to continue. ->fs_lock is still taken in d_manage() to check for a non-trap directory. That will be resolved in the next patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10autofs4: make "autofs4_can_expire" idempotent.NeilBrown
Have a "test" function change the value it is testing can be confusing, particularly as a future patch will be calling this function twice. So move the update for 'last_used' to avoid repeat expiry to the place where the final determination on what to expire is known. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10autofs4: factor should_expire() out of autofs4_expire_indirect.NeilBrown
Future patch will potentially call this twice, so make it separate. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10autofs4: allow RCU-walk to walk through autofs4NeilBrown
This series teaches autofs about RCU-walk so that we don't drop straight into REF-walk when we hit an autofs directory, and so that we avoid spinlocks as much as possible when performing an RCU-walk. This is needed so that the benefits of the recent NFS support for RCU-walk are fully available when NFS filesystems are automounted. Patches have been carefully reviewed and tested both with test suites and in production - thanks a lot to Ian Kent for his support there. This patch (of 6): Any attempt to look up a pathname that passes though an autofs4 mount is currently forced out of RCU-walk into REF-walk. This can significantly hurt performance of many-thread work loads on many-core systems, especially if the automounted filesystem supports RCU-walk but doesn't get to benefit from it. So if autofs4_d_manage is called with rcu_walk set, only fail with -ECHILD if it is necessary to wait longer than a spinlock. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10binfmt_misc: expand the register format limit to 1920 bytesMike Frysinger
The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/mpage.c: forgotten WRITE_SYNC in case of data integrity writeRoman Pen
In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity write, thus mark request as WRITE_SYNC. akpm: afaict this change will cause the data integrity write bios to be placed onto the second queue in cfq_io_cq.cfqq[], which presumably results in special treatment. The documentation for REQ_SYNC is horrid. Signed-off-by: Roman Pen <r.peniaev@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10mm: introduce common page state for ballooned memoryKonstantin Khlebnikov
Add page state PageBallon() and functions __Set/ClearPageBalloon. Like PageBuddy() PageBalloon() looks like page-flag but actually this is special state of page->_mapcount counter. There is no conflict because ballooned pages cannot be mapped and cannot be in buddy allocator. Ballooned pages are counted in vmstat counter NR_BALLOON_PAGES, it's shown them in /proc/meminfo and /proc/meminfo. Also this patch it exports PageBallon into userspace via /proc/kpageflags as KPF_BALLOON. All this code including mm/balloon_compaction.o is under CONFIG_MEMORY_BALLOON, it should be selected by ballooning driver which want use this feature. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10mempolicy: fix show_numa_map() vs exec() + do_set_mempolicy() raceOleg Nesterov
9e7814404b77 "hold task->mempolicy while numa_maps scans." fixed the race with the exiting task but this is not enough. The current code assumes that get_vma_policy(task) should either see task->mempolicy == NULL or it should be equal to ->task_mempolicy saved by hold_task_mempolicy(), so we can never race with __mpol_put(). But this can only work if we can't race with do_set_mempolicy(), and thus we can't race with another do_set_mempolicy() or do_exit() after that. However, do_set_mempolicy()->down_write(mmap_sem) can not prevent this race. This task can exec, change it's ->mm, and call do_set_mempolicy() after that; in this case they take 2 different locks. Change hold_task_mempolicy() to use get_task_policy(), it never returns NULL, and change show_numa_map() to use __get_vma_policy() or fall back to proc_priv->task_mempolicy. Note: this is the minimal fix, we will cleanup this code later. I think hold_task_mempolicy() and release_task_mempolicy() should die, we can move this logic into show_numa_map(). Or we can move get_task_policy() outside of ->mmap_sem and !CONFIG_NUMA code at least. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10block_dev: implement readpages() to optimize sequential readAkinobu Mita
Sequential read from a block device is expected to be equal or faster than from the file on a filesystem. But it is not correct due to the lack of effective readpages() in the address space operations for block device. This implements readpages() operation for block device by using mpage_readpages() which can create multipage BIOs instead of BIOs for each page and reduce system CPU time consumption. Install 1GB of RAM disk storage: # modprobe scsi_debug dev_size_mb=1024 delay=0 Sequential read from file on a filesystem: # mkfs.ext4 /dev/$DEV # mount /dev/$DEV /mnt # fio --name=t --size=512m --rw=read --filename=/mnt/file ... read : io=524288KB, bw=2133.4MB/s, iops=546133, runt= 240msec Sequential read from a block device: # fio --name=t --size=512m --rw=read --filename=/dev/$DEV ... (Without this commit) read : io=524288KB, bw=1700.2MB/s, iops=435455, runt= 301msec (With this commit) read : io=524288KB, bw=2160.4MB/s, iops=553046, runt= 237msec Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10vfs: guard end of device for mpage interfaceAkinobu Mita
Add guard_bio_eod() check for mpage code in order to allow us to do IO even on the odd last sectors of a device, even if the block size is some multiple of the physical sector size. Using mpage_readpages() for block device requires this guard check. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10vfs: make guard_bh_eod() more genericAkinobu Mita
This patchset implements readpages() operation for block device by using mpage_readpages() which can create multipage BIOs instead of BIOs for each page and reduce system CPU time consumption. This patch (of 3): guard_bh_eod() is used in submit_bh() to allow us to do IO even on the odd last sectors of a device, even if the block size is some multiple of the physical sector size. This makes guard_bh_eod() more generic and renames it guard_bio_eod() so that we can use it without struct buffer_head argument. The reason for this change is that using mpage_readpages() for block device requires to add this guard check in mpage code. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10not-adding-modules-range-to-kcore-if-its-equal-to-vmcore-range-checkpatch-fixesAndrew Morton
ERROR: space prohibited after that open parenthesis '(' #34: FILE: fs/proc/kcore.c:613: + if ( (MODULES_VADDR != VMALLOC_START) && ERROR: space prohibited before that close parenthesis ')' #35: FILE: fs/proc/kcore.c:614: + (MODULES_END != VMALLOC_END) ) { total: 2 errors, 0 warnings, 12 lines checked ./patches/not-adding-modules-range-to-kcore-if-its-equal-to-vmcore-range.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Baoquan He <bhe@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/proc/kcore.c: don't add modules range to kcore if it's equal to vmcore rangeBaoquan He
On some ARCHs modules range is eauql to vmalloc range. E.g on i686 "#define MODULES_VADDR VMALLOC_START" "#define MODULES_END VMALLOC_END" This will cause 2 duplicate program segments in /proc/kcore, and no flag to indicate they are different. This is confusing. And usually people who need check the elf header or read the content of kcore will check memory ranges. Two program segments which are the same are unnecessary. So check if the modules range is equal to vmalloc range. If so, just skip adding the modules range. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10proc/maps: make vm_is_stack() logic namespace-friendlyOleg Nesterov
- Rename vm_is_stack() to task_of_stack() and change it to return "struct task_struct *" rather than the global (and thus wrong in general) pid_t. - Add the new pid_of_stack() helper which calls task_of_stack() and uses the right namespace to report the correct pid_t. Unfortunately we need to define this helper twice, in task_mmu.c and in task_nommu.c. perhaps it makes sense to add fs/proc/util.c and move at least pid_of_stack/task_of_stack there to avoid the code duplication. - Change show_map_vma() and show_numa_map() to use the new helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10proc/maps: replace proc_maps_private->pid with "struct inode *inode"Oleg Nesterov
m_start() can use get_proc_task() instead, and "struct inode *" provides more potentially useful info, see the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/proc/task_nommu.c: don't use priv->task->mmOleg Nesterov
I do not know if CONFIG_PREEMPT/SMP is possible without CONFIG_MMU but the usage of task->mm in m_stop(). The task can exit/exec before we take mmap_sem, in this case m_stop() can hit NULL or unlock the wrong rw_semaphore. Also, this code uses priv->task != NULL to decide whether we need up_read/mmput. This is correct, but we will probably kill priv->task. Change m_start/m_stop to rely on IS_ERR_OR_NULL() like task_mmu.c does. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs-proc-task_nommuc-shift-mm_access-from-m_start-to-proc_maps_open-checkpatc ↵Andrew Morton
h-fixes WARNING: Missing a blank line after declarations #46: FILE: fs/proc/task_nommu.c:280: + int err = PTR_ERR(priv->mm); + seq_release_private(inode, file); total: 0 errors, 1 warnings, 57 lines checked ./patches/fs-proc-task_nommuc-shift-mm_access-from-m_start-to-proc_maps_open.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open()Oleg Nesterov
Copy-and-paste the changes from "fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()" into task_nommu.c. Change maps_open() to initialize priv->mm using proc_mem_open(), m_start() can rely on atomic_inc_not_zero(mm_users) like task_mmu.c does. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/proc/task_nommu.c: change maps_open() to use __seq_open_private()Oleg Nesterov
Cleanup and preparation. maps_open() can use __seq_open_private() like proc_maps_open() does. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-09-10fs/proc/task_mmu.c: update m->version in the main loop in m_start()Oleg Nesterov
Change the main loop in m_start() to update m->version. Mostly for consistency, but this can help to avoid the same loop if the very 1st ->show() fails due to seq_overflow(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>