summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-08-05Add linux-next specific files for 20140805next-20140805Stephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2014-08-05Merge branch 'akpm/master'Stephen Rothwell
2014-08-05mm: add strictlimit knobMaxim Patlasov
The "strictlimit" feature was introduced to enforce per-bdi dirty limits for FUSE which sets bdi max_ratio to 1% by default: http://article.gmane.org/gmane.linux.kernel.mm/105809 However the feature can be useful for other relatively slow or untrusted BDIs like USB flash drives and DVD+RW. The patch adds a knob to enable the feature: echo 1 > /sys/class/bdi/X:Y/strictlimit Being enabled, the feature enforces bdi max_ratio limit even if global (10%) dirty limit is not reached. Of course, the effect is not visible until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Artem S. Tashkinov" <t.artem@lycos.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05drivers/w1/w1_int.c: call put_device if device_register failsLevente Kurusa
Currently, memsetting and kfreeing the device is bad behaviour. The device will have a reference count of 1 and hence can cause trouble because it has kfree'd. Proper way to handle a failed device_register is to call put_device right after it fails. Signed-off-by: Levente Kurusa <levex@linux.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05Documentation/filesystems/vfat.txt: update the limitation for fat fallocateNamjae Jeon
Update the limitation for fat fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fat: permit to return phy block number by fibmap in fallocated regionNamjae Jeon
Make the fibmap call the return the proper physical block number for any offset request in the fallocated range. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fat: fallback to buffered write in case of fallocated region on direct IONamjae Jeon
For normal cases of direct IO write, trying to seek to location greater than file size, makes it fall back to buffered write to fill that region. Similarly, in case for write in Fallocated region, make it fall to buffered write. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fat: zero out seek range on _fat_get_blockNamjae Jeon
For normal buffered write operations, normally if we try to write to an offset > than file size, it does a cont_expand_zero till that offset. Now, in case of fallocated regions, since the blocks are already allocated. So, make it zero out that buffers for those blocks till the seek'ed offset. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fat: add fat_fallocate operationNamjae Jeon
Implement preallocation via the fallocate syscall on VFAT partitions. This patch is based on an earlier patch of the same name which had some issues detailed below and did not get accepted. Refer https://lkml.org/lkml/2007/12/22/130. a) The preallocated space was not persistent when the FALLOC_FL_KEEP_SIZE flag was set. It will deallocate cluster at evict time. b) There was no need to zero out the clusters when the flag was set Instead of doing an expanding truncate, just allocate clusters and add them to the fat chain. This reduces preallocation time. Compatibility with windows: There are no issues when FALLOC_FL_KEEP_SIZE is not set because it just does an expanding truncate. Thus reading from the preallocated area on windows returns null until data is written to it. When a file with preallocated area using the FALLOC_FL_KEEP_SIZE was written to on windows, the windows driver freed-up the preallocated clusters and allocated new clusters for the new data. The freed up clusters gets reflected in the free space available for the partition which can be seen from the Volume properties. The windows chkdsk tool also does not report any errors on a disk containing files with preallocated space. And there is also no issue using linux fat fsck. because discard preallocated clusters at repair time. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fat: add i_disksize to represent uninitialized sizeNamjae Jeon
Add i_disksize to represent uninitialized allocated size. And mmu_private represent initialized allocated size. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05mm: remap_file_pages: grab file ref to prevent race while mmapingSasha Levin
A file reference should be held while a file is mmaped, otherwise it might be freed while being used. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Suggested-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05mm: remap_file_pages: initialize populate before usageSasha Levin
'populate' wasn't initialized before being used in error paths, causing panics when mm_populate() would get called with invalid values. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05mm-replace-remap_file_pages-syscall-with-emulation-fixAndrew Morton
fix spello Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Armin Rigo <arigo@tunes.org> Cc: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05mm: replace remap_file_pages() syscall with emulationKirill A. Shutemov
remap_file_pages(2) was invented to be able efficiently map parts of huge file into limited 32-bit virtual address space such as in database workloads. Nonlinear mappings are pain to support and it seems there's no legitimate use-cases nowadays since 64-bit systems are widely available. Let's drop it and get rid of all these special-cased code. The patch replaces the syscall with emulation which creates new VMA on each remap_file_pages(), unless they it can be merged with an adjacent one. I didn't find *any* real code that uses remap_file_pages(2) to test emulation impact on. I've checked Debian code search and source of all packages in ALT Linux. No real users: libc wrappers, mentions in strace, gdb, valgrind and this kind of stuff. There are few basic tests in LTP for the syscall. They work just fine with emulation. To test performance impact, I've written small test case which demonstrate pretty much worst case scenario: map 4G shmfs file, write to begin of every page pgoff of the page, remap pages in reverse order, read every page. The test creates 1 million of VMAs if emulation is in use, so I had to set vm.max_map_count to 1100000 to avoid -ENOMEM. Before: 23.3 ( +- 4.31% ) seconds After: 43.9 ( +- 0.85% ) seconds Slowdown: 1.88x I believe we can live with that. Test case: #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #define MB (1024UL * 1024) #define SIZE (4096 * MB) int main(int argc, char **argv) { unsigned long *p; long i, pass; for (pass = 0; pass < 10; pass++) { p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; for (i = 0; i < SIZE / 4096; i++) { if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096, 0, (SIZE - 4096 * (i + 1)) >> 12, 0)) { perror("remap_file_pages"); return -1; } } for (i = SIZE / 4096 - 1; i >= 0; i--) assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1); munmap(p, SIZE); } return 0; } Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Armin Rigo <arigo@tunes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kernel/kprobes.c: convert printk to pr_foo()Fabian Frederick
Also fixes some checkpatch warnings -Static initialization -Lines over 80 characters Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05MAINTAINERS: update nomadik patternsJoe Perches
commit 3a19805920f1 ("pinctrl: nomadik: move all Nomadik drivers to subdir") move the files, update the patterns Signed-off-by: Joe Perches <joe@perches.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Alessandro Rubini <rubini@unipv.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05MAINTAINERS: update usb/gadget patternsJoe Perches
Several commits have moved files around, update the section patterns. Signed-off-by: Joe Perches <joe@perches.com> Cc: Thomas Dahlmann <dahlmann.thomas@arcor.de> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Li Yang <leoli@freescale.com> Cc: Eric Miao <eric.y.miao@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05MAINTAINERS: update DMA BUFFER SHARING patternsJoe Perches
One pattern per F: line please... Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05ksm-provide-support-to-use-deferrable-timers-for-scanner-thread-fix-fix-2Andrew Morton
fix build Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05ksm-provide-support-to-use-deferrable-timers-for-scanner-thread-fixAndrew Morton
clean up docs, fix deferrable_timer_store() error handling Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05ksm: provide support to use deferrable timers for scanner threadChintan Pandya
KSM thread to scan pages is scheduled on definite timeout. That wakes up CPU from idle state and hence may affect the power consumption. Provide an optional support to use deferrable timer which suites low-power use-cases. Typically, on our setup we observed, 10% less power consumption with some use-cases in which CPU goes to power collapse frequently. For example, playing audio while typically CPU remains idle. To enable deferrable timers, $ echo 1 > /sys/kernel/mm/ksm/deferrable_timer Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05timer: provide an api for deferrable timeoutChintan Pandya
schedule_timeout wakes up the CPU from IDLE state. For some use cases it is not desirable, hence introduce a convenient API (schedule_timeout_deferrable_interruptible) on similar pattern which uses a deferrable timer. Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05dlm: plock: reduce indentation by rearranging orderJoe Perches
if blocks that have a goto at the end of one branch can be simplified by reordering and unindenting. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05dlm, lockd: remove unused conf from lm_grantJoe Perches
This argument is always NULL so don't pass it around. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fs/dlm/plock.c: add argument descriptions to notifyJoe Perches
Function pointer arguments without type names are not very clear. Add them. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fs.h: add member argument descriptions to lock_manager_operationsJoe Perches
Function pointer struct members without argument type names are not very clear. Add them. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fs.h: add argument names to struct file_lock_operations (*funcs)Joe Perches
Function pointer struct members without argument type names are not very clear. Add them. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fs.h: a few more whitespace neateningsJoe Perches
Some line removals and line wrapping to 80 columns along with few trivial other whitespace changes. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fs.h: whitespace neateningJoe Perches
Make the whitespace a lot more kernel style conformant. git diff -w shows no differences. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05fs.h: remove unnecessary extern prototypesJoe Perches
This file has a mixture of prototypes with and without extern. Remove the extern uses. Signed-off-by: Joe Perches <joe@perches.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Teigland <teigland@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: verify the signature of signed PE bzImageVivek Goyal
This is the final piece of the puzzle of verifying kernel image signature during kexec_file_load() syscall. This patch calls into PE file routines to verify signature of bzImage. If signature are valid, kexec_file_load() succeeds otherwise it fails. Two new config options have been introduced. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. I tested these patches with both "pesign" and "sbsign" signed bzImages. I used signing_key.priv key and signing_key.x509 cert for signing as generated during kernel build process (if module signing is enabled). Used following method to sign bzImage. pesign ====== - Convert DER format cert to PEM format cert openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform PEM - Generate a .p12 file from existing cert and private key file openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in signing_key.x509.PEM - Import .p12 file into pesign db pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign - Sign bzImage pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign -c "Glacier signing key - Magrathea" -s sbsign ====== sbsign --key signing_key.priv --cert signing_key.x509.PEM --output /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+ Patch details: Well all the hard work is done in previous patches. Now bzImage loader has just call into that code and verify whether bzImage signature are valid or not. Also create two config options. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Fleming <matt@console-pimps.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec-support-kexec-kdump-on-efi-systems-fixAndrew Morton
s/get_efi/efi_get/g, per Matt Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Matt Fleming <matt@console-pimps.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: support kexec/kdump on EFI systemsVivek Goyal
This patch does two things. It passes EFI run time mappings to second kernel in bootparams efi_info. Second kernel parse this info and create new mappings in second kernel. That means mappings in first and second kernel will be same. This paves the way to enable EFI in kexec kernel. This patch also prepares and passes EFI setup data through bootparams. This contains bunch of information about various tables and their addresses. These information gathering and passing has been written along the lines of what current kexec-tools is doing to make kexec work with UEFI. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Fleming <matt@console-pimps.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec-bzimage64: fix the ordering of registersVivek Goyal
hpa suggested to fix the ordering of register declaration. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: support for kexec on panic using new system callVivek Goyal
This patch adds support for loading a kexec on panic (kdump) kernel usning new system call. It prepares ELF headers for memory areas to be dumped and for saved cpu registers. Also prepares the memory map for second kernel and limits its boot to reserved areas only. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: fix freeing up for image loader data loadingVivek Goyal
During testing I noticed a crash. Which in turn showed that there are problems with how I am freeing up image->image_loader_data. In one case I am freeing up kimage->image_loader_data first and then calling up arch to free up which might have been contained in that structure. That's wrong. I have done little cleanup and this should fix the issues around freeing up of loader data. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec-bzImage64: support for loading bzImage using 64bit entryVivek Goyal
This is loader specific code which can load bzImage and set it up for 64bit entry. This does not take care of 32bit entry or real mode entry. 32bit mode entry can be implemented if somebody needs it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec-load-and-relocate-purgatory-at-kernel-load-time-fixVivek Goyal
Set CRYPTO=y and CRYPTO_SHA256=y for all arch supporting kexec. Generic kexec implementation now makes use of crypto API to calculate the sha256 digest of loaded kernel segments (for new syscall kexec_file_load()). That means one need to enforce that CRYPTO and CRYPTO_SHA256 are built in for kexec to compile and for new syscall to work. I created this dependency for x86 but forgot to do for other arches supporting kexec. And ran into compilation failure reports from kbuild test robot. kernel/built-in.o: In function `sys_kexec_file_load': (.text+0x32314): undefined reference to `crypto_shash_final' kernel/built-in.o: In function `sys_kexec_file_load': (.text+0x32328): undefined reference to `crypto_shash_update' kernel/built-in.o: In function `sys_kexec_file_load': >> (.text+0x32338): undefined reference to `crypto_alloc_shash' Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: load and relocate purgatory at kernel load timeVivek Goyal
Load purgatory code in RAM and relocate it based on the location. Relocation code has been inspired by module relocation code and purgatory relocation code in kexec-tools. Also compute the checksums of loaded kexec segments and store them in purgatory. Arch independent code provides this functionality so that arch dependent bootloaders can make use of it. Helper functions are provided to get/set symbol values in purgatory which are used by bootloaders later to set things like stack and entry point of second kernel etc. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05purgatory: run host built programs from objtreeStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05purgatory: core purgatory functionalityVivek Goyal
Create a stand alone relocatable object purgatory which runs between two kernels. This name, concept and some code has been taken from kexec-tools. Idea is that this code runs after a crash and it runs in minimal environment. So keep it separate from rest of the kernel and in long term we will have to practically do no maintenance of this code. This code also has the logic to do verify sha256 hashes of various segments which have been loaded into memory. So first we verify that the kernel we are jumping to is fine and has not been corrupted and make progress only if checsums are verified. This code also takes care of copying some memory contents to backup region. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05purgatory/sha256: provide implementation of sha256 in purgaotory contextVivek Goyal
Next two patches provide code for purgatory. This is a code which does not link against the kernel and runs stand alone. This code runs between two kernels. One of the primary purpose of this code is to verify the digest of newly loaded kernel and making sure it matches the digest computed at kernel load time. We use sha256 for calculating digest of kexec segmetns. Purgatory can't use stanard crypto API as that API is not available in purgatory context. Hence, I have copied code from crypto/sha256_generic.c and compiled it with purgaotry code so that it could be used. I could not #include sha256_generic.c file here as some of the function signature requiered little tweaking. Original functions work with crypto API but these ones don't So instead of doing #include on sha256_generic.c I just copied relevant portions of code into arch/x86/purgatory/sha256.c. Now we shouldn't have to touch this code at all. Do let me know if there are better ways to handle it. This patch does not enable compiling of this code. That happens in next patch. I wanted to highlight this change in a separate patch for easy review. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: return error if file bytes are less then file sizeVivek Goyal
If number of bytes read from file are not same as file size, return error. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec-implementation-of-new-syscall-kexec_file_load-checkpatch-fixesAndrew Morton
WARNING: else is not generally useful after a break or return #673: FILE: kernel/kexec.c:2024: + return locate_mem_hole_top_down(start, end, kbuf); + else total: 0 errors, 1 warnings, 677 lines checked ./patches/kexec-implementation-of-new-syscall-kexec_file_load.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: implementation of new syscall kexec_file_loadVivek Goyal
Previous patch provided the interface definition and this patch prvides implementation of new syscall. Previously segment list was prepared in user space. Now user space just passes kernel fd, initrd fd and command line and kernel will create a segment list internally. This patch contains generic part of the code. Actual segment preparation and loading is done by arch and image specific loader. Which comes in next patch. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: new syscall kexec_file_load() declarationVivek Goyal
This is the new syscall kexec_file_load() declaration/interface. I have reserved the syscall number only for x86_64 so far. Other architectures (including i386) can reserve syscall number when they enable the support for this new syscall. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Borislav Petkov <bp@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: make kexec_segment user buffer pointer a unionVivek Goyal
So far kexec_segment->buf was always a user space pointer as user space passed the array of kexec_segment structures and kernel copied it. But with new system call, list of kexec segments will be prepared by kernel and kexec_segment->buf will point to a kernel memory. So while I was adding code where I made assumption that ->buf is pointing to kernel memory, sparse started giving warning. Make ->buf a union. And where a user space pointer is expected, access it using ->buf and where a kernel space pointer is expected, access it using ->kbuf. That takes care of sparse warnings. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05resource: provide new functions to walk through resourcesVivek Goyal
I have added two more functions to walk through resources. Currently walk_system_ram_range() deals with pfn and /proc/iomem can contain partial pages. By dealing in pfn, callback function loses the info that last page of a memory range is a partial page and not the full page. So I implemented walk_system_ram_res() which returns u64 values to callback functions and now it properly return start and end address. walk_system_ram_range() uses find_next_system_ram() to find the next ram resource. This in turn only travels through siblings of top level child and does not travers through all the nodes of the resoruce tree. I also need another function where I can walk through all the resources, for example figure out where "GART" aperture is. Figure out where ACPI memory is. So I wrote another function walk_iomem_res() which walks through all /proc/iomem resources and returns matches as asked by caller. Caller can specify "name" of resource, start and end and flags. Got rid of find_next_system_ram_res() and instead implemented more generic find_next_iomem_res() which can be used to traverse top level children only based on an argument. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc()Vivek Goyal
kimage_normal_alloc() and kimage_crash_alloc() are doing lot of similar things and differ only little. So instead of having two separate functions create a common function kimage_alloc_init() and pass it the "flags" argument which tells whether it is normal kexec or kexec_on_panic. And this function should be able to deal with both the cases. This consolidation also helps later where we can use a common function kimage_file_alloc_init() to handle normal and crash cases for new file based kexec syscall. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2014-08-05kexec: move segment verification code in a separate functionVivek Goyal
Previously do_kimage_alloc() will allocate a kimage structure, copy segment list from user space and then do the segment list sanity verification. Break down this function in 3 parts. do_kimage_alloc_init() to do actual allocation and basic initialization of kimage structure. copy_user_segment_list() to copy segment list from user space and sanity_check_segment_list() to verify the sanity of segment list as passed by user space. In later patches, I need to only allocate kimage and not copy segment list from user space. So breaking down in smaller functions enables re-use of code at other places. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>