diff options
author | Stephen Rothwell <sfr@canb.auug.org.au> | 2014-04-17 16:08:49 +1000 |
---|---|---|
committer | Stephen Rothwell <sfr@canb.auug.org.au> | 2014-04-17 17:07:09 +1000 |
commit | 50e056db0177508c067006b991c34cbd72c98eb4 (patch) | |
tree | 0de0703bb0098044c1b675ec7e1a224e2777df03 | |
parent | be728448326c78f925bee6c7153f7a775031b36f (diff) | |
parent | 7b5f689823ab9b3bbb07aa4a57ea9aa9db6f09c5 (diff) |
Merge branch 'akpm-current/current'
156 files changed, 2867 insertions, 1226 deletions
@@ -1377,6 +1377,9 @@ S: 17 rue Danton S: F - 94270 Le Kremlin-Bicêtre S: France +N: Jack Hammer +D: IBM ServeRAID RAID (ips) driver maintenance + N: Greg Hankins E: gregh@cc.gatech.edu D: fixed keyboard driver to separate LED and locking status @@ -1687,6 +1690,10 @@ S: Reading S: RG6 2NU S: United Kingdom +N: Dave Jeffery +E: dhjeffery@gmail.com +D: SCSI hacks and IBM ServeRAID RAID driver maintenance + N: Jakub Jelinek E: jakub@redhat.com W: http://sunsite.mff.cuni.cz/~jj diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 2a8e89e13e45..7e9abb8a276b 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -132,6 +132,20 @@ Example: platform_set_drvdata(), but left the variable "dev" unused, delete it. +If your patch fixes a bug in a specific commit, e.g. you found an issue using +git-bisect, please use the 'Fixes:' tag with the first 12 characters of the +SHA-1 ID, and the one line summary. +Example: + + Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") + +The following git-config settings can be used to add a pretty format for +outputting the above style in the git log or git show commands + + [core] + abbrev = 12 + [pretty] + fixes = Fixes: %h (\"%s\") 3) Separate your changes. @@ -443,7 +457,7 @@ person it names. This tag documents that potentially interested parties have been included in the discussion -14) Using Reported-by:, Tested-by:, Reviewed-by: and Suggested-by: +14) Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes: If this patch fixes a problem reported by somebody else, consider adding a Reported-by: tag to credit the reporter for their contribution. Please @@ -498,6 +512,12 @@ idea was not posted in a public forum. That said, if we diligently credit our idea reporters, they will, hopefully, be inspired to help us again in the future. +A Fixes: tag indicates that the patch fixes an issue in a previous commit. It +is used to make it easy to determine where a bug originated, which can help +review a bug fix. This tag also assists the stable kernel team in determining +which stable kernel versions should receive your fix. This is the preferred +method for indicating a bug fixed by the patch. See #2 above for more details. + 15) The canonical patch format diff --git a/Documentation/devicetree/bindings/rtc/xgene-rtc.txt b/Documentation/devicetree/bindings/rtc/xgene-rtc.txt new file mode 100644 index 000000000000..fd195c358446 --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/xgene-rtc.txt @@ -0,0 +1,28 @@ +* APM X-Gene Real Time Clock + +RTC controller for the APM X-Gene Real Time Clock + +Required properties: +- compatible : Should be "apm,xgene-rtc" +- reg: physical base address of the controller and length of memory mapped + region. +- interrupts: IRQ line for the RTC. +- #clock-cells: Should be 1. +- clocks: Reference to the clock entry. + +Example: + +rtcclk: rtcclk { + compatible = "fixed-clock"; + #clock-cells = <1>; + clock-frequency = <100000000>; + clock-output-names = "rtcclk"; +}; + +rtc: rtc@10510000 { + compatible = "apm,xgene-rtc"; + reg = <0x0 0x10510000 0x0 0x400>; + interrupts = <0x0 0x46 0x4>; + #clock-cells = <1>; + clocks = <&rtcclk 0>; +}; diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt index 4a93e98b290a..5cf57b368dc6 100644 --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt @@ -175,6 +175,16 @@ nfs=stale_rw|nostale_ro <bool>: 0,1,yes,no,true,false +LIMITATION +--------------------------------------------------------------------- +* The fallocated region of file is discarded at umount/evict time + when using fallocate with FALLOC_FL_KEEP_SIZE. + So, User should assume that fallocated region can be discarded at + last close if there is memory pressure resulting in eviction of + the inode from the memory. As a result, for any dependency on + the fallocated region, user should make sure to recheck fallocate + after reopening the file. + TODO ---------------------------------------------------------------------- * Need to get rid of the raw scanning stuff. Instead, always use diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 43842177b771..7ec22955857d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1287,6 +1287,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. for working out where the kernel is dying during startup. + initcall_blacklist= [KNL] Do not execute a comma-separated list of + initcall functions. Useful for debugging built-in + modules and initcalls. + initrd= [BOOT] Specify the location of the initial ramdisk inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver diff --git a/Documentation/leds/leds-class.txt b/Documentation/leds/leds-class.txt index 79699c200766..62261c04060a 100644 --- a/Documentation/leds/leds-class.txt +++ b/Documentation/leds/leds-class.txt @@ -2,9 +2,6 @@ LED handling under Linux ======================== -If you're reading this and thinking about keyboard leds, these are -handled by the input subsystem and the led class is *not* needed. - In its simplest form, the LED class just allows control of LEDs from userspace. LEDs appear in /sys/class/leds/. The maximum brightness of the LED is defined in max_brightness file. The brightness file will set the brightness diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 6f4eb322ffaf..94459b42e0ab 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -184,6 +184,12 @@ dentry names: equivalent of %s dentry->d_name.name we used to use, %pd<n> prints n last components. %pD does the same thing for struct file. +task_struct comm name: + + %pT + + For printing task_struct->comm. + struct va_format: %pV diff --git a/MAINTAINERS b/MAINTAINERS index 8ebdcc2fbd7e..3152dd74be22 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4384,10 +4384,7 @@ F: drivers/scsi/ibmvscsi/ X: drivers/scsi/ibmvscsi/ibmvstgt.c IBM ServeRAID RAID DRIVER -P: Jack Hammer -M: Dave Jeffery <ipslinux@adaptec.com> -W: http://www.developer.ibm.com/welcome/netfinity/serveraid.html -S: Supported +S: Orphan F: drivers/scsi/ips.* ICH LPC AND GPIO DRIVER diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 96e54bed5088..e858aa0ad8af 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -6,4 +6,5 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/alpha/include/asm/scatterlist.h b/arch/alpha/include/asm/scatterlist.h deleted file mode 100644 index 017d7471c3c4..000000000000 --- a/arch/alpha/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ALPHA_SCATTERLIST_H -#define _ALPHA_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#endif /* !(_ALPHA_SCATTERLIST_H) */ diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index ab438cb5af55..8e9286af440a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -83,6 +83,7 @@ config ARM <http://www.arm.linux.org.uk/>. config ARM_HAS_SG_CHAIN + select ARCH_HAS_SG_CHAIN bool config NEED_SG_DMA_LENGTH diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 23e728ecf8ab..2d95820276fd 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -21,6 +21,7 @@ generic-y += parport.h generic-y += poll.h generic-y += preempt.h generic-y += resource.h +generic-y += scatterlist.h generic-y += sections.h generic-y += segment.h generic-y += sembuf.h diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h index bbae919bceb4..68ea615c2a28 100644 --- a/arch/arm/include/asm/fixmap.h +++ b/arch/arm/include/asm/fixmap.h @@ -14,28 +14,15 @@ */ #define FIXADDR_START 0xfff00000UL -#define FIXADDR_TOP 0xfffe0000UL -#define FIXADDR_SIZE (FIXADDR_TOP - FIXADDR_START) +#define FIXADDR_END 0xfffe0000UL +#define FIXADDR_TOP (FIXADDR_END - PAGE_SIZE) -#define FIX_KMAP_BEGIN 0 -#define FIX_KMAP_END (FIXADDR_SIZE >> PAGE_SHIFT) +enum fixed_addresses { + FIX_KMAP_BEGIN, + FIX_KMAP_END = (FIXADDR_TOP - FIXADDR_START) >> PAGE_SHIFT, + __end_of_fixed_addresses +}; -#define __fix_to_virt(x) (FIXADDR_START + ((x) << PAGE_SHIFT)) -#define __virt_to_fix(x) (((x) - FIXADDR_START) >> PAGE_SHIFT) - -extern void __this_fixmap_does_not_exist(void); - -static inline unsigned long fix_to_virt(const unsigned int idx) -{ - if (idx >= FIX_KMAP_END) - __this_fixmap_does_not_exist(); - return __fix_to_virt(idx); -} - -static inline unsigned int virt_to_fix(const unsigned long vaddr) -{ - BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); - return __virt_to_fix(vaddr); -} +#include <asm-generic/fixmap.h> #endif diff --git a/arch/arm/include/asm/scatterlist.h b/arch/arm/include/asm/scatterlist.h deleted file mode 100644 index cefdb8f898a1..000000000000 --- a/arch/arm/include/asm/scatterlist.h +++ /dev/null @@ -1,12 +0,0 @@ -#ifndef _ASMARM_SCATTERLIST_H -#define _ASMARM_SCATTERLIST_H - -#ifdef CONFIG_ARM_HAS_SG_CHAIN -#define ARCH_HAS_SG_CHAIN -#endif - -#include <asm/memory.h> -#include <asm/types.h> -#include <asm-generic/scatterlist.h> - -#endif /* _ASMARM_SCATTERLIST_H */ diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 2a77ba8796ae..91a468225853 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -578,7 +578,7 @@ void __init mem_init(void) MLK(DTCM_OFFSET, (unsigned long) dtcm_end), MLK(ITCM_OFFSET, (unsigned long) itcm_end), #endif - MLK(FIXADDR_START, FIXADDR_TOP), + MLK(FIXADDR_START, FIXADDR_END), MLM(VMALLOC_START, VMALLOC_END), MLM(PAGE_OFFSET, (unsigned long)high_memory), #ifdef CONFIG_HIGHMEM diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e6e4d3749a6e..df24b9323ac6 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2,6 +2,7 @@ config ARM64 def_bool y select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_WANT_OPTIONAL_GPIOLIB select ARCH_WANT_COMPAT_IPC_PARSE_VERSION diff --git a/arch/arm64/boot/dts/apm-storm.dtsi b/arch/arm64/boot/dts/apm-storm.dtsi index 93f4b2dd9248..4917f3b81a44 100644 --- a/arch/arm64/boot/dts/apm-storm.dtsi +++ b/arch/arm64/boot/dts/apm-storm.dtsi @@ -257,6 +257,19 @@ enable-offset = <0x0>; enable-mask = <0x39>; }; + + rtcclk: rtcclk@17000000 { + compatible = "apm,xgene-device-clock"; + #clock-cells = <1>; + clocks = <&socplldiv2 0>; + reg = <0x0 0x17000000 0x0 0x2000>; + reg-names = "csr-reg"; + csr-offset = <0xc>; + csr-mask = <0x2>; + enable-offset = <0x10>; + enable-mask = <0x2>; + clock-output-names = "rtcclk"; + }; }; serial0: serial@1c020000 { @@ -339,5 +352,13 @@ phys = <&phy3 0>; phy-names = "sata-phy"; }; + + rtc: rtc@10510000 { + compatible = "apm,xgene-rtc"; + reg = <0x0 0x10510000 0x0 0x400>; + interrupts = <0x0 0x46 0x4>; + #clock-cells = <1>; + clocks = <&rtcclk 0>; + }; }; }; diff --git a/arch/cris/include/asm/Kbuild b/arch/cris/include/asm/Kbuild index afff5105909d..31742dfadff9 100644 --- a/arch/cris/include/asm/Kbuild +++ b/arch/cris/include/asm/Kbuild @@ -13,6 +13,7 @@ generic-y += linkage.h generic-y += mcs_spinlock.h generic-y += module.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += vga.h generic-y += xor.h diff --git a/arch/cris/include/asm/scatterlist.h b/arch/cris/include/asm/scatterlist.h deleted file mode 100644 index f11f8f40ec4a..000000000000 --- a/arch/cris/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef __ASM_CRIS_SCATTERLIST_H -#define __ASM_CRIS_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#endif /* !(__ASM_CRIS_SCATTERLIST_H) */ diff --git a/arch/frv/include/asm/Kbuild b/arch/frv/include/asm/Kbuild index 87b95eb8aee5..5b73921b6e9d 100644 --- a/arch/frv/include/asm/Kbuild +++ b/arch/frv/include/asm/Kbuild @@ -5,4 +5,5 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/frv/include/asm/scatterlist.h b/arch/frv/include/asm/scatterlist.h deleted file mode 100644 index 0e5eb3018468..000000000000 --- a/arch/frv/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_SCATTERLIST_H -#define _ASM_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#endif /* !_ASM_SCATTERLIST_H */ diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 12c3afee0f6f..43e7290fbccf 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -27,6 +27,7 @@ config IA64 select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP select HAVE_VIRT_CPU_ACCOUNTING + select ARCH_HAS_SG_CHAIN select VIRT_TO_BUS select ARCH_DISCARD_MEMBLOCK select GENERIC_IRQ_PROBE diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index 0da4aa2602ae..e8317d2d6c8d 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -5,5 +5,6 @@ generic-y += hash.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += vtime.h diff --git a/arch/ia64/include/asm/scatterlist.h b/arch/ia64/include/asm/scatterlist.h deleted file mode 100644 index 08fd93bff1db..000000000000 --- a/arch/ia64/include/asm/scatterlist.h +++ /dev/null @@ -1,7 +0,0 @@ -#ifndef _ASM_IA64_SCATTERLIST_H -#define _ASM_IA64_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> -#define ARCH_HAS_SG_CHAIN - -#endif /* _ASM_IA64_SCATTERLIST_H */ diff --git a/arch/m32r/include/asm/Kbuild b/arch/m32r/include/asm/Kbuild index 67779a74b62d..accc10a3dc78 100644 --- a/arch/m32r/include/asm/Kbuild +++ b/arch/m32r/include/asm/Kbuild @@ -6,4 +6,5 @@ generic-y += hash.h generic-y += mcs_spinlock.h generic-y += module.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/m32r/include/asm/scatterlist.h b/arch/m32r/include/asm/scatterlist.h deleted file mode 100644 index 7370b8b6243e..000000000000 --- a/arch/m32r/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_M32R_SCATTERLIST_H -#define _ASM_M32R_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#endif /* _ASM_M32R_SCATTERLIST_H */ diff --git a/arch/m68k/include/asm/signal.h b/arch/m68k/include/asm/signal.h index 214320b50384..8c8ce5e1ee0e 100644 --- a/arch/m68k/include/asm/signal.h +++ b/arch/m68k/include/asm/signal.h @@ -60,15 +60,6 @@ static inline int __gen_sigismember(sigset_t *set, int _sig) __const_sigismember(set,sig) : \ __gen_sigismember(set,sig)) -static inline int sigfindinword(unsigned long word) -{ - asm ("bfffo %1{#0,#0},%0" - : "=d" (word) - : "d" (word & -word) - : "cc"); - return word ^ 31; -} - #endif /* !CONFIG_CPU_HAS_NO_BITFIELDS */ #ifndef __uClinux__ diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index c98ed95c0541..2ea655be809d 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -6,5 +6,6 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += syscalls.h generic-y += trace_clock.h diff --git a/arch/microblaze/include/asm/scatterlist.h b/arch/microblaze/include/asm/scatterlist.h deleted file mode 100644 index 35d786fe93ae..000000000000 --- a/arch/microblaze/include/asm/scatterlist.h +++ /dev/null @@ -1 +0,0 @@ -#include <asm-generic/scatterlist.h> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c index e422b38d3113..9e67cdea3c74 100644 --- a/arch/mips/mm/cache.c +++ b/arch/mips/mm/cache.c @@ -29,15 +29,15 @@ void (*flush_cache_range)(struct vm_area_struct *vma, unsigned long start, void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn); void (*flush_icache_range)(unsigned long start, unsigned long end); +EXPORT_SYMBOL_GPL(flush_icache_range); void (*local_flush_icache_range)(unsigned long start, unsigned long end); void (*__flush_cache_vmap)(void); void (*__flush_cache_vunmap)(void); void (*__flush_kernel_vmap_range)(unsigned long vaddr, int size); -void (*__invalidate_kernel_vmap_range)(unsigned long vaddr, int size); - EXPORT_SYMBOL_GPL(__flush_kernel_vmap_range); +void (*__invalidate_kernel_vmap_range)(unsigned long vaddr, int size); /* MIPS specific cache operations */ void (*flush_cache_sigtramp)(unsigned long addr); diff --git a/arch/mn10300/include/asm/Kbuild b/arch/mn10300/include/asm/Kbuild index 654d5ba6e310..ecbd6676bd33 100644 --- a/arch/mn10300/include/asm/Kbuild +++ b/arch/mn10300/include/asm/Kbuild @@ -6,4 +6,5 @@ generic-y += exec.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/mn10300/include/asm/scatterlist.h b/arch/mn10300/include/asm/scatterlist.h deleted file mode 100644 index 7baa4006008a..000000000000 --- a/arch/mn10300/include/asm/scatterlist.h +++ /dev/null @@ -1,16 +0,0 @@ -/* MN10300 Scatterlist definitions - * - * Copyright (C) 2007 Matsushita Electric Industrial Co., Ltd. - * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public Licence - * as published by the Free Software Foundation; either version - * 2 of the Licence, or (at your option) any later version. - */ -#ifndef _ASM_SCATTERLIST_H -#define _ASM_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#endif /* _ASM_SCATTERLIST_H */ diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e0998997943b..caece570d0b4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -111,6 +111,7 @@ config PPC select HAVE_DMA_API_DEBUG select HAVE_OPROFILE select HAVE_DEBUG_KMEMLEAK + select ARCH_HAS_SG_CHAIN select GENERIC_ATOMIC64 if PPC32 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select HAVE_PERF_EVENTS diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 3fb1bc432f4f..7f23f162ce9c 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -4,5 +4,6 @@ generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h generic-y += rwsem.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += vtime.h diff --git a/arch/powerpc/include/asm/scatterlist.h b/arch/powerpc/include/asm/scatterlist.h deleted file mode 100644 index de1f620bd5c9..000000000000 --- a/arch/powerpc/include/asm/scatterlist.h +++ /dev/null @@ -1,17 +0,0 @@ -#ifndef _ASM_POWERPC_SCATTERLIST_H -#define _ASM_POWERPC_SCATTERLIST_H -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include <asm/dma.h> -#include <asm-generic/scatterlist.h> - -#define ARCH_HAS_SG_CHAIN - -#endif /* _ASM_POWERPC_SCATTERLIST_H */ diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 7b6c10750179..d85e86aac7fb 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -33,6 +33,7 @@ #include <linux/export.h> #include <asm/tlbflush.h> +#include <asm/dma.h> #include "mmu_decl.h" diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c index 6c0b1f5f8d2c..fa9fb5b4c66c 100644 --- a/arch/powerpc/mm/subpage-prot.c +++ b/arch/powerpc/mm/subpage-prot.c @@ -134,7 +134,7 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len) static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct vm_area_struct *vma = walk->private; + struct vm_area_struct *vma = walk->vma; split_huge_page_pmd(vma, addr, pmd); return 0; } @@ -163,9 +163,7 @@ static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, if (vma->vm_start >= (addr + len)) break; vma->vm_flags |= VM_NOHUGEPAGE; - subpage_proto_walk.private = vma; - walk_page_range(vma->vm_start, vma->vm_end, - &subpage_proto_walk); + walk_page_vma(vma, &subpage_proto_walk); vma = vma->vm_next; } } diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c index 534574a97ec9..3a104284b338 100644 --- a/arch/powerpc/platforms/44x/warp.c +++ b/arch/powerpc/platforms/44x/warp.c @@ -25,6 +25,7 @@ #include <asm/time.h> #include <asm/uic.h> #include <asm/ppc4xx.h> +#include <asm/dma.h> static __initdata struct of_device_id warp_of_bus[] = { diff --git a/arch/powerpc/platforms/52xx/efika.c b/arch/powerpc/platforms/52xx/efika.c index 18c104820198..47d66794cf3e 100644 --- a/arch/powerpc/platforms/52xx/efika.c +++ b/arch/powerpc/platforms/52xx/efika.c @@ -13,6 +13,7 @@ #include <generated/utsrelease.h> #include <linux/pci.h> #include <linux/of.h> +#include <asm/dma.h> #include <asm/prom.h> #include <asm/time.h> #include <asm/machdep.h> diff --git a/arch/powerpc/platforms/amigaone/setup.c b/arch/powerpc/platforms/amigaone/setup.c index 03aabc0e16ac..2fe12046279e 100644 --- a/arch/powerpc/platforms/amigaone/setup.c +++ b/arch/powerpc/platforms/amigaone/setup.c @@ -24,6 +24,7 @@ #include <asm/i8259.h> #include <asm/time.h> #include <asm/udbg.h> +#include <asm/dma.h> extern void __flush_disable_L1(void); diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 387a687833e2..6a71a6435629 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -146,6 +146,7 @@ config S390 select TTY select VIRT_CPU_ACCOUNTING select VIRT_TO_BUS + select ARCH_HAS_SG_CHAIN config SCHED_OMIT_FRAME_POINTER def_bool y diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 57892a8a9055..b3fea0722ff1 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -4,4 +4,5 @@ generic-y += clkdev.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h diff --git a/arch/s390/include/asm/scatterlist.h b/arch/s390/include/asm/scatterlist.h deleted file mode 100644 index 6d45ef6c12a7..000000000000 --- a/arch/s390/include/asm/scatterlist.h +++ /dev/null @@ -1,3 +0,0 @@ -#include <asm-generic/scatterlist.h> - -#define ARCH_HAS_SG_CHAIN diff --git a/arch/score/include/asm/Kbuild b/arch/score/include/asm/Kbuild index 2f947aba4bd4..aad209199f7e 100644 --- a/arch/score/include/asm/Kbuild +++ b/arch/score/include/asm/Kbuild @@ -8,5 +8,6 @@ generic-y += cputime.h generic-y += hash.h generic-y += mcs_spinlock.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += trace_clock.h generic-y += xor.h diff --git a/arch/score/include/asm/scatterlist.h b/arch/score/include/asm/scatterlist.h deleted file mode 100644 index 9f533b8362c7..000000000000 --- a/arch/score/include/asm/scatterlist.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_SCORE_SCATTERLIST_H -#define _ASM_SCORE_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#endif /* _ASM_SCORE_SCATTERLIST_H */ diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 29f2e988c56a..e1ea0ff154d7 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -42,6 +42,7 @@ config SPARC select MODULES_USE_ELF_RELA select ODD_RT_SIGACTION select OLD_SIGSUSPEND + select ARCH_HAS_SG_CHAIN config SPARC32 def_bool !64BIT diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index a45821818003..cdd1b447bb6c 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -15,6 +15,7 @@ generic-y += mcs_spinlock.h generic-y += module.h generic-y += mutex.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += serial.h generic-y += trace_clock.h generic-y += types.h diff --git a/arch/sparc/include/asm/scatterlist.h b/arch/sparc/include/asm/scatterlist.h deleted file mode 100644 index 92bb638313f8..000000000000 --- a/arch/sparc/include/asm/scatterlist.h +++ /dev/null @@ -1,8 +0,0 @@ -#ifndef _SPARC_SCATTERLIST_H -#define _SPARC_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#define ARCH_HAS_SG_CHAIN - -#endif /* !(_SPARC_SCATTERLIST_H) */ diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index a5e4b6068213..7bd64aa2e94a 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -21,6 +21,7 @@ generic-y += param.h generic-y += pci.h generic-y += percpu.h generic-y += preempt.h +generic-y += scatterlist.h generic-y += sections.h generic-y += switch_to.h generic-y += topology.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 895560b1fd24..376e4f7def27 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -26,7 +26,7 @@ config X86 select ARCH_MIGHT_HAVE_PC_SERIO select HAVE_AOUT if X86_32 select HAVE_UNSTABLE_SCHED_CLOCK - select ARCH_SUPPORTS_NUMA_BALANCING + select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_INT128 if X86_64 select ARCH_WANTS_PROT_NUMA_PROT_NONE select HAVE_IDE @@ -96,6 +96,7 @@ config X86 select IRQ_FORCED_THREADING select HAVE_BPF_JIT if X86_64 select HAVE_ARCH_TRANSPARENT_HUGEPAGE + select ARCH_HAS_SG_CHAIN select CLKEVT_I8253 select ARCH_HAVE_NMI_SAFE_CMPXCHG select GENERIC_IOMAP diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild index 3ca9762e1649..3bf000fab0ae 100644 --- a/arch/x86/include/asm/Kbuild +++ b/arch/x86/include/asm/Kbuild @@ -5,6 +5,7 @@ genhdr-y += unistd_64.h genhdr-y += unistd_x32.h generic-y += clkdev.h -generic-y += early_ioremap.h generic-y += cputime.h +generic-y += early_ioremap.h generic-y += mcs_spinlock.h +generic-y += scatterlist.h diff --git a/arch/x86/include/asm/scatterlist.h b/arch/x86/include/asm/scatterlist.h deleted file mode 100644 index 4240878b9d76..000000000000 --- a/arch/x86/include/asm/scatterlist.h +++ /dev/null @@ -1,8 +0,0 @@ -#ifndef _ASM_X86_SCATTERLIST_H -#define _ASM_X86_SCATTERLIST_H - -#include <asm-generic/scatterlist.h> - -#define ARCH_HAS_SG_CHAIN - -#endif /* _ASM_X86_SCATTERLIST_H */ diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h index 35e67a457182..31eab867e6d3 100644 --- a/arch/x86/include/asm/signal.h +++ b/arch/x86/include/asm/signal.h @@ -92,12 +92,6 @@ static inline int __gen_sigismember(sigset_t *set, int _sig) ? __const_sigismember((set), (sig)) \ : __gen_sigismember((set), (sig))) -static inline int sigfindinword(unsigned long word) -{ - asm("bsfl %1,%0" : "=r"(word) : "rm"(word) : "cc"); - return word; -} - struct pt_regs; #else /* __i386__ */ diff --git a/block/genhd.c b/block/genhd.c index 791f41943132..7bd4372e8b6f 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -849,7 +849,7 @@ static int show_partition(struct seq_file *seqf, void *v) char buf[BDEVNAME_SIZE]; /* Don't show non-partitionable removeable devices or empty devices */ - if (!get_capacity(sgp) || (!disk_max_parts(sgp) && + if (!get_capacity(sgp) || (!(disk_max_parts(sgp) > 1) && (sgp->flags & GENHD_FL_REMOVABLE))) return 0; if (sgp->flags & GENHD_FL_SUPPRESS_PARTITION_INFO) diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig index a11ff74a5127..9eac8de9e8b7 100644 --- a/drivers/input/Kconfig +++ b/drivers/input/Kconfig @@ -178,6 +178,15 @@ comment "Input Device Drivers" source "drivers/input/keyboard/Kconfig" +config INPUT_LEDS + bool "LED Support" + depends on LEDS_CLASS = INPUT || LEDS_CLASS = y + select LEDS_TRIGGERS + default y + help + This option enables support for LEDs on keyboards managed + by the input layer. + source "drivers/input/mouse/Kconfig" source "drivers/input/joystick/Kconfig" diff --git a/drivers/input/Makefile b/drivers/input/Makefile index 5ca3f631497f..2ab5f3336da5 100644 --- a/drivers/input/Makefile +++ b/drivers/input/Makefile @@ -6,6 +6,9 @@ obj-$(CONFIG_INPUT) += input-core.o input-core-y := input.o input-compat.o input-mt.o ff-core.o +ifeq ($(CONFIG_INPUT_LEDS),y) +input-core-y += leds.o +endif obj-$(CONFIG_INPUT_FF_MEMLESS) += ff-memless.o obj-$(CONFIG_INPUT_POLLDEV) += input-polldev.o diff --git a/drivers/input/input.c b/drivers/input/input.c index 1c4c0db05550..3b9284b18e70 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -708,6 +708,9 @@ static void input_disconnect_device(struct input_dev *dev) handle->open = 0; spin_unlock_irq(&dev->event_lock); + + if (is_event_supported(EV_LED, dev->evbit, EV_MAX)) + input_led_disconnect(dev); } /** @@ -2134,6 +2137,9 @@ int input_register_device(struct input_dev *dev) list_add_tail(&dev->node, &input_dev_list); + if (is_event_supported(EV_LED, dev->evbit, EV_MAX)) + input_led_connect(dev); + list_for_each_entry(handler, &input_handler_list, node) input_attach_handler(dev, handler); diff --git a/drivers/input/leds.c b/drivers/input/leds.c new file mode 100644 index 000000000000..985fa7ebeec7 --- /dev/null +++ b/drivers/input/leds.c @@ -0,0 +1,249 @@ +/* + * LED support for the input layer + * + * Copyright 2010-2014 Samuel Thibault <samuel.thibault@ens-lyon.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/init.h> +#include <linux/leds.h> +#include <linux/input.h> + +/* + * Keyboard LEDs are propagated by default like the following example: + * + * VT keyboard numlock trigger + * -> vt::numl VT LED + * -> vt-numl VT trigger + * -> per-device inputX::numl LED + * + * Userland can however choose the trigger for the vt::numl LED, or + * independently choose the trigger for any inputx::numl LED. + * + * + * VT LED classes and triggers are registered on-demand according to + * existing LED devices + */ + +/* Handler for VT LEDs, just triggers the corresponding VT trigger. */ +static void vt_led_set(struct led_classdev *cdev, + enum led_brightness brightness); +static struct led_classdev vt_leds[LED_CNT] = { +#define DEFINE_INPUT_LED(vt_led, nam, deftrig) \ + [vt_led] = { \ + .name = "vt::"nam, \ + .max_brightness = 1, \ + .brightness_set = vt_led_set, \ + .default_trigger = deftrig, \ + } +/* Default triggers for the VT LEDs just correspond to the legacy + * usage. */ + DEFINE_INPUT_LED(LED_NUML, "numl", "kbd-numlock"), + DEFINE_INPUT_LED(LED_CAPSL, "capsl", "kbd-capslock"), + DEFINE_INPUT_LED(LED_SCROLLL, "scrolll", "kbd-scrollock"), + DEFINE_INPUT_LED(LED_COMPOSE, "compose", NULL), + DEFINE_INPUT_LED(LED_KANA, "kana", "kbd-kanalock"), + DEFINE_INPUT_LED(LED_SLEEP, "sleep", NULL), + DEFINE_INPUT_LED(LED_SUSPEND, "suspend", NULL), + DEFINE_INPUT_LED(LED_MUTE, "mute", NULL), + DEFINE_INPUT_LED(LED_MISC, "misc", NULL), + DEFINE_INPUT_LED(LED_MAIL, "mail", NULL), + DEFINE_INPUT_LED(LED_CHARGING, "charging", NULL), +}; +static const char *const vt_led_names[LED_CNT] = { + [LED_NUML] = "numl", + [LED_CAPSL] = "capsl", + [LED_SCROLLL] = "scrolll", + [LED_COMPOSE] = "compose", + [LED_KANA] = "kana", + [LED_SLEEP] = "sleep", + [LED_SUSPEND] = "suspend", + [LED_MUTE] = "mute", + [LED_MISC] = "misc", + [LED_MAIL] = "mail", + [LED_CHARGING] = "charging", +}; +/* Handler for hotplug initialization */ +static void vt_led_trigger_activate(struct led_classdev *cdev); +/* VT triggers */ +static struct led_trigger vt_led_triggers[LED_CNT] = { +#define DEFINE_INPUT_LED_TRIGGER(vt_led, nam) \ + [vt_led] = { \ + .name = "vt-"nam, \ + .activate = vt_led_trigger_activate, \ + } + DEFINE_INPUT_LED_TRIGGER(LED_NUML, "numl"), + DEFINE_INPUT_LED_TRIGGER(LED_CAPSL, "capsl"), + DEFINE_INPUT_LED_TRIGGER(LED_SCROLLL, "scrolll"), + DEFINE_INPUT_LED_TRIGGER(LED_COMPOSE, "compose"), + DEFINE_INPUT_LED_TRIGGER(LED_KANA, "kana"), + DEFINE_INPUT_LED_TRIGGER(LED_SLEEP, "sleep"), + DEFINE_INPUT_LED_TRIGGER(LED_SUSPEND, "suspend"), + DEFINE_INPUT_LED_TRIGGER(LED_MUTE, "mute"), + DEFINE_INPUT_LED_TRIGGER(LED_MISC, "misc"), + DEFINE_INPUT_LED_TRIGGER(LED_MAIL, "mail"), + DEFINE_INPUT_LED_TRIGGER(LED_CHARGING, "charging"), +}; + +/* Lock for registration coherency */ +static DEFINE_MUTEX(vt_led_registered_lock); + +/* Which VT LED classes and triggers are registered */ +static unsigned long vt_led_registered[BITS_TO_LONGS(LED_CNT)]; + +/* Number of input devices having each LED */ +static int vt_led_references[LED_CNT]; + +/* VT LED state change, tell the VT trigger. */ +static void vt_led_set(struct led_classdev *cdev, + enum led_brightness brightness) +{ + int led = cdev - vt_leds; + + led_trigger_event(&vt_led_triggers[led], !!brightness); +} + +/* LED state change for some keyboard, notify that keyboard. */ +static void perdevice_input_led_set(struct led_classdev *cdev, + enum led_brightness brightness) +{ + struct input_dev *dev; + struct led_classdev *leds; + int led; + + dev = cdev->dev->platform_data; + if (!dev) + /* Still initializing */ + return; + leds = dev->leds; + led = cdev - leds; + + input_event(dev, EV_LED, led, !!brightness); + input_event(dev, EV_SYN, SYN_REPORT, 0); +} + +/* Keyboard hotplug, initialize its LED status */ +static void vt_led_trigger_activate(struct led_classdev *cdev) +{ + struct led_trigger *trigger = cdev->trigger; + int led = trigger - vt_led_triggers; + + if (cdev->brightness_set) + cdev->brightness_set(cdev, vt_leds[led].brightness); +} + +/* Free led stuff from input device, used at abortion and disconnection. */ +static void input_led_delete(struct input_dev *dev) +{ + if (dev) { + struct led_classdev *leds = dev->leds; + if (leds) { + int i; + for (i = 0; i < LED_CNT; i++) + kfree(leds[i].name); + kfree(leds); + dev->leds = NULL; + } + } +} + +/* A new input device with potential LEDs to connect. */ +int input_led_connect(struct input_dev *dev) +{ + int i, error = 0; + struct led_classdev *leds; + + dev->leds = leds = kcalloc(LED_CNT, sizeof(*leds), GFP_KERNEL); + if (!dev->leds) + return -ENOMEM; + + /* lazily register missing VT LEDs */ + mutex_lock(&vt_led_registered_lock); + for (i = 0; i < LED_CNT; i++) + if (vt_leds[i].name && test_bit(i, dev->ledbit)) { + if (!vt_led_references[i]) { + led_trigger_register(&vt_led_triggers[i]); + /* This keyboard is first to have led i, + * try to register it */ + if (!led_classdev_register(NULL, &vt_leds[i])) + vt_led_references[i] = 1; + else + led_trigger_unregister(&vt_led_triggers[i]); + } else + vt_led_references[i]++; + } + mutex_unlock(&vt_led_registered_lock); + + /* and register this device's LEDs */ + for (i = 0; i < LED_CNT; i++) + if (vt_leds[i].name && test_bit(i, dev->ledbit)) { + leds[i].name = kasprintf(GFP_KERNEL, "%s::%s", + dev_name(&dev->dev), + vt_led_names[i]); + if (!leds[i].name) { + error = -ENOMEM; + goto err; + } + leds[i].max_brightness = 1; + leds[i].brightness_set = perdevice_input_led_set; + leds[i].default_trigger = vt_led_triggers[i].name; + } + + /* No issue so far, we can register for real. */ + for (i = 0; i < LED_CNT; i++) + if (leds[i].name) { + led_classdev_register(&dev->dev, &leds[i]); + leds[i].dev->platform_data = dev; + perdevice_input_led_set(&leds[i], + vt_leds[i].brightness); + } + + return 0; + +err: + input_led_delete(dev); + return error; +} + +/* + * Disconnected input device. Clean it, and deregister now-useless VT LEDs + * and triggers. + */ +void input_led_disconnect(struct input_dev *dev) +{ + int i; + struct led_classdev *leds = dev->leds; + + for (i = 0; i < LED_CNT; i++) + if (leds[i].name) + led_classdev_unregister(&leds[i]); + + input_led_delete(dev); + + mutex_lock(&vt_led_registered_lock); + for (i = 0; i < LED_CNT; i++) { + if (!vt_leds[i].name || !test_bit(i, dev->ledbit)) + continue; + + vt_led_references[i]--; + if (vt_led_references[i]) { + /* Still some devices needing it */ + continue; + } + + led_classdev_unregister(&vt_leds[i]); + led_trigger_unregister(&vt_led_triggers[i]); + clear_bit(i, vt_led_registered); + } + mutex_unlock(&vt_led_registered_lock); +} + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("User LED support for input layer"); +MODULE_AUTHOR("Samuel Thibault <samuel.thibault@ens-lyon.org>"); diff --git a/drivers/input/misc/da9055_onkey.c b/drivers/input/misc/da9055_onkey.c index 4b11ede34950..4765799fef74 100644 --- a/drivers/input/misc/da9055_onkey.c +++ b/drivers/input/misc/da9055_onkey.c @@ -109,7 +109,6 @@ static int da9055_onkey_probe(struct platform_device *pdev) INIT_DELAYED_WORK(&onkey->work, da9055_onkey_work); - irq = regmap_irq_get_virq(da9055->irq_data, irq); err = request_threaded_irq(irq, NULL, da9055_onkey_irq, IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "ONKEY", onkey); diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig index 03bc098136bc..7e55a85c303a 100644 --- a/drivers/leds/Kconfig +++ b/drivers/leds/Kconfig @@ -11,9 +11,6 @@ menuconfig NEW_LEDS Say Y to enable Linux LED support. This allows control of supported LEDs from both userspace and optionally, by kernel events (triggers). - This is not related to standard keyboard LEDs which are controlled - via the input system. - if NEW_LEDS config LEDS_CLASS diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c index 1972d57aadb3..e7fbc08a0627 100644 --- a/drivers/misc/ti-st/st_core.c +++ b/drivers/misc/ti-st/st_core.c @@ -342,7 +342,7 @@ void st_int_recv(void *disc_data, /* Unknow packet? */ default: type = *ptr; - if (st_gdata->list[type] == NULL) { + if (type >= ST_MAX_CHANNELS || st_gdata->list[type] == NULL) { pr_err("chip/interface misbehavior dropping" " frame starting with 0x%02x", type); goto done; diff --git a/drivers/net/irda/donauboe.c b/drivers/net/irda/donauboe.c index 768dfe9a9315..6d3e2093bf7f 100644 --- a/drivers/net/irda/donauboe.c +++ b/drivers/net/irda/donauboe.c @@ -1755,17 +1755,4 @@ static struct pci_driver donauboe_pci_driver = { .resume = toshoboe_wakeup }; -static int __init -donauboe_init (void) -{ - return pci_register_driver(&donauboe_pci_driver); -} - -static void __exit -donauboe_cleanup (void) -{ - pci_unregister_driver(&donauboe_pci_driver); -} - -module_init(donauboe_init); -module_exit(donauboe_cleanup); +module_pci_driver(donauboe_pci_driver); diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c index 2f07cd615665..983f50c4b7b4 100644 --- a/drivers/pps/pps.c +++ b/drivers/pps/pps.c @@ -152,35 +152,38 @@ static long pps_cdev_ioctl(struct file *file, if (err) return -EFAULT; - ev = pps->last_ev; - - /* Manage the timeout */ - if (fdata.timeout.flags & PPS_TIME_INVALID) - err = wait_event_interruptible(pps->queue, - ev != pps->last_ev); - else { - unsigned long ticks; - - dev_dbg(pps->dev, "timeout %lld.%09d\n", - (long long) fdata.timeout.sec, - fdata.timeout.nsec); - ticks = fdata.timeout.sec * HZ; - ticks += fdata.timeout.nsec / (NSEC_PER_SEC / HZ); - - if (ticks != 0) { - err = wait_event_interruptible_timeout( - pps->queue, - ev != pps->last_ev, - ticks); - if (err == 0) - return -ETIMEDOUT; + if (!(file->f_flags & O_NONBLOCK)) { + ev = pps->last_ev; + + /* Manage the timeout */ + if (fdata.timeout.flags & PPS_TIME_INVALID) + err = wait_event_interruptible(pps->queue, + ev != pps->last_ev); + else { + unsigned long ticks; + + dev_dbg(pps->dev, "timeout %lld.%09d\n", + (long long) fdata.timeout.sec, + fdata.timeout.nsec); + ticks = fdata.timeout.sec * HZ; + ticks += fdata.timeout.nsec / + (NSEC_PER_SEC / HZ); + + if (ticks != 0) { + err = wait_event_interruptible_timeout( + pps->queue, + ev != pps->last_ev, + ticks); + if (err == 0) + return -ETIMEDOUT; + } } - } - /* Check for pending signals */ - if (err == -ERESTARTSYS) { - dev_dbg(pps->dev, "pending signal caught\n"); - return -EINTR; + /* Check for pending signals */ + if (err == -ERESTARTSYS) { + dev_dbg(pps->dev, "pending signal caught\n"); + return -EINTR; + } } /* Return the fetched timestamp */ diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 2e565f8e5165..ac4fa56280cd 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -1327,6 +1327,15 @@ config RTC_DRV_MOXART This driver can also be built as a module. If so, the module will be called rtc-moxart +config RTC_DRV_XGENE + tristate "APM X-Gene RTC" + help + If you say yes here you get support for the APM X-Gene SoC real time + clock. + + This driver can also be built as a module, if so, the module + will be called "rtc-xgene". + comment "HID Sensor RTC drivers" config RTC_DRV_HID_SENSOR_TIME diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile index 40a09915c8f6..4545b4a88f30 100644 --- a/drivers/rtc/Makefile +++ b/drivers/rtc/Makefile @@ -135,5 +135,6 @@ obj-$(CONFIG_RTC_DRV_VT8500) += rtc-vt8500.o obj-$(CONFIG_RTC_DRV_WM831X) += rtc-wm831x.o obj-$(CONFIG_RTC_DRV_WM8350) += rtc-wm8350.o obj-$(CONFIG_RTC_DRV_X1205) += rtc-x1205.o +obj-$(CONFIG_RTC_DRV_XGENE) += rtc-xgene.o obj-$(CONFIG_RTC_DRV_SIRFSOC) += rtc-sirfsoc.o obj-$(CONFIG_RTC_DRV_MOXART) += rtc-moxart.o diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index c2eff6082363..13f93fa7fa80 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -292,7 +292,9 @@ int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) dev_dbg(&rtc->dev, "alarm rollover: %s\n", "year"); do { alarm->time.tm_year++; - } while (rtc_valid_tm(&alarm->time) != 0); + } while (alarm->time.tm_mon == 1 + && is_leap_year(alarm->time.tm_year + 1900) + && rtc_valid_tm(&alarm->time) != 0); break; default: @@ -300,7 +302,16 @@ int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) } done: - return 0; + err = rtc_valid_tm(&alarm->time); + + if (err) { + dev_warn(&rtc->dev, "invalid alarm value: %d-%d-%d %d:%d:%d\n", + alarm->time.tm_year + 1900, alarm->time.tm_mon + 1, + alarm->time.tm_mday, alarm->time.tm_hour, alarm->time.tm_min, + alarm->time.tm_sec); + } + + return err; } int rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) diff --git a/drivers/rtc/rtc-xgene.c b/drivers/rtc/rtc-xgene.c new file mode 100644 index 000000000000..14129cc85bdb --- /dev/null +++ b/drivers/rtc/rtc-xgene.c @@ -0,0 +1,278 @@ +/* + * APM X-Gene SoC Real Time Clock Driver + * + * Copyright (c) 2014, Applied Micro Circuits Corporation + * Author: Rameshwar Prasad Sahu <rsahu@apm.com> + * Loc Ho <lho@apm.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + * + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/platform_device.h> +#include <linux/io.h> +#include <linux/slab.h> +#include <linux/clk.h> +#include <linux/delay.h> +#include <linux/rtc.h> + +/* RTC CSR Registers */ +#define RTC_CCVR 0x00 +#define RTC_CMR 0x04 +#define RTC_CLR 0x08 +#define RTC_CCR 0x0C +#define RTC_CCR_IE BIT(0) +#define RTC_CCR_MASK BIT(1) +#define RTC_CCR_EN BIT(2) +#define RTC_CCR_WEN BIT(3) +#define RTC_STAT 0x10 +#define RTC_STAT_BIT BIT(0) +#define RTC_RSTAT 0x14 +#define RTC_EOI 0x18 +#define RTC_VER 0x1C + +struct xgene_rtc_dev { + struct rtc_device *rtc; + struct device *dev; + unsigned long alarm_time; + void __iomem *csr_base; + struct clk *clk; + unsigned int irq_wake; +}; + +static int xgene_rtc_read_time(struct device *dev, struct rtc_time *tm) +{ + struct xgene_rtc_dev *pdata = dev_get_drvdata(dev); + + rtc_time_to_tm(readl(pdata->csr_base + RTC_CCVR), tm); + return rtc_valid_tm(tm); +} + +static int xgene_rtc_set_mmss(struct device *dev, unsigned long secs) +{ + struct xgene_rtc_dev *pdata = dev_get_drvdata(dev); + + /* + * NOTE: After the following write, the RTC_CCVR is only reflected + * after the update cycle of 1 seconds. + */ + writel((u32) secs, pdata->csr_base + RTC_CLR); + readl(pdata->csr_base + RTC_CLR); /* Force a barrier */ + + return 0; +} + +static int xgene_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alrm) +{ + struct xgene_rtc_dev *pdata = dev_get_drvdata(dev); + + rtc_time_to_tm(pdata->alarm_time, &alrm->time); + alrm->enabled = readl(pdata->csr_base + RTC_CCR) & RTC_CCR_IE; + + return 0; +} + +static int xgene_rtc_alarm_irq_enable(struct device *dev, u32 enabled) +{ + struct xgene_rtc_dev *pdata = dev_get_drvdata(dev); + u32 ccr; + + ccr = readl(pdata->csr_base + RTC_CCR); + if (enabled) { + ccr &= ~RTC_CCR_MASK; + ccr |= RTC_CCR_IE; + } else { + ccr &= ~RTC_CCR_IE; + ccr |= RTC_CCR_MASK; + } + writel(ccr, pdata->csr_base + RTC_CCR); + + return 0; +} + +static int xgene_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm) +{ + struct xgene_rtc_dev *pdata = dev_get_drvdata(dev); + unsigned long rtc_time; + unsigned long alarm_time; + + rtc_time = readl(pdata->csr_base + RTC_CCVR); + rtc_tm_to_time(&alrm->time, &alarm_time); + + pdata->alarm_time = alarm_time; + writel((u32) pdata->alarm_time, pdata->csr_base + RTC_CMR); + + xgene_rtc_alarm_irq_enable(dev, alrm->enabled); + + return 0; +} + +static const struct rtc_class_ops xgene_rtc_ops = { + .read_time = xgene_rtc_read_time, + .set_mmss = xgene_rtc_set_mmss, + .read_alarm = xgene_rtc_read_alarm, + .set_alarm = xgene_rtc_set_alarm, + .alarm_irq_enable = xgene_rtc_alarm_irq_enable, +}; + +static irqreturn_t xgene_rtc_interrupt(int irq, void *id) +{ + struct xgene_rtc_dev *pdata = (struct xgene_rtc_dev *) id; + + /* Check if interrupt asserted */ + if (!(readl(pdata->csr_base + RTC_STAT) & RTC_STAT_BIT)) + return IRQ_NONE; + + /* Clear interrupt */ + readl(pdata->csr_base + RTC_EOI); + + rtc_update_irq(pdata->rtc, 1, RTC_IRQF | RTC_AF); + + return IRQ_HANDLED; +} + +static int xgene_rtc_probe(struct platform_device *pdev) +{ + struct xgene_rtc_dev *pdata; + struct resource *res; + int ret; + int irq; + + pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL); + if (!pdata) + return -ENOMEM; + platform_set_drvdata(pdev, pdata); + pdata->dev = &pdev->dev; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + pdata->csr_base = devm_ioremap_resource(&pdev->dev, res); + if (IS_ERR(pdata->csr_base)) + return PTR_ERR(pdata->csr_base); + + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + dev_err(&pdev->dev, "No IRQ resource\n"); + return irq; + } + ret = devm_request_irq(&pdev->dev, irq, xgene_rtc_interrupt, 0, + dev_name(&pdev->dev), pdata); + if (ret) { + dev_err(&pdev->dev, "Could not request IRQ\n"); + return ret; + } + + pdata->clk = devm_clk_get(&pdev->dev, NULL); + if (IS_ERR(pdata->clk)) { + dev_err(&pdev->dev, "Couldn't get the clock for RTC\n"); + return -ENODEV; + } + clk_prepare_enable(pdata->clk); + + /* Turn on the clock and the crystal */ + writel(RTC_CCR_EN, pdata->csr_base + RTC_CCR); + + device_init_wakeup(&pdev->dev, 1); + + pdata->rtc = devm_rtc_device_register(&pdev->dev, pdev->name, + &xgene_rtc_ops, THIS_MODULE); + if (IS_ERR(pdata->rtc)) { + clk_disable_unprepare(pdata->clk); + return PTR_ERR(pdata->rtc); + } + + /* HW does not support update faster than 1 seconds */ + pdata->rtc->uie_unsupported = 1; + + return 0; +} + +static int xgene_rtc_remove(struct platform_device *pdev) +{ + struct xgene_rtc_dev *pdata = platform_get_drvdata(pdev); + + xgene_rtc_alarm_irq_enable(&pdev->dev, 0); + device_init_wakeup(&pdev->dev, 0); + clk_disable_unprepare(pdata->clk); + return 0; +} + +#ifdef CONFIG_PM_SLEEP +static int xgene_rtc_suspend(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct xgene_rtc_dev *pdata = platform_get_drvdata(pdev); + int irq; + + irq = platform_get_irq(pdev, 0); + if (device_may_wakeup(&pdev->dev)) { + if (!enable_irq_wake(irq)) + pdata->irq_wake = 1; + } else { + xgene_rtc_alarm_irq_enable(dev, 0); + clk_disable(pdata->clk); + } + + return 0; +} + +static int xgene_rtc_resume(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct xgene_rtc_dev *pdata = platform_get_drvdata(pdev); + int irq; + + irq = platform_get_irq(pdev, 0); + if (device_may_wakeup(&pdev->dev)) { + if (pdata->irq_wake) { + disable_irq_wake(irq); + pdata->irq_wake = 0; + } + } else { + clk_enable(pdata->clk); + xgene_rtc_alarm_irq_enable(dev, 1); + } + + return 0; +} +#endif + +static SIMPLE_DEV_PM_OPS(xgene_rtc_pm_ops, xgene_rtc_suspend, xgene_rtc_resume); + +#ifdef CONFIG_OF +static const struct of_device_id xgene_rtc_of_match[] = { + {.compatible = "apm,xgene-rtc" }, + { } +}; +MODULE_DEVICE_TABLE(of, xgene_rtc_of_match); +#endif + +static struct platform_driver xgene_rtc_driver = { + .probe = xgene_rtc_probe, + .remove = xgene_rtc_remove, + .driver = { + .owner = THIS_MODULE, + .name = "xgene-rtc", + .pm = &xgene_rtc_pm_ops, + .of_match_table = of_match_ptr(xgene_rtc_of_match), + }, +}; + +module_platform_driver(xgene_rtc_driver); + +MODULE_DESCRIPTION("APM X-Gene SoC RTC driver"); +MODULE_AUTHOR("Rameshwar Sahu <rsahu@apm.com>"); +MODULE_LICENSE("GPL"); diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig index b24aa010f68c..65cd80bf9aec 100644 --- a/drivers/tty/Kconfig +++ b/drivers/tty/Kconfig @@ -13,6 +13,10 @@ config VT bool "Virtual terminal" if EXPERT depends on !S390 && !UML select INPUT + select NEW_LEDS + select LEDS_CLASS + select LEDS_TRIGGERS + select INPUT_LEDS default y ---help--- If you say Y here, you will get support for terminal devices with diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c index d0e3a4497707..d6ecfc9e734f 100644 --- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -33,6 +33,7 @@ #include <linux/string.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/leds.h> #include <linux/kbd_kern.h> #include <linux/kbd_diacr.h> @@ -130,6 +131,7 @@ static char rep; /* flag telling character repeat */ static int shift_state = 0; static unsigned char ledstate = 0xff; /* undefined */ +static unsigned char lockstate = 0xff; /* undefined */ static unsigned char ledioctl; /* @@ -961,6 +963,41 @@ static void k_brl(struct vc_data *vc, unsigned char value, char up_flag) } } +/* We route VT keyboard "leds" through triggers */ +static void kbd_ledstate_trigger_activate(struct led_classdev *cdev); + +static struct led_trigger ledtrig_ledstate[] = { +#define DEFINE_LEDSTATE_TRIGGER(kbd_led, nam) \ + [kbd_led] = { \ + .name = nam, \ + .activate = kbd_ledstate_trigger_activate, \ + } + DEFINE_LEDSTATE_TRIGGER(VC_SCROLLOCK, "kbd-scrollock"), + DEFINE_LEDSTATE_TRIGGER(VC_NUMLOCK, "kbd-numlock"), + DEFINE_LEDSTATE_TRIGGER(VC_CAPSLOCK, "kbd-capslock"), + DEFINE_LEDSTATE_TRIGGER(VC_KANALOCK, "kbd-kanalock"), +#undef DEFINE_LEDSTATE_TRIGGER +}; + +static void kbd_lockstate_trigger_activate(struct led_classdev *cdev); + +static struct led_trigger ledtrig_lockstate[] = { +#define DEFINE_LOCKSTATE_TRIGGER(kbd_led, nam) \ + [kbd_led] = { \ + .name = nam, \ + .activate = kbd_lockstate_trigger_activate, \ + } + DEFINE_LOCKSTATE_TRIGGER(VC_SHIFTLOCK, "kbd-shiftlock"), + DEFINE_LOCKSTATE_TRIGGER(VC_ALTGRLOCK, "kbd-altgrlock"), + DEFINE_LOCKSTATE_TRIGGER(VC_CTRLLOCK, "kbd-ctrllock"), + DEFINE_LOCKSTATE_TRIGGER(VC_ALTLOCK, "kbd-altlock"), + DEFINE_LOCKSTATE_TRIGGER(VC_SHIFTLLOCK, "kbd-shiftllock"), + DEFINE_LOCKSTATE_TRIGGER(VC_SHIFTRLOCK, "kbd-shiftrlock"), + DEFINE_LOCKSTATE_TRIGGER(VC_CTRLLLOCK, "kbd-ctrlllock"), + DEFINE_LOCKSTATE_TRIGGER(VC_CTRLRLOCK, "kbd-ctrlrlock"), +#undef DEFINE_LOCKSTATE_TRIGGER +}; + /* * The leds display either (i) the status of NumLock, CapsLock, ScrollLock, * or (ii) whatever pattern of lights people want to show using KDSETLED, @@ -995,18 +1032,25 @@ static inline unsigned char getleds(void) return kbd->ledflagstate; } -static int kbd_update_leds_helper(struct input_handle *handle, void *data) +/* Called on trigger connection, to set initial state */ +static void kbd_ledstate_trigger_activate(struct led_classdev *cdev) { - unsigned char leds = *(unsigned char *)data; + struct led_trigger *trigger = cdev->trigger; + int led = trigger - ledtrig_ledstate; - if (test_bit(EV_LED, handle->dev->evbit)) { - input_inject_event(handle, EV_LED, LED_SCROLLL, !!(leds & 0x01)); - input_inject_event(handle, EV_LED, LED_NUML, !!(leds & 0x02)); - input_inject_event(handle, EV_LED, LED_CAPSL, !!(leds & 0x04)); - input_inject_event(handle, EV_SYN, SYN_REPORT, 0); - } + tasklet_disable(&keyboard_tasklet); + led_trigger_event(trigger, ledstate & (1 << led) ? LED_FULL : LED_OFF); + tasklet_enable(&keyboard_tasklet); +} - return 0; +static void kbd_lockstate_trigger_activate(struct led_classdev *cdev) +{ + struct led_trigger *trigger = cdev->trigger; + int led = trigger - ledtrig_lockstate; + + tasklet_disable(&keyboard_tasklet); + led_trigger_event(trigger, lockstate & (1 << led) ? LED_FULL : LED_OFF); + tasklet_enable(&keyboard_tasklet); } /** @@ -1095,16 +1139,29 @@ static void kbd_bh(unsigned long dummy) { unsigned char leds; unsigned long flags; - + int i; + spin_lock_irqsave(&led_lock, flags); leds = getleds(); spin_unlock_irqrestore(&led_lock, flags); if (leds != ledstate) { - input_handler_for_each_handle(&kbd_handler, &leds, - kbd_update_leds_helper); + for (i = 0; i < ARRAY_SIZE(ledtrig_ledstate); i++) + if ((leds ^ ledstate) & (1 << i)) + led_trigger_event(&ledtrig_ledstate[i], + leds & (1 << i) + ? LED_FULL : LED_OFF); ledstate = leds; } + + if (kbd->lockstate != lockstate) { + for (i = 0; i < ARRAY_SIZE(ledtrig_lockstate); i++) + if ((kbd->lockstate ^ lockstate) & (1 << i)) + led_trigger_event(&ledtrig_lockstate[i], + kbd->lockstate & (1 << i) + ? LED_FULL : LED_OFF); + lockstate = kbd->lockstate; + } } DECLARE_TASKLET_DISABLED(keyboard_tasklet, kbd_bh, 0); @@ -1442,20 +1499,6 @@ static void kbd_disconnect(struct input_handle *handle) kfree(handle); } -/* - * Start keyboard handler on the new keyboard by refreshing LED state to - * match the rest of the system. - */ -static void kbd_start(struct input_handle *handle) -{ - tasklet_disable(&keyboard_tasklet); - - if (ledstate != 0xff) - kbd_update_leds_helper(handle, &ledstate); - - tasklet_enable(&keyboard_tasklet); -} - static const struct input_device_id kbd_ids[] = { { .flags = INPUT_DEVICE_ID_MATCH_EVBIT, @@ -1477,7 +1520,6 @@ static struct input_handler kbd_handler = { .match = kbd_match, .connect = kbd_connect, .disconnect = kbd_disconnect, - .start = kbd_start, .name = "kbd", .id_table = kbd_ids, }; @@ -1501,6 +1543,20 @@ int __init kbd_init(void) if (error) return error; + for (i = 0; i < ARRAY_SIZE(ledtrig_ledstate); i++) { + error = led_trigger_register(&ledtrig_ledstate[i]); + if (error) + pr_err("error %d while registering trigger %s\n", + error, ledtrig_ledstate[i].name); + } + + for (i = 0; i < ARRAY_SIZE(ledtrig_lockstate); i++) { + error = led_trigger_register(&ledtrig_lockstate[i]); + if (error) + pr_err("error %d while registering trigger %s\n", + error, ledtrig_lockstate[i].name); + } + tasklet_enable(&keyboard_tasklet); tasklet_schedule(&keyboard_tasklet); diff --git a/drivers/video/backlight/backlight.c b/drivers/video/backlight/backlight.c index bd2172c2d650..31672740fe28 100644 --- a/drivers/video/backlight/backlight.c +++ b/drivers/video/backlight/backlight.c @@ -189,8 +189,6 @@ static ssize_t brightness_store(struct device *dev, } mutex_unlock(&bd->ops_lock); - backlight_generate_event(bd, BACKLIGHT_UPDATE_SYSFS); - return rc; } static DEVICE_ATTR_RW(brightness); diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c index d626756ff721..1e27cd33f7f2 100644 --- a/fs/befs/linuxvfs.c +++ b/fs/befs/linuxvfs.c @@ -396,9 +396,8 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino) if (S_ISLNK(inode->i_mode) && !(befs_ino->i_flags & BEFS_LONG_SYMLINK)){ inode->i_size = 0; inode->i_blocks = befs_sb->block_size / VFS_BLOCK_SIZE; - strncpy(befs_ino->i_data.symlink, raw_inode->data.symlink, - BEFS_SYMLINK_LEN - 1); - befs_ino->i_data.symlink[BEFS_SYMLINK_LEN - 1] = '\0'; + strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink, + BEFS_SYMLINK_LEN); } else { int num_blks; diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index aa3cb626671e..c2e5d4647345 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -145,6 +145,25 @@ static int padzero(unsigned long elf_bss) #define ELF_BASE_PLATFORM NULL #endif +/* + * Use get_random_int() to implement AT_RANDOM while avoiding depletion + * of the entropy pool. + */ +static void get_atrandom_bytes(unsigned char *buf, size_t nbytes) +{ + unsigned char *p = buf; + + while (nbytes) { + unsigned int random_variable; + size_t chunk = min(nbytes, sizeof(random_variable)); + + random_variable = get_random_int(); + memcpy(p, &random_variable, chunk); + p += chunk; + nbytes -= chunk; + } +} + static int create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, unsigned long load_addr, unsigned long interp_load_addr) @@ -206,7 +225,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, /* * Generate 16 random bytes for userspace PRNG seeding. */ - get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes)); + get_atrandom_bytes(k_rand_bytes, sizeof(k_rand_bytes)); u_rand_bytes = (elf_addr_t __user *) STACK_ALLOC(p, sizeof(k_rand_bytes)); if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes))) diff --git a/fs/fat/cache.c b/fs/fat/cache.c index 91ad9e1c9441..e26bc9a22ac9 100644 --- a/fs/fat/cache.c +++ b/fs/fat/cache.c @@ -303,6 +303,31 @@ static int fat_bmap_cluster(struct inode *inode, int cluster) return dclus; } +static int fat_get_mapped_cluster(struct inode *inode, sector_t sector, + sector_t last_block, + unsigned long *mapped_blocks, sector_t *bmap) +{ + struct super_block *sb = inode->i_sb; + struct msdos_sb_info *sbi = MSDOS_SB(sb); + int cluster, offset; + + cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits); + offset = sector & (sbi->sec_per_clus - 1); + cluster = fat_bmap_cluster(inode, cluster); + + if (cluster < 0) + return cluster; + + else if (cluster) { + *bmap = fat_clus_to_blknr(sbi, cluster) + offset; + *mapped_blocks = sbi->sec_per_clus - offset; + if (*mapped_blocks > last_block - sector) + *mapped_blocks = last_block - sector; + } + + return 0; +} + int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, unsigned long *mapped_blocks, int create) { @@ -311,7 +336,6 @@ int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, const unsigned long blocksize = sb->s_blocksize; const unsigned char blocksize_bits = sb->s_blocksize_bits; sector_t last_block; - int cluster, offset; *phys = 0; *mapped_blocks = 0; @@ -329,25 +353,39 @@ int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, return 0; /* - * ->mmu_private can access on only allocation path. - * (caller must hold ->i_mutex) + * Both ->mmu_private and ->i_disksize can access + * on only allocation path. (caller must hold ->i_mutex) */ - last_block = (MSDOS_I(inode)->mmu_private + (blocksize - 1)) + last_block = (MSDOS_I(inode)->i_disksize + (blocksize - 1)) >> blocksize_bits; if (sector >= last_block) return 0; } - cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits); - offset = sector & (sbi->sec_per_clus - 1); - cluster = fat_bmap_cluster(inode, cluster); - if (cluster < 0) - return cluster; - else if (cluster) { - *phys = fat_clus_to_blknr(sbi, cluster) + offset; - *mapped_blocks = sbi->sec_per_clus - offset; - if (*mapped_blocks > last_block - sector) - *mapped_blocks = last_block - sector; - } - return 0; + return fat_get_mapped_cluster(inode, sector, last_block, mapped_blocks, + phys); +} + +int fat_bmap2(struct inode *inode, sector_t sector, + unsigned long *mapped_blocks, struct buffer_head *bh_result, + int create, sector_t *bmap) +{ + struct super_block *sb = inode->i_sb; + sector_t last_block; + const unsigned long blocksize = sb->s_blocksize; + const unsigned char blocksize_bits = sb->s_blocksize_bits; + + BUG_ON(create != 0); + + *bmap = 0; + *mapped_blocks = 0; + + last_block = (MSDOS_I(inode)->i_disksize + (blocksize - 1)) + >> blocksize_bits; + + if (sector >= last_block) + return 0; + + return fat_get_mapped_cluster(inode, sector, last_block, mapped_blocks, + bmap); } diff --git a/fs/fat/fat.h b/fs/fat/fat.h index 7c31f4bc74a9..7270bdbca9c3 100644 --- a/fs/fat/fat.h +++ b/fs/fat/fat.h @@ -118,7 +118,8 @@ struct msdos_inode_info { unsigned int cache_valid_id; /* NOTE: mmu_private is 64bits, so must hold ->i_mutex to access */ - loff_t mmu_private; /* physically allocated size */ + loff_t mmu_private; /* physically allocated size (initialized) */ + loff_t i_disksize; /* physically allocated size (uninitialized) */ int i_start; /* first cluster or 0 */ int i_logstart; /* logical first cluster */ @@ -289,6 +290,9 @@ extern int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus); extern int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, unsigned long *mapped_blocks, int create); +extern int fat_bmap2(struct inode *inode, sector_t sector, + unsigned long *mapped_blocks, + struct buffer_head *bh_result, int create, sector_t *bmap); /* fat/dir.c */ extern const struct file_operations fat_dir_operations; diff --git a/fs/fat/file.c b/fs/fat/file.c index 9b104f543056..e33c8a2cb99c 100644 --- a/fs/fat/file.c +++ b/fs/fat/file.c @@ -17,8 +17,12 @@ #include <linux/blkdev.h> #include <linux/fsnotify.h> #include <linux/security.h> +#include <linux/falloc.h> #include "fat.h" +static long fat_fallocate(struct file *file, int mode, + loff_t offset, loff_t len); + static int fat_ioctl_get_attributes(struct inode *inode, u32 __user *user_attr) { u32 attr; @@ -182,6 +186,7 @@ const struct file_operations fat_file_operations = { #endif .fsync = fat_file_fsync, .splice_read = generic_file_splice_read, + .fallocate = fat_fallocate, }; static int fat_cont_expand(struct inode *inode, loff_t size) @@ -220,6 +225,75 @@ out: return err; } +/* + * Preallocate space for a file. This implements fat's fallocate file + * operation, which gets called from sys_fallocate system call. User + * space requests len bytes at offset. If FALLOC_FL_KEEP_SIZE is set + * we just allocate clusters without zeroing them out. Otherwise we + * allocate and zero out clusters via an expanding truncate. + */ +static long fat_fallocate(struct file *file, int mode, + loff_t offset, loff_t len) +{ + int cluster; + int nr_cluster; /* Number of clusters to be allocated */ + loff_t mm_bytes; /* Number of bytes to be allocated for file */ + struct inode *inode = file->f_mapping->host; + struct super_block *sb = inode->i_sb; + struct msdos_sb_info *sbi = MSDOS_SB(sb); + int err = 0; + + /* No support for hole punch or other fallocate flags. */ + if (mode & ~FALLOC_FL_KEEP_SIZE) + return -EOPNOTSUPP; + + /* No support for dir */ + if (!S_ISREG(inode->i_mode)) + return -EOPNOTSUPP; + + mutex_lock(&inode->i_mutex); + if ((offset + len) <= MSDOS_I(inode)->i_disksize) + goto error; + + err = inode_newsize_ok(inode, (len + offset)); + if (err) + goto error; + + if (mode & FALLOC_FL_KEEP_SIZE) { + /* First compute the number of clusters to be allocated */ + mm_bytes = offset + len - round_up(MSDOS_I(inode)->i_disksize, + sbi->cluster_size); + nr_cluster = (mm_bytes + (sbi->cluster_size - 1)) >> + sbi->cluster_bits; + + /* Start the allocation.We are not zeroing out the clusters */ + while (nr_cluster-- > 0) { + err = fat_alloc_clusters(inode, &cluster, 1); + if (err) { + fat_msg(sb, KERN_ERR, + "fat_fallocate(): fat_alloc_clusters() error"); + goto error; + } + err = fat_chain_add(inode, cluster, 1); + if (err) { + fat_free_clusters(inode, cluster); + goto error; + } + MSDOS_I(inode)->i_disksize += sbi->cluster_size; + } + } else { + /* This is just an expanding truncate */ + err = fat_cont_expand(inode, (offset + len)); + if (err) + fat_msg(sb, KERN_ERR, + "fat_fallocate(): fat_cont_expand() error"); + } + +error: + mutex_unlock(&inode->i_mutex); + return err; +} + /* Free all clusters after the skip'th cluster. */ static int fat_free(struct inode *inode, int skip) { @@ -300,8 +374,10 @@ void fat_truncate_blocks(struct inode *inode, loff_t offset) * This protects against truncating a file bigger than it was then * trying to write into the hole. */ - if (MSDOS_I(inode)->mmu_private > offset) + if (MSDOS_I(inode)->i_disksize > offset) { MSDOS_I(inode)->mmu_private = offset; + MSDOS_I(inode)->i_disksize = offset; + } nr_clusters = (offset + (cluster_size - 1)) >> sbi->cluster_bits; diff --git a/fs/fat/inode.c b/fs/fat/inode.c index b3361fe2bcb5..992e8cb1132c 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -54,6 +54,25 @@ static int fat_add_cluster(struct inode *inode) return err; } +static void check_fallocated_region(struct inode *inode, sector_t iblock, + unsigned long *max_blocks, struct buffer_head *bh_result) +{ + struct super_block *sb = inode->i_sb; + sector_t last_block, disk_block; + const unsigned long blocksize = sb->s_blocksize; + const unsigned char blocksize_bits = sb->s_blocksize_bits; + + last_block = (MSDOS_I(inode)->mmu_private + (blocksize - 1)) + >> blocksize_bits; + disk_block = (MSDOS_I(inode)->i_disksize + (blocksize - 1)) + >> blocksize_bits; + if (iblock >= last_block && iblock <= disk_block) { + MSDOS_I(inode)->mmu_private += *max_blocks << blocksize_bits; + set_buffer_new(bh_result); + } + +} + static inline int __fat_get_block(struct inode *inode, sector_t iblock, unsigned long *max_blocks, struct buffer_head *bh_result, int create) @@ -68,8 +87,11 @@ static inline int __fat_get_block(struct inode *inode, sector_t iblock, if (err) return err; if (phys) { - map_bh(bh_result, sb, phys); *max_blocks = min(mapped_blocks, *max_blocks); + if (create) + check_fallocated_region(inode, iblock, max_blocks, + bh_result); + map_bh(bh_result, sb, phys); return 0; } if (!create) @@ -93,6 +115,7 @@ static inline int __fat_get_block(struct inode *inode, sector_t iblock, *max_blocks = min(mapped_blocks, *max_blocks); MSDOS_I(inode)->mmu_private += *max_blocks << sb->s_blocksize_bits; + MSDOS_I(inode)->i_disksize = MSDOS_I(inode)->mmu_private; err = fat_bmap(inode, iblock, &phys, &mapped_blocks, create); if (err) @@ -206,6 +229,13 @@ static ssize_t fat_direct_IO(int rw, struct kiocb *iocb, loff_t size = offset + iov_length(iov, nr_segs); if (MSDOS_I(inode)->mmu_private < size) return 0; + + /* + * In case of writing in fallocated region, return 0 and + * fallback to buffered write. + */ + if (MSDOS_I(inode)->i_disksize > MSDOS_I(inode)->mmu_private) + return 0; } /* @@ -220,13 +250,36 @@ static ssize_t fat_direct_IO(int rw, struct kiocb *iocb, return ret; } +static int fat_get_block_bmap(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) +{ + struct super_block *sb = inode->i_sb; + unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits; + int err; + sector_t bmap; + unsigned long mapped_blocks; + + err = fat_bmap2(inode, iblock, &mapped_blocks, bh_result, create, + &bmap); + if (err) + return err; + + if (bmap) { + map_bh(bh_result, sb, bmap); + max_blocks = min(mapped_blocks, max_blocks); + } + + bh_result->b_size = max_blocks << sb->s_blocksize_bits; + return 0; +} + static sector_t _fat_bmap(struct address_space *mapping, sector_t block) { sector_t blocknr; /* fat_get_cluster() assumes the requested blocknr isn't truncated. */ down_read(&MSDOS_I(mapping->host)->truncate_lock); - blocknr = generic_block_bmap(mapping, block, fat_get_block); + blocknr = generic_block_bmap(mapping, block, fat_get_block_bmap); up_read(&MSDOS_I(mapping->host)->truncate_lock); return blocknr; @@ -407,7 +460,6 @@ int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de) error = fat_calc_dir_size(inode); if (error < 0) return error; - MSDOS_I(inode)->mmu_private = inode->i_size; set_nlink(inode, fat_subdirs(inode)); } else { /* not a directory */ @@ -422,8 +474,12 @@ int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de) inode->i_op = &fat_file_inode_operations; inode->i_fop = &fat_file_operations; inode->i_mapping->a_ops = &fat_aops; - MSDOS_I(inode)->mmu_private = inode->i_size; } + + MSDOS_I(inode)->mmu_private = inode->i_size; + MSDOS_I(inode)->i_disksize = round_up(inode->i_size, + inode->i_sb->s_blocksize); + if (de->attr & ATTR_SYS) { if (sbi->options.sys_immutable) inode->i_flags |= S_IMMUTABLE; @@ -488,12 +544,34 @@ out: EXPORT_SYMBOL_GPL(fat_build_inode); +static int __fat_write_inode(struct inode *inode, int wait); static void fat_evict_inode(struct inode *inode) { truncate_inode_pages_final(&inode->i_data); if (!inode->i_nlink) { inode->i_size = 0; fat_truncate_blocks(inode, 0); + } else { + /* Release unwritten fallocated blocks on inode eviction. */ + if (MSDOS_I(inode)->i_disksize > + round_up(MSDOS_I(inode)->mmu_private, + inode->i_sb->s_blocksize)) { + int err; + fat_truncate_blocks(inode, MSDOS_I(inode)->mmu_private); + /* Fallocate results in updating the i_start/iogstart + * for the zero byte file. So, make it return to + * original state during evict and commit it to avoid + * any corruption on the next access to the cluster + * chain for the file. + */ + err = __fat_write_inode(inode, inode_needs_sync(inode)); + if (err) { + fat_msg(inode->i_sb, KERN_WARNING, "Failed to " + "update on disk inode for unused fallocated " + "blocks, inode could be corrupted. Please run " + "fsck"); + } + } } invalidate_inode_buffers(inode); clear_inode(inode); @@ -1225,6 +1303,7 @@ static int fat_read_root(struct inode *inode) & ~((loff_t)sbi->cluster_size - 1)) >> 9; MSDOS_I(inode)->i_logstart = 0; MSDOS_I(inode)->mmu_private = inode->i_size; + MSDOS_I(inode)->i_disksize = inode->i_size; fat_save_attrs(inode, ATTR_DIR); inode->i_mtime.tv_sec = inode->i_atime.tv_sec = inode->i_ctime.tv_sec = 0; diff --git a/fs/hfsplus/catalog.c b/fs/hfsplus/catalog.c index 32602c667b4a..7892e6fddb66 100644 --- a/fs/hfsplus/catalog.c +++ b/fs/hfsplus/catalog.c @@ -38,21 +38,30 @@ int hfsplus_cat_bin_cmp_key(const hfsplus_btree_key *k1, return hfsplus_strcmp(&k1->cat.name, &k2->cat.name); } -void hfsplus_cat_build_key(struct super_block *sb, hfsplus_btree_key *key, - u32 parent, struct qstr *str) +/* Generates key for catalog file/folders record. */ +int hfsplus_cat_build_key(struct super_block *sb, + hfsplus_btree_key *key, u32 parent, struct qstr *str) { - int len; + int len, err; key->cat.parent = cpu_to_be32(parent); - if (str) { - hfsplus_asc2uni(sb, &key->cat.name, HFSPLUS_MAX_STRLEN, - str->name, str->len); - len = be16_to_cpu(key->cat.name.length); - } else { - key->cat.name.length = 0; - len = 0; - } + err = hfsplus_asc2uni(sb, &key->cat.name, HFSPLUS_MAX_STRLEN, + str->name, str->len); + if (unlikely(err < 0)) + return err; + + len = be16_to_cpu(key->cat.name.length); key->key_len = cpu_to_be16(6 + 2 * len); + return 0; +} + +/* Generates key for catalog thread record. */ +void hfsplus_cat_build_key_with_cnid(struct super_block *sb, + hfsplus_btree_key *key, u32 parent) +{ + key->cat.parent = cpu_to_be32(parent); + key->cat.name.length = 0; + key->key_len = cpu_to_be16(6); } static void hfsplus_cat_build_key_uni(hfsplus_btree_key *key, u32 parent, @@ -167,11 +176,16 @@ static int hfsplus_fill_cat_thread(struct super_block *sb, hfsplus_cat_entry *entry, int type, u32 parentid, struct qstr *str) { + int err; + entry->type = cpu_to_be16(type); entry->thread.reserved = 0; entry->thread.parentID = cpu_to_be32(parentid); - hfsplus_asc2uni(sb, &entry->thread.nodeName, HFSPLUS_MAX_STRLEN, + err = hfsplus_asc2uni(sb, &entry->thread.nodeName, HFSPLUS_MAX_STRLEN, str->name, str->len); + if (unlikely(err < 0)) + return err; + return 10 + be16_to_cpu(entry->thread.nodeName.length) * 2; } @@ -183,7 +197,7 @@ int hfsplus_find_cat(struct super_block *sb, u32 cnid, int err; u16 type; - hfsplus_cat_build_key(sb, fd->search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, fd->search_key, cnid); err = hfs_brec_read(fd, &tmp, sizeof(hfsplus_cat_entry)); if (err) return err; @@ -250,11 +264,16 @@ int hfsplus_create_cat(u32 cnid, struct inode *dir, if (err) return err; - hfsplus_cat_build_key(sb, fd.search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, fd.search_key, cnid); entry_size = hfsplus_fill_cat_thread(sb, &entry, S_ISDIR(inode->i_mode) ? HFSPLUS_FOLDER_THREAD : HFSPLUS_FILE_THREAD, dir->i_ino, str); + if (unlikely(entry_size < 0)) { + err = entry_size; + goto err2; + } + err = hfs_brec_find(&fd, hfs_find_rec_by_key); if (err != -ENOENT) { if (!err) @@ -265,7 +284,10 @@ int hfsplus_create_cat(u32 cnid, struct inode *dir, if (err) goto err2; - hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, str); + err = hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, str); + if (unlikely(err)) + goto err1; + entry_size = hfsplus_cat_build_record(&entry, cnid, inode); err = hfs_brec_find(&fd, hfs_find_rec_by_key); if (err != -ENOENT) { @@ -288,7 +310,7 @@ int hfsplus_create_cat(u32 cnid, struct inode *dir, return 0; err1: - hfsplus_cat_build_key(sb, fd.search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, fd.search_key, cnid); if (!hfs_brec_find(&fd, hfs_find_rec_by_key)) hfs_brec_remove(&fd); err2: @@ -313,7 +335,7 @@ int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str) if (!str) { int len; - hfsplus_cat_build_key(sb, fd.search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, fd.search_key, cnid); err = hfs_brec_find(&fd, hfs_find_rec_by_key); if (err) goto out; @@ -329,7 +351,9 @@ int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str) off + 2, len); fd.search_key->key_len = cpu_to_be16(6 + len); } else - hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, str); + err = hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, str); + if (unlikely(err)) + goto out; err = hfs_brec_find(&fd, hfs_find_rec_by_key); if (err) @@ -360,7 +384,7 @@ int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str) if (err) goto out; - hfsplus_cat_build_key(sb, fd.search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, fd.search_key, cnid); err = hfs_brec_find(&fd, hfs_find_rec_by_key); if (err) goto out; @@ -405,7 +429,11 @@ int hfsplus_rename_cat(u32 cnid, dst_fd = src_fd; /* find the old dir entry and read the data */ - hfsplus_cat_build_key(sb, src_fd.search_key, src_dir->i_ino, src_name); + err = hfsplus_cat_build_key(sb, src_fd.search_key, + src_dir->i_ino, src_name); + if (unlikely(err)) + goto out; + err = hfs_brec_find(&src_fd, hfs_find_rec_by_key); if (err) goto out; @@ -419,7 +447,11 @@ int hfsplus_rename_cat(u32 cnid, type = be16_to_cpu(entry.type); /* create new dir entry with the data from the old entry */ - hfsplus_cat_build_key(sb, dst_fd.search_key, dst_dir->i_ino, dst_name); + err = hfsplus_cat_build_key(sb, dst_fd.search_key, + dst_dir->i_ino, dst_name); + if (unlikely(err)) + goto out; + err = hfs_brec_find(&dst_fd, hfs_find_rec_by_key); if (err != -ENOENT) { if (!err) @@ -436,7 +468,11 @@ int hfsplus_rename_cat(u32 cnid, dst_dir->i_mtime = dst_dir->i_ctime = CURRENT_TIME_SEC; /* finally remove the old entry */ - hfsplus_cat_build_key(sb, src_fd.search_key, src_dir->i_ino, src_name); + err = hfsplus_cat_build_key(sb, src_fd.search_key, + src_dir->i_ino, src_name); + if (unlikely(err)) + goto out; + err = hfs_brec_find(&src_fd, hfs_find_rec_by_key); if (err) goto out; @@ -449,7 +485,7 @@ int hfsplus_rename_cat(u32 cnid, src_dir->i_mtime = src_dir->i_ctime = CURRENT_TIME_SEC; /* remove old thread entry */ - hfsplus_cat_build_key(sb, src_fd.search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, src_fd.search_key, cnid); err = hfs_brec_find(&src_fd, hfs_find_rec_by_key); if (err) goto out; @@ -459,9 +495,14 @@ int hfsplus_rename_cat(u32 cnid, goto out; /* create new thread entry */ - hfsplus_cat_build_key(sb, dst_fd.search_key, cnid, NULL); + hfsplus_cat_build_key_with_cnid(sb, dst_fd.search_key, cnid); entry_size = hfsplus_fill_cat_thread(sb, &entry, type, dst_dir->i_ino, dst_name); + if (unlikely(entry_size < 0)) { + err = entry_size; + goto out; + } + err = hfs_brec_find(&dst_fd, hfs_find_rec_by_key); if (err != -ENOENT) { if (!err) diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c index bdec66522de3..b306b66ccaba 100644 --- a/fs/hfsplus/dir.c +++ b/fs/hfsplus/dir.c @@ -43,7 +43,10 @@ static struct dentry *hfsplus_lookup(struct inode *dir, struct dentry *dentry, err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd); if (err) return ERR_PTR(err); - hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, &dentry->d_name); + err = hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, + &dentry->d_name); + if (unlikely(err < 0)) + goto fail; again: err = hfs_brec_read(&fd, &entry, sizeof(entry)); if (err) { @@ -96,9 +99,11 @@ again: be32_to_cpu(entry.file.permissions.dev); str.len = sprintf(name, "iNode%d", linkid); str.name = name; - hfsplus_cat_build_key(sb, fd.search_key, + err = hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_SB(sb)->hidden_dir->i_ino, &str); + if (unlikely(err < 0)) + goto fail; goto again; } } else if (!dentry->d_fsdata) @@ -139,7 +144,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx) err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd); if (err) return err; - hfsplus_cat_build_key(sb, fd.search_key, inode->i_ino, NULL); + hfsplus_cat_build_key_with_cnid(sb, fd.search_key, inode->i_ino); err = hfs_brec_find(&fd, hfs_find_rec_by_key); if (err) goto out; diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h index 83dc29286b10..7f36453a788d 100644 --- a/fs/hfsplus/hfsplus_fs.h +++ b/fs/hfsplus/hfsplus_fs.h @@ -444,8 +444,10 @@ int hfsplus_cat_case_cmp_key(const hfsplus_btree_key *, const hfsplus_btree_key *); int hfsplus_cat_bin_cmp_key(const hfsplus_btree_key *, const hfsplus_btree_key *); -void hfsplus_cat_build_key(struct super_block *sb, +int hfsplus_cat_build_key(struct super_block *sb, hfsplus_btree_key *, u32, struct qstr *); +void hfsplus_cat_build_key_with_cnid(struct super_block *sb, + hfsplus_btree_key *, u32); int hfsplus_find_cat(struct super_block *, u32, struct hfs_find_data *); int hfsplus_create_cat(u32, struct inode *, struct qstr *, struct inode *); int hfsplus_delete_cat(u32, struct inode *, struct qstr *); diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c index a513d2d36be9..dcb474129d5c 100644 --- a/fs/hfsplus/super.c +++ b/fs/hfsplus/super.c @@ -514,7 +514,9 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent) err = hfs_find_init(sbi->cat_tree, &fd); if (err) goto out_put_root; - hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_ROOT_CNID, &str); + err = hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_ROOT_CNID, &str); + if (unlikely(err < 0)) + goto out_put_root; if (!hfs_brec_read(&fd, &entry, sizeof(entry))) { hfs_find_exit(&fd); if (entry.type != cpu_to_be16(HFSPLUS_FOLDER)) diff --git a/fs/jffs2/background.c b/fs/jffs2/background.c index 2b60ce1996aa..bb9cebc9ca8a 100644 --- a/fs/jffs2/background.c +++ b/fs/jffs2/background.c @@ -75,10 +75,13 @@ void jffs2_stop_garbage_collect_thread(struct jffs2_sb_info *c) static int jffs2_garbage_collect_thread(void *_c) { struct jffs2_sb_info *c = _c; + sigset_t hupmask; + siginitset(&hupmask, sigmask(SIGHUP)); allow_signal(SIGKILL); allow_signal(SIGSTOP); allow_signal(SIGCONT); + allow_signal(SIGHUP); c->gc_task = current; complete(&c->gc_thread_start); @@ -87,7 +90,7 @@ static int jffs2_garbage_collect_thread(void *_c) set_freezable(); for (;;) { - allow_signal(SIGHUP); + sigprocmask(SIG_UNBLOCK, &hupmask, NULL); again: spin_lock(&c->erase_completion_lock); if (!jffs2_thread_should_wake(c)) { @@ -95,10 +98,9 @@ static int jffs2_garbage_collect_thread(void *_c) spin_unlock(&c->erase_completion_lock); jffs2_dbg(1, "%s(): sleeping...\n", __func__); schedule(); - } else + } else { spin_unlock(&c->erase_completion_lock); - - + } /* Problem - immediately after bootup, the GCD spends a lot * of time in places like jffs2_kill_fragtree(); so much so * that userspace processes (like gdm and X) are starved @@ -150,7 +152,7 @@ static int jffs2_garbage_collect_thread(void *_c) } } /* We don't want SIGHUP to interrupt us. STOP and KILL are OK though. */ - disallow_signal(SIGHUP); + sigprocmask(SIG_BLOCK, &hupmask, NULL); jffs2_dbg(1, "%s(): pass\n", __func__); if (jffs2_garbage_collect_pass(c) == -ENOSPC) { diff --git a/fs/mpage.c b/fs/mpage.c index 4979ffa60aaa..4e0af5ae34fa 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -462,6 +462,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc, struct buffer_head map_bh; loff_t i_size = i_size_read(inode); int ret = 0; + int wr = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); if (page_has_buffers(page)) { struct buffer_head *head = page_buffers(page); @@ -570,7 +571,7 @@ page_is_mapped: * This page will go to BIO. Do we need to send this BIO off first? */ if (bio && mpd->last_block_in_bio != blocks[0] - 1) - bio = mpage_bio_submit(WRITE, bio); + bio = mpage_bio_submit(wr, bio); alloc_new: if (bio == NULL) { @@ -587,7 +588,7 @@ alloc_new: */ length = first_unmapped << blkbits; if (bio_add_page(bio, page, length, 0) < length) { - bio = mpage_bio_submit(WRITE, bio); + bio = mpage_bio_submit(wr, bio); goto alloc_new; } @@ -620,7 +621,7 @@ alloc_new: set_page_writeback(page); unlock_page(page); if (boundary || (first_unmapped != blocks_per_page)) { - bio = mpage_bio_submit(WRITE, bio); + bio = mpage_bio_submit(wr, bio); if (boundary_block) { write_boundary_block(boundary_bdev, boundary_block, 1 << blkbits); @@ -632,7 +633,7 @@ alloc_new: confused: if (bio) - bio = mpage_bio_submit(WRITE, bio); + bio = mpage_bio_submit(wr, bio); if (mpd->use_writepage) { ret = mapping->a_ops->writepage(page, wbc); @@ -688,8 +689,11 @@ mpage_writepages(struct address_space *mapping, }; ret = write_cache_pages(mapping, wbc, __mpage_writepage, &mpd); - if (mpd.bio) - mpage_bio_submit(WRITE, mpd.bio); + if (mpd.bio) { + int wr = (wbc->sync_mode == WB_SYNC_ALL ? + WRITE_SYNC : WRITE); + mpage_bio_submit(wr, mpd.bio); + } } blk_finish_plug(&plug); return ret; @@ -706,8 +710,11 @@ int mpage_writepage(struct page *page, get_block_t get_block, .use_writepage = 0, }; int ret = __mpage_writepage(page, wbc, &mpd); - if (mpd.bio) - mpage_bio_submit(WRITE, mpd.bio); + if (mpd.bio) { + int wr = (wbc->sync_mode == WB_SYNC_ALL ? + WRITE_SYNC : WRITE); + mpage_bio_submit(wr, mpd.bio); + } return ret; } EXPORT_SYMBOL(mpage_writepage); diff --git a/fs/ntfs/compress.c b/fs/ntfs/compress.c index ee4144ce5d7c..f82498c35e78 100644 --- a/fs/ntfs/compress.c +++ b/fs/ntfs/compress.c @@ -58,7 +58,7 @@ typedef enum { /** * ntfs_compression_buffer - one buffer for the decompression engine */ -static u8 *ntfs_compression_buffer = NULL; +static u8 *ntfs_compression_buffer; /** * ntfs_cb_lock - spinlock which protects ntfs_compression_buffer diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c index 9de2491f2926..6c3296e546c3 100644 --- a/fs/ntfs/super.c +++ b/fs/ntfs/super.c @@ -50,8 +50,8 @@ static unsigned long ntfs_nr_compression_users; /* A global default upcase table and a corresponding reference count. */ -static ntfschar *default_upcase = NULL; -static unsigned long ntfs_nr_upcase_users = 0; +static ntfschar *default_upcase; +static unsigned long ntfs_nr_upcase_users; /* Error constants/strings used in inode.c::ntfs_show_options(). */ typedef enum { diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c index 79a89184cb5e..1927170a35ce 100644 --- a/fs/ntfs/sysctl.c +++ b/fs/ntfs/sysctl.c @@ -56,7 +56,7 @@ static ctl_table sysctls_root[] = { }; /* Storage for the sysctls header. */ -static struct ctl_table_header *sysctls_root_table = NULL; +static struct ctl_table_header *sysctls_root_table; /** * ntfs_sysctl - add or remove the debug sysctl diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index c6b90e670389..681691bc233a 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -108,7 +108,7 @@ static struct rb_root o2net_handler_tree = RB_ROOT; static struct o2net_node o2net_nodes[O2NM_MAX_NODES]; /* XXX someday we'll need better accounting */ -static struct socket *o2net_listen_sock = NULL; +static struct socket *o2net_listen_sock; /* * listen work is only queued by the listening socket callbacks on the @@ -1799,7 +1799,7 @@ int o2net_register_hb_callbacks(void) /* ------------------------------------------------------------ */ -static int o2net_accept_one(struct socket *sock) +static int o2net_accept_one(struct socket *sock, int *more) { int ret, slen; struct sockaddr_in sin; @@ -1810,6 +1810,7 @@ static int o2net_accept_one(struct socket *sock) struct o2net_node *nn; BUG_ON(sock == NULL); + *more = 0; ret = sock_create_lite(sock->sk->sk_family, sock->sk->sk_type, sock->sk->sk_protocol, &new_sock); if (ret) @@ -1821,6 +1822,7 @@ static int o2net_accept_one(struct socket *sock) if (ret < 0) goto out; + *more = 1; new_sock->sk->sk_allocation = GFP_ATOMIC; ret = o2net_set_nodelay(new_sock); @@ -1919,11 +1921,36 @@ out: return ret; } +/* + * This function is invoked in response to one or more + * pending accepts at softIRQ level. We must drain the + * entire que before returning. + */ + static void o2net_accept_many(struct work_struct *work) { struct socket *sock = o2net_listen_sock; - while (o2net_accept_one(sock) == 0) + int more; + int err; + + /* + * It is critical to note that due to interrupt moderation + * at the network driver level, we can't assume to get a + * softIRQ for every single conn since tcp SYN packets + * can arrive back-to-back, and therefore many pending + * accepts may result in just 1 softIRQ. If we terminate + * the o2net_accept_one() loop upon seeing an err, what happens + * to the rest of the conns in the queue? If no new SYN + * arrives for hours, no softIRQ will be delivered, + * and the connections will just sit in the queue. + */ + + for (;;) { + err = o2net_accept_one(sock, &more); + if (!more) + break; cond_resched(); + } } static void o2net_listen_data_ready(struct sock *sk) diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c index e33cd7a3c582..18f13c2e4a10 100644 --- a/fs/ocfs2/dlm/dlmdebug.c +++ b/fs/ocfs2/dlm/dlmdebug.c @@ -338,7 +338,7 @@ void dlm_print_one_mle(struct dlm_master_list_entry *mle) #ifdef CONFIG_DEBUG_FS -static struct dentry *dlm_debugfs_root = NULL; +static struct dentry *dlm_debugfs_root; #define DLM_DEBUGFS_DIR "o2dlm" #define DLM_DEBUGFS_DLM_STATE "dlm_state" diff --git a/fs/ocfs2/dlm/dlmlock.c b/fs/ocfs2/dlm/dlmlock.c index 5d32f7511f74..66c2a491f68d 100644 --- a/fs/ocfs2/dlm/dlmlock.c +++ b/fs/ocfs2/dlm/dlmlock.c @@ -52,7 +52,7 @@ #define MLOG_MASK_PREFIX ML_DLM #include "cluster/masklog.h" -static struct kmem_cache *dlm_lock_cache = NULL; +static struct kmem_cache *dlm_lock_cache; static DEFINE_SPINLOCK(dlm_cookie_lock); static u64 dlm_next_cookie = 1; diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index af3f7aa73e13..1256dc49f83f 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -82,9 +82,9 @@ static inline int dlm_mle_equal(struct dlm_ctxt *dlm, return 1; } -static struct kmem_cache *dlm_lockres_cache = NULL; -static struct kmem_cache *dlm_lockname_cache = NULL; -static struct kmem_cache *dlm_mle_cache = NULL; +static struct kmem_cache *dlm_lockres_cache; +static struct kmem_cache *dlm_lockname_cache; +static struct kmem_cache *dlm_mle_cache; static void dlm_mle_release(struct kref *kref); static void dlm_init_mle(struct dlm_master_list_entry *mle, @@ -3084,11 +3084,15 @@ static int dlm_add_migration_mle(struct dlm_ctxt *dlm, /* remove it so that only one mle will be found */ __dlm_unlink_mle(dlm, tmp); __dlm_mle_detach_hb_events(dlm, tmp); - ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; - mlog(0, "%s:%.*s: master=%u, newmaster=%u, " - "telling master to get ref for cleared out mle " - "during migration\n", dlm->name, namelen, name, - master, new_master); + if (tmp->type == DLM_MLE_MASTER) { + ret = DLM_MIGRATE_RESPONSE_MASTERY_REF; + mlog(0, "%s:%.*s: master=%u, newmaster=%u, " + "telling master to get ref " + "for cleared out mle during " + "migration\n", dlm->name, + namelen, name, master, + new_master); + } } spin_unlock(&tmp->spinlock); } diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 2060fc398445..7629da1b6897 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -231,6 +231,7 @@ static int ocfs2_mknod(struct inode *dir, sigset_t oldset; int did_block_signals = 0; struct posix_acl *default_acl = NULL, *acl = NULL; + struct ocfs2_dentry_lock *dl = NULL; trace_ocfs2_mknod(dir, dentry, dentry->d_name.len, dentry->d_name.name, (unsigned long long)OCFS2_I(dir)->ip_blkno, @@ -423,6 +424,8 @@ static int ocfs2_mknod(struct inode *dir, goto leave; } + dl = dentry->d_fsdata; + status = ocfs2_add_entry(handle, dentry, inode, OCFS2_I(inode)->ip_blkno, parent_fe_bh, &lookup); @@ -469,6 +472,16 @@ leave: * ocfs2_delete_inode will mutex_lock again. */ if ((status < 0) && inode) { + if (dl) { + ocfs2_simple_drop_lockres(osb, &dl->dl_lockres); + ocfs2_lock_res_free(&dl->dl_lockres); + BUG_ON(dl->dl_count != 1); + spin_lock(&dentry_attach_lock); + dentry->d_fsdata = NULL; + spin_unlock(&dentry_attach_lock); + kfree(dl); + iput(inode); + } OCFS2_I(inode)->ip_flags |= OCFS2_INODE_SKIP_ORPHAN_DIR; clear_nlink(inode); iput(inode); @@ -991,6 +1004,65 @@ leave: return status; } +static int ocfs2_check_if_ancestor(struct ocfs2_super *osb, + u64 src_inode_no, u64 dest_inode_no) +{ + int ret = 0, i = 0; + u64 parent_inode_no = 0; + u64 child_inode_no = src_inode_no; + struct inode *child_inode; + +#define MAX_LOOKUP_TIMES 32 + while (1) { + child_inode = ocfs2_iget(osb, child_inode_no, 0, 0); + if (IS_ERR(child_inode)) { + ret = PTR_ERR(child_inode); + break; + } + + ret = ocfs2_inode_lock(child_inode, NULL, 0); + if (ret < 0) { + iput(child_inode); + if (ret != -ENOENT) + mlog_errno(ret); + break; + } + + ret = ocfs2_lookup_ino_from_name(child_inode, "..", 2, + &parent_inode_no); + ocfs2_inode_unlock(child_inode, 0); + iput(child_inode); + if (ret < 0) { + ret = -ENOENT; + break; + } + + if (parent_inode_no == dest_inode_no) { + ret = 1; + break; + } + + if (parent_inode_no == osb->root_inode->i_ino) { + ret = 0; + break; + } + + child_inode_no = parent_inode_no; + + if (++i >= MAX_LOOKUP_TIMES) { + mlog(ML_NOTICE, "max lookup times reached, filesystem " + "may have nested directories, " + "src inode: %llu, dest inode: %llu.\n", + (unsigned long long)src_inode_no, + (unsigned long long)dest_inode_no); + ret = 0; + break; + } + } + + return ret; +} + /* * The only place this should be used is rename! * if they have the same id, then the 1st one is the only one locked. @@ -1002,6 +1074,7 @@ static int ocfs2_double_lock(struct ocfs2_super *osb, struct inode *inode2) { int status; + int inode1_is_ancestor, inode2_is_ancestor; struct ocfs2_inode_info *oi1 = OCFS2_I(inode1); struct ocfs2_inode_info *oi2 = OCFS2_I(inode2); struct buffer_head **tmpbh; @@ -1015,9 +1088,26 @@ static int ocfs2_double_lock(struct ocfs2_super *osb, if (*bh2) *bh2 = NULL; - /* we always want to lock the one with the lower lockid first. */ + /* we always want to lock the one with the lower lockid first. + * and if they are nested, we lock ancestor first */ if (oi1->ip_blkno != oi2->ip_blkno) { - if (oi1->ip_blkno < oi2->ip_blkno) { + inode1_is_ancestor = ocfs2_check_if_ancestor(osb, oi2->ip_blkno, + oi1->ip_blkno); + if (inode1_is_ancestor < 0) { + status = inode1_is_ancestor; + goto bail; + } + + inode2_is_ancestor = ocfs2_check_if_ancestor(osb, oi1->ip_blkno, + oi2->ip_blkno); + if (inode2_is_ancestor < 0) { + status = inode2_is_ancestor; + goto bail; + } + + if ((inode1_is_ancestor == 1) || + (oi1->ip_blkno < oi2->ip_blkno && + inode2_is_ancestor == 0)) { /* switch id1 and id2 around */ tmpbh = bh2; bh2 = bh1; @@ -1134,6 +1224,22 @@ static int ocfs2_rename(struct inode *old_dir, goto bail; } rename_lock = 1; + + /* here we cannot guarantee the inodes haven't just been + * changed, so check if they are nested again */ + status = ocfs2_check_if_ancestor(osb, new_dir->i_ino, + old_inode->i_ino); + if (status < 0) { + mlog_errno(status); + goto bail; + } else if (status == 1) { + status = -EPERM; + mlog(ML_ERROR, "src inode %llu should not be ancestor " + "of new dir inode %llu\n", + (unsigned long long)old_inode->i_ino, + (unsigned long long)new_dir->i_ino); + goto bail; + } } /* if old and new are the same, this'll just do one lock. */ @@ -1642,6 +1748,7 @@ static int ocfs2_symlink(struct inode *dir, struct ocfs2_dir_lookup_result lookup = { NULL, }; sigset_t oldset; int did_block_signals = 0; + struct ocfs2_dentry_lock *dl = NULL; trace_ocfs2_symlink_begin(dir, dentry, symname, dentry->d_name.len, dentry->d_name.name); @@ -1830,6 +1937,8 @@ static int ocfs2_symlink(struct inode *dir, goto bail; } + dl = dentry->d_fsdata; + status = ocfs2_add_entry(handle, dentry, inode, le64_to_cpu(fe->i_blkno), parent_fe_bh, &lookup); @@ -1864,6 +1973,16 @@ bail: if (xattr_ac) ocfs2_free_alloc_context(xattr_ac); if ((status < 0) && inode) { + if (dl) { + ocfs2_simple_drop_lockres(osb, &dl->dl_lockres); + ocfs2_lock_res_free(&dl->dl_lockres); + BUG_ON(dl->dl_count != 1); + spin_lock(&dentry_attach_lock); + dentry->d_fsdata = NULL; + spin_unlock(&dentry_attach_lock); + kfree(dl); + iput(inode); + } OCFS2_I(inode)->ip_flags |= OCFS2_INODE_SKIP_ORPHAN_DIR; clear_nlink(inode); iput(inode); diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index 83f1a665ae97..5d965e83bd43 100644 --- a/fs/ocfs2/stackglue.c +++ b/fs/ocfs2/stackglue.c @@ -709,7 +709,7 @@ static struct ctl_table ocfs2_root_table[] = { { } }; -static struct ctl_table_header *ocfs2_table_header = NULL; +static struct ctl_table_header *ocfs2_table_header; /* diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index a7cdd56f4c79..c7a89cea5c5d 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -75,7 +75,7 @@ #include "buffer_head_io.h" -static struct kmem_cache *ocfs2_inode_cachep = NULL; +static struct kmem_cache *ocfs2_inode_cachep; struct kmem_cache *ocfs2_dquot_cachep; struct kmem_cache *ocfs2_qf_chunk_cachep; @@ -85,7 +85,7 @@ struct kmem_cache *ocfs2_qf_chunk_cachep; * workqueue and schedule on our own. */ struct workqueue_struct *ocfs2_wq = NULL; -static struct dentry *ocfs2_debugfs_root = NULL; +static struct dentry *ocfs2_debugfs_root; MODULE_AUTHOR("Oracle"); MODULE_LICENSE("GPL"); @@ -2292,8 +2292,8 @@ static int ocfs2_initialize_super(struct super_block *sb, goto bail; } - strncpy(osb->vol_label, di->id2.i_super.s_label, 63); - osb->vol_label[63] = '\0'; + strlcpy(osb->vol_label, di->id2.i_super.s_label, + OCFS2_MAX_VOL_LABEL_LEN); osb->root_blkno = le64_to_cpu(di->id2.i_super.s_root_blkno); osb->system_dir_blkno = le64_to_cpu(di->id2.i_super.s_system_dir_blkno); osb->first_cluster_group_blkno = diff --git a/fs/ocfs2/uptodate.c b/fs/ocfs2/uptodate.c index 52eaf33d346f..82e17b076ce7 100644 --- a/fs/ocfs2/uptodate.c +++ b/fs/ocfs2/uptodate.c @@ -67,7 +67,7 @@ struct ocfs2_meta_cache_item { sector_t c_block; }; -static struct kmem_cache *ocfs2_uptodate_cachep = NULL; +static struct kmem_cache *ocfs2_uptodate_cachep; u64 ocfs2_metadata_cache_owner(struct ocfs2_caching_info *ci) { diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 442177b1119a..fa6d6a4e85b3 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -424,7 +424,6 @@ const struct file_operations proc_tid_maps_operations = { #ifdef CONFIG_PROC_PAGE_MONITOR struct mem_size_stats { - struct vm_area_struct *vma; unsigned long resident; unsigned long shared_clean; unsigned long shared_dirty; @@ -438,15 +437,16 @@ struct mem_size_stats { u64 pss; }; - -static void smaps_pte_entry(pte_t ptent, unsigned long addr, - unsigned long ptent_size, struct mm_walk *walk) +static int smaps_pte(pte_t *pte, unsigned long addr, unsigned long end, + struct mm_walk *walk) { struct mem_size_stats *mss = walk->private; - struct vm_area_struct *vma = mss->vma; + struct vm_area_struct *vma = walk->vma; pgoff_t pgoff = linear_page_index(vma, addr); struct page *page = NULL; int mapcount; + pte_t ptent = *pte; + unsigned long ptent_size = end - addr; if (pte_present(ptent)) { page = vm_normal_page(vma, addr, ptent); @@ -463,7 +463,7 @@ static void smaps_pte_entry(pte_t ptent, unsigned long addr, } if (!page) - return; + return 0; if (PageAnon(page)) mss->anonymous += ptent_size; @@ -489,35 +489,22 @@ static void smaps_pte_entry(pte_t ptent, unsigned long addr, mss->private_clean += ptent_size; mss->pss += (ptent_size << PSS_SHIFT); } + return 0; } -static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int smaps_pmd(pmd_t *pmd, unsigned long addr, unsigned long end, + struct mm_walk *walk) { struct mem_size_stats *mss = walk->private; - struct vm_area_struct *vma = mss->vma; - pte_t *pte; spinlock_t *ptl; - if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { - smaps_pte_entry(*(pte_t *)pmd, addr, HPAGE_PMD_SIZE, walk); + if (pmd_trans_huge_lock(pmd, walk->vma, &ptl) == 1) { + smaps_pte((pte_t *)pmd, addr, addr + HPAGE_PMD_SIZE, walk); spin_unlock(ptl); mss->anonymous_thp += HPAGE_PMD_SIZE; - return 0; + /* don't call smaps_pte() */ + walk->skip = 1; } - - if (pmd_trans_unstable(pmd)) - return 0; - /* - * The mmap_sem held all the way back in m_start() is what - * keeps khugepaged out of here and from collapsing things - * in here. - */ - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - for (; addr != end; pte++, addr += PAGE_SIZE) - smaps_pte_entry(*pte, addr, PAGE_SIZE, walk); - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); return 0; } @@ -582,16 +569,16 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) struct vm_area_struct *vma = v; struct mem_size_stats mss; struct mm_walk smaps_walk = { - .pmd_entry = smaps_pte_range, + .pmd_entry = smaps_pmd, + .pte_entry = smaps_pte, .mm = vma->vm_mm, + .vma = vma, .private = &mss, }; memset(&mss, 0, sizeof mss); - mss.vma = vma; /* mmap_sem is held in m_start */ - if (vma->vm_mm && !is_vm_hugetlb_page(vma)) - walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk); + walk_page_vma(vma, &smaps_walk); show_map_vma(m, vma, is_pid); @@ -712,7 +699,6 @@ enum clear_refs_types { }; struct clear_refs_private { - struct vm_area_struct *vma; enum clear_refs_types type; }; @@ -737,48 +723,52 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, ptent = pte_file_clear_soft_dirty(ptent); } - if (vma->vm_flags & VM_SOFTDIRTY) - vma->vm_flags &= ~VM_SOFTDIRTY; - set_pte_at(vma->vm_mm, addr, pte, ptent); #endif } -static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, +static int clear_refs_pte(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct clear_refs_private *cp = walk->private; - struct vm_area_struct *vma = cp->vma; - pte_t *pte, ptent; - spinlock_t *ptl; + struct vm_area_struct *vma = walk->vma; struct page *page; - split_huge_page_pmd(vma, addr, pmd); - if (pmd_trans_unstable(pmd)) + if (cp->type == CLEAR_REFS_SOFT_DIRTY) { + clear_soft_dirty(vma, addr, pte); return 0; + } + if (!pte_present(*pte)) + return 0; + page = vm_normal_page(vma, addr, *pte); + if (!page) + return 0; + /* Clear accessed and referenced bits. */ + ptep_test_and_clear_young(vma, addr, pte); + ClearPageReferenced(page); + return 0; +} - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - for (; addr != end; pte++, addr += PAGE_SIZE) { - ptent = *pte; - - if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty(vma, addr, pte); - continue; - } - - if (!pte_present(ptent)) - continue; - - page = vm_normal_page(vma, addr, ptent); - if (!page) - continue; +static int clear_refs_test_walk(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct clear_refs_private *cp = walk->private; + struct vm_area_struct *vma = walk->vma; - /* Clear accessed and referenced bits. */ - ptep_test_and_clear_young(vma, addr, pte); - ClearPageReferenced(page); + /* + * Writing 1 to /proc/pid/clear_refs affects all pages. + * Writing 2 to /proc/pid/clear_refs only affects anonymous pages. + * Writing 3 to /proc/pid/clear_refs only affects file mapped pages. + * Writing 4 to /proc/pid/clear_refs affects all pages. + */ + if (cp->type == CLEAR_REFS_ANON && vma->vm_file) + walk->skip = 1; + if (cp->type == CLEAR_REFS_MAPPED && !vma->vm_file) + walk->skip = 1; + if (cp->type == CLEAR_REFS_SOFT_DIRTY) { + if (vma->vm_flags & VM_SOFTDIRTY) + vma->vm_flags &= ~VM_SOFTDIRTY; } - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); return 0; } @@ -807,8 +797,9 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, if (type == CLEAR_REFS_SOFT_DIRTY) { soft_dirty_cleared = true; - pr_warn_once("The pagemap bits 55-60 has changed their meaning! " - "See the linux/Documentation/vm/pagemap.txt for details.\n"); + pr_warn_once("The pagemap bits 55-60 has changed their meaning!" + " See the linux/Documentation/vm/pagemap.txt for " + "details.\n"); } task = get_proc_task(file_inode(file)); @@ -820,33 +811,16 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, .type = type, }; struct mm_walk clear_refs_walk = { - .pmd_entry = clear_refs_pte_range, + .pte_entry = clear_refs_pte, + .test_walk = clear_refs_test_walk, .mm = mm, .private = &cp, }; down_read(&mm->mmap_sem); if (type == CLEAR_REFS_SOFT_DIRTY) mmu_notifier_invalidate_range_start(mm, 0, -1); - for (vma = mm->mmap; vma; vma = vma->vm_next) { - cp.vma = vma; - if (is_vm_hugetlb_page(vma)) - continue; - /* - * Writing 1 to /proc/pid/clear_refs affects all pages. - * - * Writing 2 to /proc/pid/clear_refs only affects - * Anonymous pages. - * - * Writing 3 to /proc/pid/clear_refs only affects file - * mapped pages. - */ - if (type == CLEAR_REFS_ANON && vma->vm_file) - continue; - if (type == CLEAR_REFS_MAPPED && !vma->vm_file) - continue; - walk_page_range(vma->vm_start, vma->vm_end, - &clear_refs_walk); - } + for (vma = mm->mmap; vma; vma = vma->vm_next) + walk_page_vma(vma, &clear_refs_walk); if (type == CLEAR_REFS_SOFT_DIRTY) mmu_notifier_invalidate_range_end(mm, 0, -1); flush_tlb_mm(mm); @@ -987,19 +961,33 @@ static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemap } #endif -static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, +static int pagemap_pte(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct vm_area_struct *vma; + struct vm_area_struct *vma = walk->vma; struct pagemapread *pm = walk->private; - spinlock_t *ptl; - pte_t *pte; + pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2)); + + if (vma && vma->vm_start <= addr && end <= vma->vm_end) { + pte_to_pagemap_entry(&pme, pm, vma, addr, *pte); + /* unmap before userspace copy */ + pte_unmap(pte); + } + return add_to_pagemap(addr, &pme, pm); +} + +static int pagemap_pmd(pmd_t *pmd, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ int err = 0; + struct vm_area_struct *vma = walk->vma; + struct pagemapread *pm = walk->private; pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2)); + spinlock_t *ptl; - /* find the first VMA at or above 'addr' */ - vma = find_vma(walk->mm, addr); - if (vma && pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { + if (!vma) + return err; + if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { int pmd_flags2; if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd)) @@ -1018,41 +1006,9 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, break; } spin_unlock(ptl); - return err; + /* don't call pagemap_pte() */ + walk->skip = 1; } - - if (pmd_trans_unstable(pmd)) - return 0; - for (; addr != end; addr += PAGE_SIZE) { - int flags2; - - /* check to see if we've left 'vma' behind - * and need a new, higher one */ - if (vma && (addr >= vma->vm_end)) { - vma = find_vma(walk->mm, addr); - if (vma && (vma->vm_flags & VM_SOFTDIRTY)) - flags2 = __PM_SOFT_DIRTY; - else - flags2 = 0; - pme = make_pme(PM_NOT_PRESENT(pm->v2) | PM_STATUS2(pm->v2, flags2)); - } - - /* check that 'vma' actually covers this address, - * and that it isn't a huge page vma */ - if (vma && (vma->vm_start <= addr) && - !is_vm_hugetlb_page(vma)) { - pte = pte_offset_map(pmd, addr); - pte_to_pagemap_entry(&pme, pm, vma, addr, *pte); - /* unmap before userspace copy */ - pte_unmap(pte); - } - err = add_to_pagemap(addr, &pme, pm); - if (err) - return err; - } - - cond_resched(); - return err; } @@ -1070,24 +1026,22 @@ static void huge_pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread * } /* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int pagemap_hugetlb(pte_t *pte, unsigned long addr, unsigned long end, + struct mm_walk *walk) { struct pagemapread *pm = walk->private; - struct vm_area_struct *vma; + struct vm_area_struct *vma = walk->vma; int err = 0; int flags2; pagemap_entry_t pme; + unsigned long hmask; - vma = find_vma(walk->mm, addr); - WARN_ON_ONCE(!vma); - - if (vma && (vma->vm_flags & VM_SOFTDIRTY)) + if (vma->vm_flags & VM_SOFTDIRTY) flags2 = __PM_SOFT_DIRTY; else flags2 = 0; + hmask = huge_page_mask(hstate_vma(vma)); for (; addr != end; addr += PAGE_SIZE) { int offset = (addr & ~hmask) >> PAGE_SHIFT; huge_pte_to_pagemap_entry(&pme, pm, *pte, offset, flags2); @@ -1095,9 +1049,6 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask, if (err) return err; } - - cond_resched(); - return err; } #endif /* HUGETLB_PAGE */ @@ -1164,10 +1115,11 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, if (!mm || IS_ERR(mm)) goto out_free; - pagemap_walk.pmd_entry = pagemap_pte_range; + pagemap_walk.pte_entry = pagemap_pte; + pagemap_walk.pmd_entry = pagemap_pmd; pagemap_walk.pte_hole = pagemap_pte_hole; #ifdef CONFIG_HUGETLB_PAGE - pagemap_walk.hugetlb_entry = pagemap_hugetlb_range; + pagemap_walk.hugetlb_entry = pagemap_hugetlb; #endif pagemap_walk.mm = mm; pagemap_walk.private = ± @@ -1243,7 +1195,6 @@ const struct file_operations proc_pagemap_operations = { #ifdef CONFIG_NUMA struct numa_maps { - struct vm_area_struct *vma; unsigned long pages; unsigned long anon; unsigned long active; @@ -1309,44 +1260,42 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma, return page; } -static int gather_pte_stats(pmd_t *pmd, unsigned long addr, +static int gather_pte_stats(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct numa_maps *md; - spinlock_t *ptl; - pte_t *orig_pte; - pte_t *pte; + struct numa_maps *md = walk->private; - md = walk->private; + struct page *page = can_gather_numa_stats(*pte, walk->vma, addr); + if (!page) + return 0; + gather_stats(page, md, pte_dirty(*pte), 1); + return 0; +} - if (pmd_trans_huge_lock(pmd, md->vma, &ptl) == 1) { +static int gather_pmd_stats(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct numa_maps *md = walk->private; + struct vm_area_struct *vma = walk->vma; + spinlock_t *ptl; + + if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { pte_t huge_pte = *(pte_t *)pmd; struct page *page; - page = can_gather_numa_stats(huge_pte, md->vma, addr); + page = can_gather_numa_stats(huge_pte, vma, addr); if (page) gather_stats(page, md, pte_dirty(huge_pte), HPAGE_PMD_SIZE/PAGE_SIZE); spin_unlock(ptl); - return 0; + /* don't call gather_pte_stats() */ + walk->skip = 1; } - - if (pmd_trans_unstable(pmd)) - return 0; - orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - do { - struct page *page = can_gather_numa_stats(*pte, md->vma, addr); - if (!page) - continue; - gather_stats(page, md, pte_dirty(*pte), 1); - - } while (pte++, addr += PAGE_SIZE, addr != end); - pte_unmap_unlock(orig_pte, ptl); return 0; } #ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(pte_t *pte, unsigned long addr, + unsigned long end, struct mm_walk *walk) { struct numa_maps *md; struct page *page; @@ -1354,6 +1303,9 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, if (pte_none(*pte)) return 0; + if (!pte_present(*pte)) + return 0; + page = pte_page(*pte); if (!page) return 0; @@ -1364,8 +1316,8 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, } #else -static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(pte_t *pte, unsigned long addr, + unsigned long end, struct mm_walk *walk) { return 0; } @@ -1394,12 +1346,12 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid) /* Ensure we start with an empty set of numa_maps statistics. */ memset(md, 0, sizeof(*md)); - md->vma = vma; - - walk.hugetlb_entry = gather_hugetbl_stats; - walk.pmd_entry = gather_pte_stats; + walk.hugetlb_entry = gather_hugetlb_stats; + walk.pmd_entry = gather_pmd_stats; + walk.pte_entry = gather_pte_stats; walk.private = md; walk.mm = mm; + walk.vma = vma; pol = get_vma_policy(task, vma, vma->vm_start); mpol_to_str(buffer, sizeof(buffer), pol); @@ -1430,6 +1382,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid) if (is_vm_hugetlb_page(vma)) seq_printf(m, " huge"); + /* mmap_sem is held by m_start */ walk_page_range(vma->vm_start, vma->vm_end, &walk); if (!md->pages) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 1ec08c198b66..a8015a7a55bb 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -693,24 +693,35 @@ static inline int pmd_numa(pmd_t pmd) #ifndef pte_mknonnuma static inline pte_t pte_mknonnuma(pte_t pte) { - pte = pte_clear_flags(pte, _PAGE_NUMA); - return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED); + pteval_t val = pte_val(pte); + + val &= ~_PAGE_NUMA; + val |= (_PAGE_PRESENT|_PAGE_ACCESSED); + return __pte(val); } #endif #ifndef pmd_mknonnuma static inline pmd_t pmd_mknonnuma(pmd_t pmd) { - pmd = pmd_clear_flags(pmd, _PAGE_NUMA); - return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED); + pmdval_t val = pmd_val(pmd); + + val &= ~_PAGE_NUMA; + val |= (_PAGE_PRESENT|_PAGE_ACCESSED); + + return __pmd(val); } #endif #ifndef pte_mknuma static inline pte_t pte_mknuma(pte_t pte) { - pte = pte_set_flags(pte, _PAGE_NUMA); - return pte_clear_flags(pte, _PAGE_PRESENT); + pteval_t val = pte_val(pte); + + val &= ~_PAGE_PRESENT; + val |= _PAGE_NUMA; + + return __pte(val); } #endif @@ -729,8 +740,12 @@ static inline void ptep_set_numa(struct mm_struct *mm, unsigned long addr, #ifndef pmd_mknuma static inline pmd_t pmd_mknuma(pmd_t pmd) { - pmd = pmd_set_flags(pmd, _PAGE_NUMA); - return pmd_clear_flags(pmd, _PAGE_PRESENT); + pmdval_t val = pmd_val(pmd); + + val &= ~_PAGE_PRESENT; + val |= _PAGE_NUMA; + + return __pmd(val); } #endif diff --git a/include/linux/crc64_ecma.h b/include/linux/crc64_ecma.h new file mode 100644 index 000000000000..bba7a4d692b3 --- /dev/null +++ b/include/linux/crc64_ecma.h @@ -0,0 +1,56 @@ +/* + * Copyright 2013 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __CRC64_ECMA_H_ +#define __CRC64_ECMA_H_ + +#include <linux/types.h> + + +#define CRC64_DEFAULT_INITVAL 0xFFFFFFFFFFFFFFFFULL + + +/* + * crc64_ecma_seed - Initializes the CRC64 ECMA seed. + */ +u64 crc64_ecma_seed(void); + +/* + * crc64_ecma - Computes the 64 bit ECMA CRC. + * + * @pdata: pointer to the data to compute checksum for. + * @nbytes: number of bytes in data buffer. + * @seed: CRC seed. + */ +u64 crc64_ecma(u8 const *pdata, u32 nbytes, u64 seed); + +#endif /* __CRC64_ECMA_H_ */ diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 39b81dc7d01a..d382db71e300 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,7 +31,6 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u -#define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -91,7 +90,6 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ -#define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* @@ -353,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_page_vma_node(gfp_mask, vma, addr, node) \ alloc_pages_vma(gfp_mask, 0, vma, addr, node) +extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order); +extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, + unsigned int order); + extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); @@ -372,8 +374,8 @@ extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, int cold); extern void free_hot_cold_page_list(struct list_head *list, int cold); -extern void __free_memcg_kmem_pages(struct page *page, unsigned int order); -extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order); +extern void __free_kmem_pages(struct page *page, unsigned int order); +extern void free_kmem_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5b337cf8fb86..1ae16673c672 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -41,8 +41,6 @@ extern int hugetlb_max_hstate __read_mostly; struct hugepage_subpool *hugepage_new_subpool(long nr_blocks); void hugepage_put_subpool(struct hugepage_subpool *spool); -int PageHuge(struct page *page); - void reset_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); @@ -109,11 +107,6 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, #else /* !CONFIG_HUGETLB_PAGE */ -static inline int PageHuge(struct page *page) -{ - return 0; -} - static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) { } diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h index 2bb681fbeb35..4d60c82e9fda 100644 --- a/include/linux/hugetlb_inline.h +++ b/include/linux/hugetlb_inline.h @@ -10,6 +10,8 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) return !!(vma->vm_flags & VM_HUGETLB); } +int PageHuge(struct page *page); + #else static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) @@ -17,6 +19,11 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) return 0; } +static inline int PageHuge(struct page *page) +{ + return 0; +} + #endif #endif diff --git a/include/linux/input.h b/include/linux/input.h index 82ce323b9986..6453b22372ac 100644 --- a/include/linux/input.h +++ b/include/linux/input.h @@ -79,6 +79,7 @@ struct input_value { * @led: reflects current state of device's LEDs * @snd: reflects current state of sound effects * @sw: reflects current state of device's switches + * @leds: leds objects for the device's LEDs * @open: this method is called when the very first user calls * input_open_device(). The driver must prepare the device * to start generating events (start polling thread, @@ -164,6 +165,8 @@ struct input_dev { unsigned long snd[BITS_TO_LONGS(SND_CNT)]; unsigned long sw[BITS_TO_LONGS(SW_CNT)]; + struct led_classdev *leds; + int (*open)(struct input_dev *dev); void (*close)(struct input_dev *dev); int (*flush)(struct input_dev *dev, struct file *file); @@ -531,4 +534,22 @@ int input_ff_erase(struct input_dev *dev, int effect_id, struct file *file); int input_ff_create_memless(struct input_dev *dev, void *data, int (*play_effect)(struct input_dev *, void *, struct ff_effect *)); +#ifdef CONFIG_INPUT_LEDS + +int input_led_connect(struct input_dev *dev); +void input_led_disconnect(struct input_dev *dev); + +#else + +static inline int input_led_connect(struct input_dev *dev) +{ + return 0; +} + +static inline void input_led_disconnect(struct input_dev *dev) +{ +} + +#endif + #endif diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b569b8be5c5a..5155d09e749d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -506,6 +506,9 @@ void memcg_update_array_size(int num_groups); struct kmem_cache * __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp); +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size); +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size); + void mem_cgroup_destroy_cache(struct kmem_cache *cachep); int __kmem_cache_destroy_memcg_children(struct kmem_cache *s); @@ -534,7 +537,7 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * res_counter_charge_nofail, but we hope those allocations are rare, * and won't be worth the trouble. */ - if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL)) + if (gfp & __GFP_NOFAIL) return true; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) return true; @@ -583,17 +586,7 @@ memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order) * @cachep: the original global kmem cache * @gfp: allocation flags. * - * This function assumes that the task allocating, which determines the memcg - * in the page allocator, belongs to the same cgroup throughout the whole - * process. Misacounting can happen if the task calls memcg_kmem_get_cache() - * while belonging to a cgroup, and later on changes. This is considered - * acceptable, and should only happen upon task migration. - * - * Before the cache is created by the memcg core, there is also a possible - * imbalance: the task belongs to a memcg, but the cache being allocated from - * is the global cache, since the child cache is not yet guaranteed to be - * ready. This case is also fine, since in this case the GFP_KMEMCG will not be - * passed and the page allocator will not attempt any cgroup accounting. + * All memory allocated from a per-memcg cache is charged to the owner memcg. */ static __always_inline struct kmem_cache * memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) diff --git a/include/linux/mm.h b/include/linux/mm.h index bf9811e1321a..9a3744d98b00 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1096,10 +1096,18 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma, * @pte_entry: if set, called for each non-empty PTE (4th-level) entry * @pte_hole: if set, called for each hole at all levels * @hugetlb_entry: if set, called for each hugetlb entry - * *Caution*: The caller must hold mmap_sem() if @hugetlb_entry - * is used. + * @test_walk: caller specific callback function to determine whether + * we walk over the current vma or not. A positive returned + * value means "do page table walk over the current vma," + * and a negative one means "abort current page table walk + * right now." 0 means "skip the current vma." + * @mm: mm_struct representing the target process of page table walk + * @vma: vma currently walked + * @skip: internal control flag which is set when we skip the lower + * level entries. + * @private: private data for callbacks' use * - * (see walk_page_range for more details) + * (see the comment on walk_page_range() for more details) */ struct mm_walk { int (*pgd_entry)(pgd_t *pgd, unsigned long addr, @@ -1112,15 +1120,19 @@ struct mm_walk { unsigned long next, struct mm_walk *walk); int (*pte_hole)(unsigned long addr, unsigned long next, struct mm_walk *walk); - int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, - struct mm_walk *walk); + int (*hugetlb_entry)(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk); + int (*test_walk)(unsigned long addr, unsigned long next, + struct mm_walk *walk); struct mm_struct *mm; + struct vm_area_struct *vma; + int skip; void *private; }; int walk_page_range(unsigned long addr, unsigned long end, struct mm_walk *walk); +int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk); void free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); int copy_page_range(struct mm_struct *dst, struct mm_struct *src, diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 45598f1e9aa3..d4acafc51949 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -316,6 +316,34 @@ static inline loff_t page_file_offset(struct page *page) return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT; } +/* + * Get the order of a given page in the context of the pagecache which it + * belongs to. + * + * Pagecache unit size is not a fixed value (hugetlbfs is an example), but the + * vma_interval_tree and anon_vma_interval_tree APIs assume that indices are in + * PAGE_SIZE units. So this function helps us to get normalized indices. + * + * page_size_order() should be called only for pagecache pages/hugepages and + * anonymous pages/hugepages, because pagecache unit size is irrelevant except + * for those pages. + */ +static inline unsigned int page_size_order(struct page *page) +{ + return unlikely(PageHuge(page)) ? + compound_order(compound_head(page)) : + (PAGE_CACHE_SHIFT - PAGE_SHIFT); +} + +/* + * page->index stores pagecache index whose unit is not always PAGE_SIZE. + * This function converts it into PAGE_SIZE offset. + */ +static inline pgoff_t page_pgoff(struct page *page) +{ + return page->index << page_size_order(page); +} + extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma, unsigned long address); diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index a964f7285600..4b152c81c5fa 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -136,7 +136,7 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf, static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents, struct scatterlist *sgl) { -#ifndef ARCH_HAS_SG_CHAIN +#ifndef CONFIG_ARCH_HAS_SG_CHAIN BUG(); #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 25f54c79f757..148481cbc89f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2366,9 +2366,6 @@ extern void flush_itimer_signals(void); extern void do_group_exit(int); -extern int allow_signal(int); -extern int disallow_signal(int); - extern int do_execve(struct filename *, const char __user * const __user *, const char __user * const __user *); diff --git a/include/linux/signal.h b/include/linux/signal.h index 2ac423bdb676..c9e65360c49a 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -63,11 +63,6 @@ static inline int sigismember(sigset_t *set, int _sig) return 1 & (set->sig[sig / _NSIG_BPW] >> (sig % _NSIG_BPW)); } -static inline int sigfindinword(unsigned long word) -{ - return ffz(~word); -} - #endif /* __HAVE_ARCH_SIG_BITOPS */ static inline int sigisemptyset(sigset_t *set) @@ -289,6 +284,22 @@ extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, extern void signal_setup_done(int failed, struct ksignal *ksig, int stepping); extern void signal_delivered(int sig, siginfo_t *info, struct k_sigaction *ka, struct pt_regs *regs, int stepping); extern void exit_signals(struct task_struct *tsk); +extern void kernel_sigaction(int, __sighandler_t); + +static inline void allow_signal(int sig) +{ + /* + * Kernel threads handle their own signals. Let the signal code + * know it'll be handled, so that they don't get converted to + * SIGKILL or just silently dropped. + */ + kernel_sigaction(sig, (__force __sighandler_t)2); +} + +static inline void disallow_signal(int sig) +{ + kernel_sigaction(sig, SIG_IGN); +} /* * Eventually that'll replace get_signal_to_deliver(); macro for now, diff --git a/include/linux/slab.h b/include/linux/slab.h index 307bfbe62387..a6aab2c0dfc5 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -369,16 +369,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s, #include <linux/slub_def.h> #endif -static __always_inline void * -kmalloc_order(size_t size, gfp_t flags, unsigned int order) -{ - void *ret; - - flags |= (__GFP_COMP | __GFP_KMEMCG); - ret = (void *) __get_free_pages(flags, order); - kmemleak_alloc(ret, size, 1, flags); - return ret; -} +extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order); #ifdef CONFIG_TRACING extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order); diff --git a/include/linux/string.h b/include/linux/string.h index ac889c5ea11b..f29f9a0b7265 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -114,6 +114,7 @@ void *memchr_inv(const void *s, int c, size_t n); extern char *kstrdup(const char *s, gfp_t gfp); extern char *kstrndup(const char *s, size_t len, gfp_t gfp); +extern char *kstrimdup(const char *s, gfp_t gfp); extern void *kmemdup(const void *src, size_t len, gfp_t gfp); extern char **argv_split(gfp_t gfp, const char *str, int *argcp); diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index fddbe2023a5d..1807bb194816 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,8 +61,6 @@ extern long do_no_restart_syscall(struct restart_block *parm); # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK) #endif -#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG) - /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 486c3972c0be..ced92345c963 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -80,6 +80,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NR_TLB_LOCAL_FLUSH_ALL, NR_TLB_LOCAL_FLUSH_ONE, #endif /* CONFIG_DEBUG_TLBFLUSH */ +#ifdef CONFIG_DEBUG_VM_VMACACHE + VMACACHE_FIND_CALLS, + VMACACHE_FIND_HITS, +#endif NR_VM_EVENT_ITEMS }; diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 45c9cd1daf7a..82e7db7f7100 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -95,6 +95,12 @@ static inline void vm_events_fold_cpu(int cpu) #define count_vm_tlb_events(x, y) do { (void)(y); } while (0) #endif +#ifdef CONFIG_DEBUG_VM_VMACACHE +#define count_vm_vmacache_event(x) count_vm_event(x) +#else +#define count_vm_vmacache_event(x) do {} while (0) +#endif + #define __count_zone_vm_events(item, zone, delta) \ __count_vm_events(item##_NORMAL - ZONE_NORMAL + \ zone_idx(zone), delta) diff --git a/include/linux/wait.h b/include/linux/wait.h index e7d9d9ed14f5..bd68819f0815 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -191,11 +191,23 @@ wait_queue_head_t *bit_waitqueue(void *, int); (!__builtin_constant_p(state) || \ state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \ +/* + * The below macro ___wait_event() has an explicit shadow of the __ret + * variable when used from the wait_event_*() macros. + * + * This is so that both can use the ___wait_cond_timeout() construct + * to wrap the condition. + * + * The type inconsistency of the wait_event_*() __ret variable is also + * on purpose; we use long where we can return timeout values and int + * otherwise. + */ + #define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ ({ \ __label__ __out; \ wait_queue_t __wait; \ - long __ret = ret; \ + long __ret = ret; /* explicit shadow */ \ \ INIT_LIST_HEAD(&__wait.task_list); \ if (exclusive) \ diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h index 0a4edfe8af51..d34cf2df093b 100644 --- a/include/scsi/scsi.h +++ b/include/scsi/scsi.h @@ -31,7 +31,7 @@ enum scsi_timeouts { * Like SCSI_MAX_SG_SEGMENTS, but for archs that have sg chaining. This limit * is totally arbitrary, a setting of 2048 will get you at least 8mb ios. */ -#ifdef ARCH_HAS_SG_CHAIN +#ifdef CONFIG_ARCH_HAS_SG_CHAIN #define SCSI_MAX_SG_CHAIN_SEGMENTS 2048 #else #define SCSI_MAX_SG_CHAIN_SEGMENTS SCSI_MAX_SG_SEGMENTS diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 1eddbf1557f2..d6fd8e5b14b7 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -34,7 +34,6 @@ {(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \ {(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ - {(unsigned long)__GFP_KMEMCG, "GFP_KMEMCG"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ diff --git a/init/Kconfig b/init/Kconfig index 03260b759197..0f0f351029eb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1717,6 +1717,18 @@ config MMAP_ALLOW_UNINITIALIZED See Documentation/nommu-mmap.txt for more information. +config SYSTEM_TRUSTED_KEYRING + bool "Provide system-wide ring of trusted keys" + depends on KEYS + help + Provide a system keyring to which trusted keys can be added. Keys in + the keyring are considered to be trusted. Keys may be added at will + by the kernel from compiled-in data and from hardware key stores, but + userspace may only add extra keys if those keys can be verified by + keys already in the keyring. + + Keys in this keyring are used by module signature checking. + config PROFILING bool "Profiling support" help @@ -1752,18 +1764,6 @@ config BASE_SMALL default 0 if BASE_FULL default 1 if !BASE_FULL -config SYSTEM_TRUSTED_KEYRING - bool "Provide system-wide ring of trusted keys" - depends on KEYS - help - Provide a system keyring to which trusted keys can be added. Keys in - the keyring are considered to be trusted. Keys may be added at will - by the kernel from compiled-in data and from hardware key stores, but - userspace may only add extra keys if those keys can be verified by - keys already in the keyring. - - Keys in this keyring are used by module signature checking. - menuconfig MODULES bool "Enable loadable module support" option modules diff --git a/init/main.c b/init/main.c index 9c7fd4c9249f..ba8798058748 100644 --- a/init/main.c +++ b/init/main.c @@ -77,6 +77,7 @@ #include <linux/sched_clock.h> #include <linux/context_tracking.h> #include <linux/random.h> +#include <linux/list.h> #include <asm/io.h> #include <asm/bugs.h> @@ -666,19 +667,83 @@ static void __init do_ctors(void) bool initcall_debug; core_param(initcall_debug, initcall_debug, bool, 0644); +#ifdef CONFIG_KALLSYMS +struct blacklist_entry { + struct list_head next; + char *buf; +}; + +static __initdata_or_module LIST_HEAD(blacklisted_initcalls); + +static int __init initcall_blacklist(char *str) +{ + char *str_entry; + struct blacklist_entry *entry; + + /* str argument is a comma-separated list of functions */ + do { + str_entry = strsep(&str, ","); + if (str_entry) { + pr_debug("blacklisting initcall %s\n", str_entry); + entry = alloc_bootmem(sizeof(*entry)); + entry->buf = alloc_bootmem(strlen(str_entry) + 1); + strcpy(entry->buf, str_entry); + list_add(&entry->next, &blacklisted_initcalls); + } + } while (str_entry); + + return 0; +} + +static bool __init_or_module initcall_blacklisted(initcall_t fn) +{ + struct list_head *tmp; + struct blacklist_entry *entry; + char *fn_name; + + fn_name = kasprintf(GFP_KERNEL, "%pf", fn); + if (!fn_name) + return false; + + list_for_each(tmp, &blacklisted_initcalls) { + entry = list_entry(tmp, struct blacklist_entry, next); + if (!strcmp(fn_name, entry->buf)) { + pr_debug("initcall %s blacklisted\n", fn_name); + kfree(fn_name); + return true; + } + } + + kfree(fn_name); + return false; +} +#else +static int __init initcall_blacklist(char *str) +{ + pr_warn("initcall_blacklist requires CONFIG_KALLSYMS\n"); + return 0; +} + +static bool __init_or_module initcall_blacklisted(initcall_t fn) +{ + return false; +} +#endif +__setup("initcall_blacklist=", initcall_blacklist); + static int __init_or_module do_one_initcall_debug(initcall_t fn) { ktime_t calltime, delta, rettime; unsigned long long duration; int ret; - pr_debug("calling %pF @ %i\n", fn, task_pid_nr(current)); + printk(KERN_DEBUG "calling %pF @ %i\n", fn, task_pid_nr(current)); calltime = ktime_get(); ret = fn(); rettime = ktime_get(); delta = ktime_sub(rettime, calltime); duration = (unsigned long long) ktime_to_ns(delta) >> 10; - pr_debug("initcall %pF returned %d after %lld usecs\n", + printk(KERN_DEBUG "initcall %pF returned %d after %lld usecs\n", fn, ret, duration); return ret; @@ -690,6 +755,9 @@ int __init_or_module do_one_initcall(initcall_t fn) int ret; char msgbuf[64]; + if (initcall_blacklisted(fn)) + return -EPERM; + if (initcall_debug) ret = do_one_initcall_debug(fn); else diff --git a/ipc/msg.c b/ipc/msg.c index 649853105a5d..35e4018de53c 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -306,15 +306,14 @@ static inline int msg_security(struct kern_ipc_perm *ipcp, int msgflg) SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg) { struct ipc_namespace *ns; - struct ipc_ops msg_ops; + static const struct ipc_ops msg_ops = { + .getnew = newque, + .associate = msg_security, + }; struct ipc_params msg_params; ns = current->nsproxy->ipc_ns; - msg_ops.getnew = newque; - msg_ops.associate = msg_security; - msg_ops.more_checks = NULL; - msg_params.key = key; msg_params.flg = msgflg; diff --git a/ipc/sem.c b/ipc/sem.c index bee555417312..3fcbc96abee9 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -564,7 +564,11 @@ static inline int sem_more_checks(struct kern_ipc_perm *ipcp, SYSCALL_DEFINE3(semget, key_t, key, int, nsems, int, semflg) { struct ipc_namespace *ns; - struct ipc_ops sem_ops; + static const struct ipc_ops sem_ops = { + .getnew = newary, + .associate = sem_security, + .more_checks = sem_more_checks, + }; struct ipc_params sem_params; ns = current->nsproxy->ipc_ns; @@ -572,10 +576,6 @@ SYSCALL_DEFINE3(semget, key_t, key, int, nsems, int, semflg) if (nsems < 0 || nsems > ns->sc_semmsl) return -EINVAL; - sem_ops.getnew = newary; - sem_ops.associate = sem_security; - sem_ops.more_checks = sem_more_checks; - sem_params.key = key; sem_params.flg = semflg; sem_params.u.nsems = nsems; diff --git a/ipc/shm.c b/ipc/shm.c index 76459616a7fa..b54c93f6d117 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -609,15 +609,15 @@ static inline int shm_more_checks(struct kern_ipc_perm *ipcp, SYSCALL_DEFINE3(shmget, key_t, key, size_t, size, int, shmflg) { struct ipc_namespace *ns; - struct ipc_ops shm_ops; + static const struct ipc_ops shm_ops = { + .getnew = newseg, + .associate = shm_security, + .more_checks = shm_more_checks, + }; struct ipc_params shm_params; ns = current->nsproxy->ipc_ns; - shm_ops.getnew = newseg; - shm_ops.associate = shm_security; - shm_ops.more_checks = shm_more_checks; - shm_params.key = key; shm_params.flg = shmflg; shm_params.u.size = size; diff --git a/ipc/util.c b/ipc/util.c index 2eb0d1eaa312..9b3fa38afe2c 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -317,7 +317,7 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int size) * when the key is IPC_PRIVATE. */ static int ipcget_new(struct ipc_namespace *ns, struct ipc_ids *ids, - struct ipc_ops *ops, struct ipc_params *params) + const struct ipc_ops *ops, struct ipc_params *params) { int err; @@ -344,7 +344,7 @@ static int ipcget_new(struct ipc_namespace *ns, struct ipc_ids *ids, */ static int ipc_check_perms(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp, - struct ipc_ops *ops, + const struct ipc_ops *ops, struct ipc_params *params) { int err; @@ -375,7 +375,7 @@ static int ipc_check_perms(struct ipc_namespace *ns, * On success, the ipc id is returned. */ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids, - struct ipc_ops *ops, struct ipc_params *params) + const struct ipc_ops *ops, struct ipc_params *params) { struct kern_ipc_perm *ipcp; int flg = params->flg; @@ -678,7 +678,7 @@ out: * Common routine called by sys_msgget(), sys_semget() and sys_shmget(). */ int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids, - struct ipc_ops *ops, struct ipc_params *params) + const struct ipc_ops *ops, struct ipc_params *params) { if (params->key == IPC_PRIVATE) return ipcget_new(ns, ids, ops, params); diff --git a/ipc/util.h b/ipc/util.h index 9c47d6f6c7b4..e1153ad574b7 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -201,7 +201,7 @@ static inline bool ipc_valid_object(struct kern_ipc_perm *perm) struct kern_ipc_perm *ipc_obtain_object_check(struct ipc_ids *ids, int id); int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids, - struct ipc_ops *ops, struct ipc_params *params); + const struct ipc_ops *ops, struct ipc_params *params); void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, void (*free)(struct ipc_namespace *, struct kern_ipc_perm *)); #endif diff --git a/kernel/exit.c b/kernel/exit.c index 6ed6a1d552b5..ad7183a8dbbd 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -313,45 +313,6 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) } } -/* - * Let kernel threads use this to say that they allow a certain signal. - * Must not be used if kthread was cloned with CLONE_SIGHAND. - */ -int allow_signal(int sig) -{ - if (!valid_signal(sig) || sig < 1) - return -EINVAL; - - spin_lock_irq(¤t->sighand->siglock); - /* This is only needed for daemonize()'ed kthreads */ - sigdelset(¤t->blocked, sig); - /* - * Kernel threads handle their own signals. Let the signal code - * know it'll be handled, so that they don't get converted to - * SIGKILL or just silently dropped. - */ - current->sighand->action[(sig)-1].sa.sa_handler = (void __user *)2; - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); - return 0; -} - -EXPORT_SYMBOL(allow_signal); - -int disallow_signal(int sig) -{ - if (!valid_signal(sig) || sig < 1) - return -EINVAL; - - spin_lock_irq(¤t->sighand->siglock); - current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN; - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); - return 0; -} - -EXPORT_SYMBOL(disallow_signal); - #ifdef CONFIG_MM_OWNER /* * A task is exiting. If it owned this mm, find a new owner for the mm. diff --git a/kernel/fork.c b/kernel/fork.c index 54a8d26f612f..59e3dcc5b8f2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -150,15 +150,15 @@ void __weak arch_release_thread_info(struct thread_info *ti) static struct thread_info *alloc_thread_info_node(struct task_struct *tsk, int node) { - struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED, - THREAD_SIZE_ORDER); + struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP, + THREAD_SIZE_ORDER); return page ? page_address(page) : NULL; } static inline void free_thread_info(struct thread_info *ti) { - free_memcg_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); + free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_info_cache; diff --git a/kernel/kmod.c b/kernel/kmod.c index 6b375af4958d..c18c891766b5 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -285,10 +285,7 @@ static int wait_for_helper(void *data) pid_t pid; /* If SIGCLD is ignored sys_wait4 won't populate the status. */ - spin_lock_irq(¤t->sighand->siglock); - current->sighand->action[SIGCHLD-1].sa.sa_handler = SIG_DFL; - spin_unlock_irq(¤t->sighand->siglock); - + kernel_sigaction(SIGCHLD, SIG_DFL); pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD); if (pid < 0) { sub_info->retval = pid; diff --git a/kernel/signal.c b/kernel/signal.c index 6ea13c09ae56..45dba6165ace 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -705,11 +705,8 @@ void signal_wake_up_state(struct task_struct *t, unsigned int state) * Returns 1 if any signals were found. * * All callers must be holding the siglock. - * - * This version takes a sigset mask and looks at all signals, - * not just those in the first mask word. */ -static int rm_from_queue_full(sigset_t *mask, struct sigpending *s) +static int flush_sigqueue_mask(sigset_t *mask, struct sigpending *s) { struct sigqueue *q, *n; sigset_t m; @@ -727,29 +724,6 @@ static int rm_from_queue_full(sigset_t *mask, struct sigpending *s) } return 1; } -/* - * Remove signals in mask from the pending set and queue. - * Returns 1 if any signals were found. - * - * All callers must be holding the siglock. - */ -static int rm_from_queue(unsigned long mask, struct sigpending *s) -{ - struct sigqueue *q, *n; - - if (!sigtestsetmask(&s->signal, mask)) - return 0; - - sigdelsetmask(&s->signal, mask); - list_for_each_entry_safe(q, n, &s->list, list) { - if (q->info.si_signo < SIGRTMIN && - (mask & sigmask(q->info.si_signo))) { - list_del_init(&q->list); - __sigqueue_free(q); - } - } - return 1; -} static inline int is_si_special(const struct siginfo *info) { @@ -861,6 +835,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) { struct signal_struct *signal = p->signal; struct task_struct *t; + sigset_t flush; if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { if (signal->flags & SIGNAL_GROUP_COREDUMP) @@ -872,26 +847,25 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) /* * This is a stop signal. Remove SIGCONT from all queues. */ - rm_from_queue(sigmask(SIGCONT), &signal->shared_pending); - t = p; - do { - rm_from_queue(sigmask(SIGCONT), &t->pending); - } while_each_thread(p, t); + siginitset(&flush, sigmask(SIGCONT)); + flush_sigqueue_mask(&flush, &signal->shared_pending); + for_each_thread(p, t) + flush_sigqueue_mask(&flush, &t->pending); } else if (sig == SIGCONT) { unsigned int why; /* * Remove all stop signals from all queues, wake all threads. */ - rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending); - t = p; - do { + siginitset(&flush, SIG_KERNEL_STOP_MASK); + flush_sigqueue_mask(&flush, &signal->shared_pending); + for_each_thread(p, t) { + flush_sigqueue_mask(&flush, &t->pending); task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING); - rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending); if (likely(!(t->ptrace & PT_SEIZED))) wake_up_state(t, __TASK_STOPPED); else ptrace_trap_notify(t); - } while_each_thread(p, t); + } /* * Notify the parent with CLD_CONTINUED if we were stopped. @@ -2854,7 +2828,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info, spin_lock_irq(&tsk->sighand->siglock); __set_task_blocked(tsk, &tsk->real_blocked); - siginitset(&tsk->real_blocked, 0); + sigemptyset(&tsk->real_blocked); sig = dequeue_signal(tsk, &mask, info); } spin_unlock_irq(&tsk->sighand->siglock); @@ -3091,18 +3065,39 @@ COMPAT_SYSCALL_DEFINE4(rt_tgsigqueueinfo, } #endif +/* + * For kthreads only, must not be used if cloned with CLONE_SIGHAND + */ +void kernel_sigaction(int sig, __sighandler_t action) +{ + spin_lock_irq(¤t->sighand->siglock); + current->sighand->action[sig - 1].sa.sa_handler = action; + if (action == SIG_IGN) { + sigset_t mask; + + sigemptyset(&mask); + sigaddset(&mask, sig); + + flush_sigqueue_mask(&mask, ¤t->signal->shared_pending); + flush_sigqueue_mask(&mask, ¤t->pending); + recalc_sigpending(); + } + spin_unlock_irq(¤t->sighand->siglock); +} +EXPORT_SYMBOL(kernel_sigaction); + int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) { - struct task_struct *t = current; + struct task_struct *p = current, *t; struct k_sigaction *k; sigset_t mask; if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig))) return -EINVAL; - k = &t->sighand->action[sig-1]; + k = &p->sighand->action[sig-1]; - spin_lock_irq(¤t->sighand->siglock); + spin_lock_irq(&p->sighand->siglock); if (oact) *oact = *k; @@ -3121,21 +3116,20 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) * (for example, SIGCHLD), shall cause the pending signal to * be discarded, whether or not it is blocked" */ - if (sig_handler_ignored(sig_handler(t, sig), sig)) { + if (sig_handler_ignored(sig_handler(p, sig), sig)) { sigemptyset(&mask); sigaddset(&mask, sig); - rm_from_queue_full(&mask, &t->signal->shared_pending); - do { - rm_from_queue_full(&mask, &t->pending); - } while_each_thread(current, t); + flush_sigqueue_mask(&mask, &p->signal->shared_pending); + for_each_thread(p, t) + flush_sigqueue_mask(&mask, &t->pending); } } - spin_unlock_irq(¤t->sighand->siglock); + spin_unlock_irq(&p->sighand->siglock); return 0; } -static int +static int do_sigaltstack (const stack_t __user *uss, stack_t __user *uoss, unsigned long sp) { stack_t oss; diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 4d23dc4d8139..5038b4d3b76d 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -154,6 +154,10 @@ void __init sched_clock_register(u64 (*read)(void), int bits, raw_write_seqcount_end(&cd.seq); r = rate; + /* + * Use 4MHz instead of 1MHz so that things like 1.832Mhz show as + * 1832Khz + */ if (r >= 4000000) { r /= 1000000; r_unit = 'M'; diff --git a/kernel/watchdog.c b/kernel/watchdog.c index e90089fd78e0..18bc5c2a26df 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -239,10 +239,12 @@ static void watchdog_overflow_callback(struct perf_event *event, if (__this_cpu_read(hard_watchdog_warn) == true) return; - if (hardlockup_panic) + if (hardlockup_panic) { + trigger_all_cpu_backtrace(); panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu); - else + } else { WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu); + } __this_cpu_write(hard_watchdog_warn, true); return; @@ -323,8 +325,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) else dump_stack(); - if (softlockup_panic) + if (softlockup_panic) { + trigger_all_cpu_backtrace(); panic("softlockup: hung tasks"); + } __this_cpu_write(soft_watchdog_warn, true); } else __this_cpu_write(soft_watchdog_warn, false); diff --git a/lib/Kconfig b/lib/Kconfig index 4771fb3f4da4..e5ae36dcbf01 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -177,6 +177,13 @@ config CRC8 when they need to do cyclic redundancy check according CRC8 algorithm. Module will be called crc8. +config CRC64_ECMA + tristate "CRC64 ECMA function" + help + This option provides CRC64 ECMA function. Drivers may select this + when they need to do cyclic redundancy check according to the CRC64 + ECMA algorithm. + config AUDIT_GENERIC bool depends on AUDIT && !AUDIT_ARCH @@ -460,4 +467,11 @@ config UCS2_STRING source "lib/fonts/Kconfig" +# +# sg chaining option +# + +config ARCH_HAS_SG_CHAIN + def_bool n + endmenu diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c653d645feea..cd5c82988d68 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -501,12 +501,21 @@ config DEBUG_VM If unsure, say N. +config DEBUG_VM_VMACACHE + bool "Debug VMA caching" + depends on DEBUG_VM + help + Enable this to turn on VMA caching debug information. Doing so + can cause significant overhead, so only enable it in non-production + environments. + + If unsure, say N. + config DEBUG_VM_RB bool "Debug VM red-black trees" depends on DEBUG_VM help - Enable this to turn on more extended checks in the virtual-memory - system that may impact performance. + Enable VM red-black tree debugging information and extra validations. If unsure, say N. diff --git a/lib/Makefile b/lib/Makefile index 0cd7b68e1382..02da5b614a80 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -69,6 +69,7 @@ obj-$(CONFIG_CRC32) += crc32.o obj-$(CONFIG_CRC7) += crc7.o obj-$(CONFIG_LIBCRC32C) += libcrc32c.o obj-$(CONFIG_CRC8) += crc8.o +obj-$(CONFIG_CRC64_ECMA) += crc64_ecma.o obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/ diff --git a/lib/crc64_ecma.c b/lib/crc64_ecma.c new file mode 100644 index 000000000000..41629ea5a60c --- /dev/null +++ b/lib/crc64_ecma.c @@ -0,0 +1,341 @@ +/* + * Copyright 2013 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <linux/module.h> +#include <linux/crc64_ecma.h> + + +#define CRC64_BYTE_MASK 0xFF +#define CRC64_TABLE_SIZE 256 + + +struct crc64_table { + u64 seed; + u64 table[CRC64_TABLE_SIZE]; +}; + + +static struct crc64_table CRC64_ECMA_182 = { + CRC64_DEFAULT_INITVAL, + { + 0x0000000000000000ULL, + 0xb32e4cbe03a75f6fULL, + 0xf4843657a840a05bULL, + 0x47aa7ae9abe7ff34ULL, + 0x7bd0c384ff8f5e33ULL, + 0xc8fe8f3afc28015cULL, + 0x8f54f5d357cffe68ULL, + 0x3c7ab96d5468a107ULL, + 0xf7a18709ff1ebc66ULL, + 0x448fcbb7fcb9e309ULL, + 0x0325b15e575e1c3dULL, + 0xb00bfde054f94352ULL, + 0x8c71448d0091e255ULL, + 0x3f5f08330336bd3aULL, + 0x78f572daa8d1420eULL, + 0xcbdb3e64ab761d61ULL, + 0x7d9ba13851336649ULL, + 0xceb5ed8652943926ULL, + 0x891f976ff973c612ULL, + 0x3a31dbd1fad4997dULL, + 0x064b62bcaebc387aULL, + 0xb5652e02ad1b6715ULL, + 0xf2cf54eb06fc9821ULL, + 0x41e11855055bc74eULL, + 0x8a3a2631ae2dda2fULL, + 0x39146a8fad8a8540ULL, + 0x7ebe1066066d7a74ULL, + 0xcd905cd805ca251bULL, + 0xf1eae5b551a2841cULL, + 0x42c4a90b5205db73ULL, + 0x056ed3e2f9e22447ULL, + 0xb6409f5cfa457b28ULL, + 0xfb374270a266cc92ULL, + 0x48190ecea1c193fdULL, + 0x0fb374270a266cc9ULL, + 0xbc9d3899098133a6ULL, + 0x80e781f45de992a1ULL, + 0x33c9cd4a5e4ecdceULL, + 0x7463b7a3f5a932faULL, + 0xc74dfb1df60e6d95ULL, + 0x0c96c5795d7870f4ULL, + 0xbfb889c75edf2f9bULL, + 0xf812f32ef538d0afULL, + 0x4b3cbf90f69f8fc0ULL, + 0x774606fda2f72ec7ULL, + 0xc4684a43a15071a8ULL, + 0x83c230aa0ab78e9cULL, + 0x30ec7c140910d1f3ULL, + 0x86ace348f355aadbULL, + 0x3582aff6f0f2f5b4ULL, + 0x7228d51f5b150a80ULL, + 0xc10699a158b255efULL, + 0xfd7c20cc0cdaf4e8ULL, + 0x4e526c720f7dab87ULL, + 0x09f8169ba49a54b3ULL, + 0xbad65a25a73d0bdcULL, + 0x710d64410c4b16bdULL, + 0xc22328ff0fec49d2ULL, + 0x85895216a40bb6e6ULL, + 0x36a71ea8a7ace989ULL, + 0x0adda7c5f3c4488eULL, + 0xb9f3eb7bf06317e1ULL, + 0xfe5991925b84e8d5ULL, + 0x4d77dd2c5823b7baULL, + 0x64b62bcaebc387a1ULL, + 0xd7986774e864d8ceULL, + 0x90321d9d438327faULL, + 0x231c512340247895ULL, + 0x1f66e84e144cd992ULL, + 0xac48a4f017eb86fdULL, + 0xebe2de19bc0c79c9ULL, + 0x58cc92a7bfab26a6ULL, + 0x9317acc314dd3bc7ULL, + 0x2039e07d177a64a8ULL, + 0x67939a94bc9d9b9cULL, + 0xd4bdd62abf3ac4f3ULL, + 0xe8c76f47eb5265f4ULL, + 0x5be923f9e8f53a9bULL, + 0x1c4359104312c5afULL, + 0xaf6d15ae40b59ac0ULL, + 0x192d8af2baf0e1e8ULL, + 0xaa03c64cb957be87ULL, + 0xeda9bca512b041b3ULL, + 0x5e87f01b11171edcULL, + 0x62fd4976457fbfdbULL, + 0xd1d305c846d8e0b4ULL, + 0x96797f21ed3f1f80ULL, + 0x2557339fee9840efULL, + 0xee8c0dfb45ee5d8eULL, + 0x5da24145464902e1ULL, + 0x1a083bacedaefdd5ULL, + 0xa9267712ee09a2baULL, + 0x955cce7fba6103bdULL, + 0x267282c1b9c65cd2ULL, + 0x61d8f8281221a3e6ULL, + 0xd2f6b4961186fc89ULL, + 0x9f8169ba49a54b33ULL, + 0x2caf25044a02145cULL, + 0x6b055fede1e5eb68ULL, + 0xd82b1353e242b407ULL, + 0xe451aa3eb62a1500ULL, + 0x577fe680b58d4a6fULL, + 0x10d59c691e6ab55bULL, + 0xa3fbd0d71dcdea34ULL, + 0x6820eeb3b6bbf755ULL, + 0xdb0ea20db51ca83aULL, + 0x9ca4d8e41efb570eULL, + 0x2f8a945a1d5c0861ULL, + 0x13f02d374934a966ULL, + 0xa0de61894a93f609ULL, + 0xe7741b60e174093dULL, + 0x545a57dee2d35652ULL, + 0xe21ac88218962d7aULL, + 0x5134843c1b317215ULL, + 0x169efed5b0d68d21ULL, + 0xa5b0b26bb371d24eULL, + 0x99ca0b06e7197349ULL, + 0x2ae447b8e4be2c26ULL, + 0x6d4e3d514f59d312ULL, + 0xde6071ef4cfe8c7dULL, + 0x15bb4f8be788911cULL, + 0xa6950335e42fce73ULL, + 0xe13f79dc4fc83147ULL, + 0x521135624c6f6e28ULL, + 0x6e6b8c0f1807cf2fULL, + 0xdd45c0b11ba09040ULL, + 0x9aefba58b0476f74ULL, + 0x29c1f6e6b3e0301bULL, + 0xc96c5795d7870f42ULL, + 0x7a421b2bd420502dULL, + 0x3de861c27fc7af19ULL, + 0x8ec62d7c7c60f076ULL, + 0xb2bc941128085171ULL, + 0x0192d8af2baf0e1eULL, + 0x4638a2468048f12aULL, + 0xf516eef883efae45ULL, + 0x3ecdd09c2899b324ULL, + 0x8de39c222b3eec4bULL, + 0xca49e6cb80d9137fULL, + 0x7967aa75837e4c10ULL, + 0x451d1318d716ed17ULL, + 0xf6335fa6d4b1b278ULL, + 0xb199254f7f564d4cULL, + 0x02b769f17cf11223ULL, + 0xb4f7f6ad86b4690bULL, + 0x07d9ba1385133664ULL, + 0x4073c0fa2ef4c950ULL, + 0xf35d8c442d53963fULL, + 0xcf273529793b3738ULL, + 0x7c0979977a9c6857ULL, + 0x3ba3037ed17b9763ULL, + 0x888d4fc0d2dcc80cULL, + 0x435671a479aad56dULL, + 0xf0783d1a7a0d8a02ULL, + 0xb7d247f3d1ea7536ULL, + 0x04fc0b4dd24d2a59ULL, + 0x3886b22086258b5eULL, + 0x8ba8fe9e8582d431ULL, + 0xcc0284772e652b05ULL, + 0x7f2cc8c92dc2746aULL, + 0x325b15e575e1c3d0ULL, + 0x8175595b76469cbfULL, + 0xc6df23b2dda1638bULL, + 0x75f16f0cde063ce4ULL, + 0x498bd6618a6e9de3ULL, + 0xfaa59adf89c9c28cULL, + 0xbd0fe036222e3db8ULL, + 0x0e21ac88218962d7ULL, + 0xc5fa92ec8aff7fb6ULL, + 0x76d4de52895820d9ULL, + 0x317ea4bb22bfdfedULL, + 0x8250e80521188082ULL, + 0xbe2a516875702185ULL, + 0x0d041dd676d77eeaULL, + 0x4aae673fdd3081deULL, + 0xf9802b81de97deb1ULL, + 0x4fc0b4dd24d2a599ULL, + 0xfceef8632775faf6ULL, + 0xbb44828a8c9205c2ULL, + 0x086ace348f355aadULL, + 0x34107759db5dfbaaULL, + 0x873e3be7d8faa4c5ULL, + 0xc094410e731d5bf1ULL, + 0x73ba0db070ba049eULL, + 0xb86133d4dbcc19ffULL, + 0x0b4f7f6ad86b4690ULL, + 0x4ce50583738cb9a4ULL, + 0xffcb493d702be6cbULL, + 0xc3b1f050244347ccULL, + 0x709fbcee27e418a3ULL, + 0x3735c6078c03e797ULL, + 0x841b8ab98fa4b8f8ULL, + 0xadda7c5f3c4488e3ULL, + 0x1ef430e13fe3d78cULL, + 0x595e4a08940428b8ULL, + 0xea7006b697a377d7ULL, + 0xd60abfdbc3cbd6d0ULL, + 0x6524f365c06c89bfULL, + 0x228e898c6b8b768bULL, + 0x91a0c532682c29e4ULL, + 0x5a7bfb56c35a3485ULL, + 0xe955b7e8c0fd6beaULL, + 0xaeffcd016b1a94deULL, + 0x1dd181bf68bdcbb1ULL, + 0x21ab38d23cd56ab6ULL, + 0x9285746c3f7235d9ULL, + 0xd52f0e859495caedULL, + 0x6601423b97329582ULL, + 0xd041dd676d77eeaaULL, + 0x636f91d96ed0b1c5ULL, + 0x24c5eb30c5374ef1ULL, + 0x97eba78ec690119eULL, + 0xab911ee392f8b099ULL, + 0x18bf525d915feff6ULL, + 0x5f1528b43ab810c2ULL, + 0xec3b640a391f4fadULL, + 0x27e05a6e926952ccULL, + 0x94ce16d091ce0da3ULL, + 0xd3646c393a29f297ULL, + 0x604a2087398eadf8ULL, + 0x5c3099ea6de60cffULL, + 0xef1ed5546e415390ULL, + 0xa8b4afbdc5a6aca4ULL, + 0x1b9ae303c601f3cbULL, + 0x56ed3e2f9e224471ULL, + 0xe5c372919d851b1eULL, + 0xa26908783662e42aULL, + 0x114744c635c5bb45ULL, + 0x2d3dfdab61ad1a42ULL, + 0x9e13b115620a452dULL, + 0xd9b9cbfcc9edba19ULL, + 0x6a978742ca4ae576ULL, + 0xa14cb926613cf817ULL, + 0x1262f598629ba778ULL, + 0x55c88f71c97c584cULL, + 0xe6e6c3cfcadb0723ULL, + 0xda9c7aa29eb3a624ULL, + 0x69b2361c9d14f94bULL, + 0x2e184cf536f3067fULL, + 0x9d36004b35545910ULL, + 0x2b769f17cf112238ULL, + 0x9858d3a9ccb67d57ULL, + 0xdff2a94067518263ULL, + 0x6cdce5fe64f6dd0cULL, + 0x50a65c93309e7c0bULL, + 0xe388102d33392364ULL, + 0xa4226ac498dedc50ULL, + 0x170c267a9b79833fULL, + 0xdcd7181e300f9e5eULL, + 0x6ff954a033a8c131ULL, + 0x28532e49984f3e05ULL, + 0x9b7d62f79be8616aULL, + 0xa707db9acf80c06dULL, + 0x14299724cc279f02ULL, + 0x5383edcd67c06036ULL, + 0xe0ada17364673f59ULL + } +}; + + +/* + * crc64_ecma_seed - Initializes the CRC64 ECMA seed. + */ +u64 crc64_ecma_seed(void) +{ + return CRC64_ECMA_182.seed; +} +EXPORT_SYMBOL(crc64_ecma_seed); + +/* + * crc64_ecma - Computes the 64 bit ECMA CRC. + * + * pdata: pointer to the data to compute checksum for. + * nbytes: number of bytes in data buffer. + * seed: CRC seed. + */ +u64 crc64_ecma(u8 const *pdata, u32 nbytes, u64 seed) +{ + unsigned int i; + u64 crc = seed; + + for (i = 0; i < nbytes; i++) + crc = CRC64_ECMA_182.table[(crc ^ pdata[i]) & CRC64_BYTE_MASK] ^ + (crc >> 8); + + return crc; +} +EXPORT_SYMBOL(crc64_ecma); + +MODULE_DESCRIPTION("CRC64 ECMA function"); +MODULE_AUTHOR("Freescale Semiconductor Inc."); +MODULE_LICENSE("GPL"); diff --git a/lib/scatterlist.c b/lib/scatterlist.c index 3a8e8e8fb2a5..4251cbd5becb 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -73,7 +73,7 @@ EXPORT_SYMBOL(sg_nents); **/ struct scatterlist *sg_last(struct scatterlist *sgl, unsigned int nents) { -#ifndef ARCH_HAS_SG_CHAIN +#ifndef CONFIG_ARCH_HAS_SG_CHAIN struct scatterlist *ret = &sgl[nents - 1]; #else struct scatterlist *sg, *ret = NULL; @@ -251,7 +251,7 @@ int __sg_alloc_table(struct sg_table *table, unsigned int nents, if (nents == 0) return -EINVAL; -#ifndef ARCH_HAS_SG_CHAIN +#ifndef CONFIG_ARCH_HAS_SG_CHAIN if (WARN_ON_ONCE(nents > max_ents)) return -EINVAL; #endif diff --git a/lib/string.c b/lib/string.c index 9b1f9062a202..89ad0f035f48 100644 --- a/lib/string.c +++ b/lib/string.c @@ -107,7 +107,7 @@ EXPORT_SYMBOL(strcpy); #ifndef __HAVE_ARCH_STRNCPY /** - * strncpy - Copy a length-limited, %NUL-terminated string + * strncpy - Copy a length-limited, C-string * @dest: Where to copy the string to * @src: Where to copy the string from * @count: The maximum number of bytes to copy @@ -136,7 +136,7 @@ EXPORT_SYMBOL(strncpy); #ifndef __HAVE_ARCH_STRLCPY /** - * strlcpy - Copy a %NUL terminated string into a sized buffer + * strlcpy - Copy a C-string into a sized buffer * @dest: Where to copy the string to * @src: Where to copy the string from * @size: size of destination buffer @@ -182,7 +182,7 @@ EXPORT_SYMBOL(strcat); #ifndef __HAVE_ARCH_STRNCAT /** - * strncat - Append a length-limited, %NUL-terminated string to another + * strncat - Append a length-limited, C-string to another * @dest: The string to be appended to * @src: The string to append to it * @count: The maximum numbers of bytes to copy @@ -211,7 +211,7 @@ EXPORT_SYMBOL(strncat); #ifndef __HAVE_ARCH_STRLCAT /** - * strlcat - Append a length-limited, %NUL-terminated string to another + * strlcat - Append a length-limited, C-string to another * @dest: The string to be appended to * @src: The string to append to it * @count: The size of the destination buffer. diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 0648291cdafe..e30d885d9631 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1183,6 +1183,21 @@ char *address_val(char *buf, char *end, const void *addr, return number(buf, end, num, spec); } +static noinline_for_stack +char *comm_name(char *buf, char *end, struct task_struct *tsk, + struct printf_spec spec, const char *fmt) +{ + char name[TASK_COMM_LEN]; + + /* Caller can pass NULL instead of current. */ + if (!tsk) + tsk = current; + /* Not using get_task_comm() in case I'm in IRQ context. */ + memcpy(name, tsk->comm, TASK_COMM_LEN); + name[sizeof(name) - 1] = '\0'; + return string(buf, end, name, spec); +} + int kptr_restrict __read_mostly; /* @@ -1250,6 +1265,7 @@ int kptr_restrict __read_mostly; * (default assumed to be phys_addr_t, passed by reference) * - 'd[234]' For a dentry name (optionally 2-4 last components) * - 'D[234]' Same as 'd' but for a struct file + * - 'T' task_struct->comm * * Note: The difference between 'S' and 'F' is that on ia64 and ppc64 * function pointers are really function descriptors, which contain a @@ -1261,7 +1277,7 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, { int default_width = 2 * sizeof(void *) + (spec.flags & SPECIAL ? 2 : 0); - if (!ptr && *fmt != 'K') { + if (!ptr && *fmt != 'K' && *fmt != 'T') { /* * Print (null) with the same width as a pointer so it makes * tabular output look nice. @@ -1389,6 +1405,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, return dentry_name(buf, end, ((const struct file *)ptr)->f_path.dentry, spec, fmt); + case 'T': + return comm_name(buf, end, ptr, spec, fmt); } spec.flags |= SMALL; if (spec.field_width == -1) { diff --git a/lib/xz/Kconfig b/lib/xz/Kconfig index 08837db52d94..12d2d777f36b 100644 --- a/lib/xz/Kconfig +++ b/lib/xz/Kconfig @@ -9,33 +9,33 @@ config XZ_DEC if XZ_DEC config XZ_DEC_X86 - bool "x86 BCJ filter decoder" - default y if X86 + bool "x86 BCJ filter decoder" if EXPERT + default y select XZ_DEC_BCJ config XZ_DEC_POWERPC - bool "PowerPC BCJ filter decoder" - default y if PPC + bool "PowerPC BCJ filter decoder" if EXPERT + default y select XZ_DEC_BCJ config XZ_DEC_IA64 - bool "IA-64 BCJ filter decoder" - default y if IA64 + bool "IA-64 BCJ filter decoder" if EXPERT + default y select XZ_DEC_BCJ config XZ_DEC_ARM - bool "ARM BCJ filter decoder" - default y if ARM + bool "ARM BCJ filter decoder" if EXPERT + default y select XZ_DEC_BCJ config XZ_DEC_ARMTHUMB - bool "ARM-Thumb BCJ filter decoder" - default y if (ARM && ARM_THUMB) + bool "ARM-Thumb BCJ filter decoder" if EXPERT + default y select XZ_DEC_BCJ config XZ_DEC_SPARC - bool "SPARC BCJ filter decoder" - default y if SPARC + bool "SPARC BCJ filter decoder" if EXPERT + default y select XZ_DEC_BCJ endif diff --git a/mm/compaction.c b/mm/compaction.c index 37f976287068..9635083cd8ec 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -293,14 +293,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); - total_isolated += isolated; - for (i = 0; i < isolated; i++) { - list_add(&page->lru, freelist); - page++; - } - - /* If a page was split, advance to the end of it */ if (isolated) { + total_isolated += isolated; + for (i = 0; i < isolated; i++) { + list_add(&page->lru, freelist); + page++; + } + + /* If a page was split, advance to the end of it */ blockpfn += isolated - 1; cursor += isolated - 1; continue; @@ -309,9 +309,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, isolate_fail: if (strict) break; - else - continue; - } trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated); diff --git a/mm/fremap.c b/mm/fremap.c index 34feba60a17e..2c5646f11f41 100644 --- a/mm/fremap.c +++ b/mm/fremap.c @@ -82,13 +82,10 @@ static int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma, ptfile = pgoff_to_pte(pgoff); - if (!pte_none(*pte)) { - if (pte_present(*pte) && pte_soft_dirty(*pte)) - pte_file_mksoft_dirty(ptfile); + if (!pte_none(*pte)) zap_pte(mm, vma, addr, pte); - } - set_pte_at(mm, addr, pte, ptfile); + set_pte_at(mm, addr, pte, pte_file_mksoft_dirty(ptfile)); /* * We don't need to run update_mmu_cache() here because the "file pte" * being installed by install_file_pte() is not a real pte - it's a diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 64635f5278ff..7577c40f2ad7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1800,7 +1800,7 @@ static void __split_huge_page(struct page *page, struct list_head *list) { int mapcount, mapcount2; - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff_t pgoff = page_pgoff(page); struct anon_vma_chain *avc; BUG_ON(!PageHead(page)); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dd30f22b35e0..86c7c5bbc2f7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -31,6 +31,7 @@ #include <linux/io.h> #include <linux/hugetlb.h> +#include <linux/hugetlb_inline.h> #include <linux/hugetlb_cgroup.h> #include <linux/node.h> #include "internal.h" @@ -1172,6 +1173,7 @@ static void return_unused_surplus_pages(struct hstate *h, while (nr_pages--) { if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1)) break; + cond_resched_lock(&hugetlb_lock); } } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 29501f040568..e59f5729e5e6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v) } #endif -static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) +int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) { struct res_counter *fail_res; int ret = 0; @@ -2982,7 +2982,7 @@ static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size) return ret; } -static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) +void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size) { res_counter_uncharge(&memcg->res, size); if (do_swap_account) @@ -3531,11 +3531,12 @@ __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order) /* * Disabling accounting is only relevant for some specific memcg * internal allocations. Therefore we would initially not have such - * check here, since direct calls to the page allocator that are marked - * with GFP_KMEMCG only happen outside memcg core. We are mostly - * concerned with cache allocations, and by having this test at - * memcg_kmem_get_cache, we are already able to relay the allocation to - * the root cache and bypass the memcg cache altogether. + * check here, since direct calls to the page allocator that are + * accounted to kmemcg (alloc_kmem_pages and friends) only happen + * outside memcg core. We are mostly concerned with cache allocations, + * and by having this test at memcg_kmem_get_cache, we are already able + * to relay the allocation to the root cache and bypass the memcg cache + * altogether. * * There is one exception, though: the SLUB allocator does not create * large order caches, but rather service large kmallocs directly from @@ -6777,30 +6778,29 @@ static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, } #endif -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, +static int mem_cgroup_count_precharge_pte(pte_t *pte, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct vm_area_struct *vma = walk->private; - pte_t *pte; + if (get_mctgt_type(walk->vma, addr, *pte, NULL)) + mc.precharge++; /* increment precharge temporarily */ + return 0; +} + +static int mem_cgroup_count_precharge_pmd(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; spinlock_t *ptl; if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) mc.precharge += HPAGE_PMD_NR; spin_unlock(ptl); - return 0; + /* don't call mem_cgroup_count_precharge_pte() */ + walk->skip = 1; } - - if (pmd_trans_unstable(pmd)) - return 0; - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - for (; addr != end; pte++, addr += PAGE_SIZE) - if (get_mctgt_type(vma, addr, *pte, NULL)) - mc.precharge++; /* increment precharge temporarily */ - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - return 0; } @@ -6809,18 +6809,14 @@ static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) unsigned long precharge; struct vm_area_struct *vma; + struct mm_walk mem_cgroup_count_precharge_walk = { + .pmd_entry = mem_cgroup_count_precharge_pmd, + .pte_entry = mem_cgroup_count_precharge_pte, + .mm = mm, + }; down_read(&mm->mmap_sem); - for (vma = mm->mmap; vma; vma = vma->vm_next) { - struct mm_walk mem_cgroup_count_precharge_walk = { - .pmd_entry = mem_cgroup_count_precharge_pte_range, - .mm = mm, - .private = vma, - }; - if (is_vm_hugetlb_page(vma)) - continue; - walk_page_range(vma->vm_start, vma->vm_end, - &mem_cgroup_count_precharge_walk); - } + for (vma = mm->mmap; vma; vma = vma->vm_next) + walk_page_vma(vma, &mem_cgroup_count_precharge_walk); up_read(&mm->mmap_sem); precharge = mc.precharge; @@ -6959,7 +6955,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, struct mm_walk *walk) { int ret = 0; - struct vm_area_struct *vma = walk->private; + struct vm_area_struct *vma = walk->vma; pte_t *pte; spinlock_t *ptl; enum mc_target_type target_type; @@ -7060,6 +7056,10 @@ put: /* get_mctgt_type() gets the page */ static void mem_cgroup_move_charge(struct mm_struct *mm) { struct vm_area_struct *vma; + struct mm_walk mem_cgroup_move_charge_walk = { + .pmd_entry = mem_cgroup_move_charge_pte_range, + .mm = mm, + }; lru_add_drain_all(); retry: @@ -7075,24 +7075,8 @@ retry: cond_resched(); goto retry; } - for (vma = mm->mmap; vma; vma = vma->vm_next) { - int ret; - struct mm_walk mem_cgroup_move_charge_walk = { - .pmd_entry = mem_cgroup_move_charge_pte_range, - .mm = mm, - .private = vma, - }; - if (is_vm_hugetlb_page(vma)) - continue; - ret = walk_page_range(vma->vm_start, vma->vm_end, - &mem_cgroup_move_charge_walk); - if (ret) - /* - * means we have consumed all precharges and failed in - * doing additional charge. Just abandon here. - */ - break; - } + for (vma = mm->mmap; vma; vma = vma->vm_next) + walk_page_vma(vma, &mem_cgroup_move_charge_walk); up_read(&mm->mmap_sem); } diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 35ef28acf137..12ac5df4d49a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -202,7 +202,7 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno, #ifdef __ARCH_SI_TRAPNO si.si_trapno = trapno; #endif - si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT; + si.si_addr_lsb = page_size_order(page) + PAGE_SHIFT; if ((flags & MF_ACTION_REQUIRED) && t == current) { si.si_code = BUS_MCEERR_AR; @@ -404,7 +404,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, if (av == NULL) /* Not actually mapped anymore */ return; - pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff = page_pgoff(page); read_lock(&tasklist_lock); for_each_process (tsk) { struct anon_vma_chain *vmac; @@ -437,7 +437,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, mutex_lock(&mapping->i_mmap_mutex); read_lock(&tasklist_lock); for_each_process(tsk) { - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff_t pgoff = page_pgoff(page); if (!task_early_kill(tsk)) continue; diff --git a/mm/memory.c b/mm/memory.c index d0f0bef3be48..9f562946f5e7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3578,6 +3578,8 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma, int dirtied = 0; int ret, tmp; + WARN_ON_ONCE(!rwsem_is_locked(&mm->mmap_sem)); + ret = __do_fault(vma, address, pgoff, flags, &fault_page); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) return ret; @@ -3608,6 +3610,12 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (set_page_dirty(fault_page)) dirtied = 1; + /* + * Take a local copy of the address_space - page.mapping may be zeroed + * by truncate after unlock_page(). The address_space itself remains + * pinned by vma->vm_file's reference. We rely on unlock_page()'s + * release semantics to prevent the compiler from undoing this copying. + */ mapping = fault_page->mapping; unlock_page(fault_page); if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) { diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 78e1472933ea..9d2ef4111a4c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -476,140 +476,70 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { static void migrate_page_add(struct page *page, struct list_head *pagelist, unsigned long flags); +struct queue_pages { + struct list_head *pagelist; + unsigned long flags; + nodemask_t *nmask; + struct vm_area_struct *prev; +}; + /* * Scan through pages checking if pages follow certain conditions, * and move them to the pagelist if they do. */ -static int queue_pages_pte_range(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, unsigned long end, - const nodemask_t *nodes, unsigned long flags, - void *private) +static int queue_pages_pte(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) { - pte_t *orig_pte; - pte_t *pte; - spinlock_t *ptl; - - orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - do { - struct page *page; - int nid; + struct vm_area_struct *vma = walk->vma; + struct page *page; + struct queue_pages *qp = walk->private; + unsigned long flags = qp->flags; + int nid; - if (!pte_present(*pte)) - continue; - page = vm_normal_page(vma, addr, *pte); - if (!page) - continue; - /* - * vm_normal_page() filters out zero pages, but there might - * still be PageReserved pages to skip, perhaps in a VDSO. - */ - if (PageReserved(page)) - continue; - nid = page_to_nid(page); - if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT)) - continue; + if (!pte_present(*pte)) + return 0; + page = vm_normal_page(vma, addr, *pte); + if (!page) + return 0; + /* + * vm_normal_page() filters out zero pages, but there might + * still be PageReserved pages to skip, perhaps in a VDSO. + */ + if (PageReserved(page)) + return 0; + nid = page_to_nid(page); + if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT)) + return 0; - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) - migrate_page_add(page, private, flags); - else - break; - } while (pte++, addr += PAGE_SIZE, addr != end); - pte_unmap_unlock(orig_pte, ptl); - return addr != end; + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) + migrate_page_add(page, qp->pagelist, flags); + return 0; } -static void queue_pages_hugetlb_pmd_range(struct vm_area_struct *vma, - pmd_t *pmd, const nodemask_t *nodes, unsigned long flags, - void *private) +static int queue_pages_hugetlb(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE + struct queue_pages *qp = walk->private; + unsigned long flags = qp->flags; int nid; struct page *page; - spinlock_t *ptl; + pte_t entry; - ptl = huge_pte_lock(hstate_vma(vma), vma->vm_mm, (pte_t *)pmd); - page = pte_page(huge_ptep_get((pte_t *)pmd)); + entry = huge_ptep_get(pte); + if (!pte_present(entry)) + return 0; + page = pte_page(entry); nid = page_to_nid(page); - if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT)) - goto unlock; + if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT)) + return 0; /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ if (flags & (MPOL_MF_MOVE_ALL) || (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) - isolate_huge_page(page, private); -unlock: - spin_unlock(ptl); + isolate_huge_page(page, qp->pagelist); #else BUG(); #endif -} - -static inline int queue_pages_pmd_range(struct vm_area_struct *vma, pud_t *pud, - unsigned long addr, unsigned long end, - const nodemask_t *nodes, unsigned long flags, - void *private) -{ - pmd_t *pmd; - unsigned long next; - - pmd = pmd_offset(pud, addr); - do { - next = pmd_addr_end(addr, end); - if (!pmd_present(*pmd)) - continue; - if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) { - queue_pages_hugetlb_pmd_range(vma, pmd, nodes, - flags, private); - continue; - } - split_huge_page_pmd(vma, addr, pmd); - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - continue; - if (queue_pages_pte_range(vma, pmd, addr, next, nodes, - flags, private)) - return -EIO; - } while (pmd++, addr = next, addr != end); - return 0; -} - -static inline int queue_pages_pud_range(struct vm_area_struct *vma, pgd_t *pgd, - unsigned long addr, unsigned long end, - const nodemask_t *nodes, unsigned long flags, - void *private) -{ - pud_t *pud; - unsigned long next; - - pud = pud_offset(pgd, addr); - do { - next = pud_addr_end(addr, end); - if (pud_huge(*pud) && is_vm_hugetlb_page(vma)) - continue; - if (pud_none_or_clear_bad(pud)) - continue; - if (queue_pages_pmd_range(vma, pud, addr, next, nodes, - flags, private)) - return -EIO; - } while (pud++, addr = next, addr != end); - return 0; -} - -static inline int queue_pages_pgd_range(struct vm_area_struct *vma, - unsigned long addr, unsigned long end, - const nodemask_t *nodes, unsigned long flags, - void *private) -{ - pgd_t *pgd; - unsigned long next; - - pgd = pgd_offset(vma->vm_mm, addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_none_or_clear_bad(pgd)) - continue; - if (queue_pages_pud_range(vma, pgd, addr, next, nodes, - flags, private)) - return -EIO; - } while (pgd++, addr = next, addr != end); return 0; } @@ -642,6 +572,45 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma, } #endif /* CONFIG_NUMA_BALANCING */ +static int queue_pages_test_walk(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + struct queue_pages *qp = walk->private; + unsigned long endvma = vma->vm_end; + unsigned long flags = qp->flags; + + if (endvma > end) + endvma = end; + if (vma->vm_start > start) + start = vma->vm_start; + + if (!(flags & MPOL_MF_DISCONTIG_OK)) { + if (!vma->vm_next && vma->vm_end < end) + return -EFAULT; + if (qp->prev && qp->prev->vm_end < vma->vm_start) + return -EFAULT; + } + + qp->prev = vma; + walk->skip = 1; + + if (vma->vm_flags & VM_PFNMAP) + return 0; + + if (flags & MPOL_MF_LAZY) { + change_prot_numa(vma, start, endvma); + return 0; + } + + if ((flags & MPOL_MF_STRICT) || + ((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) && + vma_migratable(vma))) + /* queue pages from current vma */ + walk->skip = 0; + return 0; +} + /* * Walk through page tables and collect pages to be migrated. * @@ -651,51 +620,29 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma, */ static struct vm_area_struct * queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, - const nodemask_t *nodes, unsigned long flags, void *private) + nodemask_t *nodes, unsigned long flags, + struct list_head *pagelist) { int err; - struct vm_area_struct *first, *vma, *prev; - - - first = find_vma(mm, start); - if (!first) - return ERR_PTR(-EFAULT); - prev = NULL; - for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) { - unsigned long endvma = vma->vm_end; - - if (endvma > end) - endvma = end; - if (vma->vm_start > start) - start = vma->vm_start; - - if (!(flags & MPOL_MF_DISCONTIG_OK)) { - if (!vma->vm_next && vma->vm_end < end) - return ERR_PTR(-EFAULT); - if (prev && prev->vm_end < vma->vm_start) - return ERR_PTR(-EFAULT); - } - - if (flags & MPOL_MF_LAZY) { - change_prot_numa(vma, start, endvma); - goto next; - } - - if ((flags & MPOL_MF_STRICT) || - ((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) && - vma_migratable(vma))) { - - err = queue_pages_pgd_range(vma, start, endvma, nodes, - flags, private); - if (err) { - first = ERR_PTR(err); - break; - } - } -next: - prev = vma; - } - return first; + struct queue_pages qp = { + .pagelist = pagelist, + .flags = flags, + .nmask = nodes, + .prev = NULL, + }; + struct mm_walk queue_pages_walk = { + .hugetlb_entry = queue_pages_hugetlb, + .pte_entry = queue_pages_pte, + .test_walk = queue_pages_test_walk, + .mm = mm, + .private = &qp, + }; + + err = walk_page_range(start, end, &queue_pages_walk); + if (err < 0) + return ERR_PTR(err); + else + return find_vma(mm, start); } /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dba2933c9c0..7cfdcd808f52 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2697,7 +2697,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int migratetype = allocflags_to_migratetype(gfp_mask); unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; - struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; @@ -2716,13 +2715,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - /* - * Will only have any effect when __GFP_KMEMCG is set. This is - * verified in the (always inline) callee - */ - if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) - return NULL; - retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); @@ -2782,8 +2774,6 @@ out: if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; - memcg_kmem_commit_charge(page, memcg, order); - return page; } EXPORT_SYMBOL(__alloc_pages_nodemask); @@ -2837,27 +2827,51 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); /* - * __free_memcg_kmem_pages and free_memcg_kmem_pages will free - * pages allocated with __GFP_KMEMCG. + * alloc_kmem_pages charges newly allocated pages to the kmem resource counter + * of the current memory cgroup. * - * Those pages are accounted to a particular memcg, embedded in the - * corresponding page_cgroup. To avoid adding a hit in the allocator to search - * for that information only to find out that it is NULL for users who have no - * interest in that whatsoever, we provide these functions. - * - * The caller knows better which flags it relies on. + * It should be used when the caller would like to use kmalloc, but since the + * allocation is large, it has to fall back to the page allocator. + */ +struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages(gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask, unsigned int order) +{ + struct page *page; + struct mem_cgroup *memcg = NULL; + + if (!memcg_kmem_newpage_charge(gfp_mask, &memcg, order)) + return NULL; + page = alloc_pages_node(nid, gfp_mask, order); + memcg_kmem_commit_charge(page, memcg, order); + return page; +} + +/* + * __free_kmem_pages and free_kmem_pages will free pages allocated with + * alloc_kmem_pages. */ -void __free_memcg_kmem_pages(struct page *page, unsigned int order) +void __free_kmem_pages(struct page *page, unsigned int order) { memcg_kmem_uncharge_pages(page, order); __free_pages(page, order); } -void free_memcg_kmem_pages(unsigned long addr, unsigned int order) +void free_kmem_pages(unsigned long addr, unsigned int order) { if (addr != 0) { VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_memcg_kmem_pages(virt_to_page((void *)addr), order); + __free_kmem_pages(virt_to_page((void *)addr), order); } } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 2beeabf502c5..b2a075ffb96e 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,29 +3,58 @@ #include <linux/sched.h> #include <linux/hugetlb.h> -static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, - struct mm_walk *walk) +/* + * Check the current skip status of page table walker. + * + * Here what I mean by skip is to skip lower level walking, and that was + * determined for each entry independently. For example, when walk_pmd_range + * handles a pmd_trans_huge we don't have to walk over ptes under that pmd, + * and the skipping does not affect the walking over ptes under other pmds. + * That's why we reset @walk->skip after tested. + */ +static bool skip_lower_level_walking(struct mm_walk *walk) { + if (walk->skip) { + walk->skip = 0; + return true; + } + return false; +} + +static int walk_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mm_struct *mm = walk->mm; pte_t *pte; + pte_t *orig_pte; + spinlock_t *ptl; int err = 0; - pte = pte_offset_map(pmd, addr); - for (;;) { + orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + do { + if (pte_none(*pte)) { + if (walk->pte_hole) + err = walk->pte_hole(addr, addr + PAGE_SIZE, + walk); + if (err) + break; + continue; + } + /* + * Callers should have their own way to handle swap entries + * in walk->pte_entry(). + */ err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, walk); if (err) break; - addr += PAGE_SIZE; - if (addr == end) - break; - pte++; - } - - pte_unmap(pte); - return err; + } while (pte++, addr += PAGE_SIZE, addr < end); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + return addr == end ? 0 : err; } -static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int walk_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, struct mm_walk *walk) { pmd_t *pmd; unsigned long next; @@ -35,6 +64,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, do { again: next = pmd_addr_end(addr, end); + if (pmd_none(*pmd)) { if (walk->pte_hole) err = walk->pte_hole(addr, next, walk); @@ -42,35 +72,32 @@ again: break; continue; } - /* - * This implies that each ->pmd_entry() handler - * needs to know about pmd_trans_huge() pmds - */ - if (walk->pmd_entry) - err = walk->pmd_entry(pmd, addr, next, walk); - if (err) - break; - /* - * Check this here so we only break down trans_huge - * pages when we _need_ to - */ - if (!walk->pte_entry) - continue; + if (walk->pmd_entry) { + err = walk->pmd_entry(pmd, addr, next, walk); + if (skip_lower_level_walking(walk)) + continue; + if (err) + break; + } - split_huge_page_pmd_mm(walk->mm, addr, pmd); - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - goto again; - err = walk_pte_range(pmd, addr, next, walk); - if (err) - break; - } while (pmd++, addr = next, addr != end); + if (walk->pte_entry) { + if (walk->vma) { + split_huge_page_pmd(walk->vma, addr, pmd); + if (pmd_trans_unstable(pmd)) + goto again; + } + err = walk_pte_range(pmd, addr, next, walk); + if (err) + break; + } + } while (pmd++, addr = next, addr < end); return err; } -static int walk_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int walk_pud_range(pgd_t *pgd, unsigned long addr, + unsigned long end, struct mm_walk *walk) { pud_t *pud; unsigned long next; @@ -79,6 +106,7 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end, pud = pud_offset(pgd, addr); do { next = pud_addr_end(addr, end); + if (pud_none_or_clear_bad(pud)) { if (walk->pte_hole) err = walk->pte_hole(addr, next, walk); @@ -86,13 +114,58 @@ static int walk_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end, break; continue; } - if (walk->pud_entry) + + if (walk->pud_entry) { err = walk->pud_entry(pud, addr, next, walk); - if (!err && (walk->pmd_entry || walk->pte_entry)) + if (skip_lower_level_walking(walk)) + continue; + if (err) + break; + } + + if (walk->pmd_entry || walk->pte_entry) { err = walk_pmd_range(pud, addr, next, walk); - if (err) - break; - } while (pud++, addr = next, addr != end); + if (err) + break; + } + } while (pud++, addr = next, addr < end); + + return err; +} + +static int walk_pgd_range(unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + pgd_t *pgd; + unsigned long next; + int err = 0; + + pgd = pgd_offset(walk->mm, addr); + do { + next = pgd_addr_end(addr, end); + + if (pgd_none_or_clear_bad(pgd)) { + if (walk->pte_hole) + err = walk->pte_hole(addr, next, walk); + if (err) + break; + continue; + } + + if (walk->pgd_entry) { + err = walk->pgd_entry(pgd, addr, next, walk); + if (skip_lower_level_walking(walk)) + continue; + if (err) + break; + } + + if (walk->pud_entry || walk->pmd_entry || walk->pte_entry) { + err = walk_pud_range(pgd, addr, next, walk); + if (err) + break; + } + } while (pgd++, addr = next, addr < end); return err; } @@ -105,144 +178,180 @@ static unsigned long hugetlb_entry_end(struct hstate *h, unsigned long addr, return boundary < end ? boundary : end; } -static int walk_hugetlb_range(struct vm_area_struct *vma, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int walk_hugetlb_range(unsigned long addr, unsigned long end, + struct mm_walk *walk) { + struct mm_struct *mm = walk->mm; + struct vm_area_struct *vma = walk->vma; struct hstate *h = hstate_vma(vma); unsigned long next; unsigned long hmask = huge_page_mask(h); pte_t *pte; int err = 0; + spinlock_t *ptl; do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask); - if (pte && walk->hugetlb_entry) - err = walk->hugetlb_entry(pte, hmask, addr, next, walk); + if (!pte) + continue; + ptl = huge_pte_lock(h, mm, pte); + /* + * Callers should have their own way to handle swap entries + * in walk->hugetlb_entry(). + */ + if (walk->hugetlb_entry) + err = walk->hugetlb_entry(pte, addr, next, walk); + spin_unlock(ptl); if (err) - return err; + break; } while (addr = next, addr != end); - - return 0; + cond_resched(); + return err; } #else /* CONFIG_HUGETLB_PAGE */ -static int walk_hugetlb_range(struct vm_area_struct *vma, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static inline int walk_hugetlb_range(unsigned long addr, unsigned long end, + struct mm_walk *walk) { return 0; } #endif /* CONFIG_HUGETLB_PAGE */ +/* + * Decide whether we really walk over the current vma on [@start, @end) + * or skip it. When we skip it, we set @walk->skip to 1. + * The return value is used to control the page table walking to + * continue (for zero) or not (for non-zero). + * + * Default check (only VM_PFNMAP check for now) is used when the caller + * doesn't define test_walk() callback. + */ +static int walk_page_test(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + + if (walk->test_walk) + return walk->test_walk(start, end, walk); + /* + * Do not walk over vma(VM_PFNMAP), because we have no valid struct + * page backing a VM_PFNMAP range. See also commit a9ff785e4437. + */ + if (vma->vm_flags & VM_PFNMAP) + walk->skip = 1; + return 0; +} + +static int __walk_page_range(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + int err = 0; + struct vm_area_struct *vma = walk->vma; + + if (vma && is_vm_hugetlb_page(vma)) { + if (walk->hugetlb_entry) + err = walk_hugetlb_range(start, end, walk); + } else + err = walk_pgd_range(start, end, walk); + + return err; +} /** - * walk_page_range - walk a memory map's page tables with a callback - * @addr: starting address - * @end: ending address - * @walk: set of callbacks to invoke for each level of the tree + * walk_page_range - walk page table with caller specific callbacks + * + * Recursively walk the page table tree of the process represented by + * @walk->mm within the virtual address range [@start, @end). In walking, + * we can call caller-specific callback functions against each entry. * - * Recursively walk the page table for the memory area in a VMA, - * calling supplied callbacks. Callbacks are called in-order (first - * PGD, first PUD, first PMD, first PTE, second PTE... second PMD, - * etc.). If lower-level callbacks are omitted, walking depth is reduced. + * Before starting to walk page table, some callers want to check whether + * they really want to walk over the vma (for example by checking vm_flags.) + * walk_page_test() and @walk->test_walk() do that check. * - * Each callback receives an entry pointer and the start and end of the - * associated range, and a copy of the original mm_walk for access to - * the ->private or ->mm fields. + * If any callback returns a non-zero value, the page table walk is aborted + * immediately and the return value is propagated back to the caller. + * Note that the meaning of the positive returned value can be defined + * by the caller for its own purpose. * - * Usually no locks are taken, but splitting transparent huge page may - * take page table lock. And the bottom level iterator will map PTE - * directories from highmem if necessary. + * If the caller defines multiple callbacks in different levels, the + * callbacks are called in depth-first manner. It could happen that + * multiple callbacks are called on a address. For example if some caller + * defines test_walk(), pmd_entry(), and pte_entry(), then callbacks are + * called in the order of test_walk(), pmd_entry(), and pte_entry(). + * If you don't want to go down to lower level at some point and move to + * the next entry in the same level, you set @walk->skip to 1. + * For example if you succeed to handle some pmd entry as trans_huge entry, + * you need not call walk_pte_range() any more, so set it to avoid that. + * We can't determine whether to go down to lower level with the return + * value of the callback, because the whole range of return values (0, >0, + * and <0) are used up for other meanings. * - * If any callback returns a non-zero value, the walk is aborted and - * the return value is propagated back to the caller. Otherwise 0 is returned. + * Each callback can access to the vma over which it is doing page table + * walk right now via @walk->vma. @walk->vma is set to NULL in walking + * outside a vma. If you want to access to some caller-specific data from + * callbacks, @walk->private should be helpful. * - * walk->mm->mmap_sem must be held for at least read if walk->hugetlb_entry - * is !NULL. + * The callers should hold @walk->mm->mmap_sem. Note that the lower level + * iterators can take page table lock in lowest level iteration and/or + * in split_huge_page_pmd(). */ -int walk_page_range(unsigned long addr, unsigned long end, +int walk_page_range(unsigned long start, unsigned long end, struct mm_walk *walk) { - pgd_t *pgd; - unsigned long next; int err = 0; + struct vm_area_struct *vma; + unsigned long next; - if (addr >= end) - return err; + if (start >= end) + return -EINVAL; if (!walk->mm) return -EINVAL; VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem)); - pgd = pgd_offset(walk->mm, addr); do { - struct vm_area_struct *vma = NULL; + vma = find_vma(walk->mm, start); + if (!vma) { /* after the last vma */ + walk->vma = NULL; + next = end; + } else if (start < vma->vm_start) { /* outside the found vma */ + walk->vma = NULL; + next = vma->vm_start; + } else { /* inside the found vma */ + walk->vma = vma; + next = min(end, vma->vm_end); - next = pgd_addr_end(addr, end); - - /* - * This function was not intended to be vma based. - * But there are vma special cases to be handled: - * - hugetlb vma's - * - VM_PFNMAP vma's - */ - vma = find_vma(walk->mm, addr); - if (vma) { - /* - * There are no page structures backing a VM_PFNMAP - * range, so do not allow split_huge_page_pmd(). - */ - if ((vma->vm_start <= addr) && - (vma->vm_flags & VM_PFNMAP)) { - next = vma->vm_end; - pgd = pgd_offset(walk->mm, next); - continue; - } - /* - * Handle hugetlb vma individually because pagetable - * walk for the hugetlb page is dependent on the - * architecture and we can't handled it in the same - * manner as non-huge pages. - */ - if (walk->hugetlb_entry && (vma->vm_start <= addr) && - is_vm_hugetlb_page(vma)) { - if (vma->vm_end < next) - next = vma->vm_end; - /* - * Hugepage is very tightly coupled with vma, - * so walk through hugetlb entries within a - * given vma. - */ - err = walk_hugetlb_range(vma, addr, next, walk); - if (err) - break; - pgd = pgd_offset(walk->mm, next); + err = walk_page_test(start, next, walk); + if (skip_lower_level_walking(walk)) continue; - } - } - - if (pgd_none_or_clear_bad(pgd)) { - if (walk->pte_hole) - err = walk->pte_hole(addr, next, walk); if (err) break; - pgd++; - continue; } - if (walk->pgd_entry) - err = walk->pgd_entry(pgd, addr, next, walk); - if (!err && - (walk->pud_entry || walk->pmd_entry || walk->pte_entry)) - err = walk_pud_range(pgd, addr, next, walk); + err = __walk_page_range(start, next, walk); if (err) break; - pgd++; - } while (addr = next, addr < end); - + } while (start = next, start < end); return err; } + +int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk) +{ + int err; + + if (!walk->mm) + return -EINVAL; + + VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem)); + VM_BUG_ON(!vma); + walk->vma = vma; + err = walk_page_test(vma->vm_start, vma->vm_end, walk); + if (skip_lower_level_walking(walk)) + return 0; + if (err) + return err; + return __walk_page_range(vma->vm_start, vma->vm_end, walk); +} diff --git a/mm/rmap.c b/mm/rmap.c index 9c3e77396d1a..e065ba798fde 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -515,11 +515,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma) static inline unsigned long __vma_address(struct page *page, struct vm_area_struct *vma) { - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); - - if (unlikely(is_vm_hugetlb_page(vma))) - pgoff = page->index << huge_page_order(page_hstate(page)); - + pgoff_t pgoff = page_pgoff(page); return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); } @@ -1024,7 +1020,7 @@ void page_add_new_anon_rmap(struct page *page, __mod_zone_page_state(page_zone(page), NR_ANON_PAGES, hpage_nr_pages(page)); __page_set_anon_rmap(page, vma, address, 1); - if (!mlocked_vma_newpage(vma, page)) { + if (!mlocked_vma_newpage(vma, page) && !PageUnevictable(page)) { SetPageActive(page); lru_cache_add(page); } else @@ -1359,7 +1355,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount, if (page->index != linear_page_index(vma, address)) { pte_t ptfile = pgoff_to_pte(page->index); if (pte_soft_dirty(pteval)) - pte_file_mksoft_dirty(ptfile); + ptfile = pte_file_mksoft_dirty(ptfile); set_pte_at(mm, address, pte, ptfile); } @@ -1609,7 +1605,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page, static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc) { struct anon_vma *anon_vma; - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff_t pgoff = page_pgoff(page); struct anon_vma_chain *avc; int ret = SWAP_AGAIN; @@ -1650,7 +1646,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc) static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc) { struct address_space *mapping = page->mapping; - pgoff_t pgoff = page->index << compound_order(page); + pgoff_t pgoff = page_pgoff(page); struct vm_area_struct *vma; int ret = SWAP_AGAIN; diff --git a/mm/slab.c b/mm/slab.c index 388cb1ae6fbc..cbcd2fa7af2f 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1681,8 +1681,12 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; + if (memcg_charge_slab(cachep, flags, cachep->gfporder)) + return NULL; + page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); if (!page) { + memcg_uncharge_slab(cachep, cachep->gfporder); if (!(flags & __GFP_NOWARN) && printk_ratelimit()) slab_out_of_memory(cachep, flags, nodeid); return NULL; @@ -1741,7 +1745,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page) memcg_release_pages(cachep, cachep->gfporder); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += nr_freed; - __free_memcg_kmem_pages(page, cachep->gfporder); + __free_pages(page, cachep->gfporder); + memcg_uncharge_slab(cachep, cachep->gfporder); } static void kmem_rcu_free(struct rcu_head *head) diff --git a/mm/slab.h b/mm/slab.h index 3045316b7c9d..3db3c52f80a2 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -191,6 +191,26 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) return s; return s->memcg_params->root_cache; } + +static __always_inline int memcg_charge_slab(struct kmem_cache *s, + gfp_t gfp, int order) +{ + if (!memcg_kmem_enabled()) + return 0; + if (is_root_cache(s)) + return 0; + return memcg_charge_kmem(s->memcg_params->memcg, gfp, + PAGE_SIZE << order); +} + +static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order) +{ + if (!memcg_kmem_enabled()) + return; + if (is_root_cache(s)) + return; + memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order); +} #else static inline bool is_root_cache(struct kmem_cache *s) { @@ -226,6 +246,15 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) { return s; } + +static inline int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int order) +{ + return 0; +} + +static inline void memcg_uncharge_slab(struct kmem_cache *s, int order) +{ +} #endif static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) diff --git a/mm/slab_common.c b/mm/slab_common.c index f3cfccf76dda..963f0376e4c1 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -290,12 +290,8 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c root_cache->size, root_cache->align, root_cache->flags, root_cache->ctor, memcg, root_cache); - if (IS_ERR(s)) { + if (IS_ERR(s)) kfree(cache_name); - goto out_unlock; - } - - s->allocflags |= __GFP_KMEMCG; out_unlock: mutex_unlock(&slab_mutex); @@ -577,6 +573,19 @@ void __init create_kmalloc_caches(unsigned long flags) } #endif /* !CONFIG_SLOB */ +void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) +{ + void *ret; + struct page *page; + + flags |= __GFP_COMP; + page = alloc_kmem_pages(flags, order); + ret = page ? page_address(page) : NULL; + kmemleak_alloc(ret, size, 1, flags); + return ret; +} +EXPORT_SYMBOL(kmalloc_order); + #ifdef CONFIG_TRACING void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) { diff --git a/mm/slub.c b/mm/slub.c index 5e234f1f8853..fa7a1817835e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1317,17 +1317,26 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x) /* * Slab allocation and freeing */ -static inline struct page *alloc_slab_page(gfp_t flags, int node, - struct kmem_cache_order_objects oo) +static inline struct page *alloc_slab_page(struct kmem_cache *s, + gfp_t flags, int node, struct kmem_cache_order_objects oo) { + struct page *page; int order = oo_order(oo); flags |= __GFP_NOTRACK; + if (memcg_charge_slab(s, flags, order)) + return NULL; + if (node == NUMA_NO_NODE) - return alloc_pages(flags, order); + page = alloc_pages(flags, order); else - return alloc_pages_exact_node(node, flags, order); + page = alloc_pages_exact_node(node, flags, order); + + if (!page) + memcg_uncharge_slab(s, order); + + return page; } static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) @@ -1349,7 +1358,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) */ alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; - page = alloc_slab_page(alloc_gfp, node, oo); + page = alloc_slab_page(s, alloc_gfp, node, oo); if (unlikely(!page)) { oo = s->min; alloc_gfp = flags; @@ -1357,7 +1366,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible */ - page = alloc_slab_page(alloc_gfp, node, oo); + page = alloc_slab_page(s, alloc_gfp, node, oo); if (page) stat(s, ORDER_FALLBACK); @@ -1473,7 +1482,8 @@ static void __free_slab(struct kmem_cache *s, struct page *page) page_mapcount_reset(page); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; - __free_memcg_kmem_pages(page, order); + __free_pages(page, order); + memcg_uncharge_slab(s, order); } #define need_reserve_slab_rcu \ @@ -3325,8 +3335,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node) struct page *page; void *ptr = NULL; - flags |= __GFP_COMP | __GFP_NOTRACK | __GFP_KMEMCG; - page = alloc_pages_node(node, flags, get_order(size)); + flags |= __GFP_COMP | __GFP_NOTRACK; + page = alloc_kmem_pages_node(node, flags, get_order(size)); if (page) ptr = page_address(page); @@ -3395,7 +3405,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kfree_hook(x); - __free_memcg_kmem_pages(page, compound_order(page)); + __free_kmem_pages(page, compound_order(page)); return; } slab_free(page->slab_cache, page, object, _RET_IP_); diff --git a/mm/util.c b/mm/util.c index f380af7ea779..efadeaaef81e 100644 --- a/mm/util.c +++ b/mm/util.c @@ -3,6 +3,7 @@ #include <linux/string.h> #include <linux/compiler.h> #include <linux/export.h> +#include <linux/ctype.h> #include <linux/err.h> #include <linux/sched.h> #include <linux/security.h> @@ -64,6 +65,35 @@ char *kstrndup(const char *s, size_t max, gfp_t gfp) EXPORT_SYMBOL(kstrndup); /** + * kstrimdup - Trim and copy a %NUL terminated string. + * @s: the string to trim and duplicate + * @gfp: the GFP mask used in the kmalloc() call when allocating memory + * + * Returns an address, which the caller must kfree, containing + * a duplicate of the passed string with leading and/or trailing + * whitespace (as defined by isspace) removed. + */ +char *kstrimdup(const char *s, gfp_t gfp) +{ + char *buf; + char *begin = skip_spaces(s); + size_t len = strlen(begin); + + while (len && isspace(begin[len - 1])) + len--; + + buf = kmalloc_track_caller(len + 1, gfp); + if (!buf) + return NULL; + + memcpy(buf, begin, len); + buf[len] = '\0'; + + return buf; +} +EXPORT_SYMBOL(kstrimdup); + +/** * kmemdup - duplicate region of memory * * @src: memory region to duplicate diff --git a/mm/vmacache.c b/mm/vmacache.c index d4224b397c0e..61c38ae9f54b 100644 --- a/mm/vmacache.c +++ b/mm/vmacache.c @@ -17,6 +17,16 @@ void vmacache_flush_all(struct mm_struct *mm) { struct task_struct *g, *p; + /* + * Single threaded tasks need not iterate the entire + * list of process. We can avoid the flushing as well + * since the mm's seqnum was increased and don't have + * to worry about other threads' seqnum. Current's + * flush will occur upon the next lookup. + */ + if (atomic_read(&mm->mm_users) == 1) + return; + rcu_read_lock(); for_each_process_thread(g, p) { /* @@ -78,11 +88,14 @@ struct vm_area_struct *vmacache_find(struct mm_struct *mm, unsigned long addr) if (!vmacache_valid(mm)) return NULL; + count_vm_vmacache_event(VMACACHE_FIND_CALLS); + for (i = 0; i < VMACACHE_SIZE; i++) { struct vm_area_struct *vma = current->vmacache[i]; if (vma && vma->vm_start <= addr && vma->vm_end > addr) { BUG_ON(vma->vm_mm != mm); + count_vm_vmacache_event(VMACACHE_FIND_HITS); return vma; } } @@ -100,11 +113,15 @@ struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm, if (!vmacache_valid(mm)) return NULL; + count_vm_vmacache_event(VMACACHE_FIND_CALLS); + for (i = 0; i < VMACACHE_SIZE; i++) { struct vm_area_struct *vma = current->vmacache[i]; - if (vma && vma->vm_start == start && vma->vm_end == end) + if (vma && vma->vm_start == start && vma->vm_end == end) { + count_vm_vmacache_event(VMACACHE_FIND_HITS); return vma; + } } return NULL; diff --git a/mm/vmscan.c b/mm/vmscan.c index 9b6497eda806..6cdda104f629 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1158,7 +1158,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, TTU_UNMAP|TTU_IGNORE_ACCESS, &dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true); list_splice(&clean_pages, page_list); - __mod_zone_page_state(zone, NR_ISOLATED_FILE, -ret); + mod_zone_page_state(zone, NR_ISOLATED_FILE, -ret); return ret; } @@ -1866,6 +1866,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, bool force_scan = false; unsigned long ap, fp; enum lru_list lru; + bool some_scanned; + int pass; /* * If the zone or memcg is small, nr[l] can be 0. This @@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, fraction[1] = fp; denominator = ap + fp + 1; out: - for_each_evictable_lru(lru) { - int file = is_file_lru(lru); - unsigned long size; - unsigned long scan; + some_scanned = false; + /* Only use force_scan on second pass. */ + for (pass = 0; !some_scanned && pass < 2; pass++) { + for_each_evictable_lru(lru) { + int file = is_file_lru(lru); + unsigned long size; + unsigned long scan; - size = get_lru_size(lruvec, lru); - scan = size >> sc->priority; + size = get_lru_size(lruvec, lru); + scan = size >> sc->priority; - if (!scan && force_scan) - scan = min(size, SWAP_CLUSTER_MAX); + if (!scan && pass && force_scan) + scan = min(size, SWAP_CLUSTER_MAX); - switch (scan_balance) { - case SCAN_EQUAL: - /* Scan lists relative to size */ - break; - case SCAN_FRACT: + switch (scan_balance) { + case SCAN_EQUAL: + /* Scan lists relative to size */ + break; + case SCAN_FRACT: + /* + * Scan types proportional to swappiness and + * their relative recent reclaim efficiency. + */ + scan = div64_u64(scan * fraction[file], + denominator); + break; + case SCAN_FILE: + case SCAN_ANON: + /* Scan one type exclusively */ + if ((scan_balance == SCAN_FILE) != file) + scan = 0; + break; + default: + /* Look ma, no brain */ + BUG(); + } + nr[lru] = scan; /* - * Scan types proportional to swappiness and - * their relative recent reclaim efficiency. + * Skip the second pass and don't force_scan, + * if we found something to scan. */ - scan = div64_u64(scan * fraction[file], denominator); - break; - case SCAN_FILE: - case SCAN_ANON: - /* Scan one type exclusively */ - if ((scan_balance == SCAN_FILE) != file) - scan = 0; - break; - default: - /* Look ma, no brain */ - BUG(); + some_scanned |= !!scan; } - nr[lru] = scan; } } diff --git a/mm/vmstat.c b/mm/vmstat.c index 302dd076b8bf..82ce17ce58c4 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -866,6 +866,10 @@ const char * const vmstat_text[] = { "nr_tlb_local_flush_one", #endif /* CONFIG_DEBUG_TLBFLUSH */ +#ifdef CONFIG_DEBUG_VM_VMACACHE + "vmacache_find_calls", + "vmacache_find_hits", +#endif #endif /* CONFIG_VM_EVENTS_COUNTERS */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA */ diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 34eb2160489d..cf02ff038df5 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -397,6 +397,11 @@ foreach my $entry (@mode_permission_funcs) { $mode_perms_search .= $entry->[0]; } +our $declaration_macros = qr{(?x: + (?:$Storage\s+)?(?:DECLARE|DEFINE)_[A-Z]+\s*\(| + (?:$Storage\s+)?LIST_HEAD\s*\( +)}; + our $allowed_asm_includes = qr{(?x: irq| memory @@ -2266,18 +2271,35 @@ sub process { } # check for missing blank lines after declarations - if ($realfile =~ m@^(drivers/net/|net/)@ && - $prevline =~ /^\+\s+$Declare\s+$Ident/ && - !($prevline =~ /(?:$Compare|$Assignment|$Operators)\s*$/ || - $prevline =~ /(?:\{\s*|\\)$/) && #extended lines - $sline =~ /^\+\s+/ && #Not at char 1 - !($sline =~ /^\+\s+$Declare/ || - $sline =~ /^\+\s+$Ident\s+$Ident/ || #eg: typedef foo + if ($sline =~ /^\+\s+\S/ && #Not at char 1 + # actual declarations + ($prevline =~ /^\+\s+$Declare\s*$Ident\s*[=,;\[]/ || + # foo bar; where foo is some local typedef or #define + $prevline =~ /^\+\s+$Ident(?:\s+|\s*\*\s*)$Ident\s*[=,;\[]/ || + # known declaration macros + $prevline =~ /^\+\s+$declaration_macros/) && + # for "else if" which can look like "$Ident $Ident" + !($prevline =~ /^\+\s+$c90_Keywords\b/ || + # other possible extensions of declaration lines + $prevline =~ /(?:$Compare|$Assignment|$Operators)\s*$/ || + # not starting a section or a macro "\" extended line + $prevline =~ /(?:\{\s*|\\)$/) && + # looks like a declaration + !($sline =~ /^\+\s+$Declare\s*$Ident\s*[=,;\[]/ || + # foo bar; where foo is some local typedef or #define + $sline =~ /^\+\s+$Ident(?:\s+|\s*\*\s*)$Ident\s*[=,;\[]/ || + # known declaration macros + $sline =~ /^\+\s+$declaration_macros/ || + # start of struct or union or enum $sline =~ /^\+\s+(?:union|struct|enum|typedef)\b/ || - $sline =~ /^\+\s+(?:$|[\{\}\.\#\"\?\:\(])/ || + # start or end of block or continuation of declaration + $sline =~ /^\+\s+(?:$|[\{\}\.\#\"\?\:\(\[])/ || + # bitfield continuation + $sline =~ /^\+\s+$Ident\s*:\s*\d+\s*[,;]/ || + # other possible extensions of declaration lines $sline =~ /^\+\s+\(?\s*(?:$Compare|$Assignment|$Operators)/)) { WARN("SPACING", - "networking uses a blank line after declarations\n" . $hereprev); + "Missing a blank line after declarations\n" . $hereprev); } # check for spaces at the beginning of a line. diff --git a/usr/Kconfig b/usr/Kconfig index 642f503d3e9f..2d4c77eecf2e 100644 --- a/usr/Kconfig +++ b/usr/Kconfig @@ -98,80 +98,3 @@ config RD_LZ4 help Support loading of a LZ4 encoded initial ramdisk or cpio buffer If unsure, say N. - -choice - prompt "Built-in initramfs compression mode" if INITRAMFS_SOURCE!="" - help - This option decides by which algorithm the builtin initramfs - will be compressed. Several compression algorithms are - available, which differ in efficiency, compression and - decompression speed. Compression speed is only relevant - when building a kernel. Decompression speed is relevant at - each boot. - - If you have any problems with bzip2 or LZMA compressed - initramfs, mail me (Alain Knaff) <alain@knaff.lu>. - - High compression options are mostly useful for users who are - low on RAM, since it reduces the memory consumption during - boot. - - If in doubt, select 'gzip' - -config INITRAMFS_COMPRESSION_NONE - bool "None" - help - Do not compress the built-in initramfs at all. This may - sound wasteful in space, but, you should be aware that the - built-in initramfs will be compressed at a later stage - anyways along with the rest of the kernel, on those - architectures that support this. - However, not compressing the initramfs may lead to slightly - higher memory consumption during a short time at boot, while - both the cpio image and the unpacked filesystem image will - be present in memory simultaneously - -config INITRAMFS_COMPRESSION_GZIP - bool "Gzip" - depends on RD_GZIP - help - The old and tried gzip compression. It provides a good balance - between compression ratio and decompression speed. - -config INITRAMFS_COMPRESSION_BZIP2 - bool "Bzip2" - depends on RD_BZIP2 - help - Its compression ratio and speed is intermediate. - Decompression speed is slowest among the choices. The initramfs - size is about 10% smaller with bzip2, in comparison to gzip. - Bzip2 uses a large amount of memory. For modern kernels you - will need at least 8MB RAM or more for booting. - -config INITRAMFS_COMPRESSION_LZMA - bool "LZMA" - depends on RD_LZMA - help - This algorithm's compression ratio is best. - Decompression speed is between the other choices. - Compression is slowest. The initramfs size is about 33% - smaller with LZMA in comparison to gzip. - -config INITRAMFS_COMPRESSION_XZ - bool "XZ" - depends on RD_XZ - help - XZ uses the LZMA2 algorithm. The initramfs size is about 30% - smaller with XZ in comparison to gzip. Decompression speed - is better than that of bzip2 but worse than gzip and LZO. - Compression is slow. - -config INITRAMFS_COMPRESSION_LZO - bool "LZO" - depends on RD_LZO - help - Its compression ratio is the poorest among the choices. The kernel - size is about 10% bigger than gzip; however its speed - (both compression and decompression) is the fastest. - -endchoice |