Age | Commit message (Collapse) | Author |
|
remap_file_pages(2) was invented to be able efficiently map parts of huge
file into limited 32-bit virtual address space such as in database
workloads.
Nonlinear mappings are pain to support and it seems there's no legitimate
use-cases nowadays since 64-bit systems are widely available.
Let's drop it and get rid of all these special-cased code.
The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.
I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on. I've checked Debian code search and source of all
packages in ALT Linux. No real users: libc wrappers, mentions in strace,
gdb, valgrind and this kind of stuff.
There are few basic tests in LTP for the syscall. They work just fine
with emulation.
To test performance impact, I've written small test case which demonstrate
pretty much worst case scenario: map 4G shmfs file, write to begin of
every page pgoff of the page, remap pages in reverse order, read every
page.
The test creates 1 million of VMAs if emulation is in use, so I had to set
vm.max_map_count to 1100000 to avoid -ENOMEM.
Before: 23.3 ( +- 4.31% ) seconds
After: 43.9 ( +- 0.85% ) seconds
Slowdown: 1.88x
I believe we can live with that.
Test case:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define MB (1024UL * 1024)
#define SIZE (4096 * MB)
int main(int argc, char **argv)
{
unsigned long *p;
long i, pass;
for (pass = 0; pass < 10; pass++) {
p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
for (i = 0; i < SIZE / 4096; i++) {
if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
perror("remap_file_pages");
return -1;
}
}
for (i = SIZE / 4096 - 1; i >= 0; i--)
assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
munmap(p, SIZE);
}
return 0;
}
[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For completeness of the arch-independent pgprot-modify,
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For VMAs that don't want write notifications, PTEs created for read faults
have their write bit set. If the read fault happens after VM_SOFTDIRTY is
cleared, then the PTE's softdirty bit will remain clear after subsequent
writes.
Here's a simple code snippet to demonstrate the bug:
char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
assert(*m == '\0'); /* new PTE allows write access */
assert(!soft_dirty(x));
*m = 'x'; /* should dirty the page */
assert(soft_dirty(x)); /* fails */
With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid unnecessary faults, write notifications
are disabled when VM_SOFTDIRTY is set.
As a side effect of enabling and disabling write notifications with care,
this patch fixes a bug in mprotect where vm_page_prot bits set by drivers
were zapped on mprotect. An analogous bug was fixed in mmap by
c9d0bf241451a3ab7d ("mm: uncached vma support with writenotify").
Signed-off-by: Peter Feiner <pfeiner@google.com>
Reported-by: Peter Feiner <pfeiner@google.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Consolidate the various external const and non-const declarations of
__start___param[] and __stop___param in <linux/moduleparam.h>. This
requires making a few struct kernel_param pointers in kernel/params.c
const.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Sparc doesn't use HARDLOCKUP_DETECTOR the same way x86 does. As a result break
out the new functions
watchdog_hardlockup_detector_is_enabled
watchdog_enable_hardlockup_detector
into their own #if defined area. This resolves the compile issue where
CONFIG_NMI_WATCHDOG=y and CONFIG_HARDLOCKUP_DETECTOR is not set on sparc.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
fix build
In file included from arch/x86/mm/init_64.c:33:
include/linux/nmi.h: In function 'watchdog_enable_hardlockup_detector':
include/linux/nmi.h:27: error: parameter name omitted
Cc: Andrew Jones <drjones@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In some cases we don't want hard lockup detection enabled by default. An
example is when running as a guest. Introduce
watchdog_enable_hardlockup_detector(bool)
allowing those cases to disable hard lockup detection. This must be
executed early by the boot processor from e.g. smp_prepare_boot_cpu, in
order to allow kernel command line arguments to override it, as well as to
avoid hard lockup detection being enabled before we've had a chance to
indicate that it's unwanted. In summary,
initial boot: default=enabled
smp_prepare_boot_cpu
watchdog_enable_hardlockup_detector(false): default=disabled
cmdline has 'nmi_watchdog=1': default=enabled
The running kernel still has the ability to enable/disable at any time
with /proc/sys/kernel/nmi_watchdog us usual. However even when the
default has been overridden /proc/sys/kernel/nmi_watchdog will initially
show '1'. To truly turn it on one must disable/enable it, i.e.
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog
This patch will be immediately useful for KVM with the next patch of this
series. Other hypervisor guest types may find it useful as well.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In kernel we have %*pE specifier to print an escaped buffer. All users
now switched to that approach.
This fixes a bug as well. The current implementation wrongly prints octal
numbers: only two first digits are used in case when 3 are required and
the rest of the string ends up cut off.
Additionally by default the \f, \v, \a, and \e are escaped to their
alphabetic representation. It's safe to do since it is currently used for
messaging only.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This is almost the opposite function to string_unescape(). Nevertheless
it handles \0 and could be used for any byte buffer.
The documentation is supplied together with the function prototype.
The test cases covers most of the scenarios and would be expanded later
on.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The introduced function string_escape_mem() is a kind of opposite to
string_unescape. We have several users of such functionality each of them
created custom implementation. The series contains clean up of test
suite, adding new call, and switching few users to use it via %*pE
specifier.
Test suite covers all of existing and most of potential use cases.
This patch (of 11):
The documentation of API belongs to c-file. This patch moves it
accordingly.
There is no functional change.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove obsolete and unused strict_strto* functions
Signed-off-by: Daniel Walter <dwalter@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The previous patch made strnicmp into a wrapper for strncasecmp. This
patch makes all in-tree users of strnicmp call strncasecmp directly, while
still making sure that the strnicmp symbol can be used by out-of-tree
modules. It should be considered a temporary hack until all in-tree
callers have been converted.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Conflicts:
drivers/clk/Kconfig
mm/page_alloc.c
|
|
|
|
|
|
|
|
|
|
Conflicts:
drivers/gpio/gpiolib.c
drivers/pinctrl/qcom/pinctrl-msm.c
|
|
Conflicts:
drivers/scsi/qla2xxx/qla_target.c
|
|
|
|
|
|
Conflicts:
arch/s390/include/asm/cputime.h
arch/s390/kernel/irq.c
arch/s390/kernel/processor.c
arch/s390/kernel/vtime.c
fs/ext4/super.c
kernel/irq_work.c
|
|
|
|
|
|
|
|
Conflicts:
arch/arm/mach-omap2/irq.c
arch/arm64/include/asm/Kbuild
arch/x86/kernel/entry_64.S
arch/x86/kernel/ptrace.c
fs/nfs/blocklayout/blocklayoutdev.c
fs/nfs/blocklayout/blocklayoutdm.c
|
|
|
|
|
|
Conflicts:
arch/arm64/kernel/ptrace.c
|
|
|
|
|
|
|
|
Conflicts:
fs/nfsd/vfs.c
security/smack/smack_lsm.c
|
|
Conflicts:
drivers/power/reset/Kconfig
drivers/power/reset/Makefile
|
|
|
|
|
|
|
|
Conflicts:
block/blk-core.c
|
|
|
|
|
|
|
|
Conflicts:
Documentation/networking/timestamping/Makefile
arch/arm/boot/dts/berlin2q-marvell-dmp.dts
arch/arm/boot/dts/rk3188-radxarock.dts
arch/arm/boot/dts/rk3188.dtsi
arch/arm/boot/dts/rk3xxx.dtsi
include/linux/skbuff.h
net/ipv4/tcp.c
|
|
Conflicts:
drivers/tty/serial/8250/8250_pci.c
|
|
|
|
Conflicts:
arch/arm/configs/mvebu_v7_defconfig
arch/arm/mach-shmobile/board-ape6evm-reference.c
arch/arm/mach-shmobile/setup-sh73a0.c
|
|
|
|
|
|
|
|
Conflicts:
arch/arm64/boot/dts/apm-storm.dtsi
|
|
Conflicts:
drivers/usb/gadget/function/f_fs.c
|