summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-08 11:11:38 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-08 11:11:38 -0700
commit20b0d06722169e6e66049c8fe6f1a48adffb79c6 (patch)
tree1b88278ca547c07f58297325aea1ab3c447e844d /Documentation
parent63d72b93f2262900c8de74ad0f5a58e0d452c9d3 (diff)
parentdb33ec371be8e45956e8cebb5b0fe641f008430b (diff)
Merge branch 'akpm' (patches from Andrew)
Merge still more updates from Andrew Morton: "Various trees. Mainly those parts of MM whose linux-next dependents are now merged. I'm still sitting on ~160 patches which await merges from -next. Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug, panic, lib, sysctl, mm/gup, mm/pagemap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (52 commits) doc: cgroup: update note about conditions when oom killer is invoked module: move the set_fs hack for flush_icache_range to m68k nommu: use flush_icache_user_range in brk and mmap binfmt_flat: use flush_icache_user_range exec: use flush_icache_user_range in read_code exec: only build read_code when needed m68k: implement flush_icache_user_range arm: rename flush_cache_user_range to flush_icache_user_range xtensa: implement flush_icache_user_range sh: implement flush_icache_user_range asm-generic: add a flush_icache_user_range stub mm: rename flush_icache_user_range to flush_icache_user_page arm,sparc,unicore32: remove flush_icache_user_range riscv: use asm-generic/cacheflush.h powerpc: use asm-generic/cacheflush.h openrisc: use asm-generic/cacheflush.h m68knommu: use asm-generic/cacheflush.h microblaze: use asm-generic/cacheflush.h ia64: use asm-generic/cacheflush.h hexagon: use asm-generic/cacheflush.h ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst17
-rw-r--r--Documentation/admin-guide/dynamic-debug-howto.rst5
-rw-r--r--Documentation/admin-guide/kdump/kdump.rst8
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt34
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst37
-rw-r--r--Documentation/core-api/pin_user_pages.rst51
6 files changed, 123 insertions, 29 deletions
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index c2a4b652bd1a..ce3e05e41724 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1170,6 +1170,13 @@ PAGE_SIZE multiple when read back.
Under certain circumstances, the usage may go over the limit
temporarily.
+ In default configuration regular 0-order allocations always
+ succeed unless OOM killer chooses current task as a victim.
+
+ Some kinds of allocations don't invoke the OOM killer.
+ Caller could retry them differently, return into userspace
+ as -ENOMEM or silently ignore in cases like disk readahead.
+
This is the ultimate protection mechanism. As long as the
high limit is used and monitored properly, this limit's
utility is limited to providing the final safety net.
@@ -1226,17 +1233,9 @@ PAGE_SIZE multiple when read back.
The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail.
- Depending on context result could be invocation of OOM
- killer and retrying allocation or failing allocation.
-
- Failed allocation in its turn could be returned into
- userspace as -ENOMEM or silently ignored in cases like
- disk readahead. For now OOM in memory cgroup kills
- tasks iff shortage has happened inside page fault.
-
This event is not raised if the OOM killer is not
considered as an option, e.g. for failed high-order
- allocations.
+ allocations or if caller asked to not retry attempts.
oom_kill
The number of processes belonging to this cgroup
diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst
index 0dc2eb8e44e5..1012bd9305e9 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -13,6 +13,11 @@ kernel code to obtain additional kernel information. Currently, if
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
enabled per-callsite.
+If you do not want to enable dynamic debug globally (i.e. in some embedded
+system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic
+debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any
+modules which you'd like to dynamically debug later.
+
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
shortcut for ``print_hex_dump(KERN_DEBUG)``.
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index ac7e131d2935..2da65fef2a1c 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -521,6 +521,14 @@ will cause a kdump to occur at the panic() call. In cases where a user wants
to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
to achieve the same behaviour.
+Trigger Kdump on add_taint()
+============================
+
+The kernel parameter panic_on_taint facilitates a conditional call to panic()
+from within add_taint() whenever the value set in this bitmask matches with the
+bit flag being set by add_taint().
+This will cause a kdump to occur at the add_taint()->panic() call.
+
Contact
=======
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f3eeecbb3f63..ed9597e37415 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1445,7 +1445,7 @@
hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
- Format: <integer>
+ Format: 0 | 1
hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
@@ -1513,9 +1513,9 @@
hung_task_panic=
[KNL] Should the hung task detector generate panics.
- Format: <integer>
+ Format: 0 | 1
- A nonzero value instructs the kernel to panic when a
+ A value of 1 instructs the kernel to panic when a
hung task is detected. The default value is controlled
by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
option. The value selected by this boot parameter can
@@ -3447,6 +3447,19 @@
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer
+ panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
+ Format: <hex>[,nousertaint]
+ Hexadecimal bitmask representing the set of TAINT flags
+ that will cause the kernel to panic when add_taint() is
+ called with any of the flags in this set.
+ The optional switch "nousertaint" can be utilized to
+ prevent userspace forced crashes by writing to sysctl
+ /proc/sys/kernel/tainted any flagset matching with the
+ bitmask set on panic_on_taint.
+ See Documentation/admin-guide/tainted-kernels.rst for
+ extra details on the taint flags that users can pick
+ to compose the bitmask to assign to panic_on_taint.
+
panic_on_warn panic() instead of WARN(). Useful to cause kdump
on a WARN().
@@ -4652,9 +4665,9 @@
softlockup_panic=
[KNL] Should the soft-lockup detector generate panics.
- Format: <integer>
+ Format: 0 | 1
- A nonzero value instructs the soft-lockup detector
+ A value of 1 instructs the soft-lockup detector
to panic the machine when a soft-lockup occurs. It is
also controlled by the kernel.softlockup_panic sysctl
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
@@ -4663,7 +4676,7 @@
softlockup_all_cpu_backtrace=
[KNL] Should the soft-lockup detector generate
backtraces on all cpus.
- Format: <integer>
+ Format: 0 | 1
sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/admin-guide/laptops/sonypi.rst
@@ -4956,6 +4969,15 @@
switches= [HW,M68k]
+ sysctl.*= [KNL]
+ Set a sysctl parameter, right before loading the init
+ process, as if the value was written to the respective
+ /proc/sys/... file. Both '.' and '/' are recognized as
+ separators. Unrecognized parameters and invalid values
+ are reported in the kernel log. Sysctls registered
+ later by a loaded module cannot be set this way.
+ Example: sysctl.vm.swappiness=40
+
sysfs.deprecated=0|1 [KNL]
Enable/disable old style sysfs layout for old udev
on older distributions. When this option is enabled
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1ebf68d01141..83acf5025488 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -335,6 +335,20 @@ Path for the hotplug policy agent.
Default value is "``/sbin/hotplug``".
+hung_task_all_cpu_backtrace:
+================
+
+If this option is set, the kernel will send an NMI to all CPUs to dump
+their backtraces when a hung task is detected. This file shows up if
+CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.
+
+0: Won't show all CPUs backtraces when a hung task is detected.
+This is the default behavior.
+
+1: Will non-maskably interrupt all CPUs and dump their backtraces when
+a hung task is detected.
+
+
hung_task_panic
===============
@@ -632,6 +646,22 @@ rate for each task.
scanned for a given scan.
+oops_all_cpu_backtrace:
+================
+
+If this option is set, the kernel will send an NMI to all CPUs to dump
+their backtraces when an oops event occurs. It should be used as a last
+resort in case a panic cannot be triggered (to protect VMs running, for
+example) or kdump can't be collected. This file shows up if CONFIG_SMP
+is enabled.
+
+0: Won't show all CPUs backtraces when an oops is detected.
+This is the default behavior.
+
+1: Will non-maskably interrupt all CPUs and dump their backtraces when
+an oops event is detected.
+
+
osrelease, ostype & version
===========================
@@ -1239,6 +1269,13 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
See :doc:`/admin-guide/tainted-kernels` for more information.
+Note:
+ writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
+ booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
+ and any of the ORed together values being written to ``tainted`` match with
+ the bitmask declared on panic_on_taint.
+ See :doc:`/admin-guide/kernel-parameters` for more details on that particular
+ kernel command line option and its optional ``nousertaint`` switch.
threads-max
===========
diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index 2e939ff10b86..6068266dd303 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -148,23 +148,46 @@ NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's
because DAX pages do not have a separate page cache, and so "pinning" implies
locking down file system blocks, which is not (yet) supported in that way.
-CASE 3: Hardware with page faulting support
--------------------------------------------
-Here, a well-written driver doesn't normally need to pin pages at all. However,
-if the driver does choose to do so, it can register MMU notifiers for the range,
-and will be called back upon invalidation. Either way (avoiding page pinning, or
-using MMU notifiers to unpin upon request), there is proper synchronization with
-both filesystem and mm (page_mkclean(), munmap(), etc).
-
-Therefore, neither flag needs to be set.
-
-In this case, ideally, neither get_user_pages() nor pin_user_pages() should be
-called. Instead, the software should be written so that it does not pin pages.
-This allows mm and filesystems to operate more efficiently and reliably.
+CASE 3: MMU notifier registration, with or without page faulting hardware
+-------------------------------------------------------------------------
+Device drivers can pin pages via get_user_pages*(), and register for mmu
+notifier callbacks for the memory range. Then, upon receiving a notifier
+"invalidate range" callback , stop the device from using the range, and unpin
+the pages. There may be other possible schemes, such as for example explicitly
+synchronizing against pending IO, that accomplish approximately the same thing.
+
+Or, if the hardware supports replayable page faults, then the device driver can
+avoid pinning entirely (this is ideal), as follows: register for mmu notifier
+callbacks as above, but instead of stopping the device and unpinning in the
+callback, simply remove the range from the device's page tables.
+
+Either way, as long as the driver unpins the pages upon mmu notifier callback,
+then there is proper synchronization with both filesystem and mm
+(page_mkclean(), munmap(), etc). Therefore, neither flag needs to be set.
CASE 4: Pinning for struct page manipulation only
-------------------------------------------------
-Here, normal GUP calls are sufficient, so neither flag needs to be set.
+If only struct page data (as opposed to the actual memory contents that a page
+is tracking) is affected, then normal GUP calls are sufficient, and neither flag
+needs to be set.
+
+CASE 5: Pinning in order to write to the data within the page
+-------------------------------------------------------------
+Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
+write to a page's data, unpin" can cause a problem. Case 5 may be considered a
+superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
+other words, if the code is neither Case 1 nor Case 2, it may still require
+FOLL_PIN, for patterns like this:
+
+Correct (uses FOLL_PIN calls):
+ pin_user_pages()
+ write to the data within the pages
+ unpin_user_pages()
+
+INCORRECT (uses FOLL_GET calls):
+ get_user_pages()
+ write to the data within the pages
+ put_page()
page_maybe_dma_pinned(): the whole point of pinning
===================================================