diff options
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/cachetlb.rst | 2 | ||||
-rw-r--r-- | Documentation/core-api/padata.rst | 41 | ||||
-rw-r--r-- | Documentation/core-api/pin_user_pages.rst | 51 | ||||
-rw-r--r-- | Documentation/core-api/printk-formats.rst | 36 | ||||
-rw-r--r-- | Documentation/core-api/protection-keys.rst | 5 |
5 files changed, 98 insertions, 37 deletions
diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst index 93cb65d52720..a1582cc79f0f 100644 --- a/Documentation/core-api/cachetlb.rst +++ b/Documentation/core-api/cachetlb.rst @@ -213,7 +213,7 @@ Here are the routines, one by one: there will be no entries in the cache for the kernel address space for virtual addresses in the range 'start' to 'end-1'. - The first of these two routines is invoked after map_vm_area() + The first of these two routines is invoked after map_kernel_range() has installed the page table entries. The second is invoked before unmap_kernel_range() deletes the page table entries. diff --git a/Documentation/core-api/padata.rst b/Documentation/core-api/padata.rst index 9a24c111781d..0830e5b0e821 100644 --- a/Documentation/core-api/padata.rst +++ b/Documentation/core-api/padata.rst @@ -4,23 +4,26 @@ The padata parallel execution mechanism ======================================= -:Date: December 2019 +:Date: May 2020 Padata is a mechanism by which the kernel can farm jobs out to be done in -parallel on multiple CPUs while retaining their ordering. It was developed for -use with the IPsec code, which needs to be able to perform encryption and -decryption on large numbers of packets without reordering those packets. The -crypto developers made a point of writing padata in a sufficiently general -fashion that it could be put to other uses as well. +parallel on multiple CPUs while optionally retaining their ordering. -Usage -===== +It was originally developed for IPsec, which needs to perform encryption and +decryption on large numbers of packets without reordering those packets. This +is currently the sole consumer of padata's serialized job support. + +Padata also supports multithreaded jobs, splitting up the job evenly while load +balancing and coordinating between threads. + +Running Serialized Jobs +======================= Initializing ------------ -The first step in using padata is to set up a padata_instance structure for -overall control of how jobs are to be run:: +The first step in using padata to run serialized jobs is to set up a +padata_instance structure for overall control of how jobs are to be run:: #include <linux/padata.h> @@ -162,6 +165,24 @@ functions that correspond to the allocation in reverse:: It is the user's responsibility to ensure all outstanding jobs are complete before any of the above are called. +Running Multithreaded Jobs +========================== + +A multithreaded job has a main thread and zero or more helper threads, with the +main thread participating in the job and then waiting until all helpers have +finished. padata splits the job into units called chunks, where a chunk is a +piece of the job that one thread completes in one call to the thread function. + +A user has to do three things to run a multithreaded job. First, describe the +job by defining a padata_mt_job structure, which is explained in the Interface +section. This includes a pointer to the thread function, which padata will +call each time it assigns a job chunk to a thread. Then, define the thread +function, which accepts three arguments, ``start``, ``end``, and ``arg``, where +the first two delimit the range that the thread operates on and the last is a +pointer to the job's shared state, if any. Prepare the shared state, which is +typically allocated on the main thread's stack. Last, call +padata_do_multithreaded(), which will return once the job is finished. + Interface ========= diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index 2e939ff10b86..6068266dd303 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -148,23 +148,46 @@ NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's because DAX pages do not have a separate page cache, and so "pinning" implies locking down file system blocks, which is not (yet) supported in that way. -CASE 3: Hardware with page faulting support -------------------------------------------- -Here, a well-written driver doesn't normally need to pin pages at all. However, -if the driver does choose to do so, it can register MMU notifiers for the range, -and will be called back upon invalidation. Either way (avoiding page pinning, or -using MMU notifiers to unpin upon request), there is proper synchronization with -both filesystem and mm (page_mkclean(), munmap(), etc). - -Therefore, neither flag needs to be set. - -In this case, ideally, neither get_user_pages() nor pin_user_pages() should be -called. Instead, the software should be written so that it does not pin pages. -This allows mm and filesystems to operate more efficiently and reliably. +CASE 3: MMU notifier registration, with or without page faulting hardware +------------------------------------------------------------------------- +Device drivers can pin pages via get_user_pages*(), and register for mmu +notifier callbacks for the memory range. Then, upon receiving a notifier +"invalidate range" callback , stop the device from using the range, and unpin +the pages. There may be other possible schemes, such as for example explicitly +synchronizing against pending IO, that accomplish approximately the same thing. + +Or, if the hardware supports replayable page faults, then the device driver can +avoid pinning entirely (this is ideal), as follows: register for mmu notifier +callbacks as above, but instead of stopping the device and unpinning in the +callback, simply remove the range from the device's page tables. + +Either way, as long as the driver unpins the pages upon mmu notifier callback, +then there is proper synchronization with both filesystem and mm +(page_mkclean(), munmap(), etc). Therefore, neither flag needs to be set. CASE 4: Pinning for struct page manipulation only ------------------------------------------------- -Here, normal GUP calls are sufficient, so neither flag needs to be set. +If only struct page data (as opposed to the actual memory contents that a page +is tracking) is affected, then normal GUP calls are sufficient, and neither flag +needs to be set. + +CASE 5: Pinning in order to write to the data within the page +------------------------------------------------------------- +Even though neither DMA nor Direct IO is involved, just a simple case of "pin, +write to a page's data, unpin" can cause a problem. Case 5 may be considered a +superset of Case 1, plus Case 2, plus anything that invokes that pattern. In +other words, if the code is neither Case 1 nor Case 2, it may still require +FOLL_PIN, for patterns like this: + +Correct (uses FOLL_PIN calls): + pin_user_pages() + write to the data within the pages + unpin_user_pages() + +INCORRECT (uses FOLL_GET calls): + get_user_pages() + write to the data within the pages + put_page() page_maybe_dma_pinned(): the whole point of pinning =================================================== diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 1e3838652348..8c9aba262b1e 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -114,6 +114,20 @@ used when printing stack backtraces. The specifier takes into consideration the effect of compiler optimisations which may occur when tail-calls are used and marked with the noreturn GCC attribute. +Probed Pointers from BPF / tracing +---------------------------------- + +:: + + %pks kernel string + %pus user string + +The ``k`` and ``u`` specifiers are used for printing prior probed memory from +either kernel memory (k) or user memory (u). The subsequent ``s`` specifier +results in printing a string. For direct use in regular vsnprintf() the (k) +and (u) annotation is ignored, however, when used out of BPF's bpf_trace_printk(), +for example, it reads the memory it is pointing to without faulting. + Kernel Pointers --------------- @@ -470,21 +484,23 @@ Examples (OF):: %pfwf /ocp@68000000/i2c@48072000/camera@10/port/endpoint - Full name %pfwP endpoint - Node name -Time and date (struct rtc_time) -------------------------------- +Time and date +------------- :: - %ptR YYYY-mm-ddTHH:MM:SS - %ptRd YYYY-mm-dd - %ptRt HH:MM:SS - %ptR[dt][r] + %pt[RT] YYYY-mm-ddTHH:MM:SS + %pt[RT]d YYYY-mm-dd + %pt[RT]t HH:MM:SS + %pt[RT][dt][r] -For printing date and time as represented by struct rtc_time structure in -human readable format. +For printing date and time as represented by + R struct rtc_time structure + T time64_t type +in human readable format. -By default year will be incremented by 1900 and month by 1. Use %ptRr (raw) -to suppress this behaviour. +By default year will be incremented by 1900 and month by 1. +Use %pt[RT]r (raw) to suppress this behaviour. Passed by reference. diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index 49d9833af871..ec575e72d0b2 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -5,8 +5,9 @@ Memory Protection Keys ====================== Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake "Scalable Processor" Server CPUs. -It will be avalable in future non-server parts. +which is found on Intel's Skylake (and later) "Scalable Processor" +Server CPUs. It will be available in future non-server Intel parts +and future AMD processors. For anyone wishing to test or use this feature, it is available in Amazon's EC2 C5 instances and is known to work there using an Ubuntu |