summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2020-05-04mm: fix s390 compat build errorMinchan Kim
Nathan reported build error with sys_compat_process_madvise. This patch should fix it. Link: http://lkml.kernel.org/r/20200429012421.GA132200@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> [build] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/madvise: make function 'do_process_madvise' staticZheng Bin
Fix sparse warnings: mm/madvise.c:1233:9: warning: symbol 'do_process_madvise' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20200429014030.41147-1-zhengbin13@huawei.com Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm: support compat_sys_process_madviseMinchan Kim
This patch supports compat syscall for process_madvise Link: http://lkml.kernel.org/r/20200423195835.GA46847@google.com Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm: support vector address ranges for process_madviseMinchan Kim
This patch extends a) process_madvise(2) support vector address ranges in a system call and then b) support the vector address ranges to local process as well as external process. Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma. (With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost of TLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. So finally, the API is as follows, ssize_t process_madvise(idtype_t idtype, id_t id, const struct iovec *iovec, unsigned long vlen, int advice, unsigned long flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The idtype and id arguments select the target process to be advised as follows: idtype == P_PID select the process whose process ID matches id idtype == P_PIDFD select the process referred to by the PID file descriptor specified in id. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by idtype and id is external. MADV_COLD MADV_PAGEOUT MADV_MERGEABLE MADV_UNMERGEABLE Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. Link: http://lkml.kernel.org/r/20200423145215.72666-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/madvise: allow KSM hints for remote APIOleksandr Natalenko
It all began with the fact that KSM works only on memory that is marked by madvise(). And the only way to get around that is to either: * use LD_PRELOAD; or * patch the kernel with something like UKSM or PKSM. (i skip ptrace can of worms here intentionally) To overcome this restriction, lets employ a new remote madvise API. This can be used by some small userspace helper daemon that will do auto-KSM job for us. I think of two major consumers of remote KSM hints: * hosts, that run containers, especially similar ones and especially in a trusted environment, sharing the same runtime like Node.js; * heavy applications, that can be run in multiple instances, not limited to opensource ones like Firefox, but also those that cannot be modified since they are binary-only and, maybe, statically linked. Speaking of statistics, more numbers can be found in the very first submission, that is related to this one [1]. For my current setup with two Firefox instances I get 100 to 200 MiB saved for the second instance depending on the amount of tabs. 1 FF instance with 15 tabs: $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc 410 2 FF instances, second one has 12 tabs (all the tabs are different): $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc 592 At the very moment I do not have specific numbers for containerised workload, but those should be comparable in case the containers share similar/same runtime. [1] https://lore.kernel.org/patchwork/patch/1012142/ Link: http://lkml.kernel.org/r/20200302193630.68771-8-minchan@kernel.org Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: SeongJae Park <sjpark@amazon.de> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/madvise: support both pid and pidfd for process_madviseMinchan Kim
There is a demand[1] to support pid as well pidfd for process_madvise to reduce unnecessary syscall to get pidfd if the user has control of the target process(ie, they could guarantee the process is not gone or pid is not reused). This patch aims for supporting both options like waitid(2). So, the syscall is currently, int process_madvise(int which, pid_t pid, void *addr, size_t length, int advise, unsigned long flag); @which is actually idtype_t for userspace libray and currently, it supports P_PID and P_PIDFD. [1] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/ Link: http://lkml.kernel.org/r/20200302193630.68771-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <christian@brauner.io> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/madvise: check fatal signal pending of target processMinchan Kim
Bail out to prevent unnecessary CPU overhead if target process has pending fatal signal during (MADV_COLD|MADV_PAGEOUT) operation. Link: http://lkml.kernel.org/r/20200302193630.68771-4-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/madvise: introduce process_madvise() syscall: an external memory hinting APIMinchan Kim
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. It's similar in spirit to madvise(MADV_WONTNEED), but the information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. int process_madvise(int pidfd, void *addr, size_t length, int advise, unsigned long flag); Since it could affect other process's address range, only privileged process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/madvise: pass task and mm to do_madviseMinchan Kim
Patch series "introduce memory hinting API for external process", v7. Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API. With that, application could give hints to kernel what memory range are preferred to be reclaimed. However, in some platform(e.g., Android), the information required to make the hinting decision is not known to the app. Instead, it is known to a centralized userspace daemon(e.g., ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the concern, this patch introduces new syscall - process_madvise(2). Bascially, it's same with madvise(2) syscall but it has some differences. 1. It needs pidfd of target process to provide the hint 2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this moment. Other hints in madvise will be opened when there are explicit requests from community to prevent unexpected bugs we couldn't support. 3. Only privileged processes can do something for other process's address space. For more detail of the new API, please see "mm: introduce external memory hinting API" description in this patchset. This patch (of 7): In upcoming patches, do_madvise will be called from external process context so we shouldn't asssume "current" is always hinted process's task_struct. Furthermore, we must not access mm_struct via task->mm, but obtain it via access_mm() once (in the following patch) and only use that pointer [1], so pass it to do_madvise() as well. Note the vma->vm_mm pointers are safe, so we can use them further down the call stack. And let's pass *current* and current->mm as arguments of do_madvise so it shouldn't change existing behavior but prepare next patch to make review easy. Note: io_madvise pass NULL as target_task argument of do_madvise because it couldn't know who is target. [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com [vbabka@suse.cz: changelog tweak] [minchan@kernel.org: use current->mm for io_uring] Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@kernel.org [akpm@linux-foundation.org: fix it for upstream changes] [akpm@linux-foundation.org: whoops] [rdunlap@infradead.org: add missing includes] Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: John Dias <joaodias@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: <linux-man@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/mmap.c: do not allow mappings outside of allowed limitsAlexander Gordeev
One can set a lowest possible address in /proc/sys/vm/mmap_min_addr and mmap below that bound nevertheless. It is possible to request a fixed mapping address below mmap_min_addr and succeed. This update adds early checks of mmap_min_addr and mmap_end boundaries and fixes the above issue. Apart from it is wrong I am not aware of any existing issue. Link: http://lkml.kernel.org/r/d6da1319114a331095052638f0ffa3ccb0be58f1.1584958099.git.agordeev@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/mmap.c: add more sanity checks to get_unmapped_area()Alexander Gordeev
Generic get_unmapped_area() function does sanity checks of address and length of the area to be mapped. Yet, it lacks checking against mmap_min_addr and mmap_end limits. At the same time the default implementation of functions arch_get_unmapped_area[_topdown]() and some architecture callbacks do mmap_min_addr and mmap_end checks on their own. Put additional checks into the generic code and do not let architecture callbacks to get away with a possible area outside of the allowed limits. That could also relieve arch_get_unmapped_area[_topdown]() callbacks of own address and length sanity checks. Link: http://lkml.kernel.org/r/d14f2cff3c891ef2c4b0337d737c6f04beacb124.1584958099.git.agordeev@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/swap.c: annotate data races for lru_rotate_pvecsQian Cai
Read to lru_add_pvec->nr could be interrupted and then write to the same variable. The write has local interrupt disabled, but the plain reads result in data races. However, it is unlikely the compilers could do much damage here given that lru_add_pvec->nr is a "unsigned char" and there is an existing compiler barrier. Thus, annotate the reads using the data_race() macro. The data races were reported by KCSAN, BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23: rotate_reclaimable_page+0x2df/0x490 pagevec_add at include/linux/pagevec.h:81 (inlined by) rotate_reclaimable_page at mm/swap.c:259 end_page_writeback+0x1b5/0x2b0 end_swap_bio_write+0x1d0/0x280 bio_endio+0x297/0x560 dec_pending+0x218/0x430 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x297/0x560 blk_update_request+0x201/0x920 scsi_end_request+0x6b/0x4a0 scsi_io_completion+0xb7/0x7e0 scsi_finish_command+0x1ed/0x2a0 scsi_softirq_done+0x1c9/0x1d0 blk_done_softirq+0x181/0x1d0 __do_softirq+0xd9/0x57c irq_exit+0xa2/0xc0 do_IRQ+0x8b/0x190 ret_from_intr+0x0/0x42 delay_tsc+0x46/0x80 __const_udelay+0x3c/0x40 __udelay+0x10/0x20 kcsan_setup_watchpoint+0x202/0x3a0 __tsan_read1+0xc2/0x100 lru_add_drain_cpu+0xb8/0x3f0 lru_add_drain+0x25/0x40 shrink_active_list+0xe1/0xc80 shrink_lruvec+0x766/0xb70 shrink_node+0x2d6/0xca0 do_try_to_free_pages+0x1f7/0x9a0 try_to_free_pages+0x252/0x5b0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23: lru_add_drain_cpu+0xb8/0x3f0 lru_add_drain_cpu at mm/swap.c:602 lru_add_drain+0x25/0x40 shrink_active_list+0xe1/0xc80 shrink_lruvec+0x766/0xb70 shrink_node+0x2d6/0xca0 do_try_to_free_pages+0x1f7/0x9a0 try_to_free_pages+0x252/0x5b0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 2 locks held by oom02/37761: #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part irq event stamp: 1949217 trace_hardirqs_on_thunk+0x1a/0x1c __do_softirq+0x2e7/0x57c __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6 Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/rmap: annotate a data race at tlb_flush_batchedQian Cai
mm->tlb_flush_batched could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6: try_to_unmap_one+0x59a/0x1ab0 set_tlb_ubc_flush_pending at mm/rmap.c:635 (inlined by) try_to_unmap_one at mm/rmap.c:1538 rmap_walk_anon+0x296/0x650 rmap_walk+0xdf/0x100 try_to_unmap+0x18a/0x2f0 shrink_page_list+0xef6/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 balance_pgdat+0x652/0xd90 kswapd+0x396/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4: flush_tlb_batched_pending+0x29/0x90 flush_tlb_batched_pending at mm/rmap.c:682 change_p4d_range+0x5dd/0x1030 change_pte_range at mm/mprotect.c:44 (inlined by) change_pmd_range at mm/mprotect.c:212 (inlined by) change_pud_range at mm/mprotect.c:240 (inlined by) change_p4d_range at mm/mprotect.c:260 change_protection+0x222/0x310 change_prot_numa+0x3e/0x60 task_numa_work+0x219/0x350 task_work_run+0xed/0x140 prepare_exit_to_usermode+0x2cc/0x2e0 ret_from_intr+0x32/0x42 Reported by Kernel Concurrency Sanitizer on: CPU: 4 PID: 6364 Comm: mtest01 Tainted: G W L 5.5.0-next-20200210+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 flush_tlb_batched_pending() is under PTL but the write is not, but mm->tlb_flush_batched is only a bool type, so the value is unlikely to be shattered. Thus, mark it as an intentional data race by using the data race macro. Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/util.c: annotate an data race at vm_committed_asQian Cai
"vm_committed_as.count" could be accessed concurrently as reported by KCSAN, read to 0xffffffff923164f8 of 8 bytes by task 1268 on cpu 38: __vm_enough_memory+0x43/0x280 mm/util.c:801 mmap_region+0x1b2/0xb90 mm/mmap.c:1726 do_mmap+0x45c/0x700 vm_mmap_pgoff+0xc0/0x130 vm_mmap+0x71/0x90 elf_map+0xa1/0x1b0 load_elf_binary+0x9de/0x2180 search_binary_handler+0xd8/0x2b0 __do_execve_file+0xb61/0x1080 __x64_sys_execve+0x5f/0x70 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe write to 0xffffffff923164f8 of 8 bytes by task 1265 on cpu 41: percpu_counter_add_batch+0x83/0xd0 lib/percpu_counter.c:91 exit_mmap+0x178/0x220 include/linux/mman.h:68 mmput+0x10e/0x270 flush_old_exec+0x572/0xfe0 load_elf_binary+0x467/0x2180 search_binary_handler+0xd8/0x2b0 __do_execve_file+0xb61/0x1080 __x64_sys_execve+0x5f/0x70 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe The warning is almost impossible to trigger according to the commit 82f71ae4a2b8 ("mm: catch memory commitment underflow") but leave it for now to catch any possible unbalanced vm_unacct_memory() in the future. Since only the read is operating as lockless, mark it as an intentional data race using the data_race() macro to avoid modifying percpu_counter_read() and still catch unintended races elsewhere. Link: http://lkml.kernel.org/r/1581518109-21180-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/mempool: fix a data race in mempool_free()Qian Cai
mempool_t pool.curr_nr could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in mempool_free / remove_element write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113: remove_element+0x4a/0x1c0 remove_element at mm/mempool.c:132 mempool_alloc+0x102/0x210 (inlined by) mempool_alloc at mm/mempool.c:399 bio_alloc_bioset+0x106/0x2c0 get_swap_bio+0x49/0x230 __swap_writepage+0x680/0xc30 swap_writepage+0x9c/0xf0 pageout+0x33e/0xae0 shrink_page_list+0x1f57/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 <snip> read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64: mempool_free+0x3e/0x150 mempool_free at mm/mempool.c:492 bio_free+0x192/0x280 bio_put+0x91/0xd0 end_swap_bio_write+0x1d8/0x280 bio_endio+0x2c2/0x5b0 dec_pending+0x22b/0x440 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x2c2/0x5b0 blk_update_request+0x217/0x940 scsi_end_request+0x6b/0x4d0 scsi_io_completion+0xb7/0x7e0 scsi_finish_command+0x223/0x310 scsi_softirq_done+0x1d5/0x210 blk_mq_complete_request+0x224/0x250 scsi_mq_done+0xc2/0x250 pqi_raid_io_complete+0x5a/0x70 [smartpqi] pqi_irq_handler+0x150/0x1410 [smartpqi] __handle_irq_event_percpu+0x90/0x540 handle_irq_event_percpu+0x49/0xd0 handle_irq_event+0x85/0xca handle_edge_irq+0x13f/0x3e0 do_IRQ+0x86/0x190 <snip> Since the write is under pool->lock but the read is done as lockless. Even though the commit 5b990546e334 ("mempool: fix and document synchronization and memory barrier usage") introduced the smp_wmb() and smp_rmb() pair to improve the situation, it is adequate to protect it from data races which could lead to a logic bug, so fix it by adding READ_ONCE() for the read. Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/list_lru: fix a data race in list_lru_count_oneQian Cai
struct list_lru_one l.nr_items could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39: list_lru_isolate_move+0xf9/0x130 list_lru_isolate_move at mm/list_lru.c:180 inode_lru_isolate+0x12b/0x2a0 __list_lru_walk_one+0x122/0x3d0 list_lru_walk_one+0x75/0xa0 prune_icache_sb+0x8b/0xc0 super_cache_scan+0x1b8/0x250 do_shrink_slab+0x256/0x6d0 shrink_slab+0x41b/0x4a0 shrink_node+0x35c/0xd80 balance_pgdat+0x652/0xd90 kswapd+0x396/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56: list_lru_count_one+0x116/0x2f0 list_lru_count_one at mm/list_lru.c:193 super_cache_count+0xe8/0x170 do_shrink_slab+0x95/0x6d0 shrink_slab+0x41b/0x4a0 shrink_node+0x35c/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 56 PID: 6345 Comm: oom01 Tainted: G W L 5.5.0-next-20200205+ #4 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 A shattered l.nr_items could affect the shrinker behaviour due to a data race. Fix it by adding READ_ONCE() for the read. Since the writes are aligned and up to word-size, assume those are safe from data races to avoid readability issues of writing WRITE_ONCE(var, var + val). Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/page_counter: fix various data races at memswQian Cai
Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could had memcg->memsw->watermark and memcg->memsw->failcnt been accessed concurrently as reported by KCSAN, BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59: page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138 try_charge+0x131/0xd50 mm/memcontrol.c:2405 __memcg_kmem_charge_memcg+0x58/0x140 __memcg_kmem_charge+0xcc/0x280 __alloc_pages_nodemask+0x1e1/0x450 alloc_pages_current+0xa6/0x120 pte_alloc_one+0x17/0xd0 __pte_alloc+0x3a/0x1f0 copy_p4d_range+0xc36/0x1990 copy_page_range+0x21d/0x360 dup_mmap+0x5f5/0x7a0 dup_mm+0xa2/0x240 copy_process+0x1b3f/0x3460 _do_fork+0xaa/0xa20 __x64_sys_clone+0x13b/0x170 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120: page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139 try_charge+0x131/0xd50 mm/memcontrol.c:2405 mem_cgroup_try_charge+0x159/0x460 mem_cgroup_try_charge_delay+0x3d/0xa0 wp_page_copy+0x14d/0x930 do_wp_page+0x107/0x7b0 __handle_mm_fault+0xce6/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0: page_counter_try_charge+0x100/0x170 mm/page_counter.c:129 try_charge+0x185/0xbf0 mm/memcontrol.c:2405 __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837 __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877 __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780 read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1: page_counter_try_charge+0xef/0x170 mm/page_counter.c:129 try_charge+0x185/0xbf0 mm/memcontrol.c:2405 __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837 __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877 __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780 Since watermark could be compared or set to garbage due to a data race which would change the code logic, fix it by adding a pair of READ_ONCE() and WRITE_ONCE() in those places. The "failcnt" counter is tolerant of some degree of inaccuracy and is only used to report stats, a data race will not be harmful, thus mark it as an intentional data race using the data_race() macro. Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@lca.pw Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") Signed-off-by: Qian Cai <cai@lca.pw> Reported-by: syzbot+f36cfe60b1006a94f9dc@syzkaller.appspotmail.com Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm-swapfile-fix-and-annotate-various-data-races-v2Qian Cai
add a missing annotation for si->flags in memory.c Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/swapfile: fix and annotate various data racesQian Cai
swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could be accessed concurrently separately as noticed by KCSAN, === si.highest_bit === write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24: swap_range_alloc+0x81/0x130 swap_range_alloc at mm/swapfile.c:681 scan_swap_map_slots+0x371/0xb90 get_swap_pages+0x39d/0x5c0 get_swap_page+0xf2/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1795/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70: scan_swap_map_slots+0x4a6/0xb90 scan_swap_map_slots at mm/swapfile.c:892 get_swap_pages+0x39d/0x5c0 get_swap_page+0xf2/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1795/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 Reported by Kernel Concurrency Sanitizer on: CPU: 70 PID: 6672 Comm: oom01 Tainted: G W L 5.5.0-next-20200205+ #3 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 === si.swap_map[offset] === write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86: __swap_entry_free_locked+0x8c/0x100 __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4) __swap_entry_free.constprop.20+0x69/0xb0 free_swap_and_cache+0x53/0xa0 unmap_page_range+0x7f8/0x1d70 unmap_single_vma+0xcd/0x170 unmap_vmas+0x18b/0x220 exit_mmap+0xee/0x220 mmput+0x10e/0x270 do_exit+0x59b/0xf40 do_group_exit+0x8b/0x180 read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20: _swap_info_get+0x81/0xa0 _swap_info_get at mm/swapfile.c:1140 free_swap_and_cache+0x40/0xa0 unmap_page_range+0x7f8/0x1d70 unmap_single_vma+0xcd/0x170 unmap_vmas+0x18b/0x220 exit_mmap+0xee/0x220 mmput+0x10e/0x270 do_exit+0x59b/0xf40 do_group_exit+0x8b/0x180 === si.flags === write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23: scan_swap_map_slots+0x6fe/0xb50 scan_swap_map_slots at mm/swapfile.c:887 get_swap_pages+0x39d/0x5c0 get_swap_page+0x377/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1795/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63: _swap_info_get+0x41/0xa0 __swap_info_get at mm/swapfile.c:1114 put_swap_page+0x84/0x490 __remove_mapping+0x384/0x5f0 shrink_page_list+0xff1/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 The writes are under si->lock but the reads are not. For si.highest_bit and si.swap_map[offset], data race could trigger logic bugs, so fix them by having WRITE_ONCE() for the writes and READ_ONCE() for the reads except those isolated reads where they compare against zero which a data race would cause no harm. Thus, annotate them as intentional data races using the data_race() macro. For si.flags, the readers are only interested in a single bit where a data race there would cause no issue there. Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/filemap.c: fix a data race in filemap_fault()Kirill A. Shutemov
struct file_ra_state ra.mmap_miss could be accessed concurrently during page faults as noticed by KCSAN, BUG: KCSAN: data-race in filemap_fault / filemap_map_pages write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30: filemap_fault+0x920/0xfc0 do_sync_mmap_readahead at mm/filemap.c:2384 (inlined by) filemap_fault at mm/filemap.c:2486 __xfs_filemap_fault+0x112/0x3e0 [xfs] xfs_filemap_fault+0x74/0x90 [xfs] __do_fault+0x9e/0x220 do_fault+0x4a0/0x920 __handle_mm_fault+0xc69/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32: filemap_map_pages+0xc2e/0xd80 filemap_map_pages at mm/filemap.c:2625 do_fault+0x3da/0x920 __handle_mm_fault+0xc69/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G W L 5.5.0-next-20200210+ #1 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 ra.mmap_miss is used to contribute the readahead decisions, a data race could be undesirable. Both the read and write is only under non-exclusive mmap_sem, two concurrent writers could even underflow the counter. Fix the underflow by writing to a local variable before committing a final store to ra.mmap_miss given a small inaccuracy of the counter should be acceptable. Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pw Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Qian Cai <cai@lca.pw> Tested-by: Qian Cai <cai@lca.pw> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/swap_state: mark various intentional data racesQian Cai
swap_cache_info.* could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101: lookup_swap_cache+0x12e/0x460 lookup_swap_cache at mm/swap_state.c:322 do_swap_page+0x112/0xeb0 __handle_mm_fault+0xc7a/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100: lookup_swap_cache+0x117/0x460 lookup_swap_cache at mm/swap_state.c:322 shmem_swapin_page+0xc7/0x9e0 shmem_getpage_gfp+0x2ca/0x16c0 shmem_fault+0xef/0x3c0 __do_fault+0x9e/0x220 do_fault+0x4a0/0x920 __handle_mm_fault+0xc69/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G W O L 5.5.0-next-20200204+ #6 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87: __delete_from_swap_cache+0x681/0x8b0 __delete_from_swap_cache at mm/swap_state.c:178 read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53: __delete_from_swap_cache+0x66e/0x8b0 __delete_from_swap_cache at mm/swap_state.c:178 Both the read and write are done as lockless. Since swap_cache_info.* are only used to print out counter information, even if any of them missed a few incremental due to data races, it will be harmless, so just mark it as an intentional data race using the data_race() macro. While at it, fix a checkpatch.pl warning, WARNING: Single statement macros should not use a do {} while (0) loop Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm-page_io-mark-various-intentional-data-races-v2Qian Cai
add a missing annotation Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/page_io: mark various intentional data racesQian Cai
struct swap_info_struct si.flags could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16: scan_swap_map_slots+0x6fe/0xb50 scan_swap_map_slots at mm/swapfile.c:887 get_swap_pages+0x39d/0x5c0 get_swap_page+0x377/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1740/0x2820 shrink_inactive_list+0x316/0x8b0 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7: swap_readpage+0x204/0x6a0 swap_readpage at mm/page_io.c:380 read_swap_cache_async+0xa2/0xb0 swapin_readahead+0x6a0/0x890 do_swap_page+0x465/0xeb0 __handle_mm_fault+0xc7a/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 7 PID: 5422 Comm: gmain Tainted: G W O L 5.5.0-next-20200204+ #6 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Other reads, read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120: __swap_writepage+0x140/0xc20 __swap_writepage at mm/page_io.c:289 read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16: swap_set_page_dirty+0x44/0x1f4 swap_set_page_dirty at mm/page_io.c:442 The write is under &si->lock, but the reads are done as lockless. Since the reads only check for a specific bit in the flag, it is harmless even if load tearing happens. Thus, just mark them as intentional data races using the data_race() macro. Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/frontswap: mark various intentional data racesQian Cai
There are a few information counters that are intentionally not protected against increment races, so just annotate them using the data_race() macro. BUG: KCSAN: data-race in __frontswap_store / __frontswap_store write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103: __frontswap_store+0x2d0/0x344 inc_frontswap_failed_stores at mm/frontswap.c:70 (inlined by) __frontswap_store at mm/frontswap.c:280 swap_writepage+0x83/0xf0 pageout+0x33e/0xae0 shrink_page_list+0x1f57/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47: __frontswap_store+0x2b9/0x344 inc_frontswap_failed_stores at mm/frontswap.c:70 (inlined by) __frontswap_store at mm/frontswap.c:280 swap_writepage+0x83/0xf0 pageout+0x33e/0xae0 shrink_page_list+0x1f57/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Marco Elver <elver@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04mm/kmemleak: silence KCSAN splats in checksumQian Cai
Even if KCSAN is disabled for kmemleak, update_checksum() could still call crc32() (which is outside of kmemleak.c) to dereference object->pointer. Thus, the value of object->pointer could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12: do_raw_spin_lock+0x114/0x200 debug_spin_lock_after at kernel/locking/spinlock_debug.c:91 (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115 _raw_spin_lock+0x40/0x50 __handle_mm_fault+0xa9e/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60: crc32_le_base+0x67/0x350 crc32_le_base+0x67/0x350: crc32_body at lib/crc32.c:106 (inlined by) crc32_le_generic at lib/crc32.c:179 (inlined by) crc32_le at lib/crc32.c:197 kmemleak_scan+0x528/0xd90 update_checksum at mm/kmemleak.c:1172 (inlined by) kmemleak_scan at mm/kmemleak.c:1497 kmemleak_scan_thread+0xcc/0xfa kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 If a shattered value was returned due to a data race, it will be corrected in the next scan. Thus, let KCSAN ignore all reads in the region to silence KCSAN in case the write side is non-atomic. Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Suggested-by: Marco Elver <elver@google.com> Acked-by: Marco Elver <elver@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04kernel: better document the use_mm/unuse_mm API contractChristoph Hellwig
Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04kernel: move use_mm/unuse_mm to kthread.cChristoph Hellwig
Patch series "improve use_mm / unuse_mm", v2. This series improves the use_mm / unuse_mm interface by better documenting the assumptions, and my taking the set_fs manipulations spread over the callers into the core API. This patch (of 3): Use the proper API instead. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de These helpers are only for use with kernel threads, and I will tie them more into the kthread infrastructure going forward. Also move the prototypes to kthread.h - mmu_context.h was a little weird to start with as it otherwise contains very low-level MM bits. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04kernel/sysctl: support handling command line aliasesVlastimil Babka
We can now handle sysctl parameters on kernel command line, but historically some parameters introduced their own command line equivalent, which we don't want to remove for compatibility reasons. We can however convert them to the generic infrastructure with a table translating the legacy command line parameters to their sysctl names, and removing the one-off param handlers. This patch adds the support and makes the first conversion to demonstrate it, on the (deprecated) numa_zonelist_order parameter. Link: http://lkml.kernel.org/r/20200427180433.7029-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-05-04Merge branch 'akpm-current/current'Stephen Rothwell
2020-05-04Merge remote-tracking branch 'rcu/rcu/next'Stephen Rothwell
2020-05-04Merge remote-tracking branch 'tip/auto-latest'Stephen Rothwell
2020-05-04Merge remote-tracking branch 'drm/drm-next'Stephen Rothwell
# Conflicts: # drivers/gpu/drm/tidss/tidss_encoder.c # include/linux/dma-buf.h
2020-05-04Merge remote-tracking branch 'jc_docs/docs-next'Stephen Rothwell
# Conflicts: # Documentation/arm64/amu.rst # Documentation/arm64/booting.rst
2020-05-04Merge remote-tracking branch 'vfs/for-next'Stephen Rothwell
2020-05-04Merge remote-tracking branch 'kspp-gustavo/for-next/kspp'Stephen Rothwell
2020-05-01treewide: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. Notice that, currently, more than 250 of these patches have already been merged into mainline during the last merge window, including 5.7-rc2. So, in order to make better use of everyone's time, I'm planning to add this treewide patch to my -next tree and then send a pull request to Linus for 5.7-rc3 or -rc4, after getting some acks and/or reviews. This treewide patch has been successfully built (on top of v5.7-rc1) for multiple architectures (arm, arm64, sparc, powerpc, ia64, s390, i386, nios2, c6x, xtensa, openrisc, mips, parisc, x86_64, riscv, sh, sparc64) and 82 different configurations with the help of the 0-day CI guys[5]. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=replace+zero-length+array+with+flexible-array+member [5] https://github.com/GustavoARSilva/linux-hardening/blob/master/cii/kernel-ci/kspp-fam0-20200420.md Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2020-04-30mm/memory_hotplug: allow arch override of non boot memory resource namesJames Morse
Memory added to the system by hotplug has a 'System RAM' resource created for it. This is exposed to user-space via /proc/iomem. This poses problems for kexec on arm64. If kexec decides to place the kernel in one of these newly onlined regions, the new kernel will find itself booting from a region not described as memory in the firmware tables. Arm64 doesn't have a structure like the e820 memory map that can be re-written when memory is brought online. Instead arm64 uses the UEFI memory map, or the memory node from the DT, sometimes both. We never rewrite these. Allow an architecture to specify a different name for these hotplug regions. Link: http://lkml.kernel.org/r/20200326180730.4754-3-james.morse@arm.com Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Will Deacon <will@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: piliu <piliu@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm-debug-add-tests-validating-architecture-page-table-helpers-v17Anshuman Khandual
- debug_vm_pgtable() is now called from late_initcall() per Linus - Explicitly enable DEBUG_VM_PGTABLE when ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM - Added #ifdef documentation per Gerald - Dropped page table helper semantics documentation (will be added via later patches) - Split the X86 changes defining mm_p4d_folded() into a new prerequisite patch Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> # s390 Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/debug: add tests validating architecture page table helpersAnshuman Khandual
This adds tests which will validate architecture page table helpers and other accessors in their compliance with expected generic MM semantics. This will help various architectures in validating changes to existing page table helpers or addition of new ones. This test covers basic page table entry transformations including but not limited to old, young, dirty, clean, write, write protect etc at various level along with populating intermediate entries with next page table page and validating them. Test page table pages are allocated from system memory with required size and alignments. The mapped pfns at page table levels are derived from a real pfn representing a valid kernel text symbol. This test gets called via late_initcall(). This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected. Any architecture, which is willing to subscribe this test will need to select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64, x86, s390 and powerpc platforms where the test is known to build and run successfully Going forward, other architectures too can subscribe the test after fixing any build or runtime problems with their page table helpers. Meanwhile for better platform coverage, the test can also be enabled with CONFIG_EXPERT even without ARCH_HAS_DEBUG_VM_PGTABLE. Folks interested in making sure that a given platform's page table helpers conform to expected generic MM semantics should enable the above config which will just trigger this test during boot. Any non conformity here will be reported as an warning which would need to be fixed. This test will help catch any changes to the agreed upon semantics expected from generic MM and enable platforms to accommodate it thereafter. Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Qian Cai <cai@lca.pw> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> # s390 Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm: use false for bool variableZou Wei
Fixes coccicheck warning: mm/zbud.c:246:1-20: WARNING: Assignment of 0/1 to bool variable mm/mremap.c:777:2-8: WARNING: Assignment of 0/1 to bool variable mm/huge_memory.c:525:9-10: WARNING: return of 0/1 in function 'is_transparent_hugepage' with return type bool Link: http://lkml.kernel.org/r/1586835930-47076-1-git-send-email-zou_wei@huawei.com Signed-off-by: Zou Wei <zou_wei@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/memory: fix a typo in comment "attampt"->"attempt"Ethon Paul
There is a comment in typo, fix it. Link: http://lkml.kernel.org/r/20200411004043.14686-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/page-writeback: fix a typo in comment "effictive"->"effective"Ethon Paul
There is a typo in comment, fix it. Link: http://lkml.kernel.org/r/20200411003513.14613-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/sparse: fix a typo in comment "convienence"->"convenience"Ethon Paul
There is a typo in comment, fix it. Link: http://lkml.kernel.org/r/20200411002955.14545-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/slub: fix a typo in comment "disambiguiation"->"disambiguation"Ethon Paul
There is a typo in comment, fix it. Link: http://lkml.kernel.org/r/20200411002247.14468-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm: fix a typo in comment "strucure"->"structure"Ethon Paul
There is a typo in comment, fix it. Link: http://lkml.kernel.org/r/20200411064723.15855-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm, memcg: fix some typos in memcontrol.cEthon Paul
There are some typos in comment, fix them. s/responsiblity/responsibility s/oflline/offline Link: http://lkml.kernel.org/r/20200411064246.15781-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/frontswap: fix some typos in frontswap.cEthon Paul
There are some typos in comment, fix them. s/Fortunatly/Fortunately s/taked/taken s/necessory/necessary s/shink/shrink Link: http://lkml.kernel.org/r/20200411064009.15727-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/filemap: fix a typo in comment "unneccssary"->"unnecessary"Ethon Paul
There is a typo in comment, fix it. Link: http://lkml.kernel.org/r/20200411065141.15936-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/list_lru: fix a typo in comment "numbesr"->"numbers"Ethon Paul
There is a typo in comment, fix it. Link: http://lkml.kernel.org/r/20200411071041.16161-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2020-04-30mm/memblock: fix a typo in comment "implict"->"implicit"Ethon Paul
There is a typo in commet, fix it. Link: http://lkml.kernel.org/r/20200411070701.16097-1-ethp@qq.com Signed-off-by: Ethon Paul <ethp@qq.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>