summaryrefslogtreecommitdiff
path: root/arch/riscv/mm
AgeCommit message (Collapse)Author
2020-06-18RISC-V: Acquire mmap lock before invoking walk_page_rangeAtish Patra
As per walk_page_range documentation, mmap lock should be acquired by the caller before invoking walk_page_range. mmap_assert_locked gets triggered without that. The details can be found here. http://lists.infradead.org/pipermail/linux-riscv/2020-June/010335.html Fixes: 395a21ff859c(riscv: add ARCH_HAS_SET_DIRECT_MAP support) Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-06-11Merge tag 'riscv-for-linus-5.8-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Kconfig select statements are now sorted alphanumerically - first-level interrupts are now handled via a full irqchip driver - CPU hotplug is fixed - vDSO calls now use the common vDSO infrastructure * tag 'riscv-for-linus-5.8-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: set the permission of vdso_data to read-only riscv: use vDSO common flow to reduce the latency of the time-related functions riscv: fix build warning of missing prototypes RISC-V: Don't mark init section as non-executable RISC-V: Force select RISCV_INTC for CONFIG_RISCV RISC-V: Remove do_IRQ() function clocksource/drivers/timer-riscv: Use per-CPU timer interrupt irqchip: RISC-V per-HART local interrupt controller driver RISC-V: Rename and move plic_find_hart_id() to arch directory RISC-V: self-contained IPI handling routine RISC-V: Sort select statements alphanumerically
2020-06-09RISC-V: Don't mark init section as non-executableAnup Patel
The head text section (i.e. _start, secondary_start_sbi, etc) and the init section fall under same page table level-1 mapping. Currently, the runtime CPU hotplug is broken because we are marking init section as non-executable which in-turn marks head text section as non-executable. Further investigating other architectures, it seems marking the init section as non-executable is redundant because the init section pages are anyway poisoned and freed. To fix broken runtime CPU hotplug, we simply remove the code marking the init section as non-executable. Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support") Cc: stable@vger.kernel.org Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: convert mmap_sem API commentsMichel Lespinasse
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: convert mmap_sem call sites missed by coccinelleMichel Lespinasse
Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: consolidate pte_index() and pte_offset_*() definitionsMike Rapoport
All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [rppt@linux.ibm.com: v2] Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org [akpm@linux-foundation.org: fix x86 warning] [sfr@canb.auug.org.au: fix powerpc build] Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: reorder includes after introduction of linux/pgtable.hMike Rapoport
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: don't include asm/pgtable.h if linux/mm.h is already includedMike Rapoport
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04Merge tag 'riscv-for-linus-5.8-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - The remainder of the code necessary to support the Kendryte K210: * Support for building device trees into the kernel, as the K210 doesn't have a bootloader that provides one * A K210 device tree and the associated defconfig update * Support for skipping PMP initialization on systems that trap on PMP accesses rather than treating them as WARL - Support for KGDB - Improvements to text patching - Some cleanups to the SiFive L2 cache driver * tag 'riscv-for-linus-5.8-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: soc: sifive: l2 cache: Mark l2_get_priv_group as static soc: sifive: l2 cache: Eliminate an unsigned zero compare warning riscv: Add support to determine no. of L2 cache way enabled riscv: cacheinfo: Implement cache_get_priv_group with a generic ops structure riscv: Use text_mutex instead of patch_lock riscv: Use NOKPROBE_SYMBOL() instead of __krpobes annotation riscv: Remove the 'riscv_' prefix of function name riscv: Add SW single-step support for KDB riscv: Use the XML target descriptions to report 3 system registers riscv: Add KGDB support kgdb: Add kgdb_has_hit_break function RISC-V: Skip setting up PMPs on traps riscv: K210: Update defconfig riscv: K210: Add a built-in device tree riscv: Allow device trees to be built into the kernel
2020-06-03riscv: support DEBUG_WXZong Li
Support DEBUG_WX to check whether there are mapping with write and execute permission at the same time. [akpm@linux-foundation.org: replace macros with C] Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/282e266311bced080bc6f7c255b92f87c1eb65d6.1587455584.git.zong.li@sifive.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03hugetlbfs: remove hugetlb_add_hstate() warning for existing hstateMike Kravetz
hugetlb_add_hstate() prints a warning if the hstate already exists. This was originally done as part of kernel command line parsing. If 'hugepagesz=' was specified more than once, the warning pr_warn("hugepagesz= specified twice, ignoring\n"); would be printed. Some architectures want to enable all huge page sizes. They would call hugetlb_add_hstate for all supported sizes. However, this was done after command line processing and as a result hstates could have already been created for some sizes. To make sure no warning were printed, there would often be code like: if (!size_to_hstate(size) hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT) The only time we want to print the warning is as the result of command line processing. So, remove the warning from hugetlb_add_hstate and add it to the single arch independent routine processing "hugepagesz=". After this, calls to size_to_hstate() in arch specific code can be removed and hugetlb_add_hstate can be called without worrying about warning messages. [mike.kravetz@oracle.com: fix hugetlb initialization] Link: http://lkml.kernel.org/r/4c36c6ce-3774-78fa-abc4-b7346bf24348@oracle.com Link: http://lkml.kernel.org/r/20200428205614.246260-5-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Mina Almasry <almasrymina@google.com> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Will Deacon <will@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Longpeng <longpeng2@huawei.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Qian Cai <cai@lca.pw> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200417185049.275845-4-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200428205614.246260-4-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03hugetlbfs: move hugepagesz= parsing to arch independent codeMike Kravetz
Now that architectures provide arch_hugetlb_valid_size(), parsing of "hugepagesz=" can be done in architecture independent code. Create a single routine to handle hugepagesz= parsing and remove all arch specific routines. We can also remove the interface hugetlb_bad_size() as this is no longer used outside arch independent code. This also provides consistent behavior of hugetlbfs command line options. The hugepagesz= option should only be specified once for a specific size, but some architectures allow multiple instances. This appears to be more of an oversight when code was added by some architectures to set up ALL huge pages sizes. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Mina Almasry <almasrymina@google.com> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Will Deacon <will@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Longpeng <longpeng2@huawei.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Qian Cai <cai@lca.pw> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200417185049.275845-3-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200428205614.246260-3-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03hugetlbfs: add arch_hugetlb_valid_sizeMike Kravetz
Patch series "Clean up hugetlb boot command line processing", v4. Longpeng(Mike) reported a weird message from hugetlb command line processing and proposed a solution [1]. While the proposed patch does address the specific issue, there are other related issues in command line processing. As hugetlbfs evolved, updates to command line processing have been made to meet immediate needs and not necessarily in a coordinated manner. The result is that some processing is done in arch specific code, some is done in arch independent code and coordination is problematic. Semantics can vary between architectures. The patch series does the following: - Define arch specific arch_hugetlb_valid_size routine used to validate passed huge page sizes. - Move hugepagesz= command line parsing out of arch specific code and into an arch independent routine. - Clean up command line processing to follow desired semantics and document those semantics. [1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpeng2@huawei.com This patch (of 3): The architecture independent routine hugetlb_default_setup sets up the default huge pages size. It has no way to verify if the passed value is valid, so it accepts it and attempts to validate at a later time. This requires undocumented cooperation between the arch specific and arch independent code. For architectures that support more than one huge page size, provide a routine arch_hugetlb_valid_size to validate a huge page size. hugetlb_default_setup can use this to validate passed values. arch_hugetlb_valid_size will also be used in a subsequent patch to move processing of the "hugepagesz=" in arch specific code to a common routine in arch independent code. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Longpeng <longpeng2@huawei.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Qian Cai <cai@lca.pw> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200428205614.246260-1-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200428205614.246260-2-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200417185049.275845-1-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200417185049.275845-2-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03mm: use free_area_init() instead of free_area_init_nodes()Mike Rapoport
free_area_init() has effectively became a wrapper for free_area_init_nodes() and there is no point of keeping it. Still free_area_init() name is shorter and more general as it does not imply necessity to initialize multiple nodes. Rename free_area_init_nodes() to free_area_init(), update the callers and drop old version of free_area_init(). Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Hoan Tran <hoan@os.amperecomputing.com> [arm64] Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Brian Cain <bcain@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02mm: ptdump: expand type of 'val' in note_page()Steven Price
The page table entry is passed in the 'val' argument to note_page(), however this was previously an "unsigned long" which is fine on 64-bit platforms. But for 32 bit x86 it is not always big enough to contain a page table entry which may be 64 bits. Change the type to u64 to ensure that it is always big enough. [akpm@linux-foundation.org: fix riscv] Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200521152308.33096-3-steven.price@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-20riscv: Fix print_vm_layout build error if NOMMUKefeng Wang
arch/riscv/mm/init.c: In function ‘print_vm_layout’: arch/riscv/mm/init.c:68:37: error: ‘FIXADDR_START’ undeclared (first use in this function); arch/riscv/mm/init.c:69:20: error: ‘FIXADDR_TOP’ undeclared arch/riscv/mm/init.c:70:37: error: ‘PCI_IO_START’ undeclared arch/riscv/mm/init.c:71:20: error: ‘PCI_IO_END’ undeclared arch/riscv/mm/init.c:72:38: error: ‘VMEMMAP_START’ undeclared arch/riscv/mm/init.c:73:20: error: ‘VMEMMAP_END’ undeclared (first use in this function); Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-18riscv: Allow device trees to be built into the kernelPalmer Dabbelt
Some systems don't provide a useful device tree to the kernel on boot. Chasing around bootloaders for these systems is a headache, so instead le't's just keep a device tree table in the kernel, keyed by the SOC's unique identifier, that contains the relevant DTB. This is only implemented for M mode right now. While we could implement this via the SBI calls that allow access to these identifiers, we don't have any systems that need this right now. Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-05RISC-V: Remove unused code from STRICT_KERNEL_RWXAtish Patra
This patch removes the unused functions set_kernel_text_rw/ro. Currently, it is not being invoked from anywhere and no other architecture (except arm) uses this code. Even in ARM, these functions are not invoked from anywhere currently. Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support") Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-05-04riscv: set max_pfn to the PFN of the last pageVincent Chen
The current max_pfn equals to zero. In this case, I found it caused users cannot get some page information through /proc such as kpagecount in v5.6 kernel because of new sanity checks. The following message is displayed by stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t 1" on HiFive unleashed board. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [109] 4 processors online, 4 processors configured stress-ng: info: [109] dispatching hogs: 1 physpage stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0 stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0 stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found stress-ng: debug: [109] cache allocate: default cache size: 2048K stress-ng: debug: [109] starting stressors stress-ng: debug: [109] 1 stressor spawned stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0) stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success) stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success) ... stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success) stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0) stress-ng: debug: [109] process [110] terminated stress-ng: info: [109] successful run completed in 1.00s # After applying this patch, the kernel can pass the test. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs stress-ng: debug: [104] cache allocate: default cache size: 2048K stress-ng: debug: [104] starting stressors stress-ng: debug: [104] 1 stressor spawned stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated stress-ng: info: [104] successful run completed in 1.01s # Cc: stable@vger.kernel.org Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Yash Shah <yash.shah@sifive.com> Tested-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-04-09Merge tag 'riscv-for-linus-5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "This contains a handful of new features: - Partial support for the Kendryte K210. There are still a few outstanding issues that I have patches for, but I don't actually have a board to test them so they're not included yet. - SBI v0.2 support. - Fixes to support for building with LLVM-based toolchains. The resulting images are known not to boot yet. I don't anticipate a part two, but I'll probably have something early in the RCs to finish up the K210 support" * tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits) riscv: create a loader.bin boot image for Kendryte SoC riscv: Kendryte K210 default config riscv: Add Kendryte K210 device tree riscv: Select required drivers for Kendryte SOC riscv: Add Kendryte K210 SoC support riscv: Add SOC early init support riscv: Unaligned load/store handling for M_MODE RISC-V: Support cpu hotplug RISC-V: Add supported for ordered booting method using HSM RISC-V: Add SBI HSM extension definitions RISC-V: Export SBI error to linux error mapping function RISC-V: Add cpu_ops and modify default booting method RISC-V: Move relocate and few other functions out of __init RISC-V: Implement new SBI v0.2 extensions RISC-V: Introduce a new config for SBI v0.1 RISC-V: Add SBI v0.2 extension definitions RISC-V: Add basic support for SBI v0.2 RISC-V: Mark existing SBI as 0.1 SBI. riscv: Use macro definition instead of magic number riscv: Add support to dump the kernel page tables ...
2020-04-02mm: allow VM_FAULT_RETRY for multiple timesPeter Xu
The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: introduce FAULT_FLAG_DEFAULTPeter Xu
Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: introduce fault_signal_pending()Peter Xu
For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path. It cleans the current codes a bit so we don't need to duplicate the same check across archs. More importantly, this will be an unified place that we handle the signal immediately right after an interrupted page fault, so it'll be much easier for us if we want to change the behavior of handling signals later on for all the archs. Note that currently only part of the archs are using this new helper, because some archs have their own way to handle signals. In the follow up patches, we'll try to apply this helper to all the rest of archs. Another note is that the "regs" parameter in the new helper is not used yet. It'll be used very soon. Now we kept it in this patch only to avoid touching all the archs again in the follow up patches. [peterx@redhat.com: fix sparse warnings] Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-26riscv: Add support to dump the kernel page tablesZong Li
In a similar manner to arm64, x86, powerpc, etc., it can traverse all page tables, and dump the page table layout with the memory types and permissions. Add a debugfs file at /sys/kernel/debug/kernel_page_tables to export the page table layout to userspace. Signed-off-by: Zong Li <zong.li@sifive.com> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26riscv: add STRICT_KERNEL_RWX supportZong Li
The commit contains that make text section as non-writable, rodata section as read-only, and data section as non-executable. The init section should be changed to non-executable. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC supportZong Li
ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap pages for debugging purposes. Implement the __kernel_map_pages functions to fill the poison pattern. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26riscv: add ARCH_HAS_SET_DIRECT_MAP supportZong Li
Add set_direct_map_*() functions for setting the direct map alias for the page to its default permissions and to an invalid state that cannot be cached in a TLB. (See d253ca0c ("x86/mm/cpa: Add set_direct_map_*() functions")) Add a similar implementation for RISC-V. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26riscv: add ARCH_HAS_SET_MEMORY supportZong Li
Add set_memory_ro/rw/x/nx architecture hooks to change the page attribution. Use own set_memory.h rather than generic set_memory.h (i.e. include/asm-generic/set_memory.h), because we want to add other function prototypes here. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-05riscv: Use p*d_leaf macros to define p*d_hugeAlexandre Ghiti
The newly introduced p*d_leaf macros allow to check if an entry of the page table map to a physical page instead of the next level. To avoid duplication of code, use those macros to determine if a page table entry points to a hugepage. Suggested-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-04riscv: Fix range looking for kernel image memblockAlexandre Ghiti
When looking for the memblock where the kernel lives, we should check that the memory range associated to the memblock entirely comprises the kernel image and not only intersects with it. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-02-24riscv: adjust the indentZong Li
Adjust the indent to match Linux coding style. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-02-24riscv: allocate a complete page size for each page tableZong Li
Each page table should be created by allocating a complete page size for it. Otherwise, the content of the page table would be corrupted somewhere through memory allocation which allocates the memory at the middle of the page table for other use. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-01-23riscv: mm: add support for CONFIG_DEBUG_VIRTUALZong Li
This patch implements CONFIG_DEBUG_VIRTUAL to do additional checks on virt_to_phys and __pa_symbol calls. virt_to_phys used for linear mapping check, and __pa_symbol used for kernel symbol check. In current RISC-V, kernel image maps to linear mapping area. If CONFIG_DEBUG_VIRTUAL is disable, these two functions calculate the offset on the address feded directly without any checks. The result of test_debug_virtual as follows: [ 0.358456] ------------[ cut here ]------------ [ 0.358738] virt_to_phys used for non-linear address: (____ptrval____) (0xffffffd000000000) [ 0.359174] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x3c/0x50 [ 0.359409] Modules linked in: [ 0.359630] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc3-00002-g5133c5c0ca13 #57 [ 0.359861] epc: ffffffe000253d1a ra : ffffffe000253d1a sp : ffffffe03aa87da0 [ 0.360019] gp : ffffffe000ae03b0 tp : ffffffe03aa88000 t0 : ffffffe000af2660 [ 0.360175] t1 : 0000000000000064 t2 : 00000000000000b7 s0 : ffffffe03aa87dc0 [ 0.360330] s1 : ffffffd000000000 a0 : 000000000000004f a1 : 0000000000000000 [ 0.360492] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffe000a84358 [ 0.360672] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 [ 0.360876] s2 : ffffffe000ae0600 s3 : ffffffe00000fc7c s4 : ffffffe0000224b0 [ 0.361067] s5 : ffffffe000030890 s6 : ffffffe000022470 s7 : 0000000000000008 [ 0.361267] s8 : ffffffe0000002c4 s9 : ffffffe000ae0640 s10: ffffffe000ae0630 [ 0.361453] s11: 0000000000000000 t3 : 0000000000000000 t4 : 000000000001e6d0 [ 0.361636] t5 : ffffffe000ae0a18 t6 : ffffffe000aee54e [ 0.361806] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.362056] ---[ end trace aec0bf78d4978122 ]--- [ 0.362404] PA: 0xfffffff080200000 for VA: 0xffffffd000000000 [ 0.362607] PA: 0x00000000baddd2d0 for VA: 0xffffffe03abdd2d0 Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Tested-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-01-22riscv: Add KASAN supportNick Hu
This patch ports the feature Kernel Address SANitizer (KASAN). Note: The start address of shadow memory is at the beginning of kernel space, which is 2^64 - (2^39 / 2) in SV39. The size of the kernel space is 2^38 bytes so the size of shadow memory should be 2^38 / 8. Thus, the shadow memory would not overlap with the fixmap area. There are currently two limitations in this port, 1. RV64 only: KASAN need large address space for extra shadow memory region. 2. KASAN can't debug the modules since the modules are allocated in VMALLOC area. We mapped the shadow memory, which corresponding to VMALLOC area, to the kasan_early_shadow_page because we don't have enough physical space for all the shadow memory corresponding to VMALLOC area. Signed-off-by: Nick Hu <nickhu@andestech.com> Reported-by: Greentime Hu <green.hu@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-01-03riscv: mm: use __pa_symbol for kernel symbolsZong Li
__pa_symbol is the marcro that should be used for kernel symbols. It is also a pre-requisite for DEBUG_VIRTUAL which will do bounds checking. Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27riscv: export flush_icache_all to modulesOlof Johansson
This is needed by LKDTM (crash dump test module), it calls flush_icache_range(), which on RISC-V turns into flush_icache_all(). On other architectures, the actual implementation is exported, so follow that precedence and export it here too. Fixes build of CONFIG_LKDTM that fails with: ERROR: "flush_icache_all" [drivers/misc/lkdtm/lkdtm.ko] undefined! Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-20riscv: move sifive_l2_cache.c to drivers/socChristoph Hellwig
The sifive_l2_cache.c is in no way related to RISC-V architecture memory management. It is a little stub driver working around the fact that the EDAC maintainers prefer their drivers to be structured in a certain way that doesn't fit the SiFive SOCs. Move the file to drivers/soc and add a Kconfig option for it, as well as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE. Fixes: a967a289f169 ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Borislav Petkov <bp@suse.de> [paul.walmsley@sifive.com: keep the MAINTAINERS change specific to the L2$ controller code] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-04Merge tag 'riscv/for-v5.5-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Paul Walmsley: "A few minor RISC-V updates for v5.5-rc1 that arrived late. New features: - Dump some kernel virtual memory map details to the console if CONFIG_DEBUG_VM is enabled Other improvements: - Enable more debugging options in the primary defconfigs Cleanups: - Clean up Kconfig indentation" * tag 'riscv/for-v5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Add address map dumper riscv: defconfigs: enable more debugging options riscv: defconfigs: enable debugfs riscv: Fix Kconfig indentation
2019-11-28Merge tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremapLinus Torvalds
Pull generic ioremap support from Christoph Hellwig: "This adds the remaining bits for an entirely generic ioremap and iounmap to lib/ioremap.c. To facilitate that, it cleans up the giant mess of weird ioremap variants we had with no users outside the arch code. For now just the three newest ports use the code, but there is more than a handful others that can be converted without too much work. Summary: - clean up various obsolete ioremap and iounmap variants - add a new generic ioremap implementation and switch csky, nds32 and riscv over to it" * tag 'ioremap-5.5' of git://git.infradead.org/users/hch/ioremap: (21 commits) nds32: use generic ioremap csky: use generic ioremap csky: remove ioremap_cache riscv: use the generic ioremap code lib: provide a simple generic ioremap implementation sh: remove __iounmap nios2: remove __iounmap hexagon: remove __iounmap m68k: rename __iounmap and mark it static arch: rely on asm-generic/io.h for default ioremap_* definitions asm-generic: don't provide ioremap for CONFIG_MMU asm-generic: ioremap_uc should behave the same with and without MMU xtensa: clean up ioremap x86: Clean up ioremap() parisc: remove __ioremap nios2: remove __ioremap alpha: remove the unused __ioremap wrapper hexagon: clean up ioremap ia64: rename ioremap_nocache to ioremap_uc unicore32: remove ioremap_cached ...
2019-11-22Merge branch 'next/misc2' into for-nextPaul Walmsley
2019-11-22Merge branch 'next/nommu' into for-nextPaul Walmsley
Conflicts: arch/riscv/boot/Makefile arch/riscv/include/asm/sbi.h
2019-11-22Merge branch 'next/misc' into for-nextPaul Walmsley
2019-11-22RISC-V: Add address map dumperYash Shah
Add support for dumping the kernel address space layout to the console. User can enable CONFIG_DEBUG_VM to dump the virtual memory region into dmesg buffer during boot-up. Signed-off-by: Yash Shah <yash.shah@sifive.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Anup Patel <anup@brainfault.org> [paul.walmsley@sifive.com: dropped .init/.text/.data/.bss prints; added PCI legacy I/O region display] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-11-17riscv: add nommu supportChristoph Hellwig
The kernel runs in M-mode without using page tables, and thus can't run bare metal without help from additional firmware. Most of the patch is just stubbing out code not needed without page tables, but there is an interesting detail in the signals implementation: - The normal RISC-V syscall ABI only implements rt_sigreturn as VDSO entry point, but the ELF VDSO is not supported for nommu Linux. We instead copy the code to call the syscall onto the stack. In addition to enabling the nommu code a new defconfig for a small kernel image that can run in nommu mode on qemu is also provided, to run a kernel in qemu you can use the following command line: qemu-system-riscv64 -smp 2 -m 64 -machine virt -nographic \ -kernel arch/riscv/boot/loader \ -drive file=rootfs.ext2,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0 Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anup Patel <anup@brainfault.org> [paul.walmsley@sifive.com: updated to apply; add CONFIG_MMU guards around PCI_IOBASE definition to fix build issues; fixed checkpatch issues; move the PCI_IO_* and VMEMMAP address space macros along with the others; resolve sparse warning] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-11-13riscv: implement remote sfence.i using IPIsChristoph Hellwig
The RISC-V ISA only supports flushing the instruction cache for the local CPU core. Currently we always offload the remote TLB flushing to the SBI, which then issues an IPI under the hoods. But with M-mode we do not have an SBI so we have to do it ourselves. IPI to the other nodes using the existing kernel helpers instead if we have native clint support and thus can IPI directly from the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anup Patel <anup@brainfault.org> [paul.walmsley@sifive.com: cleaned up code comment] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-11-12riscv: Use PMD_SIZE to replace PTE_PARENT_SIZEZong Li
The PMD_SIZE is equal to PGDIR_SIZE when __PAGETABLE_PMD_FOLDED is defined. Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> [paul.walmsley@sifive.com: fixed spelling in commit summary] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-11-11riscv: use the generic ioremap codeChristoph Hellwig
Use the generic ioremap code instead of providing a local version. Note that this relies on the asm-generic no-op definition of pgprot_noncached. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Tested-by: Paul Walmsley <paul.walmsley@sifive.com> # rv32, rv64 boot Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv