summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2022-01-08truncate,shmem: Handle truncates that split large foliosMatthew Wilcox (Oracle)
Handle folio splitting in the parts of the truncation functions which already handle partial pages. Factor all that code out into a new function called truncate_inode_partial_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08truncate: Convert invalidate_inode_pages2_range to foliosMatthew Wilcox (Oracle)
If we're going to unmap a folio, we have to be sure to unmap the entire folio, not just the part of it which lies after the search index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08mm: Remove pagevec_remove_exceptionals()Matthew Wilcox (Oracle)
All of its callers now call folio_batch_remove_exceptionals(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08mm: Convert find_lock_entries() to use a folio_batchMatthew Wilcox (Oracle)
find_lock_entries() already only returned the head page of folios, so convert it to return a folio_batch instead of a pagevec. That cascades through converting truncate_inode_pages_range() to delete_from_page_cache_batch() and page_cache_delete_batch(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08filemap: Return only folios from find_get_entries()Matthew Wilcox (Oracle)
The callers have all been converted to work on folios, so convert find_get_entries() to return a batch of folios instead of pages. We also now return multiple large folios in a single call. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-01-08filemap: Convert filemap_get_read_batch() to use a folio_batchMatthew Wilcox (Oracle)
This change ripples all the way through the filemap_read() call chain and removes a lot of messing about converting folios to pages and back again. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08filemap: Convert filemap_read() to use a folioMatthew Wilcox (Oracle)
We know the pagevec always contains folios, but use page_folio() anyway instead of casting. Removes a few calls to legacy functions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08truncate: Add invalidate_complete_folio2()Matthew Wilcox (Oracle)
Convert invalidate_complete_page2() to invalidate_complete_folio2(). Use filemap_free_folio() to free the page instead of calling ->freepage manually. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08truncate: Convert invalidate_inode_pages2_range() to use a folioMatthew Wilcox (Oracle)
If we're going to unmap a folio, we have to be sure to unmap the entire folio, not just the part of it which lies after the search index. We cannot yet remove the struct page from invalidate_inode_pages2_range() because the page pointer in the pvec might be a shadow/dax/swap entry instead of actually a page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08truncate: Skip known-truncated indicesMatthew Wilcox (Oracle)
If we've truncated an entire folio, we can skip over all the indices covered by this folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08truncate,shmem: Add truncate_inode_folio()Matthew Wilcox (Oracle)
Convert all callers of truncate_inode_page() to call truncate_inode_folio() instead, and move the declaration to mm/internal.h. Move the assertion that the caller is not passing in a tail page to generic_error_remove_page(). We can't entirely remove the struct page from the callers yet because the page pointer in the pvec might be a shadow/dax/swap entry instead of actually a page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08shmem: Convert part of shmem_undo_range() to use a folioMatthew Wilcox (Oracle)
find_lock_entries() never returns tail pages. We cannot use page_folio() here as the pagevec may also contain swap entries, so simply cast for now. This is an intermediate step which will be fully removed by the end of this series. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08mm: Add unmap_mapping_folio()Matthew Wilcox (Oracle)
Convert both callers of unmap_mapping_page() to call unmap_mapping_folio() instead. Also move zap_details from linux/mm.h to mm/memory.c Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04truncate: Add truncate_cleanup_folio()Matthew Wilcox (Oracle)
Convert both callers of truncate_cleanup_page() to use truncate_cleanup_folio() instead. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Add filemap_release_folio()Matthew Wilcox (Oracle)
Reimplement try_to_release_page() as a wrapper around filemap_release_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Use a folio in filemap_page_mkwriteMatthew Wilcox (Oracle)
This fixes a bug for tail pages. They always have a NULL mapping, so the check would fail and we would never mark the folio as dirty. Ends up growing the kernel by 19 bytes although there will be fewer calls to compound_head() dynamically. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Use a folio in filemap_map_pagesMatthew Wilcox (Oracle)
Saves 61 bytes due to fewer calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Use folios in next_uptodate_pageMatthew Wilcox (Oracle)
This saves 105 bytes of text. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert page_cache_delete_batch to foliosMatthew Wilcox (Oracle)
Saves one call to compound_head() and reduces text size by 15 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert filemap_get_pages to use foliosMatthew Wilcox (Oracle)
This saves a few calls to compound_head(), including one in filemap_update_page(). Shrinks the kernel by 78 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Drop the refcount while waiting for page lockMatthew Wilcox (Oracle)
Commit bd8a1f3655a7 ("mm/filemap: support readpage splitting a page") changed the read_iter path to drop the refcount while waiting for the page lock. However, it missed the same pattern in read_mapping_page() and friends. Use the same pattern in do_read_cache_folio() that is used in filemap_update_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Add read_cache_folio and read_mapping_folioMatthew Wilcox (Oracle)
Reimplement read_cache_page() as a wrapper around read_cache_folio(). Saves over 400 bytes of text from do_read_cache_folio() which more than makes up for the extra 100 bytes of text added to the various wrapper functions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert filemap_fault to folioMatthew Wilcox (Oracle)
Instead of converting back-and-forth between the actual page and the head page, just convert once at the end of the function where we set the vmf->page. Saves 241 bytes of text, or 15% of the size of filemap_fault(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert do_async_mmap_readahead to take a folioMatthew Wilcox (Oracle)
Call page_cache_async_ra() directly instead of indirecting through page_cache_async_readahead(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04readahead: Convert page_cache_ra_unbounded to foliosMatthew Wilcox (Oracle)
This saves 99 bytes of kernel text. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04readahead: Convert page_cache_async_ra() to take a folioMatthew Wilcox (Oracle)
Using the folio here avoids checking whether it's a tail page. This patch mostly just enables some of the following patches. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert filemap_range_uptodate to foliosMatthew Wilcox (Oracle)
The only caller was already passing a head page, so this simply avoids a call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert filemap_create_page to folioMatthew Wilcox (Oracle)
This is all internal to filemap and saves 100 bytes of text. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert filemap_read_page to take a folioMatthew Wilcox (Oracle)
One of the callers already had a folio; the other two grow by a few bytes, but filemap_read_page() shrinks by 50 bytes for a net reduction of 27 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert find_get_pages_contig to foliosMatthew Wilcox (Oracle)
None of the callers of find_get_pages_contig() want tail pages. They all use order-0 pages today, but if they were converted, they'd want folios. So just remove the call to find_subpage() instead of replacing it with folio_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert filemap_get_read_batch to use foliosMatthew Wilcox (Oracle)
The page cache only stores folios, never tail pages. Saves 29 bytes due to removing calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert find_get_entry to return a folioMatthew Wilcox (Oracle)
Convert callers to cope. Saves 580 bytes of kernel text; all five callers are reduced in size. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Add filemap_remove_folio and __filemap_remove_folioMatthew Wilcox (Oracle)
Reimplement __delete_from_page_cache() as a wrapper around __filemap_remove_folio() and delete_from_page_cache() as a wrapper around filemap_remove_folio(). Remove the EXPORT_SYMBOL as delete_from_page_cache() was not used by any in-tree modules. Convert page_cache_free_page() into filemap_free_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert tracing of page cache operations to folioMatthew Wilcox (Oracle)
Pass the folio instead of a page. The page was already implicitly a folio as it accessed page->mapping directly. Add the order of the folio to the tracepoint, as this is important information. Also drop printing the address of the struct page as the pfn provides better information than the struct page address. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Add filemap_unaccount_folio()Matthew Wilcox (Oracle)
Replace unaccount_page_cache_page() with filemap_unaccount_folio(). The bug handling path could be a bit more robust (eg taking into account the mapcounts of tail pages), but it's really never supposed to happen. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Convert page_cache_delete to take a folioMatthew Wilcox (Oracle)
It was already assuming a head page, so this is a straightforward conversion. Convert the one caller to call page_folio(), even though it must currently be passing in a head page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-04filemap: Add folio_put_wait_locked()Matthew Wilcox (Oracle)
Convert all three callers of put_and_wait_on_page_locked() to folio_put_wait_locked(). This shrinks the kernel overall by 19 bytes. filemap_update_page() shrinks by 19 bytes while __migration_entry_wait() is unchanged. folio_put_wait_locked() is 14 bytes smaller than put_and_wait_on_page_locked(), but pmd_migration_entry_wait() grows by 14 bytes. It removes the assumption from pmd_migration_entry_wait() that pages cannot be larger than a PMD (which is true today, but may be interesting to explore in the future). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-02mm/writeback: Improve __folio_mark_dirty() commentMatthew Wilcox (Oracle)
Add some notes about how this function needs to be called. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-02filemap: Remove PageHWPoison check from next_uptodate_page()Matthew Wilcox (Oracle)
Pages are individually marked as suffering from hardware poisoning. Checking that the head page is not hardware poisoned doesn't make sense; we might be after a subpage. We check each page individually before we use it, so this was an optimisation gone wrong. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2021-11-25Merge tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull folio fixes from Matthew Wilcox: "In the course of preparing the folio changes for iomap for next merge window, we discovered some problems that would be nice to address now: - Renaming multi-page folios to large folios. mapping_multi_page_folio_support() is just a little too long, so we settled on mapping_large_folio_support(). That meant renaming, eg folio_test_multi() to folio_test_large(). Rename AS_THP_SUPPORT to match - I hadn't included folio wrappers for zero_user_segments(), etc. Also, multi-page^W^W large folio support is now independent of CONFIG_TRANSPARENT_HUGEPAGE, so machines with HIGHMEM always need to fall back to the out-of-line zero_user_segments(). Remove FS_THP_SUPPORT to match - The build bots finally got round to telling me that I missed a couple of architectures when adding flush_dcache_folio(). Christoph suggested that we just add linux/cacheflush.h and not rely on asm-generic/cacheflush.h" * tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache: mm: Add functions to zero portions of a folio fs: Rename AS_THP_SUPPORT and mapping_thp_support fs: Remove FS_THP_SUPPORT mm: Remove folio_test_single mm: Rename folio_test_multi to folio_test_large Add linux/cacheflush.h
2021-11-22hugetlbfs: flush before unlock on move_hugetlb_page_tables()Nadav Amit
We must flush the TLB before releasing i_mmap_rwsem to avoid the potential reuse of an unshared PMDs page. This is not true in the case of move_hugetlb_page_tables(). The last reference on the page table can therefore be dropped before the TLB flush took place. Prevent it by reordering the operations and flushing the TLB before releasing i_mmap_rwsem. Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-22hugetlbfs: flush TLBs correctly after huge_pmd_unshareNadav Amit
When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB flush is missing. This TLB flush must be performed before releasing the i_mmap_rwsem, in order to prevent an unshared PMDs page from being released and reused before the TLB flush took place. Arguably, a comprehensive solution would use mmu_gather interface to batch the TLB flushes and the PMDs page release, however it is not an easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2) deferring the release of the page reference for the PMDs page until after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into thinking PMDs are shared when they are not. Fix __unmap_hugepage_range() by adding the missing TLB flush, and forcing a flush when unshare is successful. Fixes: 24669e58477e ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6 Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20kmap_local: don't assume kmap PTEs are linear arrays in memoryArd Biesheuvel
The kmap_local conversion broke the ARM architecture, because the new code assumes that all PTEs used for creating kmaps form a linear array in memory, and uses array indexing to look up the kmap PTE belonging to a certain kmap index. On ARM, this cannot work, not only because the PTE pages may be non-adjacent in memory, but also because ARM/!LPAE interleaves hardware entries and extended entries (carrying software-only bits) in a way that is not compatible with array indexing. Fortunately, this only seems to affect configurations with more than 8 CPUs, due to the way the per-CPU kmap slots are organized in memory. Work around this by permitting an architecture to set a Kconfig symbol that signifies that the kmap PTEs do not form a lineary array in memory, and so the only way to locate the appropriate one is to walk the page tables. Link: https://lore.kernel.org/linux-arm-kernel/20211026131249.3731275-1-ardb@kernel.org/ Link: https://lkml.kernel.org/r/20211116094737.7391-1-ardb@kernel.org Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Quanyang Wang <quanyang.wang@windriver.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm/damon/dbgfs: fix missed use of damon_dbgfs_lockSeongJae Park
DAMON debugfs is supposed to protect dbgfs_ctxs, dbgfs_nr_ctxs, and dbgfs_dirs using damon_dbgfs_lock. However, some of the code is accessing the variables without the protection. This fixes it by protecting all such accesses. Link: https://lkml.kernel.org/r/20211110145758.16558-3-sj@kernel.org Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocationSeongJae Park
Patch series "DAMON fixes". This patch (of 2): DAMON users can trigger below warning in '__alloc_pages()' by invoking write() to some DAMON debugfs files with arbitrarily high count argument, because DAMON debugfs interface allocates some buffers based on the user-specified 'count'. if (unlikely(order >= MAX_ORDER)) { WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); return NULL; } Because the DAMON debugfs interface code checks failure of the 'kmalloc()', this commit simply suppresses the warnings by adding '__GFP_NOWARN' flag. Link: https://lkml.kernel.org/r/20211110145758.16558-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211110145758.16558-2-sj@kernel.org Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hugetlb, userfaultfd: fix reservation restore on userfaultfd errorMina Almasry
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we bail out using "goto out_release_unlock;" in the cases where idx >= size, or !huge_pte_none(), the code will detect that new_pagecache_page == false, and so call restore_reserve_on_error(). In this case I see restore_reserve_on_error() delete the reservation, and the following call to remove_inode_hugepages() will increment h->resv_hugepages causing a 100% reproducible leak. We should treat the is_continue case similar to adding a page into the pagecache and set new_pagecache_page to true, to indicate that there is no reservation to restore on the error path, and we need not call restore_reserve_on_error(). Rename new_pagecache_page to page_in_pagecache to make that clear. Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error") Signed-off-by: Mina Almasry <almasrymina@google.com> Reported-by: James Houghton <jthoughton@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hugetlb: fix hugetlb cgroup refcounting during mremapBui Quang Minh
When hugetlb_vm_op_open() is called during copy_vma(), we may take the reference to resv_map->css. Later, when clearing the reservation pointer of old_vma after transferring it to new_vma, we forget to drop the reference to resv_map->css. This leads to a reference leak of css. Fixes this by adding a check to drop reservation css reference in clear_vma_resv_huge_pages() Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com Fixes: 550a7d60bd5e35 ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flagRustam Kovhaev
When kmemleak is enabled for SLOB, system does not boot and does not print anything to the console. At the very early stage in the boot process we hit infinite recursion from kmemleak_init() and eventually kernel crashes. kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not valid for SLOB. Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com Fixes: d8843922fba4 ("slab: Ignore internal flags in cache creation") Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Glauber Costa <glommer@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm: emit the "free" trace report before freeing memory in kmem_cache_free()Yunfeng Ye
After the memory is freed, it can be immediately allocated by other CPUs, before the "free" trace report has been emitted. This causes inaccurate traces. For example, if the following sequence of events occurs: CPU 0 CPU 1 (1) alloc xxxxxx (2) free xxxxxx (3) alloc xxxxxx (4) free xxxxxx Then they will be inaccurately reported via tracing, so that they appear to have happened in this order: CPU 0 CPU 1 (1) alloc xxxxxx (2) alloc xxxxxx (3) free xxxxxx (4) free xxxxxx This makes it look like CPU 1 somehow managed to allocate memory that CPU 0 still had allocated for itself. In order to avoid this, emit the "free xxxxxx" tracing report just before the actual call to free the memory, instead of just after it. Link: https://lkml.kernel.org/r/374eb75d-7404-8721-4e1e-65b0e5b17279@huawei.com Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm/swap.c:put_pages_list(): reinitialise the page listMatthew Wilcox
While free_unref_page_list() puts pages onto the CPU local LRU list, it does not remove them from the list they were passed in on. That makes the list_head appear to be non-empty, and would lead to various corruption problems if we didn't have an assertion that the list was empty. Reinitialise the list after calling free_unref_page_list() to avoid this problem. Link: https://lkml.kernel.org/r/YYp40A2lNrxaZji8@casper.infradead.org Fixes: 988c69f1bc23 ("mm: optimise put_pages_list()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Steve French <stfrench@microsoft.com> Reported-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Steve French <stfrench@microsoft.com> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Hyeoncheol Lee <hyc.lee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>