From 48c935ad88f5be20eb5445a77c171351b1eb5111 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 15 Jan 2016 16:51:24 -0800 Subject: page-flags: define PG_locked behavior on compound pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit lock_page() must operate on the whole compound page. It doesn't make much sense to lock part of compound page. Change code to use head page's PG_locked, if tail page is passed. This patch also gets rid of custom helper functions -- __set_page_locked() and __clear_page_locked(). They are replaced with helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these helper would trigger VM_BUG_ON(). SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never appear there. VM_BUG_ON() is added to make sure that this assumption is correct. [akpm@linux-foundation.org: fix fs/cifs/file.c] Signed-off-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Dave Hansen Cc: Mel Gorman Cc: Rik van Riel Cc: Vlastimil Babka Cc: Christoph Lameter Cc: Naoya Horiguchi Cc: Steve Capper Cc: "Aneesh Kumar K.V" Cc: Johannes Weiner Cc: Michal Hocko Cc: Jerome Marchand Cc: Jérôme Glisse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 8424b64711ac..5b965e27aaae 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1166,7 +1166,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags) /* * We ignore non-LRU pages for good reasons. * - PG_locked is only well defined for LRU pages and a few others - * - to avoid races with __set_page_locked() + * - to avoid races with __SetPageLocked() * - to avoid races with __SetPageSlab*() (and more non-atomic ops) * The check (unnecessarily) ignores LRU pages being isolated and * walked by the page reclaim code, however that's not a big loss. -- cgit v1.2.3 From 4d2fa965483f4c39bd097ff9bbf3efe62d4cf367 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 15 Jan 2016 16:54:00 -0800 Subject: thp, mm: split_huge_page(): caller need to lock page We're going to use migration entries instead of compound_lock() to stabilize page refcounts. Setup and remove migration entries require page to be locked. Some of split_huge_page() callers already have the page locked. Let's require everybody to lock the page before calling split_huge_page(). Signed-off-by: Kirill A. Shutemov Tested-by: Sasha Levin Tested-by: Aneesh Kumar K.V Acked-by: Vlastimil Babka Acked-by: Jerome Marchand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Dave Hansen Cc: Mel Gorman Cc: Rik van Riel Cc: Naoya Horiguchi Cc: Steve Capper Cc: Johannes Weiner Cc: Michal Hocko Cc: Christoph Lameter Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 8 +++++++- mm/migrate.c | 8 ++++++-- 2 files changed, 13 insertions(+), 3 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5b965e27aaae..a2c987df80eb 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1149,7 +1149,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags) } if (!PageHuge(p) && PageTransHuge(hpage)) { + lock_page(hpage); if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) { + unlock_page(hpage); if (!PageAnon(hpage)) pr_err("MCE: %#lx: non anonymous thp\n", pfn); else @@ -1159,6 +1161,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags) put_hwpoison_page(p); return -EBUSY; } + unlock_page(hpage); VM_BUG_ON_PAGE(!page_count(p), p); hpage = compound_head(p); } @@ -1751,7 +1754,10 @@ int soft_offline_page(struct page *page, int flags) return -EBUSY; } if (!PageHuge(page) && PageTransHuge(hpage)) { - if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) { + lock_page(page); + ret = split_huge_page(hpage); + unlock_page(page); + if (unlikely(ret)) { pr_info("soft offline: %#lx: failed to split THP\n", pfn); if (flags & MF_COUNT_INCREASED) diff --git a/mm/migrate.c b/mm/migrate.c index 91545da23fd1..dec81a9e2fd6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -943,9 +943,13 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, goto out; } - if (unlikely(PageTransHuge(page))) - if (unlikely(split_huge_page(page))) + if (unlikely(PageTransHuge(page))) { + lock_page(page); + rc = split_huge_page(page); + unlock_page(page); + if (rc) goto out; + } rc = __unmap_and_move(page, newpage, force, mode); if (rc == MIGRATEPAGE_SUCCESS) -- cgit v1.2.3 From d96b339f453997f2f08c52da3f41423be48c978f Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Fri, 15 Jan 2016 16:54:03 -0800 Subject: mm: soft-offline: check return value in second __get_any_page() call I saw the following BUG_ON triggered in a testcase where a process calls madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that calls migratepages command repeatedly (doing ping-pong among different NUMA nodes) for the first process: Soft offlining page 0x60000 at 0x700000600000 __get_any_page: 0x60000 free buddy page page:ffffea0001800000 count:0 mapcount:-127 mapping: (null) index:0x1 flags: 0x1fffc0000000000() page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) ------------[ cut here ]------------ kernel BUG at /src/linux-dev/include/linux/mm.h:342! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G O 4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000 RIP: 0010:[] [] put_page+0x5c/0x60 RSP: 0018:ffff88007c213e00 EFLAGS: 00010246 Call Trace: put_hwpoison_page+0x4e/0x80 soft_offline_page+0x501/0x520 SyS_madvise+0x6bc/0x6f0 entry_SYSCALL_64_fastpath+0x12/0x6a Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47 RIP [] put_page+0x5c/0x60 RSP The root cause resides in get_any_page() which retries to get a refcount of the page to be soft-offlined. This function calls put_hwpoison_page(), expecting that the target page is putback to LRU list. But it can be also freed to buddy. So the second check need to care about such case. Fixes: af8fae7c0886 ("mm/memory-failure.c: clean up soft_offline_page()") Signed-off-by: Naoya Horiguchi Cc: Sasha Levin Cc: Aneesh Kumar K.V Cc: Vlastimil Babka Cc: Jerome Marchand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Dave Hansen Cc: Mel Gorman Cc: Rik van Riel Cc: Steve Capper Cc: Johannes Weiner Cc: Michal Hocko Cc: Christoph Lameter Cc: David Rientjes Cc: [3.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a2c987df80eb..05e079bf9425 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1575,7 +1575,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) * Did it turn free? */ ret = __get_any_page(page, pfn, 0); - if (!PageLRU(page)) { + if (ret == 1 && !PageLRU(page)) { /* Drop page reference which is from __get_any_page() */ put_hwpoison_page(page); pr_info("soft_offline: %#lx: unknown non LRU page type %lx\n", -- cgit v1.2.3 From 4e41a30c6d506c884d3da9aeb316352e70679d4b Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Fri, 15 Jan 2016 16:54:07 -0800 Subject: mm: hwpoison: adjust for new thp refcounting Some mm-related BUG_ON()s could trigger from hwpoison code due to recent changes in thp refcounting rule. This patch fixes them up. In the new refcounting, we no longer use tail->_mapcount to keep tail's refcount, and thereby we can simplify get/put_hwpoison_page(). And another change is that tail's refcount is not transferred to the raw page during thp split (more precisely, in new rule we don't take refcount on tail page any more.) So when we need thp split, we have to transfer the refcount properly to the 4kB soft-offlined page before migration. thp split code goes into core code only when precheck (total_mapcount(head) == page_count(head) - 1) passes to avoid useless split, where we assume that one refcount is held by the caller of thp split and the others are taken via mapping. To meet this assumption, this patch moves thp split part in soft_offline_page() after get_any_page(). [akpm@linux-foundation.org: remove unneeded #define, per Kirill] Signed-off-by: Naoya Horiguchi Acked-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 +- mm/memory-failure.c | 73 +++++++++++++++-------------------------------------- 2 files changed, 22 insertions(+), 53 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/include/linux/mm.h b/include/linux/mm.h index 6b56cfd9fd09..e4397f640e86 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2217,7 +2217,7 @@ extern int memory_failure(unsigned long pfn, int trapno, int flags); extern void memory_failure_queue(unsigned long pfn, int trapno, int flags); extern int unpoison_memory(unsigned long pfn); extern int get_hwpoison_page(struct page *page); -extern void put_hwpoison_page(struct page *page); +#define put_hwpoison_page(page) put_page(page) extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; extern void shake_page(struct page *p, int access); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 05e079bf9425..1b99403d76f2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -882,15 +882,7 @@ int get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); - if (PageHuge(head)) - return get_page_unless_zero(head); - - /* - * Thp tail page has special refcounting rule (refcount of tail pages - * is stored in ->_mapcount,) so we can't call get_page_unless_zero() - * directly for tail pages. - */ - if (PageTransHuge(head)) { + if (!PageHuge(head) && PageTransHuge(head)) { /* * Non anonymous thp exists only in allocation/free time. We * can't handle such a case correctly, so let's give it up. @@ -902,41 +894,12 @@ int get_hwpoison_page(struct page *page) page_to_pfn(page)); return 0; } - - if (get_page_unless_zero(head)) { - if (PageTail(page)) - get_page(page); - return 1; - } else { - return 0; - } } - return get_page_unless_zero(page); + return get_page_unless_zero(head); } EXPORT_SYMBOL_GPL(get_hwpoison_page); -/** - * put_hwpoison_page() - Put refcount for memory error handling: - * @page: raw error page (hit by memory error) - */ -void put_hwpoison_page(struct page *page) -{ - struct page *head = compound_head(page); - - if (PageHuge(head)) { - put_page(head); - return; - } - - if (PageTransHuge(head)) - if (page != head) - put_page(head); - - put_page(page); -} -EXPORT_SYMBOL_GPL(put_hwpoison_page); - /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty. @@ -1162,6 +1125,8 @@ int memory_failure(unsigned long pfn, int trapno, int flags) return -EBUSY; } unlock_page(hpage); + get_hwpoison_page(p); + put_hwpoison_page(hpage); VM_BUG_ON_PAGE(!page_count(p), p); hpage = compound_head(p); } @@ -1753,24 +1718,28 @@ int soft_offline_page(struct page *page, int flags) put_hwpoison_page(page); return -EBUSY; } - if (!PageHuge(page) && PageTransHuge(hpage)) { - lock_page(page); - ret = split_huge_page(hpage); - unlock_page(page); - if (unlikely(ret)) { - pr_info("soft offline: %#lx: failed to split THP\n", - pfn); - if (flags & MF_COUNT_INCREASED) - put_hwpoison_page(page); - return -EBUSY; - } - } get_online_mems(); - ret = get_any_page(page, pfn, flags); put_online_mems(); + if (ret > 0) { /* for in-use pages */ + if (!PageHuge(page) && PageTransHuge(hpage)) { + lock_page(hpage); + ret = split_huge_page(hpage); + unlock_page(hpage); + if (unlikely(ret || PageTransCompound(page) || + !PageAnon(page))) { + pr_info("soft offline: %#lx: failed to split THP\n", + pfn); + if (flags & MF_COUNT_INCREASED) + put_hwpoison_page(hpage); + return -EBUSY; + } + get_hwpoison_page(page); + put_hwpoison_page(hpage); + } + if (PageHuge(page)) ret = soft_offline_huge_page(page, flags); else -- cgit v1.2.3 From acc14dc4bd484f1fc8c227dd9fc2a1e592312d2b Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Fri, 15 Jan 2016 16:57:43 -0800 Subject: mm: soft-offline: clean up soft_offline_page() soft_offline_page() has some deeply indented code, that's the sign of demand for cleanup. So let's do this. No functionality change. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 78 ++++++++++++++++++++++++++++++++--------------------- 1 file changed, 47 insertions(+), 31 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 1b99403d76f2..2015c9a06cae 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1684,6 +1684,49 @@ static int __soft_offline_page(struct page *page, int flags) return ret; } +static int soft_offline_in_use_page(struct page *page, int flags) +{ + int ret; + struct page *hpage = compound_head(page); + + if (!PageHuge(page) && PageTransHuge(hpage)) { + lock_page(hpage); + ret = split_huge_page(hpage); + unlock_page(hpage); + if (unlikely(ret || PageTransCompound(page) || + !PageAnon(page))) { + pr_info("soft offline: %#lx: failed to split THP\n", + page_to_pfn(page)); + if (flags & MF_COUNT_INCREASED) + put_hwpoison_page(hpage); + return -EBUSY; + } + get_hwpoison_page(page); + put_hwpoison_page(hpage); + } + + if (PageHuge(page)) + ret = soft_offline_huge_page(page, flags); + else + ret = __soft_offline_page(page, flags); + + return ret; +} + +static void soft_offline_free_page(struct page *page) +{ + if (PageHuge(page)) { + struct page *hpage = compound_head(page); + + set_page_hwpoison_huge_page(hpage); + if (!dequeue_hwpoisoned_huge_page(hpage)) + num_poisoned_pages_add(1 << compound_order(hpage)); + } else { + if (!TestSetPageHWPoison(page)) + num_poisoned_pages_inc(); + } +} + /** * soft_offline_page - Soft offline a page. * @page: page to offline @@ -1710,7 +1753,6 @@ int soft_offline_page(struct page *page, int flags) { int ret; unsigned long pfn = page_to_pfn(page); - struct page *hpage = compound_head(page); if (PageHWPoison(page)) { pr_info("soft offline: %#lx page already poisoned\n", pfn); @@ -1723,36 +1765,10 @@ int soft_offline_page(struct page *page, int flags) ret = get_any_page(page, pfn, flags); put_online_mems(); - if (ret > 0) { /* for in-use pages */ - if (!PageHuge(page) && PageTransHuge(hpage)) { - lock_page(hpage); - ret = split_huge_page(hpage); - unlock_page(hpage); - if (unlikely(ret || PageTransCompound(page) || - !PageAnon(page))) { - pr_info("soft offline: %#lx: failed to split THP\n", - pfn); - if (flags & MF_COUNT_INCREASED) - put_hwpoison_page(hpage); - return -EBUSY; - } - get_hwpoison_page(page); - put_hwpoison_page(hpage); - } + if (ret > 0) + ret = soft_offline_in_use_page(page, flags); + else if (ret == 0) + soft_offline_free_page(page); - if (PageHuge(page)) - ret = soft_offline_huge_page(page, flags); - else - ret = __soft_offline_page(page, flags); - } else if (ret == 0) { /* for free pages */ - if (PageHuge(page)) { - set_page_hwpoison_huge_page(hpage); - if (!dequeue_hwpoisoned_huge_page(hpage)) - num_poisoned_pages_add(1 << compound_order(hpage)); - } else { - if (!TestSetPageHWPoison(page)) - num_poisoned_pages_inc(); - } - } return ret; } -- cgit v1.2.3 From 98fd1ef4241ce0c700a174dd3e33c643b4774690 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Fri, 15 Jan 2016 16:57:46 -0800 Subject: mm: soft-offline: exit with failure for non anonymous thp Currently memory_failure() doesn't handle non anonymous thp case, because we can hardly expect the error handling to be successful, and it can just hit some corner case which results in BUG_ON or something severe like that. This is also the case for soft offline code, so let's make it in the same way. Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(), but it's unnecessary because get_any_page() is already called when running on this code, which takes a refcount of the target page regardress of the flag. So this patch also removes it. [akpm@linux-foundation.org: fix build] Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 2015c9a06cae..ac595e7a3a95 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1691,16 +1691,16 @@ static int soft_offline_in_use_page(struct page *page, int flags) if (!PageHuge(page) && PageTransHuge(hpage)) { lock_page(hpage); - ret = split_huge_page(hpage); - unlock_page(hpage); - if (unlikely(ret || PageTransCompound(page) || - !PageAnon(page))) { - pr_info("soft offline: %#lx: failed to split THP\n", - page_to_pfn(page)); - if (flags & MF_COUNT_INCREASED) - put_hwpoison_page(hpage); + if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) { + unlock_page(hpage); + if (!PageAnon(hpage)) + pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page)); + else + pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page)); + put_hwpoison_page(hpage); return -EBUSY; } + unlock_page(hpage); get_hwpoison_page(page); put_hwpoison_page(hpage); } -- cgit v1.2.3