From a09233f3e1b77dbf50851660533e008056553a2a Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 6 Aug 2015 15:46:58 -0700 Subject: mm/memory-failure: unlock_page before put_page Recently I addressed a few of hwpoison race problems and the patches are merged on v4.2-rc1. It made progress, but unfortunately some problems still remain due to less coverage of my testing. So I'm trying to fix or avoid them in this series. One point I'm expecting to discuss is that patch 4/5 changes the page flag set to be checked on free time. In current behavior, __PG_HWPOISON is not supposed to be set when the page is freed. I think that there is no strong reason for this behavior, and it causes a problem hard to fix only in error handler side (because __PG_HWPOISON could be set at arbitrary timing.) So I suggest to change it. With this patchset, hwpoison stress testing in official mce-test testsuite (which previously failed) passes. This patch (of 5): In "just unpoisoned" path, we do put_page and then unlock_page, which is a wrong order and causes "freeing locked page" bug. So let's fix it. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Dean Nelson Cc: Tony Luck Cc: "Kirill A. Shutemov" Cc: Hugh Dickins Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c53543d89282..04d677048af7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1209,9 +1209,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags) if (!PageHWPoison(p)) { printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn); atomic_long_sub(nr_pages, &num_poisoned_pages); + unlock_page(hpage); put_page(hpage); - res = 0; - goto out; + return 0; } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) -- cgit v1.2.3 From a209ef09af0dc921311d0cc4a1d4f926321d91b8 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 6 Aug 2015 15:47:01 -0700 Subject: mm/memory-failure: fix race in counting num_poisoned_pages When memory_failure() is called on a page which are just freed after page migration from soft offlining, the counter num_poisoned_pages is raised twi= ce. So let's fix it with using TestSetPageHWPoison. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Dean Nelson Cc: Tony Luck Cc: "Kirill A. Shutemov" Cc: Hugh Dickins Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 04d677048af7..f72d2fad0b90 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1671,8 +1671,8 @@ static int __soft_offline_page(struct page *page, int flags) if (ret > 0) ret = -EIO; } else { - SetPageHWPoison(page); - atomic_long_inc(&num_poisoned_pages); + if (!TestSetPageHWPoison(page)) + atomic_long_inc(&num_poisoned_pages); } } else { pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n", -- cgit v1.2.3 From 98ed2b0052e68420f1bad6c81e3f2600d25023e7 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 6 Aug 2015 15:47:04 -0700 Subject: mm/memory-failure: give up error handling for non-tail-refcounted thp "non anonymous thp" case is still racy with freeing thp, which causes panic due to put_page() for refcount-0 page. It seems that closing up this race might be hard (and/or not worth doing,) so let's give up the error handling for this case. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Dean Nelson Cc: Tony Luck Cc: "Kirill A. Shutemov" Cc: Hugh Dickins Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f72d2fad0b90..cd985530f102 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -909,6 +909,18 @@ int get_hwpoison_page(struct page *page) * directly for tail pages. */ if (PageTransHuge(head)) { + /* + * Non anonymous thp exists only in allocation/free time. We + * can't handle such a case correctly, so let's give it up. + * This should be better than triggering BUG_ON when kernel + * tries to touch the "partially handled" page. + */ + if (!PageAnon(head)) { + pr_err("MCE: %#lx: non anonymous thp\n", + page_to_pfn(page)); + return 0; + } + if (get_page_unless_zero(head)) { if (PageTail(page)) get_page(page); @@ -1134,15 +1146,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags) } if (!PageHuge(p) && PageTransHuge(hpage)) { - if (!PageAnon(hpage)) { - pr_err("MCE: %#lx: non anonymous thp\n", pfn); - if (TestClearPageHWPoison(p)) - atomic_long_sub(nr_pages, &num_poisoned_pages); - put_page(p); - if (p != hpage) - put_page(hpage); - return -EBUSY; - } if (unlikely(split_huge_page(hpage))) { pr_err("MCE: %#lx: thp split failed\n", pfn); if (TestClearPageHWPoison(p)) -- cgit v1.2.3 From 4491f7126063ef51081f5662bd4fcae31621a333 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 6 Aug 2015 15:47:11 -0700 Subject: mm/memory-failure: set PageHWPoison before migrate_pages() Now page freeing code doesn't consider PageHWPoison as a bad page, so by setting it before completing the page containment, we can prevent the error page from being reused just after successful page migration. I added TTU_IGNORE_HWPOISON for try_to_unmap() to make sure that the page table entry is transformed into migration entry, not to hwpoison entry. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Dean Nelson Cc: Tony Luck Cc: "Kirill A. Shutemov" Cc: Hugh Dickins Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 7 ++++--- mm/migrate.c | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index cd985530f102..ea5a93659488 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1659,6 +1659,8 @@ static int __soft_offline_page(struct page *page, int flags) inc_zone_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); list_add(&page->lru, &pagelist); + if (!TestSetPageHWPoison(page)) + atomic_long_inc(&num_poisoned_pages); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { @@ -1673,9 +1675,8 @@ static int __soft_offline_page(struct page *page, int flags) pfn, ret, page->flags); if (ret > 0) ret = -EIO; - } else { - if (!TestSetPageHWPoison(page)) - atomic_long_inc(&num_poisoned_pages); + if (TestClearPageHWPoison(page)) + atomic_long_dec(&num_poisoned_pages); } } else { pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n", diff --git a/mm/migrate.c b/mm/migrate.c index f2415be7d93b..eb4267107d1f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -880,7 +880,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage, /* Establish migration ptes or remove ptes */ if (page_mapped(page)) { try_to_unmap(page, - TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); + TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS| + TTU_IGNORE_HWPOISON); page_was_mapped = 1; } -- cgit v1.2.3 From 4f32be677b124a49459e2603321c7a5605ceb9f8 Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Fri, 14 Aug 2015 15:34:56 -0700 Subject: mm/hwpoison: fix page refcount of unknown non LRU page After trying to drain pages from pagevec/pageset, we try to get reference count of the page again, however, the reference count of the page is not reduced if the page is still not on LRU list. Fix it by adding the put_page() to drop the page reference which is from __get_any_page(). Signed-off-by: Wanpeng Li Acked-by: Naoya Horiguchi Cc: [3.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ea5a93659488..81c20a7c9fa7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1538,6 +1538,8 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags) */ ret = __get_any_page(page, pfn, 0); if (!PageLRU(page)) { + /* Drop page reference which is from __get_any_page() */ + put_page(page); pr_info("soft_offline: %#lx: unknown non LRU page type %lx\n", pfn, page->flags); return -EIO; -- cgit v1.2.3 From 036138080a4376e5f3e5d0cca8ac99084c5cf06e Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Fri, 14 Aug 2015 15:34:59 -0700 Subject: mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held Hugetlbfs pages will get a refcount in get_any_page() or madvise_hwpoison() if soft offlining through madvise. The refcount which is held by the soft offline path should be released if we fail to isolate hugetlbfs pages. Fix it by reducing the refcount for both isolation success and failure. Signed-off-by: Wanpeng Li Acked-by: Naoya Horiguchi Cc: [3.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 81c20a7c9fa7..dba52ee31bd4 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1569,13 +1569,12 @@ static int soft_offline_huge_page(struct page *page, int flags) unlock_page(hpage); ret = isolate_huge_page(hpage, &pagelist); - if (ret) { - /* - * get_any_page() and isolate_huge_page() takes a refcount each, - * so need to drop one here. - */ - put_page(hpage); - } else { + /* + * get_any_page() and isolate_huge_page() takes a refcount each, + * so need to drop one here. + */ + put_page(hpage); + if (!ret) { pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn); return -EBUSY; } -- cgit v1.2.3 From 7f6bf39bbdd1dcccd103ba7dce8496a8e72e7df4 Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Fri, 14 Aug 2015 15:35:08 -0700 Subject: mm/hwpoison: fix panic due to split huge zero page Bug: ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:1957! invalid opcode: 0000 [#1] SMP Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27 Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015 task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000 RIP: split_huge_page_to_list+0xdb/0x120 Call Trace: memory_failure+0x32e/0x7c0 madvise_hwpoison+0x8b/0x160 SyS_madvise+0x40/0x240 ? do_page_fault+0x37/0x90 entry_SYSCALL_64_fastpath+0x12/0x71 Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0 RIP split_huge_page_to_list+0xdb/0x120 RSP ---[ end trace aee7ce0df8e44076 ]--- Testcase: #define _GNU_SOURCE #include #include #include #include #include #include #include #include #define MB 1024*1024 int main(void) { char *mem; posix_memalign((void **)&mem, 2 * MB, 200 * MB); madvise(mem, 200 * MB, MADV_HWPOISON); free(mem); return 0; } Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag. The get_user_pages_fast() which called in madvise_hwpoison() will get huge zero page if the page is not allocated before. Huge zero page is a tranparent huge page, however, it is not an anonymous page. memory_failure will split the huge zero page and trigger BUG_ON(is_huge_zero_page(page)); After commit 98ed2b0052e6 ("mm/memory-failure: give up error handling for non-tail-refcounted thp"), memory_failure will not catch non anon thp from madvise_hwpoison path and this bug occur. Fix it by catching non anon thp in memory_failure in order to not split huge zero page in madvise_hwpoison path. After this patch: Injecting memory failure for page 0x202800 at 0x7fd8ae800000 MCE: 0x202800: non anonymous thp [...] [akpm@linux-foundation.org: remove second split, per Wanpeng] Signed-off-by: Wanpeng Li Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'mm/memory-failure.c') diff --git a/mm/memory-failure.c b/mm/memory-failure.c index dba52ee31bd4..1f4446a90cef 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1146,8 +1146,11 @@ int memory_failure(unsigned long pfn, int trapno, int flags) } if (!PageHuge(p) && PageTransHuge(hpage)) { - if (unlikely(split_huge_page(hpage))) { - pr_err("MCE: %#lx: thp split failed\n", pfn); + if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) { + if (!PageAnon(hpage)) + pr_err("MCE: %#lx: non anonymous thp\n", pfn); + else + pr_err("MCE: %#lx: thp split failed\n", pfn); if (TestClearPageHWPoison(p)) atomic_long_sub(nr_pages, &num_poisoned_pages); put_page(p); -- cgit v1.2.3