From b191d6491be67cef2b3fa83015561caca1394ab9 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Wed, 17 Jul 2019 13:21:00 -0400 Subject: pidfd: fix a poll race when setting exit_state There is a race between reading task->exit_state in pidfd_poll and writing it after do_notify_parent calls do_notify_pidfd. Expected sequence of events is: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd tsk->exit_state = EXIT_DEAD pidfd_poll if (tsk->exit_state) However nothing prevents the following sequence: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd pidfd_poll if (tsk->exit_state) tsk->exit_state = EXIT_DEAD This causes a polling task to wait forever, since poll blocks because exit_state is 0 and the waiting task is not notified again. A stress test continuously doing pidfd poll and process exits uncovered this bug. To fix it, we make sure that the task's exit_state is always set before calling do_notify_pidfd. Fixes: b53b0b9d9a6 ("pidfd: add polling support") Cc: kernel-team@android.com Cc: Oleg Nesterov Signed-off-by: Suren Baghdasaryan Signed-off-by: Joel Fernandes (Google) Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org [christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie] Signed-off-by: Christian Brauner --- kernel/exit.c | 1 + 1 file changed, 1 insertion(+) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index a75b6a7f458a..4436158a6d30 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -720,6 +720,7 @@ static void exit_notify(struct task_struct *tsk, int group_dead) if (group_dead) kill_orphaned_pgrp(tsk->group_leader, NULL); + tsk->exit_state = EXIT_ZOMBIE; if (unlikely(tsk->ptrace)) { int sig = thread_group_leader(tsk) && thread_group_empty(tsk) && -- cgit v1.2.3 From 30b692d3b390c6fe78a5064be0c4bbd44a41be59 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 29 Jul 2019 17:48:24 +0200 Subject: exit: make setting exit_state consistent Since commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state") we unconditionally set exit_state to EXIT_ZOMBIE before calling into do_notify_parent(). This was done to eliminate a race when querying exit_state in do_notify_pidfd(). Back then we decided to do the absolute minimal thing to fix this and not touch the rest of the exit_notify() function where exit_state is set. Since this fix has not caused any issues change the setting of exit_state to EXIT_DEAD in the autoreap case to account for the fact hat exit_state is set to EXIT_ZOMBIE unconditionally. This fix was planned but also explicitly requested in [1] and makes the whole code more consistent. /* References */ [1]: https://lore.kernel.org/lkml/CAHk-=wigcxGFR2szue4wavJtH5cYTTeNES=toUBVGsmX0rzX+g@mail.gmail.com Signed-off-by: Christian Brauner Acked-by: Oleg Nesterov Cc: Linus Torvalds --- kernel/exit.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 4436158a6d30..5b4a5dcce8f8 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) autoreap = true; } - tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; - if (tsk->exit_state == EXIT_DEAD) + if (autoreap) { + tsk->exit_state = EXIT_DEAD; list_add(&tsk->ptrace_entry, &dead); + } /* mt-exec, de_thread() is waiting for group leader */ if (unlikely(tsk->signal->notify_count < 0)) -- cgit v1.2.3