thalamus/supervisor: reap channel daemons via SIGCHLD instead of SIG_IGN

SIGCHLD=SIG_IGN at main() was auto-reaping all children in the kernel,
which broke tokio::process::Command::wait() — every tool that spawned a
subprocess (bash, mcp clients) was getting ECHILD because tokio couldn't
waitpid() on a child the kernel had already reaped.

Replace with a SIGCHLD signal handler task that reaps only PIDs listed in
channels_dir() (via waitpid(pid, WNOHANG) — ECHILD on non-child is a
harmless no-op). Tokio-spawned children aren't in PID files, so tokio's
own per-child wait paths are untouched.

Co-Authored-By: Proof of Concept <poc@bcachefs.org>
This commit is contained in:
Kent Overstreet 2026-04-23 23:53:11 -04:00
parent d95f3e9445
commit 0e459aae92
2 changed files with 49 additions and 2 deletions

View file

@ -756,8 +756,10 @@ fn restore_stderr(original_fd: std::os::fd::RawFd) {
#[tokio::main]
pub async fn main() {
// Auto-reap child processes (channel daemons outlive the supervisor)
unsafe { libc::signal(libc::SIGCHLD, libc::SIG_IGN); }
// Reap channel-daemon zombies via a SIGCHLD handler that only touches
// PIDs listed in channels_dir(). Avoids SIGCHLD=SIG_IGN, which would
// break tokio::process::Command::wait() (kernel auto-reap → ECHILD).
let _reaper = crate::thalamus::supervisor::start_zombie_reaper();
// Redirect stderr to pipe — logs to file and sends to channel for UI display
let stderr_capture = redirect_stderr_to_pipe();