The daemon was getting stuck when a claude subprocess hung — no
completion logged, job blocked forever, pending queue growing.
Use spawn() + watchdog thread instead of blocking output(). The
watchdog sleeps in 1s increments checking a cancel flag, sends
SIGTERM at 5 minutes, SIGKILL after 5s grace. Cancel flag ensures
the watchdog exits promptly when the child finishes normally.
- Refactor split from serial batch to independent per-node tasks
(run-agent split N spawns N parallel tasks, gated by llm_concurrency)
- Replace cosine similarity edge inheritance with agent-assigned
neighbors in the plan JSON — the LLM already understands the
semantic relationships, no need to approximate with bag-of-words
- Add --strict-mcp-config to claude CLI calls to skip MCP server
startup (saves ~5s per call)
- Remove hardcoded 2000-char split threshold — let the agent decide
what's worth splitting
- Reload store before mutations to handle concurrent split races