bail-no-competing.sh used to bail if any other live agent existed in
the state dir, period. That was too coarse: surface-observe agents run
a multi-step pipeline (surface → organize-search → organize-new →
observe), and the intent is to let a new surface-phase agent start
while an older one finishes its post-surface tail. With the old check
the newer agent always bailed, so surface-observe was effectively
serialized at the slowest cycle time.
Make the script phase-aware:
- oneshot.rs now passes the current phase as argv[2] alongside the pid
file name. The script writes that phase into its own pid file on
every step transition, so concurrent agents can read each other's
phase just by cat'ing the pid files.
- Bail only when another live agent is in the same phase-group as us.
Groups: "surface" vs. "everything else" (post-surface). At most one
agent per group alive at a time — surface runs at a higher cadence
than the organize/observe tail.
- Still clean up stale pid files for dead processes.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>