feat(spawn): readiness handshake by default (status=ready / --no-wait / --ready-timeout)#113
Merged
Conversation
fujibee
added a commit
that referenced
this pull request
Jun 14, 2026
Addresses aggie-co's review of #113: - Ready sentinel now records the owner session_id, and watch.sh cleanup only removes it when the content is still ours. This keeps "sentinel present iff a live watcher is receiving" honest across a quick actas restart, where the old watcher's EXIT could otherwise delete a successor's freshly-written sentinel. - session-start.sh now garbage-collects stale stream watermarks (#107) and readiness sentinels (#108) whose owner session_id is no longer alive — the SIGKILL / terminal-crash case that bypasses the EXIT trap. Both files are advisory (a live watcher rewrites them; spawn clears the sentinel before use), so this is hygiene. Tests: ready sentinel records the owner; cleanup leaves a successor-owned sentinel; session-start GCs stale watermark/ready but keeps live ones. Full suite green (217).
spawn returned as soon as the terminal launched, but the new agent wasn't receiving yet — it still had to boot the CLI and run actas before its watcher attached. A job sent in that window was lost. The only readiness signal was scraping the pane. spawn now BLOCKS by default until the new agent is actually listening: - watch.sh, when it attaches in exclusive (actas) mode, touches a readiness sentinel (run/ready.<team>__<name>) once its subscription and watermark are set — i.e. once it will deliver anything that arrives from here on — and removes it on exit. So the sentinel is present iff a live watcher is receiving for that role. (agmsg_ready_path added to lib/actas-lock.sh; same encoding as the lock path, so spawn and watch agree with no env plumbing.) - spawn clears the sentinel, launches, then polls for it and returns `status=ready`. `--ready-timeout <secs>` (default 90) bounds the wait; on timeout it prints `status=timeout` and exits 3 so the caller can re-spawn. `--no-wait` opts out. Codex skips the wait (no Monitor). A spawned agent always starts its watcher via actas, so no boot-prompt or cmd-template change is needed — readiness is just the watcher attaching. Complements #107: that stops in-gap loss for restarts generally; this makes spawn readiness a positive signal instead of a guess. Tests: handshake ready/timeout/--no-wait/codex-skip; watch.sh creates and removes the sentinel in actas mode and not in broad mode. Existing launch tests pass --no-wait (no real watcher in the stub env). Full suite green.
Addresses aggie-co's review of #113: - Ready sentinel now records the owner session_id, and watch.sh cleanup only removes it when the content is still ours. This keeps "sentinel present iff a live watcher is receiving" honest across a quick actas restart, where the old watcher's EXIT could otherwise delete a successor's freshly-written sentinel. - session-start.sh now garbage-collects stale stream watermarks (#107) and readiness sentinels (#108) whose owner session_id is no longer alive — the SIGKILL / terminal-crash case that bypasses the EXIT trap. Both files are advisory (a live watcher rewrites them; spawn clears the sentinel before use), so this is hygiene. Tests: ready sentinel records the owner; cleanup leaves a successor-owned sentinel; session-start GCs stale watermark/ready but keeps live ones. Full suite green (217).
86b2f29 to
03b1a18
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #108.
Problem
spawnreturned as soon as the terminal/pane launched, but the new agent wasn't receiving yet — it still had to boot the CLI and runactasbefore its watcher (watch.sh) attached. A job the leader sent in that window was lost, and the only way to tell an agent was ready was to scrape its pane. (Reported: 6 of 9 jobs dropped in a spawned-worker crew.)Fix — readiness handshake, on by default
watch.shin exclusive (actas) mode touches a readiness sentinel (run/ready.<team>__<name>) once its subscription and watermark are set — i.e. the moment it will deliver anything that arrives from here on — and removes it on exit. The sentinel is present iff a live watcher is receiving for that role. (agmsg_ready_pathadded tolib/actas-lock.sh, same encoding as the lock path, sospawnandwatchagree with no env plumbing.)spawnclears the sentinel, launches, then polls and returnsstatus=ready.--ready-timeout <secs>(default 90) bounds the wait; on timeout it printsstatus=timeoutand exits 3 so the caller can re-spawn.--no-waitopts out. Codex skips the wait (no Monitor).A spawned agent always starts its watcher via
actas, so there is no boot-prompt or cmd-template change — readiness is simply "the watcher attached". This complements #107: #107 stops in-gap loss for restarts generally; this makes spawn readiness a positive signal instead of a race.Tests
status=readywhen the watcher attaches;status=timeout/exit 3 when nothing attaches;--no-waitreturns immediately; codex skips the wait.watch.sh: creates and removes the sentinel in actas mode; does not create one in broad mode.--no-wait(the stub env has no real watcher).Full suite green (214).