feat(despawn): tear down spawned crew members (#109)#129
Merged
Conversation
Add despawn.sh, the inverse of spawn.sh: - graceful: send ctrl:despawn to the member; its watcher drops its own role (releasing the actas lock) and closes its own tmux pane. Block until the lock is released, up to --timeout. - --force: tear the member down from the placement recorded at spawn time — kill its tmux pane/window and drop its registration — for when the member's watcher can't respond. spawn.sh records the member's tmux target id + project + type via the new agmsg_spawn_path helper so --force has a placement to act on. watch.sh: only an EXCLUSIVE (actas) watcher dedicated to exactly the despawned role tears itself down on ctrl:despawn. A broad-subscription watcher must skip it — its $TMUX_PANE is the leader's own pane, so acting on a despawn for any subscribed role would kill the leader's session. tests: unset TMUX_PANE when launching the watcher under test so the ctrl:despawn handler never targets the developer's real pane; add a regression test asserting a broad watcher ignores ctrl:despawn.
) despawn.sh existed but was only reachable by calling the script directly. Expose it as a first-class command: - templates/cmd.claude-code.md, cmd.codex.md: add a 'despawn' handler that maps '/<cmd> despawn <name> [--force] [--timeout N]' to despawn.sh, plus the help line in the join summary. - README.md: add a 'Tear down a spawned agent (despawn)' section next to spawn, and a cheatsheet line. - SKILL.md: document the despawn invocation alongside spawn.
This was referenced Jun 17, 2026
Merged
fujibee
added a commit
that referenced
this pull request
Jun 18, 2026
Build the full Codex pseudo-monitor on top of the re-arm fix, ported clean onto current main (despawn + instance-id already landed). New scripts: - codex-shim.sh: PATH-front entrypoint that routes interactive codex / codex resume launches in monitor-mode projects through codex-monitor.sh; everything else (--version, exec, non-monitor projects) passes through to real Codex. - codex-monitor.sh: starts/reuses the shared app-server, launches the out-of-sandbox bridge launcher, then execs the TUI on the same unix socket. - codex-bridge-launcher.sh: polls the request file and starts codex-bridge.js outside the Codex sandbox (a hook-launched bridge cannot connect to the socket from inside the sandbox). - codex-shim-install.sh: installs/removes the ~/.agents/bin/codex shim. Wiring grafted onto main's versions (preserving #129/#132/#128): - session-start.sh: codex branch writes a request file (launcher mode) or nohup-launches the bridge. Resolves the thread id via CODEX_THREAD_ID, then falls back to the newest rollout whose session_meta cwd matches the project — fresh / "codex exec" sessions never export CODEX_THREAD_ID, so without this the bridge only auto-launched on the interactive resume path (launch-gap #41). - delivery.sh: "set monitor" for codex installs the shim + prints guidance; rejects "both". - install.sh: configure_codex_sandbox adds db/teams/run to Codex sandbox_workspace_write.writable_roots. Tests: codex shim, session-start codex (incl. rollout fallback), delivery monitor/both codex, install writable_roots. Full suite 314 green.
fujibee
added a commit
that referenced
this pull request
Jun 18, 2026
… + fresh-session launch (#41) (#148) * feat(codex-bridge): re-arm watch-once without depending on turn/completed (#41) Vertical slice of the Codex app-server bridge, rebuilt clean on current main (despawn + instance-id already landed). Carries the working WebSocket-over-UDS transport and watch-once oracle, and fixes the delivery bug. Root cause: the bridge re-armed watch-once only from onTurnCompleted (turn/completed | turn/failed). The real Codex app-server does not reliably deliver those, so after the first wake the bridge started a turn and never re-armed — it handled one message then slept. The fake app-server in tests sends turn/completed explicitly, hiding it. Fix: route turn teardown through a single onTurnEnded() reachable from turn/completed, thread/status idle, OR a turn watchdog (--turn-timeout, default 60s). Detection re-arms when the turn ends, so a missing completion event no longer parks the bridge. The stale max_id guard is retained. Tests: two regressions where the fake never sends turn/completed — one driven by the watchdog, one by an idle status — asserting a second wakeup occurs. Full codex-bridge + watch-once suites green (12 + 6). * feat(codex-bridge): wire up the Codex monitor stack on main (#41) Build the full Codex pseudo-monitor on top of the re-arm fix, ported clean onto current main (despawn + instance-id already landed). New scripts: - codex-shim.sh: PATH-front entrypoint that routes interactive codex / codex resume launches in monitor-mode projects through codex-monitor.sh; everything else (--version, exec, non-monitor projects) passes through to real Codex. - codex-monitor.sh: starts/reuses the shared app-server, launches the out-of-sandbox bridge launcher, then execs the TUI on the same unix socket. - codex-bridge-launcher.sh: polls the request file and starts codex-bridge.js outside the Codex sandbox (a hook-launched bridge cannot connect to the socket from inside the sandbox). - codex-shim-install.sh: installs/removes the ~/.agents/bin/codex shim. Wiring grafted onto main's versions (preserving #129/#132/#128): - session-start.sh: codex branch writes a request file (launcher mode) or nohup-launches the bridge. Resolves the thread id via CODEX_THREAD_ID, then falls back to the newest rollout whose session_meta cwd matches the project — fresh / "codex exec" sessions never export CODEX_THREAD_ID, so without this the bridge only auto-launched on the interactive resume path (launch-gap #41). - delivery.sh: "set monitor" for codex installs the shim + prints guidance; rejects "both". - install.sh: configure_codex_sandbox adds db/teams/run to Codex sandbox_workspace_write.writable_roots. Tests: codex shim, session-start codex (incl. rollout fallback), delivery monitor/both codex, install writable_roots. Full suite 314 green. * docs(codex-bridge): document the Codex monitor beta (#41) - templates/cmd.codex.md: offer monitor as a delivery mode for codex (the app-server bridge beta), default to it on first join, and accept "mode monitor"; keep rejecting "both". - README.md: describe the Codex monitor beta in the Codex section and the intro, pointing at docs/codex-monitor-beta.md for setup and internals. * test(codex): sandbox HOME in setup_test_env so the suite never touches real state (#41) Running the bats suite was clobbering the developer's real shim: a couple of tests install the codex shim (delivery set monitor codex -> codex-shim-install writes $HOME/.agents/bin/codex) and configure $HOME/.codex/config.toml, but setup_test_env did not sandbox HOME. The shim ended up pointing at the test's temp scripts dir, which dangled once the temp dir was torn down. CI never caught it (clean containers have no real ~/.agents to break). Sandbox HOME under the per-test temp skill dir in setup_test_env. bats runs each test in its own subshell, so the export is scoped per test and needs no restore. test_install.bats keeps its own FAKE_HOME and is unaffected. * feat(codex): flag missing PATH / Node when enabling monitor mode (#41) The two silent first-run failures for the Codex monitor beta were a shim that isn't on PATH and a missing Node — in both cases the bridge just never starts. `delivery set monitor codex` now surfaces them at enable time: - If ~/.agents/bin is not on PATH, print a loud WARNING that the shim won't intercept `codex` and the bridge will not engage (this is the #1 cause of "I turned monitor on and nothing happens"). - If Node is missing, print a WARNING that the bridge needs Node. Presence only, no version gate: the bridge uses old/stable APIs (one optional-chaining call, Node 14+) and any Node new enough to run Codex runs it. AGMSG_CODEX_NODE overrides the checked binary (and makes the path testable). Tests for both warnings. * docs(codex-bridge): fix the flow diagram — hook writes a request file, launcher starts the bridge (#41) The diagram and prose claimed the SessionStart hook starts codex-bridge.js directly. That is the broken in-sandbox path (socket connect EPERM). Correct it: the hook resolves the thread (CODEX_THREAD_ID or newest matching rollout) and writes a request file; codex-monitor.sh's out-of-sandbox launcher reads it and starts the bridge, which connects over WebSocket-over-UDS. Add the re-arm loop and turn serialization. * feat(codex): emphasize the monitor beta and clean up on mode off (#41) Make the experimental nature and the PATH cost of the Codex monitor beta hard to miss, and make turning it off actually tear things down: - cmd.codex.md: move monitor to the LAST delivery option (3, "BETA, advanced"), default to turn, and spell out that it installs a PATH shim that intercepts the `codex` command — opt in only if you understand PATH precedence. - README + docs/codex-monitor-beta.md: prominent warning blocks (changes how Codex starts, ~/.agents/bin must be first on PATH, known rough edges #149/#150). - delivery.sh: `set off codex` now stops the project's bridge process(es) and removes their run files (the manual counterpart to the not-yet-wired auto teardown, #149), and tells the user the shim is global — how to remove it and restore PATH if no other project uses monitor. Leaves the shared app-server and shim in place. Test included. * feat(codex): tell users to restart Codex for monitor to take effect (#151) Enabling mode monitor only launches the bridge on the next Codex SessionStart, so an already-running session (or an off->on toggle) stays unmonitored until restart. delivery.sh, the cmd.codex.md prompt guidance, and docs now say so explicitly. The actual in-session (re)launch is tracked in #151. * docs(codex): clarify bridge starts on first turn, not at launch Codex fires the SessionStart hook on the session's first turn (first message), not the moment the TUI opens, so the monitor bridge is absent until the user interacts once after a restart. Update the delivery.sh enable message, the cmd.codex.md prompts, and the beta doc to say "restart and send your first message". Note the launcher/request-file redundancy review (#153) in the bridge mechanics. * feat(codex): lower monitor poll interval default from 5s to 2s The watch-once poll interval governs how fast an idle, armed bridge detects a new unread message. Drop the default from 5s to 2s for snappier delivery. Safe against double-injection: watch-once is one-shot (exits on first detection), re-arm only happens after the turn ends, and the stale-max_id guard backstops any re-detection — none of which the interval affects. Update both the bridge default and the watch-once.sh standalone fallback to keep them in sync. * fix(codex): deliver deferred wake on turn end; de-dup Codex writable_roots Two blocking issues from review: 1) codex-bridge.js: a wake observed while the resumed thread was still active (SessionStart fires on the first user turn) was deferred by tryStartTurn(), but onTurnEnded() re-armed instead of delivering it. The next watch-once re-observed the same unread max_id and the stale-wake guard stopped the bridge with exit 1 before the message was delivered. onTurnEnded() now delivers a pending wake via tryStartTurn() rather than re-arming. 2) install.sh: Codex writable_roots was configured by TWO copies — a legacy inline block (db/teams only, old closing-bracket awk) and configure_codex_ sandbox() (db/teams/run). On a fresh install both ran, and the inline copy's empty-array handling emitted a leading comma, which combined with the function's insert to produce invalid TOML (`["run", , "db", "teams"]`). Removed the legacy inline duplicate and rewrote the function to insert after the opening '[' — valid for empty/single-line/multiline arrays. Adds regression tests for both (the bridge test fails without the fix).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add
despawn— the inverse ofspawn— to tear down a spawned crew member.ctrl:despawncontrol message to the member. Its watcher (watch.sh) sees it, drops its own role (releasing the actas lock + registration) and closes its own tmux pane, which ends the member CLI. The leader blocks until the lock is released, up to--timeout(default 30s).--force: tear the member down from the placement recorded at spawn time — kill its tmux pane/window and drop its registration — for when the member's own watcher can't respond (dead watcher, or a codex member with no Monitor).How
scripts/despawn.sh:despawn.sh <team> <from> <name> [--force] [--timeout <secs>].scripts/spawn.sh: records the member's tmux target id + project + type via a newagmsg_spawn_pathhelper at launch, so--forcehas a placement to act on.scripts/lib/actas-lock.sh:agmsg_spawn_pathplacement-record path helper.scripts/watch.sh: handlectrl:despawn. Only an exclusive (actas) watcher dedicated to exactly the despawned role tears itself down. A broad-subscription watcher must skip the control message — its$TMUX_PANEis the leader's own pane, so acting on a despawn for any subscribed role would kill the leader's session. This is the core safety fix.Command surface
despawnis wired as a first-class command, not just a script:templates/cmd.claude-code.md+cmd.codex.md: adespawnhandler mapping/<cmd> despawn <name> [--force] [--timeout N]todespawn.sh, plus the help line shown at join.README.md: a "Tear down a spawned agent (despawn)" section next tospawn, and a cheatsheet line.SKILL.md: the despawn invocation documented alongside spawn.Tests
tests/test_despawn.bats: graceful teardown,--forceplacement teardown,--forceerror with no record, graceful timeout (exit 3), graceful no-op when no live lock, and a regression test asserting a broad watcher ignoresctrl:despawnand does not self-destruct.TMUX_PANEwhen launching the watcher under test, so thectrl:despawnhandler never targets a real pane.Validation
HOME, clean install):despawn.shis packaged and runs end-to-end (graceful no-op +--forceno-record error paths).--force(pane kill + registration drop) — the leader session survives in both cases (the bug this fixes).despawnhandler anddespawn.shis shipped.Closes #109.