feat(despawn): tear down spawned crew members (#109) by fujibee · Pull Request #129 · fujibee/agmsg

fujibee · 2026-06-15T22:17:47Z

What

Add despawn — the inverse of spawn — to tear down a spawned crew member.

graceful (default): send a ctrl:despawn control message to the member. Its watcher (watch.sh) sees it, drops its own role (releasing the actas lock + registration) and closes its own tmux pane, which ends the member CLI. The leader blocks until the lock is released, up to --timeout (default 30s).
--force: tear the member down from the placement recorded at spawn time — kill its tmux pane/window and drop its registration — for when the member's own watcher can't respond (dead watcher, or a codex member with no Monitor).

How

scripts/despawn.sh: despawn.sh <team> <from> <name> [--force] [--timeout <secs>].
scripts/spawn.sh: records the member's tmux target id + project + type via a new agmsg_spawn_path helper at launch, so --force has a placement to act on.
scripts/lib/actas-lock.sh: agmsg_spawn_path placement-record path helper.
scripts/watch.sh: handle ctrl:despawn. Only an exclusive (actas) watcher dedicated to exactly the despawned role tears itself down. A broad-subscription watcher must skip the control message — its $TMUX_PANE is the leader's own pane, so acting on a despawn for any subscribed role would kill the leader's session. This is the core safety fix.

Command surface

despawn is wired as a first-class command, not just a script:

templates/cmd.claude-code.md + cmd.codex.md: a despawn handler mapping /<cmd> despawn <name> [--force] [--timeout N] to despawn.sh, plus the help line shown at join.
README.md: a "Tear down a spawned agent (despawn)" section next to spawn, and a cheatsheet line.
SKILL.md: the despawn invocation documented alongside spawn.

Tests

tests/test_despawn.bats: graceful teardown, --force placement teardown, --force error with no record, graceful timeout (exit 3), graceful no-op when no live lock, and a regression test asserting a broad watcher ignores ctrl:despawn and does not self-destruct.
The test harness unsets TMUX_PANE when launching the watcher under test, so the ctrl:despawn handler never targets a real pane.

Validation

Full suite: 249/249 green.
Packaging smoke (isolated HOME, clean install): despawn.sh is packaged and runs end-to-end (graceful no-op + --force no-record error paths).
Local end-to-end run confirmed both paths on real tmux members: graceful self-teardown (~1s, window closes, role/registration dropped) and --force (pane kill + registration drop) — the leader session survives in both cases (the bug this fixes).
A nested run (leader → spawn A → A spawns B → A despawns B → leader despawns A) confirmed self-teardown at every level with no orphaned panes/watchers/records and the despawning session always surviving.
Packaging smoke also asserts the installed command template carries the despawn handler and despawn.sh is shipped.

Closes #109.

Add despawn.sh, the inverse of spawn.sh: - graceful: send ctrl:despawn to the member; its watcher drops its own role (releasing the actas lock) and closes its own tmux pane. Block until the lock is released, up to --timeout. - --force: tear the member down from the placement recorded at spawn time — kill its tmux pane/window and drop its registration — for when the member's watcher can't respond. spawn.sh records the member's tmux target id + project + type via the new agmsg_spawn_path helper so --force has a placement to act on. watch.sh: only an EXCLUSIVE (actas) watcher dedicated to exactly the despawned role tears itself down on ctrl:despawn. A broad-subscription watcher must skip it — its $TMUX_PANE is the leader's own pane, so acting on a despawn for any subscribed role would kill the leader's session. tests: unset TMUX_PANE when launching the watcher under test so the ctrl:despawn handler never targets the developer's real pane; add a regression test asserting a broad watcher ignores ctrl:despawn.

) despawn.sh existed but was only reachable by calling the script directly. Expose it as a first-class command: - templates/cmd.claude-code.md, cmd.codex.md: add a 'despawn' handler that maps '/<cmd> despawn <name> [--force] [--timeout N]' to despawn.sh, plus the help line in the join summary. - README.md: add a 'Tear down a spawned agent (despawn)' section next to spawn, and a cheatsheet line. - SKILL.md: document the despawn invocation alongside spawn.

Build the full Codex pseudo-monitor on top of the re-arm fix, ported clean onto current main (despawn + instance-id already landed). New scripts: - codex-shim.sh: PATH-front entrypoint that routes interactive codex / codex resume launches in monitor-mode projects through codex-monitor.sh; everything else (--version, exec, non-monitor projects) passes through to real Codex. - codex-monitor.sh: starts/reuses the shared app-server, launches the out-of-sandbox bridge launcher, then execs the TUI on the same unix socket. - codex-bridge-launcher.sh: polls the request file and starts codex-bridge.js outside the Codex sandbox (a hook-launched bridge cannot connect to the socket from inside the sandbox). - codex-shim-install.sh: installs/removes the ~/.agents/bin/codex shim. Wiring grafted onto main's versions (preserving #129/#132/#128): - session-start.sh: codex branch writes a request file (launcher mode) or nohup-launches the bridge. Resolves the thread id via CODEX_THREAD_ID, then falls back to the newest rollout whose session_meta cwd matches the project — fresh / "codex exec" sessions never export CODEX_THREAD_ID, so without this the bridge only auto-launched on the interactive resume path (launch-gap #41). - delivery.sh: "set monitor" for codex installs the shim + prints guidance; rejects "both". - install.sh: configure_codex_sandbox adds db/teams/run to Codex sandbox_workspace_write.writable_roots. Tests: codex shim, session-start codex (incl. rollout fallback), delivery monitor/both codex, install writable_roots. Full suite 314 green.

… + fresh-session launch (#41) (#148) * feat(codex-bridge): re-arm watch-once without depending on turn/completed (#41) Vertical slice of the Codex app-server bridge, rebuilt clean on current main (despawn + instance-id already landed). Carries the working WebSocket-over-UDS transport and watch-once oracle, and fixes the delivery bug. Root cause: the bridge re-armed watch-once only from onTurnCompleted (turn/completed | turn/failed). The real Codex app-server does not reliably deliver those, so after the first wake the bridge started a turn and never re-armed — it handled one message then slept. The fake app-server in tests sends turn/completed explicitly, hiding it. Fix: route turn teardown through a single onTurnEnded() reachable from turn/completed, thread/status idle, OR a turn watchdog (--turn-timeout, default 60s). Detection re-arms when the turn ends, so a missing completion event no longer parks the bridge. The stale max_id guard is retained. Tests: two regressions where the fake never sends turn/completed — one driven by the watchdog, one by an idle status — asserting a second wakeup occurs. Full codex-bridge + watch-once suites green (12 + 6). * feat(codex-bridge): wire up the Codex monitor stack on main (#41) Build the full Codex pseudo-monitor on top of the re-arm fix, ported clean onto current main (despawn + instance-id already landed). New scripts: - codex-shim.sh: PATH-front entrypoint that routes interactive codex / codex resume launches in monitor-mode projects through codex-monitor.sh; everything else (--version, exec, non-monitor projects) passes through to real Codex. - codex-monitor.sh: starts/reuses the shared app-server, launches the out-of-sandbox bridge launcher, then execs the TUI on the same unix socket. - codex-bridge-launcher.sh: polls the request file and starts codex-bridge.js outside the Codex sandbox (a hook-launched bridge cannot connect to the socket from inside the sandbox). - codex-shim-install.sh: installs/removes the ~/.agents/bin/codex shim. Wiring grafted onto main's versions (preserving #129/#132/#128): - session-start.sh: codex branch writes a request file (launcher mode) or nohup-launches the bridge. Resolves the thread id via CODEX_THREAD_ID, then falls back to the newest rollout whose session_meta cwd matches the project — fresh / "codex exec" sessions never export CODEX_THREAD_ID, so without this the bridge only auto-launched on the interactive resume path (launch-gap #41). - delivery.sh: "set monitor" for codex installs the shim + prints guidance; rejects "both". - install.sh: configure_codex_sandbox adds db/teams/run to Codex sandbox_workspace_write.writable_roots. Tests: codex shim, session-start codex (incl. rollout fallback), delivery monitor/both codex, install writable_roots. Full suite 314 green. * docs(codex-bridge): document the Codex monitor beta (#41) - templates/cmd.codex.md: offer monitor as a delivery mode for codex (the app-server bridge beta), default to it on first join, and accept "mode monitor"; keep rejecting "both". - README.md: describe the Codex monitor beta in the Codex section and the intro, pointing at docs/codex-monitor-beta.md for setup and internals. * test(codex): sandbox HOME in setup_test_env so the suite never touches real state (#41) Running the bats suite was clobbering the developer's real shim: a couple of tests install the codex shim (delivery set monitor codex -> codex-shim-install writes $HOME/.agents/bin/codex) and configure $HOME/.codex/config.toml, but setup_test_env did not sandbox HOME. The shim ended up pointing at the test's temp scripts dir, which dangled once the temp dir was torn down. CI never caught it (clean containers have no real ~/.agents to break). Sandbox HOME under the per-test temp skill dir in setup_test_env. bats runs each test in its own subshell, so the export is scoped per test and needs no restore. test_install.bats keeps its own FAKE_HOME and is unaffected. * feat(codex): flag missing PATH / Node when enabling monitor mode (#41) The two silent first-run failures for the Codex monitor beta were a shim that isn't on PATH and a missing Node — in both cases the bridge just never starts. `delivery set monitor codex` now surfaces them at enable time: - If ~/.agents/bin is not on PATH, print a loud WARNING that the shim won't intercept `codex` and the bridge will not engage (this is the #1 cause of "I turned monitor on and nothing happens"). - If Node is missing, print a WARNING that the bridge needs Node. Presence only, no version gate: the bridge uses old/stable APIs (one optional-chaining call, Node 14+) and any Node new enough to run Codex runs it. AGMSG_CODEX_NODE overrides the checked binary (and makes the path testable). Tests for both warnings. * docs(codex-bridge): fix the flow diagram — hook writes a request file, launcher starts the bridge (#41) The diagram and prose claimed the SessionStart hook starts codex-bridge.js directly. That is the broken in-sandbox path (socket connect EPERM). Correct it: the hook resolves the thread (CODEX_THREAD_ID or newest matching rollout) and writes a request file; codex-monitor.sh's out-of-sandbox launcher reads it and starts the bridge, which connects over WebSocket-over-UDS. Add the re-arm loop and turn serialization. * feat(codex): emphasize the monitor beta and clean up on mode off (#41) Make the experimental nature and the PATH cost of the Codex monitor beta hard to miss, and make turning it off actually tear things down: - cmd.codex.md: move monitor to the LAST delivery option (3, "BETA, advanced"), default to turn, and spell out that it installs a PATH shim that intercepts the `codex` command — opt in only if you understand PATH precedence. - README + docs/codex-monitor-beta.md: prominent warning blocks (changes how Codex starts, ~/.agents/bin must be first on PATH, known rough edges #149/#150). - delivery.sh: `set off codex` now stops the project's bridge process(es) and removes their run files (the manual counterpart to the not-yet-wired auto teardown, #149), and tells the user the shim is global — how to remove it and restore PATH if no other project uses monitor. Leaves the shared app-server and shim in place. Test included. * feat(codex): tell users to restart Codex for monitor to take effect (#151) Enabling mode monitor only launches the bridge on the next Codex SessionStart, so an already-running session (or an off->on toggle) stays unmonitored until restart. delivery.sh, the cmd.codex.md prompt guidance, and docs now say so explicitly. The actual in-session (re)launch is tracked in #151. * docs(codex): clarify bridge starts on first turn, not at launch Codex fires the SessionStart hook on the session's first turn (first message), not the moment the TUI opens, so the monitor bridge is absent until the user interacts once after a restart. Update the delivery.sh enable message, the cmd.codex.md prompts, and the beta doc to say "restart and send your first message". Note the launcher/request-file redundancy review (#153) in the bridge mechanics. * feat(codex): lower monitor poll interval default from 5s to 2s The watch-once poll interval governs how fast an idle, armed bridge detects a new unread message. Drop the default from 5s to 2s for snappier delivery. Safe against double-injection: watch-once is one-shot (exits on first detection), re-arm only happens after the turn ends, and the stale-max_id guard backstops any re-detection — none of which the interval affects. Update both the bridge default and the watch-once.sh standalone fallback to keep them in sync. * fix(codex): deliver deferred wake on turn end; de-dup Codex writable_roots Two blocking issues from review: 1) codex-bridge.js: a wake observed while the resumed thread was still active (SessionStart fires on the first user turn) was deferred by tryStartTurn(), but onTurnEnded() re-armed instead of delivering it. The next watch-once re-observed the same unread max_id and the stale-wake guard stopped the bridge with exit 1 before the message was delivered. onTurnEnded() now delivers a pending wake via tryStartTurn() rather than re-arming. 2) install.sh: Codex writable_roots was configured by TWO copies — a legacy inline block (db/teams only, old closing-bracket awk) and configure_codex_ sandbox() (db/teams/run). On a fresh install both ran, and the inline copy's empty-array handling emitted a leading comma, which combined with the function's insert to produce invalid TOML (`["run", , "db", "teams"]`). Removed the legacy inline duplicate and rewrote the function to insert after the opening '[' — valid for empty/single-line/multiline arrays. Adds regression tests for both (the bridge test fails without the fix).

fujibee added 2 commits June 15, 2026 11:01

fujibee merged commit 647688a into main Jun 16, 2026
3 checks passed

fujibee deleted the feat/despawn branch June 16, 2026 00:28

This was referenced Jun 17, 2026

feat: add Hermes Agent support #118

Closed

release: 1.0.5 #137

Merged

Fewmanism mentioned this pull request Jun 17, 2026

Follow-up to #118: support Hermes profiles in spawn #119

Closed

fujibee mentioned this pull request Jun 17, 2026

feat(codex): Codex monitor bridge (beta) — app-server bridge + re-arm + fresh-session launch (#41) #148

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(despawn): tear down spawned crew members (#109)#129

feat(despawn): tear down spawned crew members (#109)#129
fujibee merged 2 commits into
mainfrom
feat/despawn

fujibee commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fujibee commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Command surface

Tests

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fujibee commented Jun 15, 2026 •

edited

Loading