Skip to content

fix(codex): engage the monitor bridge on codex 0.141 (ws transport + loaded-thread discovery)#174

Merged
fujibee merged 5 commits into
mainfrom
fix/codex-monitor-ws-170
Jun 21, 2026
Merged

fix(codex): engage the monitor bridge on codex 0.141 (ws transport + loaded-thread discovery)#174
fujibee merged 5 commits into
mainfrom
fix/codex-monitor-ws-170

Conversation

@fujibee

@fujibee fujibee commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Summary

On codex 0.141 the Codex monitor bridge never engaged, so a registered codex agent received no pushed messages while idle (#170). Two breakages:

  1. codex --remote <addr> now accepts only ws://host:port (rejects unix://), but the bridge used a unix:// socket — the TUI exited before the bridge could attach.
  2. With --remote, codex no longer exposes the thread id to hooks (CODEX_THREAD_ID unset, no rollout under ~/.codex/sessions/), so the launcher never learned which thread to attach to.

Changes

  • codex-bridge.js — the WebSocket app-server client now connects over either a unix socket ({ path }) or a TCP host/port ({ host, port }), so --app-server accepts ws://host:port as well as unix://PATH (handshake/framing unchanged). New --thread loaded discovers the live TUI thread via thread/loaded/list instead of a hook-resolved id.
  • codex-monitor.sh — runs the shared app-server on ws://127.0.0.1:<port> (port recorded per project for reuse) and connects the TUI with --remote ws://127.0.0.1:<port>.
  • codex-bridge-launcher.sh — resolves the project's single codex identity itself and starts the bridge with --thread loaded. A request file carrying a real thread id (older codex, hook path) still takes precedence, preserving backward compatibility.

Verification

  • tests/test_codex_bridge.bats (16 cases, all green locally): added ws://host:port connect, thread/loaded/list discovery, and the no-loaded-thread timeout. The prior "rejects unsupported endpoints" case now uses http:// since ws:// is supported.
  • Protocol confirmed against the real codex 0.141 binary: codex app-server --listen documents ws://IP:PORT, and generate-json-schema exposes thread/loaded/list{ data: string[] }.
  • Full cross-platform validation deferred to CI; a live end-to-end run on codex 0.141 is recommended before release as the final gate.

Closes #170

fujibee added 5 commits June 20, 2026 15:47
…loaded-thread discovery)

codex 0.141 rejects `--remote unix://` (ws-only) so the TUI exited before the
bridge could attach, and it no longer exposes the thread id to hooks (no
CODEX_THREAD_ID, no rollout for --remote), so the launcher never started the
bridge. Three changes make the monitor bridge engage again:

- codex-bridge.js: the WebSocket app-server client connects over either a unix
  socket or a TCP host/port, so --app-server accepts ws://host:port. Add
  --thread loaded, which discovers the live TUI thread via thread/loaded/list
  instead of relying on a hook-resolved id.
- codex-monitor.sh: run the shared app-server on ws://127.0.0.1:<port> (recorded
  per project for reuse) and connect the TUI with --remote ws://...
- codex-bridge-launcher.sh: resolve the project's single codex identity itself
  and start the bridge with --thread loaded; a request file with a real thread
  id (older codex) still takes precedence.

Tests: ws://host:port connect, thread/loaded/list discovery, and the no-loaded
timeout; the prior 'rejects unsupported endpoints' test now uses http:// since
ws:// is supported.

Closes #170
The ws migration picked the loopback port with a node -e one-liner, which made
codex-monitor.sh itself fail (and take down the Codex TUI) when Node was not on
PATH — e.g. an nvm-only Node in a non-interactive spawn shell. Let the app-server
pick the port (--listen ws://127.0.0.1:0) and parse the reported 'listening on'
line instead. codex-monitor.sh now needs no Node; only the bridge does, and it
degrades on its own if Node is missing rather than aborting the launch.
The bridge (codex-bridge.js) is launched via its env-node shebang, but an
nvm/fnm/volta Node is only on PATH in interactive shells — so a bridge started
from a non-interactive context (a spawn boot script) cannot find Node and Codex
monitor silently never starts, even though Node is installed.

Add lib/node.sh::agmsg_resolve_node — PATH, then AGMSG_NODE, then the newest
nvm/fnm node, then volta/homebrew/usr-local — and launch the bridge with the
resolved binary from codex-bridge-launcher.sh and session-start.sh. Falls back to
bare "node" so behaviour is never worse than relying on PATH (the existing
delivery.sh preflight still warns when no Node is found).
…as-is

The Node-resolution change launched the bridge as `$node $bridge_cmd`, which
broke AGMSG_CODEX_BRIDGE_CMD overrides that point at a non-Node runnable (the
delivery/session-start tests use a bash stub) — Node tried to parse the stub and
no bridge started. Run an explicit AGMSG_CODEX_BRIDGE_CMD as-is; only the default
codex-bridge.js goes through the resolved Node. Fixes the 3 session-start codex
tests that regressed on CI.
…p-server reuse

Addresses review notes on #174:
- delivery.sh Node preflight now resolves via lib/node.sh (the same path the
  launcher uses), so it no longer false-warns when only a version-manager Node is
  present, and it accepts AGMSG_NODE (canonical) as well as AGMSG_CODEX_NODE. An
  explicit override is returned verbatim so a bogus value still warns.
- codex-monitor.sh reuses an existing app-server only when its recorded SERVER_PID
  is alive AND the port answers, so a foreign process that grabbed the same port
  after ours died is not mistaken for the bridge app-server.
@fujibee fujibee merged commit 4eff740 into main Jun 21, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex monitor bridge never engages on recent codex CLI (--remote is ws-only; SessionStart hook can't resolve the thread id)

1 participant