Harden brainstorm stop-server stale pid handling by VeraPyuyi · Pull Request #1703 · obra/superpowers

VeraPyuyi · 2026-06-08T04:33:07Z

Summary

harden brainstorm stop-server.sh so stale or reused state/server.pid values do not kill unrelated processes
verify process ownership before SIGTERM and again before SIGKILL using the Node server.cjs entrypoint plus matching session ownership
fail closed as {status:stale_pid} when ownership cannot be proven, while preserving {status:stopped} for real brainstorm servers
add regression coverage for unrelated PIDs, fake server.cjs argv mentions, different-session brainstorm servers, paths with spaces, and real server shutdown

Validation

C:\Program Files\Git\bin\bash.exe tests/brainstorm-server/stop-server.test.sh -> 9 passed, 0 failed
C:\Program Files\Git\bin\bash.exe tests/brainstorm-server/windows-lifecycle.test.sh -> 12 passed, 0 failed, 0 skipped
C:\Program Files\Git\bin\bash.exe -n skills/brainstorming/scripts/stop-server.sh
C:\Program Files\Git\bin\bash.exe -n tests/brainstorm-server/stop-server.test.sh
git diff --check
git diff --cached --check

Note: scripts/lint-shell.sh skills/brainstorming/scripts/stop-server.sh tests/brainstorm-server/stop-server.test.sh could not run locally because shellcheck is not installed on PATH.

Replaced the bullet point next to "Jesse" in the sponsorship section of the `README` into a dash. This is needed so the `README` renders properly on markdown viewers.

Records scope, branching, architecture, deletion gate, verification protocol, path/config edits, migration ordering, and post-implementation verification. Frames CI integration, scenario co-location, and Python package rename as deferred work. Per-file deletion of bash tests under superpowers/tests/ is gated by a subagent that compares each bash assertion to its drill scenario's verify block. Default keeps the bash test if any assertion is unmatched. Branching: independent off dev (f/evals-lift), not stacked on f/cross-platform.

Two parallel reviewers raised legitimate issues against the lift-drill- into-evals spec. Updates: - Coverage map for tests/explicit-skill-requests/ corrected: 6 run-*.sh scripts + prompts, not "2 scenarios cover all". Several scripts (Haiku, multi-turn, please-use-brainstorming, use-systematic-debugging) have no drill counterpart and stay. - tests/claude-code/test-subagent-driven-development.sh marked as meta/documentation test (asks agent to describe SDD); no drill scenario covers description tests; defaults to keep. - Path-defaults section now shows verified evidence: PROJECT_ROOT resolves to evals/ post-move; only claude*.yaml substitute ${SUPERPOWERS_ROOT} in args (codex/gemini use it via os.environ in pre-run hooks); helper invocation order specified (after load_dotenv, before click definitions). - Step 2 copy uses explicit rsync excludes (.git, .venv, results, .env, __pycache__, *.egg-info, .private-journal); checksum-level verification rather than file-count. - Drill SHA recorded at copy time in commit message and evals/.drill-source-sha for divergence detection. - evals/tests/ pytest suite added to verification protocol. - Reference scrub list expanded: RELEASE-NOTES.md, docs/superpowers/plans/, .codex-plugin/ (corrected from .codex/), lefthook.yml. Excluded dirs called out (node_modules/, .venv/, evals/). - Historical plan docs / RELEASE-NOTES handling: annotate, don't rewrite. - evals/lefthook.yml move documented (drill ships its own; contributors run cd evals && lefthook run pre-commit manually). - PR description checklist includes archival action item for obra/drill post-merge. False finding rejected: svelte-todo fixture is complete on disk (design.md + plan.md + scaffold.sh present); reviewer obra#1 obra#3 dropped.

15-task implementation plan derived from the design spec at docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md. Each task is bite-sized (2-5 min steps) with exact commands, exact file paths, and exact code where required. Subagent verification gates per the spec are written out as concrete prompt templates. Self-review: - Spec coverage: every spec section maps to a task - Placeholder scan: no TBD/TODO/placeholder/fill-in-later language - Type consistency: helper named _set_superpowers_root_default consistently; drill SHA recorded in evals/.drill-source-sha consistently

rsync of obra/drill@013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.

Adds _set_superpowers_root_default() to drill/cli.py, called at module import after load_dotenv(). PROJECT_ROOT resolves to evals/ post-lift; its parent is the superpowers repo root, which is the correct value for SUPERPOWERS_ROOT. Existing env values are respected as overrides via os.environ.setdefault. Tests: - helper sets default when var is unset - helper does not override when var is already set

These backends only read SUPERPOWERS_ROOT via engine.py/setup.py's os.environ access, which the new cli.py default helper supplies automatically. claude*.yaml keep SUPERPOWERS_ROOT in required_env because they interpolate ${SUPERPOWERS_ROOT} into --plugin-dir args.

The cli.py helper now defaults the env var. Mention as override only.

…ing-* scenarios) Subagent verification confirmed each prompt's intent matches its corresponding drill scenario's turns[].intent verbatim, and each scenario has both a deterministic skill-called assertion and a semantic LLM criterion confirming the matching skill was loaded (actually a stronger check than the bash test, which only confirms the skill fires anywhere in the stream). All 6 prompts deleted. The runner had no remaining prompts to drive, so run-test.sh and run-all.sh deleted as well.

…rsation-skill-invocation) Subagent verification: every bash assertion (Skill tool invoked + specific skill name 'subagent-driven-development' loaded after the agent describes it conversationally in turn 1) maps to the drill scenario's skill-called assertion + criteria paragraph requiring the skill to fire in direct response to the second user message. Drill additionally asserts tool-called Agent (subagent dispatch) which is stricter than the bash test. Other runners in tests/explicit-skill-requests/ (haiku, multiturn, extended-multiturn) and their prompt files are preserved — they have no drill coverage and exercise different behaviors.

…ractals + sdd-svelte-todo) The bash test had ZERO output assertions — it just ran claude -p and printed token usage. Drill's scenarios are strictly more rigorous: go-fractals: skill-called SDD + tool-called Agent + go test ./... passes + cmd/fractals/main.go exists + >=4 commits + LLM criteria verifying real SDD workflow. svelte-todo: skill-called SDD + tool-called Agent + npm test passes + playwright e2e passes + package.json + svelte.config.js or vite.config.ts + >=4 commits + LLM criteria. design.md and plan.md are byte-identical between bash fixtures and drill fixtures (evals/fixtures/sdd-{go-fractals,svelte-todo}/). Drill's setup helper (scaffold_sdd_*) forces git init -b main (stricter than bash's reliance on init.defaultBranch). The .claude/settings.local.json from bash scaffold.sh is unnecessary for drill since permissions are managed via backend YAML. Subagent verification: SAFE TO DELETE for both.

…eviewer-catches-planted-flaws) Subagent verification: every bash assertion (TODO in Requirements section flagged, "specified later" deferral flagged, Issues section present, did-not-approve verdict) maps to drill verify.criteria entries. Setup parity covered by setup.assertions (test-feature-design.md exists with TODO + 'specified later' content). Drill is stricter: asserts tool-called Agent (subagent dispatch) which the bash test did not check.

…eview-catches-planted-bugs) Subagent verification: every bash assertion (skill invocation, subagent dispatch, SQL injection flagged, credential handling flagged, no merge approval) maps to drill verify checks. Drill is stricter: bundles severity (Critical/Important) into the same criteria as the finding itself (bash split severity into a separate test). Setup parity covered (src/db.js with string concat + identity hash, two commits). The drill scenario header explicitly says it is the "cross-harness, semantically-judged replacement for the bash test."

- test-worktree-native-preference.sh: drill covers PRESSURE phase only; RED + GREEN baselines have no drill counterpart and are kept so the RED-GREEN-REFACTOR validation remains rerunnable end-to-end. - test-subagent-driven-development-integration.sh: drill covers the YAGNI subset (forbidden exports + reviewer-as-gate). Bash adds >=3 commits, >=2 subagent dispatches, TodoWrite usage, test file existence check, and token-budget telemetry. Kept until drill scenario covers those or they are retired. - test-subagent-driven-development.sh: tests agent's ability to *describe* SDD (string matches against expected keywords). Drill scenarios test behavior, not description-recall. Kept by design. Subagent verification recorded in commit messages of subsequent deletions; gap analyses driving these annotations are also in the verification subagent reports for the gating sweep.

- RELEASE-NOTES.md: note that test-requesting-code-review.sh and test-document-review-system.sh were lifted into drill scenarios on 2026-05-06; references are preserved as dated artifacts. - docs/superpowers/plans/2026-03-23-codex-app-compatibility.md: note that tests/skill-triggering/ was lifted into drill scenarios on 2026-05-06; the run-all.sh reference is a dated artifact. Subagent second-pass scrub confirmed no other active references in the tree (excluding evals/ and the spec/plan for this work itself).

- docs/testing.md split into Plugin tests + Skill behavior evals. Plugin tests section enumerates the bash tests that survive (kept by drill-coverage analysis or as describe-skill tests). - CLAUDE.md adds Eval harness section pointing at evals/. - README.md Contributing section mentions evals/ alongside tests/. - .gitignore adds evals/{results,.venv,.env} as belt-and-suspenders (evals/.gitignore covers these locally; root-level entries help tooling that does not recurse into nested ignore files).

- evals/README.md, evals/CLAUDE.md: fix uv install command from 'uv sync --dev' to 'uv sync --extra dev'. Drill's pyproject.toml uses [project.optional-dependencies], so --dev is a no-op for pytest/ruff/ty; --extra dev is the correct invocation. - tests/claude-code/run-skill-tests.sh: drop test-requesting-code-review.sh from integration_tests array (file deleted earlier in this branch). - tests/claude-code/README.md: replace test-requesting-code-review.sh section with test-worktree-native-preference.sh (the worktree test is kept; the code-review test was lifted into drill). - docs/testing.md, CLAUDE.md: remove "Copilot CLI" from the harness list. evals/backends/ has claude*, codex, gemini configs but no copilot.yaml, so the claim was unsupported. Adversarial review credit: reviewer obra#2 found four legitimate issues (uv-sync, run-skill-tests stale ref, README stale ref via obra#1, and Copilot CLI fabrication); reviewer obra#1 found two distinct issues (run-skill-tests + tests/claude-code/README.md). Reviewer obra#2 wins this round.

* Remove Circle K signal from review skill * Add generic review hesitation guidance * Use Jesse wording for review hesitation guidance

Fixes obra#1529.

Replace generic third-person "Claude" with "agents" / "your agent" forms across active skill prose, the README intro, and the vendored anthropic-best-practices.md reference. Carve-outs preserved: historical attribution paths, the "Variant C: Claude.AI Emphatic Style" example label, model identifiers (Haiku/Sonnet/Opus), and the "In Claude Code:" per-platform skill-dispatch list. Coined-term rename: "Claude Search Optimization (CSO)" → "Skill Discovery Optimization (SDO)" in writing-skills/SKILL.md. Files in this commit also pick up later-phase changes that accumulated on the same files (dispatching-parallel-agents code- example transformation, writing-skills numbering and path fixes). The bundled spec at docs/superpowers/specs/ records the original scope and the carve-outs. README.md gets only its prose change here; the alphabetization lands in Phase C's commit.

Two structural changes: 1. Generalize CLAUDE.md-specific guidance: - "Project-specific conventions (put in CLAUDE.md)" → "(put in your instructions file)" in writing-skills/SKILL.md - "(explicit CLAUDE.md violation)" → "(explicit instruction-file violation)" in receiving-code-review/SKILL.md - The instruction-priority list in using-superpowers/SKILL.md stays inclusive (CLAUDE.md, GEMINI.md, AGENTS.md) — that's load-bearing, not a substitution opportunity. 2. Per-platform tool reference files at skills/using-superpowers/ references/{claude-code,codex,copilot,gemini}-tools.md. Each ref documents: - The runtime's preferred instructions file (CLAUDE.md, AGENTS.md, GEMINI.md, etc.) and how it loads - The runtime's personal-skills directory + cross-runtime ~/.agents/skills/ path where applicable - Action-language → tool-name mapping table Tool names and table content reflect the source-verified state from direct inspection of openai/codex, google-gemini/gemini-cli, sst/opencode, and the installed @github/copilot package. Filenames and behaviors are sourced from each runtime's official docs. Files in this commit also pick up later-phase changes that accumulated on the same files (using-superpowers/SKILL.md "How to Access Skills" overhaul, action-language flowchart, refs' final table content). The bundled spec records original scope.

Rewrite the Windows polyglot hook documentation to match the current run-hook.cmd dispatcher and update the porting guide cross-reference.\n\nFixes obra#1653.

Add a conditional TDD Evidence field to the implementer report format so controllers can verify RED and GREEN output when TDD was required. The field asks for the command run, relevant RED/GREEN output, and the expected RED failure reason rather than raw full logs. Fixes obra#994.

Serialize antigravity against the Gemini Code Assist rate limit (max_concurrency=1), diagnose 429/RESOURCE_EXHAUSTED honestly instead of as auth, fail-fast on a latched window, and tolerant preflight OK match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds `run-all --scenarios` for resuming a scenario subset across the Code Assist rate-limit windows. Follows the agy rate-limit fix (79f9963). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a reboot or PID wraparound the pid file can point at an unrelated, live process — which we would then kill. Verify the PID is actually our server (a running 'node ... server.cjs') before signalling it. If ownership can't be proven, fail closed: remove the stale pid file and report {status: stale_pid} without killing anything. Real servers still stop ({status: stopped}); a missing pid file still reports not_running. Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real server is stopped, and a missing pid file. Refs #1703

The node+server.cjs command match (from the adversarial review) still matched any unrelated node process running a file named server.cjs. When we recorded the bound port (state/server-info) and lsof is available, additionally require the PID to be the process actually LISTENING on this session's port — which rules out a different project's server.cjs / editor task runner that recycled the stale PID. Falls back to the command match when the port or lsof isn't available. Test: a 'node server.cjs' process not listening on the recorded port is spared. Refs #1703

stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a reboot or PID wraparound the pid file can point at an unrelated, live process — which we would then kill. Verify the PID is actually our server (a running 'node ... server.cjs') before signalling it. If ownership can't be proven, fail closed: remove the stale pid file and report {status: stale_pid} without killing anything. Real servers still stop ({status: stopped}); a missing pid file still reports not_running. Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real server is stopped, and a missing pid file. Refs #1703

The node+server.cjs command match (from the adversarial review) still matched any unrelated node process running a file named server.cjs. When we recorded the bound port (state/server-info) and lsof is available, additionally require the PID to be the process actually LISTENING on this session's port — which rules out a different project's server.cjs / editor task runner that recycled the stale PID. Falls back to the command match when the port or lsof isn't available. Test: a 'node server.cjs' process not listening on the recorded port is spared. Refs #1703

stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a reboot or PID wraparound the pid file can point at an unrelated, live process — which we would then kill. Verify the PID is actually our server (a running 'node ... server.cjs') before signalling it. If ownership can't be proven, fail closed: remove the stale pid file and report {status: stale_pid} without killing anything. Real servers still stop ({status: stopped}); a missing pid file still reports not_running. Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real server is stopped, and a missing pid file. Refs #1703

The node+server.cjs command match (from the adversarial review) still matched any unrelated node process running a file named server.cjs. When we recorded the bound port (state/server-info) and lsof is available, additionally require the PID to be the process actually LISTENING on this session's port — which rules out a different project's server.cjs / editor task runner that recycled the stale PID. Falls back to the command match when the port or lsof isn't available. Test: a 'node server.cjs' process not listening on the recorded port is spared. Refs #1703

obra · 2026-06-17T22:37:52Z

Thanks @VeraPyuyi — verifying ownership before signaling a PID is exactly the right hardening for stop-server.sh.

After the 6.0 release: 6.0.0 ships ownership verification in stop-server.sh. Verified against v6.0.2 (and confirmed identical on origin/dev):

before sending any signal, it checks the target PID's argv carries a --brainstorm-server-id=<id> that matches the per-start id written to state/server-instance-id (is_brainstorm_server / command_has_server_id)
it fails closed, reporting {"status":"stale_pid"} rather than risking SIGTERM/SIGKILL on an unrelated node after a reboot

That's a different implementation from this PR's (path/cwd/environ matching) but the same — arguably stronger — guarantee, and it's the work referenced in the 6.0 release notes. So I'm closing this as resolved in 6.0.

If stop-server.sh ever signals the wrong process on 6.0.2, please reopen or file a fresh issue with the scenario.

Triaged by Claude Opus 4.8 (claude-opus-4-8, 1M context) running in Claude Code 2.1.181, with the superpowers and github-triage plugins. Post-6.0 obsolescence pass — verified against the released v6.0.2 source tree, not from memory. (No session ID available to cite in this environment.)

robotsnh and others added 30 commits May 6, 2026 11:22

docs: turned the dash in "- Jesse" into an escape sequence (obra#1474)

b4363df

Replaced the bullet point next to "Jesse" in the sponsorship section of the `README` into a dash. This is needed so the `README` renders properly on markdown viewers.

evals: drop SUPERPOWERS_ROOT setup step from README/CLAUDE

6f0adeb

The cli.py helper now defaults the env var. Mention as override only.

evals: remove unreleased wave scenarios

3dc0ea6

evals: drop drill source marker

58082d0

evals: add Gemini 2.5 Flash backend

35e42a1

evals: use pre-commit hooks

7f02ccd

fix(writing-skills): use markdown link for testing methodology reference

d4cf61b

fix: remove stale Cursor plugin refs

9088f56

fix(using-git-worktrees): repair skipped Step 2 numbering (obra#1522)

491df73

fix: remove global worktree path fallback (obra#1476)

3dfb376

[codex] replace Circle K signal with generic review guidance (obra#1531)

a152bb3

* Remove Circle K signal from review skill * Add generic review hesitation guidance * Use Jesse wording for review hesitation guidance

fix(tdd): link testing anti-patterns reference (obra#1532)

3d6dc90

Fixes obra#1529.

Move eval harness to submodule (obra#1541)

d25618d

arimu1 and others added 14 commits June 1, 2026 15:57

docs(windows): update polyglot hook docs

9d3e68a

Rewrite the Windows polyglot hook documentation to match the current run-hook.cmd dispatcher and update the porting guide cross-reference.\n\nFixes obra#1653.

docs(windows): trim polyglot hook implementation copy

7301c81

feat: add Kimi Code plugin manifest

2a8e547

fix: align Kimi manifest with supported fields

7fec40b

fix: wire Kimi plugin into release metadata

6b76158

docs: simplify Kimi README install steps

773bbf6

docs: restore Kimi direct install command

c74c22d

Tighten Kimi plugin porting coverage

16a1719

Add shell lint script

f3f0789

fix(brainstorming): cap websocket frame payloads

d7c260a

chore(evals): bump submodule to --scenarios filter (ff3ee83)

ae1eefb

Adds `run-all --scenarios` for resuming a scenario subset across the Code Assist rate-limit windows. Follows the agy rate-limit fix (79f9963). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Harden brainstorm stop-server stale pid handling

88a042f

anhtnt90dev mentioned this pull request Jun 8, 2026

docs: clarify local plugin test commands #1706

Open

5 tasks

obra mentioned this pull request Jun 10, 2026

Harden & modernize the brainstorming visual companion (auth, lifecycle, reconnect, just-in-time offer) #1720

Merged

5 tasks

obra mentioned this pull request Jun 16, 2026

Release v6.0.0 #1767

Closed

8 tasks

arittr force-pushed the dev branch from 75f6628 to 284be59 Compare June 16, 2026 17:10

obra mentioned this pull request Jun 16, 2026

Release v6.0.0 #1769

Merged

8 tasks

obra force-pushed the dev branch 3 times, most recently from 210b867 to b62616f Compare June 17, 2026 05:46

obra closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden brainstorm stop-server stale pid handling#1703

Harden brainstorm stop-server stale pid handling#1703
VeraPyuyi wants to merge 71 commits into
obra:devfrom
VeraPyuyi:codex/stop-server-stale-pid

VeraPyuyi commented Jun 8, 2026

Uh oh!

obra commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Uh oh!

Conversation

VeraPyuyi commented Jun 8, 2026

Summary

Validation

Uh oh!

obra commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants