You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fresh-eyes architecture audit of the Godot AI plugin stack: Python server, FastMCP tools/handlers, and the GDScript editor plugin. The issue is now both diagnosis and execution roadmap so the work can be split into PRs cleanly.
Audit branch: claude/audit-plugin-architecture-XYtQY (no code changes, clean tree).
Severity legend:P0 = correctness bug or data loss, P1 = latent reliability hazard or silent failure mode, P2 = structural / maintainability.
PR 7 — UpdateManager extraction (Add MCP resources and pagination #1 update seam). Landed in Extract McpUpdateManager from mcp_dock.gd (#297, PR 7) #310 (targets beta). McpUpdateManager (utils/update_manager.gd) owns Releases poll, ZIP download, install orchestration, and the install-in-flight gate; dock keeps banner UI only and consults the gate via _is_self_update_in_progress(). PR 6 carry-overs addressed: AWAITING_VERSION deleted (slot 2 reserved for wire-compat), STOPPING → INCOMPATIBLE and STOPPING → UNINITIALIZED added to can_transition() so the recovery rollback paths stop bypassing transition_state. Status comment: PR 7 status.
PR 10 — Narrow meta-tool JSON coercion (Make MCP tool surface tool-search friendly #8). Landed in PR 10: Narrow meta-tool JSON coercion #314 (targets beta). dispatch_manage_op's nested coercion is now annotation-aware: JSON-shaped strings decode only when the target handler param is annotated list/dict-like, string-typed params keep the literal value (a group named "[a,b]" is no longer mangled), and malformed JSON or wrong-container shapes raise structured INVALID_PARAMS with {tool, op, param, expected, actual|json_error} data. Handler signature + type-hints are cached per-handler via functools.cache. Finding Make MCP tool surface tool-search friendly #8 / P1 closed.
PR 11 — Middleware order doc + test (Prepare PyPI packaging: metadata, --version flag, release-smoke CI #14). Landed in PR 11: Document and pin FastMCP middleware order (#14) + closing docs sweep #317 (targets beta). The four mcp.add_middleware(...) calls in src/godot_ai/server.py now sit under a consolidated rationale block explaining FastMCP's reversed(self.middleware) chain composition (first-added is outermost on response) and per-layer position reasoning. tests/unit/test_server_middleware_order.py locks the four-class order via runtime introspection of mcp.middleware, using identity-set membership against the imported godot_ai.middleware classes (a rename hard-fails at import time, no string-prefix dependency). Closing docs sweep also reconciled middleware-list drift across CLAUDE.md, .claude/skills/godot-ai/skill.md, docs/plugin-architecture.md, and docs/implementation-plan.md (three docs were listing only three of four middlewares — PreserveGodotCommandErrorData was missing). Finding Prepare PyPI packaging: metadata, --version flag, release-smoke CI #14 / P2 closed.
Audit-cleanup batch closed. PR 11 (PR 11: Document and pin FastMCP middleware order (#14) + closing docs sweep #317) was the closing PR. The following items are deferred outside the audit umbrella and tracked as ordinary follow-ups, not as PR 12+ in this stack: Add per-call session routing and readable session IDs #12 (error-code parity contract test), the handler-pattern dedup work (resolve-or-error helper, property-list validation, value-coercion interface), the audit blind spots (debugger timer leak, CLI finder cache invalidation, structured dispatcher diagnostics, setup-dev.ps1 uv-presence parity, structured ci-check-gdscript exit), and the broader self-update preload-alias work (handler bodies + the depth-2+ const-preload graph deferred from PR 9). beta → main promotion is the next deliberate step the maintainer owns.
Land remaining focused cleanups as bandwidth allows.
Compatibility / migration notes
State-machine changes are the highest regression risk because they touch server spawn/adoption, version verification, reconnect behavior, and dock refresh timing. Keep these behind characterization tests and preserve current user-visible states unless a PR explicitly changes them.
Runtime Protocol cleanup in PR 8: Delete unused Runtime Protocol #313 deliberately chose deletion over injection because the seam had no production consumer. Future runtime alternatives should add a real registration-time injection seam instead of restoring the Protocol alone.
Update extraction must preserve installed-plugin reload behavior, including the rescue/update runner path. Treat self-update smoke coverage as mandatory before moving ownership into UpdateManager.
The beta branch is the integration target for this stack; do not retarget individual audit PRs to main unless they are explicitly promoted as hotfixes.
Top structural issues
1. [P2]plugin.gd (2250) and mcp_dock.gd (2450) are god classes with no clear seams ✅ Update slice fixed in #310; plugin.gd extractions complete
plugin.gd mixed EditorPlugin lifecycle, server spawn/adopt/recovery, port resolution, OS-specific PID scraping, version verification, handler registration, and game-helper autoload — 103 methods, roughly 5 responsibilities. Natural extractions:
✅ UpdateManager — landed in Extract McpUpdateManager from mcp_dock.gd (#297, PR 7) #310 as McpUpdateManager (utils/update_manager.gd). Owns Releases poll, ZIP download, install orchestration, and the install-in-flight gate; dock keeps banner UI only.
mcp_dock.gd is still ~2300 LOC after the update-slice extraction — client-row refactor and other dock-internal carve-outs remain candidates for PR 11+ but are not load-bearing for the audit findings any more.
2. [P2]Boolean-flag thicket with no state enum ✅ Fixed in #308 (+ #310 cleanup)
Across plugin.gd and mcp_dock.gd there were roughly 29 overlapping booleans/deadlines governing server, connection, and refresh state. PR 6 (#308) replaced them with McpServerState (12 states + transition table + first-writer-wins for terminal diagnoses), McpClientRefreshState (5 states + per-state UI/spawn semantics), McpStartupPath (typed startup-trace tags), and McpAdoptionLabel (managed/external). McpSpawnState (terminal-only string union) was deleted.
✅ McpServerState.AWAITING_VERSION deleted. The state was reachable in the transition table but unused; integer slot 2 left empty so live editor_state.state wire values for READY/INCOMPATIBLE/etc. don't shift.
✅ STOPPING → INCOMPATIBLE and STOPPING → UNINITIALIZED added to can_transition(). recover_incompatible_server, force_restart_server, and reset_for_force_restart now route through transition_state(...) instead of writing _server_state directly. test_stopping_recovery_rollback_transitions covers the new transitions.
3. [P1]Dock thread management is three different strategies ✅ Fixed in #308 (partial)
PR 6 (#308) collapsed the seven refresh booleans into McpClientRefreshState (IDLE, DEFERRED_FOR_FILESYSTEM, RUNNING, RUNNING_TIMED_OUT, SHUTTING_DOWN) plus pending-request flags. Per-client action threads stay separate (different lifecycle from the bulk refresh sweep, with McpCliExec wall-clock budgets bounding them). The dock-side refresh thicket is now resolved; the broader unification of all dock worker lifecycles under one pool/watchdog is no longer load-bearing and rolls into general cleanup.
4. [P0]McpScenePath.from_node silently produces wrong paths for foreign nodes ✅ Fixed in #298
utils/scene_path.gd:10-16 calls scene_root.get_path_to(node) without first checking scene_root.is_ancestor_of(node). If a handler passes a node from an instanced sub-scene or foreign tree, the function can return a plausible-looking path like /Main// instead of an empty string. Every consumer trusts this path.
User-visible impact: Agents can receive paths that resolve to the wrong node, or no node, and the failure looks like ordinary downstream tool behavior instead of an invalid-node bug.
Repro:
# In a test scene with /Main as scene_rootvarforeign:=Node.new()
EditorInterface.get_edited_scene_root().get_parent().add_child(foreign)
varpath:=McpScenePath.from_node(foreign, EditorInterface.get_edited_scene_root())
# Expected: "" (or explicit error sentinel)# Actual: "/Main/"
Fix is small (is_ancestor_of guard plus root special case); the important part is regression coverage.
5. [P2]Runtime Protocol is vestigial ✅ Fixed in #313
PR 8 (#313) chose Path B. runtime/interface.py was deleted, shared handlers and _meta_tool.py::dispatch_manage_op now type against DirectRuntime, and tests/unit/test_tool_domains.py::test_runtime_protocol_is_not_reintroduced_without_injection_seam guards against reintroducing the pretend seam or stale runtime-boundary docs. The production MCP surface is unchanged: tools/resources still construct DirectRuntime.from_context(...), per-call session_id pinning remains at that boundary, and resources still use the active session.
Medium-impact issues
6. [P1]connection.gd lifecycle gaps ✅ Fixed in #300
pause_processing was a bare bool, not a depth counter. Nested pauses could prematurely resume processing during save/play re-entrancy windows.
Outbound sends did not check backpressure before send_text() even though screenshots can be multi-MB.
State-change tracking could mark scene/play/readiness events as sent even when outbound backpressure prevented delivery.
Socket lifecycle reset now clears pending deferred responses.
7. [P1]Deferred-response sentinel has no timeout ✅ Fixed in #300
dispatcher.gd now tracks pending deferred responses, emits structured DEFERRED_TIMEOUT errors when a handler never replies, and exposes deferred cleanup for connection lifecycle reset.
8. [P1]_meta_tool.py silently coerces JSON-shaped strings ✅ Fixed in #314
PR 10 (#314) made the nested coercion in dispatch_manage_op annotation-aware. _coerce_stringified_json_values now walks each handler's resolved type hints (cached per-handler via functools.cache) and only JSON-decodes a string-shaped value when the target param is annotated list/dict-like. String-typed params keep the literal value; malformed JSON or wrong-container shapes raise structured INVALID_PARAMS errors carrying {tool, op, param, expected, actual|json_error}. Handler TypeError for missing/extra args is still wrapped into INVALID_PARAMS at the dispatch boundary, so schema failures surface as clean validation errors instead of opaque internals.
9. [P0]Update-reload runner has no rollback for partial extracts ✅ Fixed in #299
update_reload_runner.gd:181-196: if file 3 of 5 fails to write, _install_zip_paths returns false but already-written files remain. The runner can re-enable the old plugin against a half-new addon directory.
User-visible impact: A failed self-update can leave mixed plugin files from vN and vN+1, producing sporadic runtime errors and misleading version reporting.
Repro:
1. Stage a v(N+1) ZIP with 5 files.
2. Trigger update via dock.
3. Mid-install, force a write failure on file 3 (chmod -w on target, or fill disk
to fail a later write). _install_zip_paths returns false.
4. Inspect addons/godot_ai/: some files are vN+1 and some are vN.
5. Editor re-enables the plugin against this mixed directory.
Fix shape: pre-flight writeability before starting, or collect paths_written and unwind on failure. If rollback fails, do not re-enable the plugin as if the install were safe.
10. [P0]Atomic write fallback can drop the user's MCP config on Windows ✅ Fixed in #299
clients/_atomic_write.gd:29-34: when the first rename fails, the code removes the destination and retries. If the second rename fails, the original is gone and the temp is cleaned up.
User-visible impact: A Windows user can lose the entire MCP config file for Claude Desktop, Cursor, Cline, etc. while the dock is adding/removing only the Godot AI entry.
Repro:
1. On Windows, open the target MCP config in another process to create lock/AV timing pressure.
2. Click Configure for any client in the dock.
3. First rename fails.
4. _atomic_write calls DirAccess.remove_absolute(path).
5. Second rename fails because the lock or antivirus timing changes.
6. Original config is gone; backup is not restored.
Fix shape: avoid remove-then-rename as the recovery path. Use copy-then-verify or a three-file rotation where .backup is retained until the new file is verified readable.
11. [P1]SessionRegistry is not concurrency-safe ✅ Fixed in #300
sessions/registry.py now serializes registry mutations and waiter coordination behind an asyncio.Lock, and wait_for_session() performs a same-lock re-check to avoid missing a just-registered session.
12. [P2] Plugin error codes pass through as opaque strings
godot_client/client.py:86 forwards error.code as-is. The GDScript side has numeric error_codes.gd, and Python has protocol/errors.py, but no contract test enforces parity. Add a translation/contract test so drift fails in CI.
13. [P2]animation_handler.gd mixes four domains ✅ Fixed in #367 on beta (tracked under #342)
The 1674-LOC handler is now split along the four-domain seam following the camera/material/particle *_handler.gd + *_presets.gd + *_values.gd pattern: animation_handler.gd (869 LOC, write ops + undo helpers + resolvers + thin proxies into the submodules), animation_presets.gd (480 LOC, preset_fade / preset_slide / preset_shake / preset_pulse + the target classifier and direction-offset helper), and animation_values.gd (454 LOC, list_animations / get_animation / validate_animation + shared keyframe value coercion + transition parsing + serialization, plus a new player_root_node helper that DRYs up the root-node fallback that was open-coded in three places). Submodules use a WeakRef back-pointer so the handler can own them strongly via _presets/_values without forming a RefCounted cycle. preset_* and read methods are thin proxies on the handler so plugin.gd dispatcher entries and test_animation.gd's _handler.method(...) call sites required no edits.
The first attempt landed on main in #344 against this umbrella's branching policy and was reverted via #368; the split now lives only on beta and reaches main via the eventual promotion. Polish for validate_animation's broken_tracks[].node_path shape on subpath tracks landed in #371 (the original rfind(":") would surface Target:modulate instead of Target in the diagnostic for missing-target subpath tracks like Target:modulate:a; first-colon split fixes that).
14. [P2]Middleware order in server.py is implicit ✅ Fixed in #317
PR 11 (#317) added a consolidated rationale block above the four mcp.add_middleware(...) calls in server.py documenting FastMCP's reversed(self.middleware) chain composition and per-layer position reasoning, plus tests/unit/test_server_middleware_order.py locking the four-class order via runtime introspection of mcp.middleware (identity-set membership against the imported godot_ai.middleware classes — a rename hard-fails at import time). The closing docs sweep also reconciled middleware-list drift across three docs that were listing only three of four middlewares.
Handler-side patterns worth absorbing
These are not all worth standalone issues, but they are good cleanup candidates once the reliability stack is stable:
Repeated "resolve node or return error" boilerplate across 6+ handlers should become one helper.
Repeated get_property_list() validation loops should move into shared property validation.
Per-handler value coercion should expose one consistent interface instead of scattered *_values.gd helpers.
Inline sub-resource mutation outside undo actions is a P1 risk in particle_handler.gd and likely material_handler.gd; convert mutations to undoable properties or document the constraint explicitly.
Read paths need frame-budget awareness where they recursively scan res:// or do many uncached get_node_or_null calls.
_param_validators.gd is too thin; enum validation, non-empty string checks, and common coercion should live there.
Blind spots worth flagging
No multi-session concurrent-access test.Existing websocket coverage is mostly sequential; add asyncio.gather across two sessions with read and write tools. ✅ Closed in Add PR4 characterization tests #304.
Debugger timer leak on screenshot success. Pending request timers are erased from the dict but still run to natural deadline. [P2]
CLI finder cache never invalidates. Installing a CLI mid-session can leave the dock reporting it missing until restart. [P2]
Legacy/simple log buffers coexist with structured log ring utilities. Clarify ownership and which logger new plugin code should use. [P2]
ci-check-gdscript greps import logs for parse errors. Prefer explicit Godot script exit status so wording changes cannot hide parse failures. [P1]
setup-dev.ps1 skips the uv presence check that the bash setup has. [P2]
Dispatcher exceptions do not reach the agent-readable log buffer. Malformed-result handling improved in Harden lifecycle reliability (#297, PR 3) #300, but real GDScript backtraces still stay in the console. Improve structured diagnostics. [P2]
What's healthy (do not refactor casually)
Tool/handler split is earning its keep: tools are thin adapters and handlers own editor/plugin logic.
Client configurator descriptors are data-only across all supported clients; keep that constraint.
Test runner guardrails (zero-assertion detection, suite isolation, resilient discovery) match the repo guidance.
Tool-catalog drift detection is useful and should stay.
Runtime/game-side boundary integrity is healthy; shipped game helper code avoids plugin imports.
pyproject.toml hygiene is good.
Pre-refactor test checklist
The reliability fixes (#4, #9, #10, plus the lifecycle hardening in #6/#7/#11) should ship independently. The structural refactors (#1, #2, #5, #13) need characterization tests first:
Multi-session concurrent tool calls — asyncio.gather across two registered sessions exercising both read and write tools. Landed in Add PR4 characterization tests #304 at both the WebSocket layer (test_websocket.py) and the MCP-tool layer (test_mcp_tools.py).
Connection lifecycle round-trip — disconnect → reconnect → handshake → first command, with pause/resume interleaved. Wire-level sequence in test_websocket.py::test_disconnect_reconnect_handshake_then_first_command; pause-depth interleaving (handler-held pauses must survive _clear_on_disconnect) in test_connection.gd::test_clear_on_disconnect_preserves_pause_depth + test_pause_resume_balances_across_repeated_reconnect_cycles. Landed in Add PR4 characterization tests #304.
Deferred-response paths — happy path and missing-reply timeout path. Plugin-side request-id threading + auto-send suppression in test_dispatcher.gd; server-side timeout drops the pending entry and ignores late replies in test_websocket.py. Landed in Add PR4 characterization tests #304.
Self-update success path in CI — Lower-level non-interactive success path (manifest acceptance + new-file install) in test_update_reload_runner.gd, landed in Add PR4 characterization tests #304. Full interactive smoke remains at script/local-self-update-smoke and is the gate for any PR that touches self-update / plugin reload handoff / install-extract logic.
Config write failure path — failed replacement must preserve the original user MCP config. Covered in Add update/config data-loss safeguards (#297, PR 2) #299 by test_clients.gd::test_atomic_write_preserves_destination_when_swap_fails (real failed swap, destination contents survive) plus the structural pins in tests/unit/test_audit_data_loss_safeguards.py that prevent the dangerous remove-then-rename pattern from coming back.
Summary
Fresh-eyes architecture audit of the Godot AI plugin stack: Python server, FastMCP tools/handlers, and the GDScript editor plugin. The issue is now both diagnosis and execution roadmap so the work can be split into PRs cleanly.
Audit branch:
claude/audit-plugin-architecture-XYtQY(no code changes, clean tree).Severity legend: P0 = correctness bug or data loss, P1 = latent reliability hazard or silent failure mode, P2 = structural / maintainability.
Progress log
betabranch created frommainon origin.beta). Finding Add Phase 2 Batch 7: script, resource, filesystem tools + editor_quit #4 / P0 closed.beta). Findings Add node_rename, complex node_set_property values, and script_patch #9, scene_open can crash editor via re-entrant _set_main_scene_state when selection targets a freed Object #10 / P0 closed. Includes the inner-restore-from-backup follow-up that surfacesFAILED_MIXEDwhen_install_zip_file's local restore fails.beta). Findings Add multi-angle coverage and camera control for editor_screenshot #6, Simplify batch_handler #7, List parameters arrive as JSON strings when called from Claude Code MCP client #11 / P1 closed; also added dispatcher malformed-handler diagnostics and stabilized the Camera2D current-state headless flake tracked in flake: Camera2D current undo test still fails after #296 #301.beta). Multi-session concurrent read+write (asyncio.gather), disconnect → reconnect → handshake → first command, deferred-response request-id threading + missing-reply timeout, ProjectHandler pause restoration on validation error, McpConnection pause-depth survives_clear_on_disconnect, lower-levelupdate_reload_runnersuccess path. Config-write-failure preservation left intact in Add update/config data-loss safeguards (#297, PR 2) #299's coverage (test_atomic_write_preserves_destination_when_swap_failsplus the structural pins intest_audit_data_loss_safeguards.py); full interactive self-update smoke remains atscript/local-self-update-smoke.plugin.gdextraction (Add MCP resources and pagination #1, partial). Landed in Extract McpPortResolver + McpServerLifecycleManager from plugin.gd (#297, PR 5) #307 (targetsbeta).McpPortResolver+McpServerLifecycleManagerextracted; plugin.gd shrinks 666 LOC.ServerVersionCheckextraction deferred to PR 6 because it's tangled with the boolean-flag thicket PR 6 rewrites. Carry-over concerns documented in the PR 5 status comment — they all roll into PR 6.McpServerLifecycleManager._host._fieldplumbing via resolution (a). Landed in PR 6: state-model cleanup + ServerVersionCheck extraction (#297) #308 (targetsbeta).McpServerState+McpClientRefreshState+McpStartupPath+McpAdoptionLabelintroduced;McpSpawnStatedeleted; lifecycle state moved intoMcpServerLifecycleManager;McpServerVersionCheckextracted with_connectionreleased on disarm. Carry-over concerns documented in the PR 6 status comment — they roll into PR 7.UpdateManagerextraction (Add MCP resources and pagination #1 update seam). Landed in Extract McpUpdateManager from mcp_dock.gd (#297, PR 7) #310 (targetsbeta).McpUpdateManager(utils/update_manager.gd) owns Releases poll, ZIP download, install orchestration, and the install-in-flight gate; dock keeps banner UI only and consults the gate via_is_self_update_in_progress(). PR 6 carry-overs addressed:AWAITING_VERSIONdeleted (slot 2 reserved for wire-compat),STOPPING → INCOMPATIBLEandSTOPPING → UNINITIALIZEDadded tocan_transition()so the recovery rollback paths stop bypassingtransition_state. Status comment: PR 7 status.beta). Chose Path B: deleted the unusedRuntimeProtocol, retyped handlers/meta-tool dispatch toDirectRuntime, updated contributor docs, and added a structural guard so the Protocol/doc references cannot creep back without a real injection seam. Status comment: PR 8 status.beta).dispatch_manage_op's nested coercion is now annotation-aware: JSON-shaped strings decode only when the target handler param is annotated list/dict-like, string-typed params keep the literal value (a group named"[a,b]"is no longer mangled), and malformed JSON or wrong-container shapes raise structuredINVALID_PARAMSwith{tool, op, param, expected, actual|json_error}data. Handler signature + type-hints are cached per-handler viafunctools.cache. Finding Make MCP tool surface tool-search friendly #8 / P1 closed.beta). Extends the PR [codex] Adapt #309 self-update parse fix for beta #312 parse-hazard recipe to four more depth-1 infrastructure scripts inplugin.gd's const-preload graph:connection.gd,dispatcher.gd,mcp_dock.gd,client_configurator.gd. Each gains script-localconst X := preload("res://...")aliases for constructor / constant / static-method / enum references; targeted top-level fields against pluginMcp*classes are untyped (relying on_init/setupparameter type fences).tests/unit/test_plugin_self_update_safety.py::TARGETED_LOAD_SURFACE_FILESis extended to lock the contract for the four new files. Out of scope per the PR 9 roadmap comment: handler bodies, debugger plugin, dispatcher log helpers, and the depth-2+ const-preload graph — deferred to PR 11+.beta). The fourmcp.add_middleware(...)calls insrc/godot_ai/server.pynow sit under a consolidated rationale block explaining FastMCP'sreversed(self.middleware)chain composition (first-added is outermost on response) and per-layer position reasoning.tests/unit/test_server_middleware_order.pylocks the four-class order via runtime introspection ofmcp.middleware, using identity-set membership against the importedgodot_ai.middlewareclasses (a rename hard-fails at import time, no string-prefix dependency). Closing docs sweep also reconciled middleware-list drift acrossCLAUDE.md,.claude/skills/godot-ai/skill.md,docs/plugin-architecture.md, anddocs/implementation-plan.md(three docs were listing only three of four middlewares —PreserveGodotCommandErrorDatawas missing). Finding Prepare PyPI packaging: metadata, --version flag, release-smoke CI #14 / P2 closed.setup-dev.ps1uv-presence parity, structuredci-check-gdscriptexit), and the broader self-update preload-alias work (handler bodies + the depth-2+ const-preload graph deferred from PR 9).beta→mainpromotion is the next deliberate step the maintainer owns.animation_handler.gdsplit. Landed onbetain Forward animation_handler split onto beta (#344 → beta) #367 (squashed asaac5483). 1674-LOC handler split intoanimation_handler.gd(869) +animation_presets.gd(480) +animation_values.gd(454) following the camera/material/particle pattern, with Accept scene-absolute target_path in animation presets (closes #328) #337's scene-absolutetarget_pathsupport adapted onto the new files. Tracked separately under Split animation_handler.gd along the four-domain seam (#297 finding #13) #342. (The first attempt landed onmainin Split animation_handler.gd along the four-domain seam (#342) #344 against the umbrella's branching policy — reverted via Revert "Split animation_handler.gd along the four-domain seam (#344)" #368 once the mistake was caught, so the split now lives only onbetaand reachesmainvia the eventual promotion. Polish forvalidate_animation'sbroken_tracks[].node_pathshape on subpath tracks landed in Fix validate_animation subpath track parsing (rfind → find) #371 — Copilot review surfaced thatrfind(":")could leak the property suffix into the diagnostic; the fix splits on the first colon and adds a regression test asserting the bare node name surfaces for missing-target subpath tracks likeTarget:modulate:a.)Branching policy
betabranch from currentmainbefore the first implementation PR in this stack.betaby default, notmain.mainreceives one final promotion PR frombetaafter the stack is validated.mainonly when explicitly called out as hotfixes.Base: betauntil the final promotion PR.Execution matrix
betafrommainmainbetabetabetabetaplugin.gdextractionbetaServerVersionCheck+ resolve_host._fieldplumbingbetaUpdateManagerextraction + PR 6 carry-overs (AWAITING_VERSION, recovery-rollback transitions)betabetabetaINVALID_PARAMSunit testsbetatest_plugin_self_update_safety.py+local-self-update-smokebetaanimation_handler.gdsplitbetatest_animationsuite + smokemainin #344, reverted in #368, re-landed onbetacorrectly.validate_animationsubpath-track diagnostic shapebetabroken_tracks[0].node_pathfor missing-target subpath tracksRecommended PR split
betafrom currentmainand use it as the base for this stack.McpScenePath.from_nodeancestry validation. [Add Phase 2 Batch 7: script, resource, filesystem tools + editor_quit #4]pause_processingdepth counter, deferred-response timeout,SessionRegistrylock, and dispatcher exception logging if it stays small. [Add multi-angle coverage and camera control for editor_screenshot #6, Simplify batch_handler #7, List parameters arrive as JSON strings when called from Claude Code MCP client #11, blind spot]PortResolverandServerLifecycleManagerfromplugin.gd. [Add MCP resources and pagination #1]ServerStateandClientRefreshStatemodels; fold dock worker lifecycle cleanup into this area if practical. [Support end-to-end reload workflow + codex configurator #2, Add Phase 2 scene + node write tools (Batches 5-6) #3]UpdateManagerso dock UI stops knowing plugin lifecycle internals. [Add MCP resources and pagination #1]Decide the Runtime Protocol's fate: inject a runtime seam at tool registration or delete the unused Protocol. [Add Phase 3 Batch 1: readiness gating + signal/autoload/input_map tools #5]✅ Deleted the unused Protocol in PR 8: Delete unused Runtime Protocol #313.Compatibility / migration notes
UpdateManager.betabranch is the integration target for this stack; do not retarget individual audit PRs tomainunless they are explicitly promoted as hotfixes.Top structural issues
1. [P2]
✅ Update slice fixed in #310; plugin.gd extractions completeplugin.gd(2250) andmcp_dock.gd(2450) are god classes with no clear seamsplugin.gdmixed EditorPlugin lifecycle, server spawn/adopt/recovery, port resolution, OS-specific PID scraping, version verification, handler registration, and game-helper autoload — 103 methods, roughly 5 responsibilities. Natural extractions:ServerLifecycleManager— landed in Extract McpPortResolver + McpServerLifecycleManager from plugin.gd (#297, PR 5) #307 asMcpServerLifecycleManager. Ownsstart_server,stop_server,recover_strong_port_occupant,respawn_with_refresh,adopt_compatible_server,prepare_for_update_reload. PR 6 (PR 6: state-model cleanup + ServerVersionCheck extraction (#297) #308) moved 14 lifecycle fields into the manager (resolution (a)) so_host._fieldplumbing is gone.PortResolver— landed in Extract McpPortResolver + McpServerLifecycleManager from plugin.gd (#297, PR 5) #307 asMcpPortResolver. Ownsresolve_ws_port,is_port_in_use,find_pid_on_port, netstat/powershell scraping, pid-file I/O,pid_alive.ServerVersionCheck— landed in PR 6: state-model cleanup + ServerVersionCheck extraction (#297) #308 asMcpServerVersionCheck. The_arm_*,_on_*_verified, and_on_*_unverifiedstate machine moved out ofplugin.gdand runs alongsideMcpServerState.UpdateManager— landed in Extract McpUpdateManager from mcp_dock.gd (#297, PR 7) #310 asMcpUpdateManager(utils/update_manager.gd). Owns Releases poll, ZIP download, install orchestration, and the install-in-flight gate; dock keeps banner UI only.mcp_dock.gdis still ~2300 LOC after the update-slice extraction — client-row refactor and other dock-internal carve-outs remain candidates for PR 11+ but are not load-bearing for the audit findings any more.2. [P2]
Boolean-flag thicket with no state enum✅ Fixed in #308 (+ #310 cleanup)Across
plugin.gdandmcp_dock.gdthere were roughly 29 overlapping booleans/deadlines governing server, connection, and refresh state. PR 6 (#308) replaced them withMcpServerState(12 states + transition table + first-writer-wins for terminal diagnoses),McpClientRefreshState(5 states + per-state UI/spawn semantics),McpStartupPath(typed startup-trace tags), andMcpAdoptionLabel(managed/external).McpSpawnState(terminal-only string union) was deleted.PR 7 carry-overs were resolved in #310:
McpServerState.AWAITING_VERSIONdeleted. The state was reachable in the transition table but unused; integer slot2left empty so liveeditor_state.statewire values forREADY/INCOMPATIBLE/etc. don't shift.STOPPING → INCOMPATIBLEandSTOPPING → UNINITIALIZEDadded tocan_transition().recover_incompatible_server,force_restart_server, andreset_for_force_restartnow route throughtransition_state(...)instead of writing_server_statedirectly.test_stopping_recovery_rollback_transitionscovers the new transitions.3. [P1]
Dock thread management is three different strategies✅ Fixed in #308 (partial)PR 6 (#308) collapsed the seven refresh booleans into
McpClientRefreshState(IDLE,DEFERRED_FOR_FILESYSTEM,RUNNING,RUNNING_TIMED_OUT,SHUTTING_DOWN) plus pending-request flags. Per-client action threads stay separate (different lifecycle from the bulk refresh sweep, withMcpCliExecwall-clock budgets bounding them). The dock-side refresh thicket is now resolved; the broader unification of all dock worker lifecycles under one pool/watchdog is no longer load-bearing and rolls into general cleanup.4. [P0]
✅ Fixed in #298McpScenePath.from_nodesilently produces wrong paths for foreign nodesutils/scene_path.gd:10-16callsscene_root.get_path_to(node)without first checkingscene_root.is_ancestor_of(node). If a handler passes a node from an instanced sub-scene or foreign tree, the function can return a plausible-looking path like/Main//instead of an empty string. Every consumer trusts this path.User-visible impact: Agents can receive paths that resolve to the wrong node, or no node, and the failure looks like ordinary downstream tool behavior instead of an invalid-node bug.
Repro:
Fix is small (
is_ancestor_ofguard plus root special case); the important part is regression coverage.5. [P2]
✅ Fixed in #313RuntimeProtocol is vestigialPR 8 (#313) chose Path B.
runtime/interface.pywas deleted, shared handlers and_meta_tool.py::dispatch_manage_opnow type againstDirectRuntime, andtests/unit/test_tool_domains.py::test_runtime_protocol_is_not_reintroduced_without_injection_seamguards against reintroducing the pretend seam or stale runtime-boundary docs. The production MCP surface is unchanged: tools/resources still constructDirectRuntime.from_context(...), per-callsession_idpinning remains at that boundary, and resources still use the active session.Medium-impact issues
6. [P1]
✅ Fixed in #300connection.gdlifecycle gapspause_processingwas a bare bool, not a depth counter. Nested pauses could prematurely resume processing during save/play re-entrancy windows.send_text()even though screenshots can be multi-MB.7. [P1]
Deferred-response sentinel has no timeout✅ Fixed in #300dispatcher.gdnow tracks pending deferred responses, emits structuredDEFERRED_TIMEOUTerrors when a handler never replies, and exposes deferred cleanup for connection lifecycle reset.8. [P1]
✅ Fixed in #314_meta_tool.pysilently coerces JSON-shaped stringsPR 10 (#314) made the nested coercion in
dispatch_manage_opannotation-aware._coerce_stringified_json_valuesnow walks each handler's resolved type hints (cached per-handler viafunctools.cache) and only JSON-decodes a string-shaped value when the target param is annotated list/dict-like. String-typed params keep the literal value; malformed JSON or wrong-container shapes raise structuredINVALID_PARAMSerrors carrying{tool, op, param, expected, actual|json_error}. HandlerTypeErrorfor missing/extra args is still wrapped intoINVALID_PARAMSat the dispatch boundary, so schema failures surface as clean validation errors instead of opaque internals.9. [P0]
Update-reload runner has no rollback for partial extracts✅ Fixed in #299update_reload_runner.gd:181-196: if file 3 of 5 fails to write,_install_zip_pathsreturnsfalsebut already-written files remain. The runner can re-enable the old plugin against a half-new addon directory.User-visible impact: A failed self-update can leave mixed plugin files from vN and vN+1, producing sporadic runtime errors and misleading version reporting.
Repro:
Fix shape: pre-flight writeability before starting, or collect
paths_writtenand unwind on failure. If rollback fails, do not re-enable the plugin as if the install were safe.10. [P0]
Atomic write fallback can drop the user's MCP config on Windows✅ Fixed in #299clients/_atomic_write.gd:29-34: when the first rename fails, the code removes the destination and retries. If the second rename fails, the original is gone and the temp is cleaned up.User-visible impact: A Windows user can lose the entire MCP config file for Claude Desktop, Cursor, Cline, etc. while the dock is adding/removing only the Godot AI entry.
Repro:
Fix shape: avoid remove-then-rename as the recovery path. Use copy-then-verify or a three-file rotation where
.backupis retained until the new file is verified readable.11. [P1]
✅ Fixed in #300SessionRegistryis not concurrency-safesessions/registry.pynow serializes registry mutations and waiter coordination behind anasyncio.Lock, andwait_for_session()performs a same-lock re-check to avoid missing a just-registered session.12. [P2] Plugin error codes pass through as opaque strings
godot_client/client.py:86forwardserror.codeas-is. The GDScript side has numericerror_codes.gd, and Python hasprotocol/errors.py, but no contract test enforces parity. Add a translation/contract test so drift fails in CI.13. [P2]
✅ Fixed in #367 onanimation_handler.gdmixes four domainsbeta(tracked under #342)The 1674-LOC handler is now split along the four-domain seam following the camera/material/particle
*_handler.gd+*_presets.gd+*_values.gdpattern:animation_handler.gd(869 LOC, write ops + undo helpers + resolvers + thin proxies into the submodules),animation_presets.gd(480 LOC,preset_fade/preset_slide/preset_shake/preset_pulse+ the target classifier and direction-offset helper), andanimation_values.gd(454 LOC,list_animations/get_animation/validate_animation+ shared keyframe value coercion + transition parsing + serialization, plus a newplayer_root_nodehelper that DRYs up the root-node fallback that was open-coded in three places). Submodules use aWeakRefback-pointer so the handler can own them strongly via_presets/_valueswithout forming a RefCounted cycle.preset_*and read methods are thin proxies on the handler so plugin.gd dispatcher entries andtest_animation.gd's_handler.method(...)call sites required no edits.The first attempt landed on
mainin #344 against this umbrella's branching policy and was reverted via #368; the split now lives only onbetaand reachesmainvia the eventual promotion. Polish forvalidate_animation'sbroken_tracks[].node_pathshape on subpath tracks landed in #371 (the originalrfind(":")would surfaceTarget:modulateinstead ofTargetin the diagnostic for missing-target subpath tracks likeTarget:modulate:a; first-colon split fixes that).14. [P2]
Middleware order in✅ Fixed in #317server.pyis implicitPR 11 (#317) added a consolidated rationale block above the four
mcp.add_middleware(...)calls inserver.pydocumenting FastMCP'sreversed(self.middleware)chain composition and per-layer position reasoning, plustests/unit/test_server_middleware_order.pylocking the four-class order via runtime introspection ofmcp.middleware(identity-set membership against the importedgodot_ai.middlewareclasses — a rename hard-fails at import time). The closing docs sweep also reconciled middleware-list drift across three docs that were listing only three of four middlewares.Handler-side patterns worth absorbing
These are not all worth standalone issues, but they are good cleanup candidates once the reliability stack is stable:
get_property_list()validation loops should move into shared property validation.*_values.gdhelpers.particle_handler.gdand likelymaterial_handler.gd; convert mutations to undoable properties or document the constraint explicitly.res://or do many uncachedget_node_or_nullcalls._param_validators.gdis too thin; enum validation, non-empty string checks, and common coercion should live there.Blind spots worth flagging
Existing websocket coverage is mostly sequential; add✅ Closed in Add PR4 characterization tests #304.asyncio.gatheracross two sessions with read and write tools.ci-check-gdscriptgreps import logs forparse errors. Prefer explicit Godot script exit status so wording changes cannot hide parse failures. [P1]setup-dev.ps1skips theuvpresence check that the bash setup has. [P2]What's healthy (do not refactor casually)
pyproject.tomlhygiene is good.Pre-refactor test checklist
The reliability fixes (#4, #9, #10, plus the lifecycle hardening in #6/#7/#11) should ship independently. The structural refactors (#1, #2, #5, #13) need characterization tests first:
asyncio.gatheracross two registered sessions exercising both read and write tools. Landed in Add PR4 characterization tests #304 at both the WebSocket layer (test_websocket.py) and the MCP-tool layer (test_mcp_tools.py).test_websocket.py::test_disconnect_reconnect_handshake_then_first_command; pause-depth interleaving (handler-held pauses must survive_clear_on_disconnect) intest_connection.gd::test_clear_on_disconnect_preserves_pause_depth+test_pause_resume_balances_across_repeated_reconnect_cycles. Landed in Add PR4 characterization tests #304.test_dispatcher.gd; server-side timeout drops the pending entry and ignores late replies intest_websocket.py. Landed in Add PR4 characterization tests #304.test_update_reload_runner.gd, landed in Add PR4 characterization tests #304. Full interactive smoke remains atscript/local-self-update-smokeand is the gate for any PR that touches self-update / plugin reload handoff / install-extract logic.test_clients.gd::test_atomic_write_preserves_destination_when_swap_fails(real failed swap, destination contents survive) plus the structural pins intests/unit/test_audit_data_loss_safeguards.pythat prevent the dangerous remove-then-rename pattern from coming back.https://claude.ai/code/session_01MuWm51niNYBFK5TDdyPUVr