Skip to content

Architecture audit: god classes, vestigial Runtime protocol, silent path bug, and other structural debt #297

Description

@dsarno

Summary

Fresh-eyes architecture audit of the Godot AI plugin stack: Python server, FastMCP tools/handlers, and the GDScript editor plugin. The issue is now both diagnosis and execution roadmap so the work can be split into PRs cleanly.

Audit branch: claude/audit-plugin-architecture-XYtQY (no code changes, clean tree).

Severity legend: P0 = correctness bug or data loss, P1 = latent reliability hazard or silent failure mode, P2 = structural / maintainability.

Progress log

Branching policy

  • Create a new beta branch from current main before the first implementation PR in this stack.
  • All audit follow-up PRs should target beta by default, not main.
  • main receives one final promotion PR from beta after the stack is validated.
  • Urgent hotfixes may target main only when explicitly called out as hotfixes.
  • PR descriptions should state Base: beta until the final promotion PR.

Execution matrix

PR Findings Priority Blast radius Effort Base Test gate Status
0 Create beta from main Setup Repo workflow S main Branch exists on origin ✅ done
1 Scene path ancestry guard #4 / P0 High correctness S beta GDScript scene path regression ✅ done (#298)
2 Update/config data-loss safeguards #9, #10 / P0 High data loss M beta Partial extract rollback + atomic write fallback tests ✅ done (#299)
3 Lifecycle reliability #6, #7, #11 plus dispatcher exception logging P1 Medium M beta Pause depth, deferred timeout, registry concurrency tests
4 Characterization tests Pre-refactor checklist P1 Medium M beta Multi-session, reconnect, deferred, self-update success paths
5 plugin.gd extraction #1 (partial) P2 High refactor L beta Existing plugin tests + self-update smoke
6 State model cleanup + ServerVersionCheck + resolve _host._field plumbing #2, feeds #3, finishes #1 P2/P1 High behavior risk L beta Server/dock refresh lifecycle tests + state-machine transition tests
7 UpdateManager extraction + PR 6 carry-overs (AWAITING_VERSION, recovery-rollback transitions) #1 update seam P2 Medium/high M beta Self-update success + rollback tests, state-machine transition tests
8 Runtime Protocol decision #5 P2 Medium API/test seam M beta Runtime/tool boundary unit tests + live Godot CI
10 Narrow meta-tool JSON coercion #8 / P1 Low (Python middleware) S beta Annotation-aware coercion + structured INVALID_PARAMS unit tests ✅ done (#314)
9 Self-update preload-alias extension (next ring) #242 / #244 hardening P1 Low (4 plugin scripts) S beta Extended test_plugin_self_update_safety.py + local-self-update-smoke
11 Middleware order doc + test (closing PR) #14 / P2 Low (1 server.py rationale block + 1 lock test) S beta Runtime-introspection middleware-order lock ✅ done (#317) — audit batch closed
Follow-up: animation_handler.gd split #13 / P2 Plugin-only file split M beta test_animation suite + smoke ✅ done (#367, tracked under #342). Originally merged to main in #344, reverted in #368, re-landed on beta correctly.
Follow-up: validate_animation subpath-track diagnostic shape Copilot review nit on #367 P2 One file + one regression test XS beta New regression test asserts broken_tracks[0].node_path for missing-target subpath tracks
Deferred outside audit #12, handler patterns, blind spots, broader preload-alias graph P1/P2 Variable Variable follow-up tickets Per-ticket

Recommended PR split

  1. Create beta from current main and use it as the base for this stack.
  2. Fix McpScenePath.from_node ancestry validation. [Add Phase 2 Batch 7: script, resource, filesystem tools + editor_quit #4]
  3. Fix update-reload partial-extract rollback and atomic-write Windows fallback. [Add node_rename, complex node_set_property values, and script_patch #9, scene_open can crash editor via re-entrant _set_main_scene_state when selection targets a freed Object #10]
  4. Harden lifecycle paths: pause_processing depth counter, deferred-response timeout, SessionRegistry lock, and dispatcher exception logging if it stays small. [Add multi-angle coverage and camera control for editor_screenshot #6, Simplify batch_handler #7, List parameters arrive as JSON strings when called from Claude Code MCP client #11, blind spot]
  5. Add characterization tests before major refactors: multi-session concurrent tool calls, reconnect/pause round-trip, deferred-response missing-reply path, and self-update success path in CI.
  6. Extract PortResolver and ServerLifecycleManager from plugin.gd. [Add MCP resources and pagination #1]
  7. Replace boolean thicket with explicit ServerState and ClientRefreshState models; fold dock worker lifecycle cleanup into this area if practical. [Support end-to-end reload workflow + codex configurator #2, Add Phase 2 scene + node write tools (Batches 5-6) #3]
  8. Extract UpdateManager so dock UI stops knowing plugin lifecycle internals. [Add MCP resources and pagination #1]
  9. Decide the Runtime Protocol's fate: inject a runtime seam at tool registration or delete the unused Protocol. [Add Phase 3 Batch 1: readiness gating + signal/autoload/input_map tools #5] ✅ Deleted the unused Protocol in PR 8: Delete unused Runtime Protocol #313.
  10. Land remaining focused cleanups as bandwidth allows.

Compatibility / migration notes

  • State-machine changes are the highest regression risk because they touch server spawn/adoption, version verification, reconnect behavior, and dock refresh timing. Keep these behind characterization tests and preserve current user-visible states unless a PR explicitly changes them.
  • Runtime Protocol cleanup in PR 8: Delete unused Runtime Protocol #313 deliberately chose deletion over injection because the seam had no production consumer. Future runtime alternatives should add a real registration-time injection seam instead of restoring the Protocol alone.
  • Update extraction must preserve installed-plugin reload behavior, including the rescue/update runner path. Treat self-update smoke coverage as mandatory before moving ownership into UpdateManager.
  • The beta branch is the integration target for this stack; do not retarget individual audit PRs to main unless they are explicitly promoted as hotfixes.

Top structural issues

1. [P2] plugin.gd (2250) and mcp_dock.gd (2450) are god classes with no clear seams ✅ Update slice fixed in #310; plugin.gd extractions complete

plugin.gd mixed EditorPlugin lifecycle, server spawn/adopt/recovery, port resolution, OS-specific PID scraping, version verification, handler registration, and game-helper autoload — 103 methods, roughly 5 responsibilities. Natural extractions:

mcp_dock.gd is still ~2300 LOC after the update-slice extraction — client-row refactor and other dock-internal carve-outs remain candidates for PR 11+ but are not load-bearing for the audit findings any more.

2. [P2] Boolean-flag thicket with no state enum ✅ Fixed in #308 (+ #310 cleanup)

Across plugin.gd and mcp_dock.gd there were roughly 29 overlapping booleans/deadlines governing server, connection, and refresh state. PR 6 (#308) replaced them with McpServerState (12 states + transition table + first-writer-wins for terminal diagnoses), McpClientRefreshState (5 states + per-state UI/spawn semantics), McpStartupPath (typed startup-trace tags), and McpAdoptionLabel (managed/external). McpSpawnState (terminal-only string union) was deleted.

PR 7 carry-overs were resolved in #310:

  • McpServerState.AWAITING_VERSION deleted. The state was reachable in the transition table but unused; integer slot 2 left empty so live editor_state.state wire values for READY/INCOMPATIBLE/etc. don't shift.
  • STOPPING → INCOMPATIBLE and STOPPING → UNINITIALIZED added to can_transition(). recover_incompatible_server, force_restart_server, and reset_for_force_restart now route through transition_state(...) instead of writing _server_state directly. test_stopping_recovery_rollback_transitions covers the new transitions.

3. [P1] Dock thread management is three different strategies ✅ Fixed in #308 (partial)

PR 6 (#308) collapsed the seven refresh booleans into McpClientRefreshState (IDLE, DEFERRED_FOR_FILESYSTEM, RUNNING, RUNNING_TIMED_OUT, SHUTTING_DOWN) plus pending-request flags. Per-client action threads stay separate (different lifecycle from the bulk refresh sweep, with McpCliExec wall-clock budgets bounding them). The dock-side refresh thicket is now resolved; the broader unification of all dock worker lifecycles under one pool/watchdog is no longer load-bearing and rolls into general cleanup.

4. [P0] McpScenePath.from_node silently produces wrong paths for foreign nodes ✅ Fixed in #298

utils/scene_path.gd:10-16 calls scene_root.get_path_to(node) without first checking scene_root.is_ancestor_of(node). If a handler passes a node from an instanced sub-scene or foreign tree, the function can return a plausible-looking path like /Main// instead of an empty string. Every consumer trusts this path.

User-visible impact: Agents can receive paths that resolve to the wrong node, or no node, and the failure looks like ordinary downstream tool behavior instead of an invalid-node bug.

Repro:

# In a test scene with /Main as scene_root
var foreign := Node.new()
EditorInterface.get_edited_scene_root().get_parent().add_child(foreign)

var path := McpScenePath.from_node(foreign, EditorInterface.get_edited_scene_root())

# Expected: ""  (or explicit error sentinel)
# Actual:   "/Main/"

Fix is small (is_ancestor_of guard plus root special case); the important part is regression coverage.

5. [P2] Runtime Protocol is vestigial ✅ Fixed in #313

PR 8 (#313) chose Path B. runtime/interface.py was deleted, shared handlers and _meta_tool.py::dispatch_manage_op now type against DirectRuntime, and tests/unit/test_tool_domains.py::test_runtime_protocol_is_not_reintroduced_without_injection_seam guards against reintroducing the pretend seam or stale runtime-boundary docs. The production MCP surface is unchanged: tools/resources still construct DirectRuntime.from_context(...), per-call session_id pinning remains at that boundary, and resources still use the active session.


Medium-impact issues

6. [P1] connection.gd lifecycle gaps ✅ Fixed in #300

  • pause_processing was a bare bool, not a depth counter. Nested pauses could prematurely resume processing during save/play re-entrancy windows.
  • Outbound sends did not check backpressure before send_text() even though screenshots can be multi-MB.
  • State-change tracking could mark scene/play/readiness events as sent even when outbound backpressure prevented delivery.
  • Socket lifecycle reset now clears pending deferred responses.

7. [P1] Deferred-response sentinel has no timeout ✅ Fixed in #300

dispatcher.gd now tracks pending deferred responses, emits structured DEFERRED_TIMEOUT errors when a handler never replies, and exposes deferred cleanup for connection lifecycle reset.

8. [P1] _meta_tool.py silently coerces JSON-shaped strings ✅ Fixed in #314

PR 10 (#314) made the nested coercion in dispatch_manage_op annotation-aware. _coerce_stringified_json_values now walks each handler's resolved type hints (cached per-handler via functools.cache) and only JSON-decodes a string-shaped value when the target param is annotated list/dict-like. String-typed params keep the literal value; malformed JSON or wrong-container shapes raise structured INVALID_PARAMS errors carrying {tool, op, param, expected, actual|json_error}. Handler TypeError for missing/extra args is still wrapped into INVALID_PARAMS at the dispatch boundary, so schema failures surface as clean validation errors instead of opaque internals.

9. [P0] Update-reload runner has no rollback for partial extracts ✅ Fixed in #299

update_reload_runner.gd:181-196: if file 3 of 5 fails to write, _install_zip_paths returns false but already-written files remain. The runner can re-enable the old plugin against a half-new addon directory.

User-visible impact: A failed self-update can leave mixed plugin files from vN and vN+1, producing sporadic runtime errors and misleading version reporting.

Repro:

1. Stage a v(N+1) ZIP with 5 files.
2. Trigger update via dock.
3. Mid-install, force a write failure on file 3 (chmod -w on target, or fill disk
   to fail a later write). _install_zip_paths returns false.
4. Inspect addons/godot_ai/: some files are vN+1 and some are vN.
5. Editor re-enables the plugin against this mixed directory.

Fix shape: pre-flight writeability before starting, or collect paths_written and unwind on failure. If rollback fails, do not re-enable the plugin as if the install were safe.

10. [P0] Atomic write fallback can drop the user's MCP config on Windows ✅ Fixed in #299

clients/_atomic_write.gd:29-34: when the first rename fails, the code removes the destination and retries. If the second rename fails, the original is gone and the temp is cleaned up.

User-visible impact: A Windows user can lose the entire MCP config file for Claude Desktop, Cursor, Cline, etc. while the dock is adding/removing only the Godot AI entry.

Repro:

1. On Windows, open the target MCP config in another process to create lock/AV timing pressure.
2. Click Configure for any client in the dock.
3. First rename fails.
4. _atomic_write calls DirAccess.remove_absolute(path).
5. Second rename fails because the lock or antivirus timing changes.
6. Original config is gone; backup is not restored.

Fix shape: avoid remove-then-rename as the recovery path. Use copy-then-verify or a three-file rotation where .backup is retained until the new file is verified readable.

11. [P1] SessionRegistry is not concurrency-safe ✅ Fixed in #300

sessions/registry.py now serializes registry mutations and waiter coordination behind an asyncio.Lock, and wait_for_session() performs a same-lock re-check to avoid missing a just-registered session.

12. [P2] Plugin error codes pass through as opaque strings

godot_client/client.py:86 forwards error.code as-is. The GDScript side has numeric error_codes.gd, and Python has protocol/errors.py, but no contract test enforces parity. Add a translation/contract test so drift fails in CI.

13. [P2] animation_handler.gd mixes four domains ✅ Fixed in #367 on beta (tracked under #342)

The 1674-LOC handler is now split along the four-domain seam following the camera/material/particle *_handler.gd + *_presets.gd + *_values.gd pattern: animation_handler.gd (869 LOC, write ops + undo helpers + resolvers + thin proxies into the submodules), animation_presets.gd (480 LOC, preset_fade / preset_slide / preset_shake / preset_pulse + the target classifier and direction-offset helper), and animation_values.gd (454 LOC, list_animations / get_animation / validate_animation + shared keyframe value coercion + transition parsing + serialization, plus a new player_root_node helper that DRYs up the root-node fallback that was open-coded in three places). Submodules use a WeakRef back-pointer so the handler can own them strongly via _presets/_values without forming a RefCounted cycle. preset_* and read methods are thin proxies on the handler so plugin.gd dispatcher entries and test_animation.gd's _handler.method(...) call sites required no edits.

The first attempt landed on main in #344 against this umbrella's branching policy and was reverted via #368; the split now lives only on beta and reaches main via the eventual promotion. Polish for validate_animation's broken_tracks[].node_path shape on subpath tracks landed in #371 (the original rfind(":") would surface Target:modulate instead of Target in the diagnostic for missing-target subpath tracks like Target:modulate:a; first-colon split fixes that).

14. [P2] Middleware order in server.py is implicit ✅ Fixed in #317

PR 11 (#317) added a consolidated rationale block above the four mcp.add_middleware(...) calls in server.py documenting FastMCP's reversed(self.middleware) chain composition and per-layer position reasoning, plus tests/unit/test_server_middleware_order.py locking the four-class order via runtime introspection of mcp.middleware (identity-set membership against the imported godot_ai.middleware classes — a rename hard-fails at import time). The closing docs sweep also reconciled middleware-list drift across three docs that were listing only three of four middlewares.


Handler-side patterns worth absorbing

These are not all worth standalone issues, but they are good cleanup candidates once the reliability stack is stable:

  • Repeated "resolve node or return error" boilerplate across 6+ handlers should become one helper.
  • Repeated get_property_list() validation loops should move into shared property validation.
  • Per-handler value coercion should expose one consistent interface instead of scattered *_values.gd helpers.
  • Inline sub-resource mutation outside undo actions is a P1 risk in particle_handler.gd and likely material_handler.gd; convert mutations to undoable properties or document the constraint explicitly.
  • Read paths need frame-budget awareness where they recursively scan res:// or do many uncached get_node_or_null calls.
  • _param_validators.gd is too thin; enum validation, non-empty string checks, and common coercion should live there.

Blind spots worth flagging

  • No multi-session concurrent-access test. Existing websocket coverage is mostly sequential; add asyncio.gather across two sessions with read and write tools. ✅ Closed in Add PR4 characterization tests #304.
  • Debugger timer leak on screenshot success. Pending request timers are erased from the dict but still run to natural deadline. [P2]
  • CLI finder cache never invalidates. Installing a CLI mid-session can leave the dock reporting it missing until restart. [P2]
  • Legacy/simple log buffers coexist with structured log ring utilities. Clarify ownership and which logger new plugin code should use. [P2]
  • ci-check-gdscript greps import logs for parse errors. Prefer explicit Godot script exit status so wording changes cannot hide parse failures. [P1]
  • setup-dev.ps1 skips the uv presence check that the bash setup has. [P2]
  • Dispatcher exceptions do not reach the agent-readable log buffer. Malformed-result handling improved in Harden lifecycle reliability (#297, PR 3) #300, but real GDScript backtraces still stay in the console. Improve structured diagnostics. [P2]

What's healthy (do not refactor casually)

  • Tool/handler split is earning its keep: tools are thin adapters and handlers own editor/plugin logic.
  • Client configurator descriptors are data-only across all supported clients; keep that constraint.
  • Test runner guardrails (zero-assertion detection, suite isolation, resilient discovery) match the repo guidance.
  • Tool-catalog drift detection is useful and should stay.
  • Runtime/game-side boundary integrity is healthy; shipped game helper code avoids plugin imports.
  • pyproject.toml hygiene is good.

Pre-refactor test checklist

The reliability fixes (#4, #9, #10, plus the lifecycle hardening in #6/#7/#11) should ship independently. The structural refactors (#1, #2, #5, #13) need characterization tests first:

  • Multi-session concurrent tool callsasyncio.gather across two registered sessions exercising both read and write tools. Landed in Add PR4 characterization tests #304 at both the WebSocket layer (test_websocket.py) and the MCP-tool layer (test_mcp_tools.py).
  • Connection lifecycle round-trip — disconnect → reconnect → handshake → first command, with pause/resume interleaved. Wire-level sequence in test_websocket.py::test_disconnect_reconnect_handshake_then_first_command; pause-depth interleaving (handler-held pauses must survive _clear_on_disconnect) in test_connection.gd::test_clear_on_disconnect_preserves_pause_depth + test_pause_resume_balances_across_repeated_reconnect_cycles. Landed in Add PR4 characterization tests #304.
  • Deferred-response paths — happy path and missing-reply timeout path. Plugin-side request-id threading + auto-send suppression in test_dispatcher.gd; server-side timeout drops the pending entry and ignores late replies in test_websocket.py. Landed in Add PR4 characterization tests #304.
  • Self-update success path in CI — Lower-level non-interactive success path (manifest acceptance + new-file install) in test_update_reload_runner.gd, landed in Add PR4 characterization tests #304. Full interactive smoke remains at script/local-self-update-smoke and is the gate for any PR that touches self-update / plugin reload handoff / install-extract logic.
  • Config write failure path — failed replacement must preserve the original user MCP config. Covered in Add update/config data-loss safeguards (#297, PR 2) #299 by test_clients.gd::test_atomic_write_preserves_destination_when_swap_fails (real failed swap, destination contents survive) plus the structural pins in tests/unit/test_audit_data_loss_safeguards.py that prevent the dangerous remove-then-rename pattern from coming back.

https://claude.ai/code/session_01MuWm51niNYBFK5TDdyPUVr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions