Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions docs/ai/design/2026-06-02-feature-cli-startup-performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
---
phase: design
title: System Design & Architecture
description: Define the technical architecture, components, and data models
feature: cli-startup-performance
---

# System Design & Architecture

## Architecture Overview

```mermaid
graph TD
User[User or CI] --> Bin[dist/cli.js bin entry]
Bin --> Bootstrap[Lightweight CLI Bootstrap]
Bootstrap --> Manifest[Command Metadata Manifest]
Manifest --> Commander[Commander Parser]
Commander --> Help[Help and Version Output]
Commander --> Dispatch[Lazy Action Dispatcher]
Dispatch --> Init[init/phase/setup/lint/install handlers]
Dispatch --> Memory[memory handlers]
Dispatch --> Skill[skill handlers]
Dispatch --> Agent[agent handlers and TUI]
Dispatch --> Channel[channel handlers and bridge]
Dispatch --> Docs[docs handlers]
Bench[Benchmark Script] --> Bin
CI[CI Gate] --> Bench
```

The optimized CLI should separate cheap command metadata from expensive command execution code.

The approved architecture is a two-step optimization:

1. Implement a lightweight static command metadata layer plus lazy action dispatcher first. This keeps TypeScript source maintainable and removes eager command-handler imports from startup/help paths.
2. Run the benchmark after the lazy metadata/dispatcher refactor. If p50 remains above `50 ms`, add generated or bundled `dist` optimization using only existing repo tooling and without changing package manifests.

Key components:

- **Lightweight CLI bootstrap**: The published entrypoint that loads only Commander, version metadata, command metadata, and dispatch glue.
- **Command metadata manifest**: Static data for command names, descriptions, arguments, and options. This enables help/version output without importing handlers.
- **Lazy action dispatcher**: Imports the actual command module only when the selected action executes.
- **Command handler modules**: Existing command implementations, refactored only as needed to avoid top-level imports that are not needed by the selected subcommand.
- **Benchmark script**: Local and CI entrypoint for startup/help timing and representative command smoke checks.

## Data Models

### Command Metadata

```typescript
interface CliCommandDefinition {
name: string;
description: string;
arguments?: CliArgumentDefinition[];
options?: CliOptionDefinition[];
subcommands?: CliCommandDefinition[];
action?: LazyActionDefinition;
}

interface LazyActionDefinition {
module: string;
exportName: string;
}
```

The exact shape can be simpler if hand-written registration helpers are clearer. The design requirement is that help-visible command metadata is available without importing heavy handler modules.

### Benchmark Result

```typescript
interface BenchmarkCaseResult {
label: string;
command: string[];
iterations: number;
minMs: number;
p50Ms: number;
p95Ms: number;
maxMs: number;
avgMs: number;
failures: number;
}
```

## API Design

No public CLI API changes are allowed. Internal APIs may be introduced:

- `registerCommandMetadata(program, definitions)` to build Commander commands from metadata.
- `lazyAction(modulePath, exportName)` to wrap `.action(...)` with dynamic import and error handling.
- `runCliBenchmark(cases, options)` to execute benchmark cases with repeated child processes.

Existing command modules should continue exposing testable handler functions where practical.

## Component Breakdown

### CLI Bootstrap

- Owns `program.name`, package version loading, root command metadata, and `program.parse`.
- Must not import heavy command modules at top level.
- Must not import `ink`, `react`, `inquirer`, `telegraf`, `@ai-devkit/agent-manager`, `@ai-devkit/memory`, or channel bridge code unless the chosen command requires them.

### Command Registration

- Keeps help text equivalent to current help output.
- Registers command actions through lazy dispatch wrappers.
- May be hand-written first to reduce risk; generated metadata is allowed if it improves maintainability.

### Command Handlers

- Existing command behavior remains source of truth.
- Heavy subcommand-specific dependencies should move into the action path when feasible. Example: `agent console` should be the path that loads Ink/React, not `agent --help`.
- Shared utility imports are acceptable only when they are lightweight enough for the target.

### Build Output

- The build may produce generated or bundled `dist` artifacts.
- Source maps or clear generated-file provenance must exist if output becomes hard to inspect.
- `packages/cli/package.json` `bin` behavior must remain install-compatible.

### Benchmarking

- Benchmark direct built CLI execution after `npm run build`.
- Use at least 20 iterations per startup/help command.
- Record p50 and p95; p50 is the enforcement metric for `<50 ms`.
- Use temporary directories/config for memory benchmark cases.

## Design Decisions

### 1. Optimize Current Node CLI First

Rust is intentionally out of scope. Measurements show most overhead comes from eager imports and CLI bootstrap shape, not local CPU-heavy work. The fastest low-risk path is to remove unnecessary Node module loading.

### 2. Preserve CLI Semantics

Performance work must be behavior-preserving. Any command output, option parsing, or exit-code change is a regression unless explicitly approved later.

### 3. Allow Bootstrap/Build Restructuring

The `<50 ms` target is aggressive. Dynamic imports alone may not be enough with native ESM file fanout, so the design allows a lightweight bootstrap, generated metadata, or bundled artifacts without adding dependencies.

Chosen path: do not start with bundling. Start with static metadata plus lazy dispatch because it is easier to review and preserves source/debug clarity. Treat bundling or generated `dist` output as a measured second step only if the first step does not meet the target.

### 4. No New Dependencies

The implementation must use existing repo tooling or plain Node scripts. If bundling is required, use tooling already available through the current lockfile without changing manifests, or implement a non-bundled fallback.

## Alternatives Considered

- **Action-only dynamic imports**: Simple and likely helpful, but may not hit `<50 ms` if command metadata still imports large modules.
- **Static command metadata plus lazy handlers**: Chosen first step. Better startup characteristics while keeping source maintainable; requires keeping metadata and handler behavior aligned.
- **Bundled bootstrap or CLI**: Conditional second step. Can reduce ESM file-load overhead; requires careful handling of dynamic imports, source maps, shebang, templates, and daemon entrypoints.
- **Rust rewrite**: Best native startup potential, but too much scope for this feature and does not directly address command compatibility risk.

## Non-Functional Requirements

- Startup/help benchmark p50 `<50 ms` for required commands.
- Lightweight command RSS should drop materially from current `~100 MB+` import paths; exact memory threshold is secondary to startup target.
- CI benchmark must avoid single-run flakiness through repeated sampling.
- The implementation must remain portable on supported Node versions and existing npm workspace tooling.
- No additional secrets, credentials, or network services are needed for tests.
123 changes: 123 additions & 0 deletions docs/ai/implementation/2026-06-02-feature-cli-startup-performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
phase: implementation
title: Implementation Guide
description: Technical implementation notes, patterns, and code guidelines
feature: cli-startup-performance
---

# Implementation Guide

## Development Setup

- Work in branch/worktree `feature-cli-startup-performance`.
- Run `npm ci` in the feature worktree before phase work.
- Run `npm run build` before benchmark commands so measurements use `packages/cli/dist/cli.js`.

## Code Structure

Expected touch points:

- `packages/cli/src/cli.ts`: root bootstrap and command registration.
- `packages/cli/src/commands/*.ts`: command metadata/action split and lazy handler imports.
- `packages/cli/src/__tests__/commands/*.test.ts`: command behavior regression tests.
- `e2e/`: built CLI smoke coverage if bootstrap/build changes affect published behavior.
- CI workflow files if benchmark gate is added there.

Current implementation deltas:

- `packages/cli/src/util/cli-benchmark.ts`: local startup benchmark utility and executable built script entrypoint.
- `packages/cli/src/__tests__/util/cli-benchmark.test.ts`: TDD coverage for timing stats, failure accounting, required benchmark case list, and built-script root resolution.
- `packages/cli/src/cli-command-manifest.ts`: shared lightweight top-level command manifest used by static help and lazy dispatch.
- `packages/cli/src/__tests__/util/cli-command-manifest.test.ts`: coverage proving the manifest drives root help, command help, and dispatch paths.
- `packages/cli/src/cli-runtime.ts`: lightweight static help rendering and lazy top-level command execution.
- `packages/cli/src/__tests__/util/cli-runtime.test.ts`: TDD coverage for lightweight help/version, dispatch mapping, and lazy command registration.
- `packages/cli/src/cli.ts`: thin bootstrap that handles static help/version and delegates real commands to the lazy dispatcher.
- `packages/cli/package.json`: `benchmark:startup` script running `node dist/util/cli-benchmark.js`.
- `.github/workflows/ci.yml`: CI benchmark step after build.

## Phase 6 Implementation Check

Alignment with the design:

- The root entrypoint now imports only package metadata plus lightweight bootstrap/dispatch helpers before command selection.
- Root `--version`, root `--help`, and top-level command `--help` paths are served from static metadata and do not load the previous heavy command graph.
- Real command execution imports only the selected top-level command module before Commander parsing.
- Unknown command routing uses a lightweight Commander program populated from the shared manifest, preserving the existing unknown-command error without eager command-module imports.
- The startup benchmark runs locally and in CI after build, with the `<50 ms` p50 gate enforced for version/help paths.

Deviations and follow-ups:

- Static help metadata duplicates command names/descriptions and selected option metadata. This is the main drift risk versus Commander-generated help and should be reviewed when commands change.
- Lazy loading is currently at the top-level command group boundary. Heavy subcommand-specific dependencies inside groups such as `agent` and `channel` can be split further later, but this was not required to meet the startup/help target.
- Representative real commands are smoke-measured in the benchmark table, but CI does not enforce a `10%` real-command regression threshold because there is no stored baseline in this implementation.

## Phase 8 Code Review Notes

- Reviewed the lightweight help metadata against real command registration. Fixed two public help parity gaps found during review: option-bearing command help now includes command-specific flags for `init`, `setup`, `lint`, and `install`; `channel --help` now includes `stop [name]`.
- Refactored the runtime to expose `registerSelectedCommand` for direct branch coverage, while keeping `runSelectedCommand` as the CLI entrypoint dispatch API.
- Verified exported helper usage with `rg`: new APIs are referenced only by `cli.ts`, tests, benchmark script entrypoint, and feature docs.
- No new runtime dependencies, config keys, migrations, or irreversible state changes were introduced.

## Simplification Pass

- Consolidated top-level command metadata into `cli-command-manifest.ts`, so adding or changing a top-level command has one lightweight metadata entry used by both help rendering and dispatch resolution.
- Removed the source `cli-full.ts` eager fallback. Unknown command handling now builds a lightweight Commander shell from the manifest, avoiding a second full command graph.
- Consolidated `cli-bootstrap.ts` and `cli-dispatch.ts` into `cli-runtime.ts`, keeping one runtime module for help rendering and lazy command execution.
- Updated the CLI entrypoint to fast-path `--version` before importing runtime code.
- Added manifest tests to guard against future drift between root help, command help, and dispatch.

## Implementation Notes

### Core Features

- Keep root bootstrap lightweight. Avoid top-level imports of heavy command modules.
- Keep help-visible command metadata available without importing handler dependencies.
- Import command handlers dynamically only when a command action actually runs.
- Implement static command metadata plus lazy dispatch as the first optimization step.
- Introduce generated or bundled `dist` output only after benchmarking proves the first step misses the `<50 ms` target.
- If generated or bundled output is introduced, keep the source architecture explicit and testable.

### Patterns & Best Practices

- Prefer small registration helpers over broad abstractions unless generated metadata becomes necessary.
- Preserve existing `withErrorHandler` behavior around async command actions.
- Keep command handler functions exported for direct unit testing.
- Do not introduce new dependencies.

## Integration Points

- `packages/cli/package.json` `bin.ai-devkit` must remain compatible.
- `packages/cli` build must continue copying `templates` into `dist/templates`.
- `channel-daemon` launch logic must still resolve dev and built paths correctly.
- Existing package imports from `@ai-devkit/agent-manager`, `@ai-devkit/memory`, and `@ai-devkit/channel-connector` should move behind lazy boundaries where possible.

## Error Handling

- Lazy import failures should surface as command failures with the same error handling conventions as existing commands.
- Benchmark failures should print the failing command, p50, threshold, iterations, and failed process count.
- Generated build failures should fail `npm run build` clearly.

## Performance Considerations

- Optimize for fresh process startup, not long-lived process warm paths.
- Avoid unnecessary JSON/config/file reads before command selection.
- Avoid loading TUI/React/Ink unless `agent console` runs.
- Avoid loading memory database code unless a memory action runs.
- Avoid loading Telegram/channel bridge code unless channel actions requiring them run.

Benchmark foundation evidence:

- `npm test -w packages/cli -- src/__tests__/util/cli-benchmark.test.ts` passed with 4 tests.
- `npm test -w packages/cli -- src/__tests__/util/cli-runtime.test.ts src/__tests__/util/cli-command-manifest.test.ts src/__tests__/util/cli-benchmark.test.ts` passed with 18 tests after the final simplification pass.
- `npm run build` passed for all 4 projects.
- `AI_DEVKIT_CLI_BENCHMARK_ITERATIONS=1 npm run benchmark:startup -w packages/cli` executed all 15 configured cases with `0` failures. This smoke run captures current unoptimized startup timings around `325-680 ms`, confirming the benchmark exposes the baseline regression target.
- After lightweight bootstrap, `npm run benchmark:startup -w packages/cli` with 20 iterations produced `0` failures. Startup/help p50 values were `24.070-25.226 ms`; `--version` p50 was `25.080 ms`.
- After top-level lazy dispatch and CI gate wiring, `npm run benchmark:startup -w packages/cli` with 20 iterations exited `0`. Startup/help p50 values were `29.391-33.132 ms`; real command smoke p50 values were `75.028 ms` for `lint`, `239.437 ms` for `agent-list-json`, and `153.793 ms` for `memory-search`.
- After the simplification pass, `npm run benchmark:startup -w packages/cli` with 20 iterations exited `0`. Startup/help p50 values were `24.085-25.149 ms`; real command smoke p50 values were `70.889 ms` for `lint`, `227.256 ms` for `agent-list-json`, and `149.253 ms` for `memory-search`.
- After moving runtime modules next to `cli.ts`, `npm run benchmark:startup -w packages/cli` with 20 iterations exited `0`. Startup/help p50 values were `24.290-26.318 ms`; real command smoke p50 values were `71.420 ms` for `lint`, `255.848 ms` for `agent-list-json`, and `149.797 ms` for `memory-search`.

## Security Notes

- Benchmark scripts must not read or print secrets.
- Channel and memory smoke tests should use temporary or project-isolated config paths.
- No Telegram tokens, tmux sessions, or external network calls should be required in CI.
Loading
Loading