Skip to content

axsaucedo/agent_data_duckdb

Repository files navigation

agent_data — DuckDB Extension for AI Agent Session Data

A DuckDB extension written in Rust for querying, analysing and inspecting AI coding agents history. Read conversations, plans, todos, history, and usage stats directly from your local agent data directories.

Supported agents: Claude Code (~/.claude), Claude Desktop ("Cowork", ~/Library/Application Support/Claude), GitHub Copilot CLI (~/.copilot) and Gemini CLI (~/.gemini).

OpenAI Codex Coming Soon™.

Written in 🦀 Rust.

Quickstart

Load Extension

INSTALL agent_data FROM community;
LOAD agent_data;

Run Query

SELECT c.*, h.display
FROM read_conversations(path='~/.claude') c
JOIN read_history(path='~/.claude') h
ON c.session_id = h.session_id;

Overview

All functions read from the default agent directory (~/.claude for Claude Code, ~/.copilot for Copilot) when no path is provided.

The provider is auto-detected from the directory structure.

-- How many conversations have I had with Claude?
SELECT COUNT(DISTINCT session_id) AS sessions,
       COUNT(*) AS total_messages
FROM read_conversations();

-- What did I work on this week?
SELECT date, message_count, tool_call_count
FROM read_stats()
ORDER BY date DESC
LIMIT 7;

-- Which tools does github copilot use most?
SELECT tool_name, COUNT(*) AS uses
FROM read_conversations('~/.copilot')
WHERE tool_name IS NOT NULL
GROUP BY tool_name
ORDER BY uses DESC
LIMIT 10;

-- What are my active todos in my custom claude path?
SELECT content, status
FROM read_todos('~/work_folder/.claude')
WHERE status != 'completed'
ORDER BY item_index;

-- Which tools does Gemini CLI use most?
SELECT tool_name, COUNT(*) AS uses
FROM read_conversations('~/.gemini')
WHERE message_type = 'tool_call'
GROUP BY tool_name
ORDER BY uses DESC
LIMIT 10;

-- Compare activity across Claude, Copilot, and Gemini
SELECT source, COUNT(DISTINCT session_id) AS sessions, COUNT(*) AS messages
FROM (
    SELECT * FROM read_conversations(path='~/.claude')
    UNION ALL
    SELECT * FROM read_conversations(path='~/.copilot')
    UNION ALL
    SELECT * FROM read_conversations(path='~/.gemini')
)
GROUP BY source;

Default Behavior

When called without arguments, each function reads from its provider's default path:

Function Default path Detected as
read_conversations() ~/.claude Claude Code
read_plans() ~/.claude Claude Code
read_todos() ~/.claude Claude Code
read_history() ~/.claude Claude Code
read_stats() ~/.claude Claude Code

To read Claude Desktop, Copilot, or Gemini data, pass the path explicitly:

FROM read_conversations(path='~/Library/Application Support/Claude');  -- detected as Claude Desktop
FROM read_conversations(path='~/.copilot');  -- detected as Copilot
FROM read_conversations(path='~/.gemini');  -- detected as Gemini CLI

Available Functions

All functions accept two optional parameters:

  • path — data directory path (default: ~/.claude). Auto-detected from folder structure (local-agent-mode-sessions/ → Claude Desktop, projects/ → Claude, session-state/ → Copilot, tmp/ + installation_id → Gemini CLI).
  • source — explicit provider override: 'claude', 'claude-desktop', 'copilot', or 'gemini'. Use when auto-detection fails or for non-standard directory layouts.

Every table includes a source column ('claude', 'claude-desktop', 'copilot', or 'gemini') as the first column.

read_conversations([path (opt)], [source (opt)])

Reads conversation/event data.

  • Claude: JSONL files from projects/<project>/<session>.jsonl (including nested sub-agent transcripts at projects/<project>/<session>/subagents/agent-*.jsonl)
  • Claude Desktop: JSONL files from local-agent-mode-sessions/**/.claude/projects/<project>/<session>.jsonl (same schema as Claude Code)
  • Copilot: JSONL events from session-state/<uuid>/events.jsonl
  • Gemini: JSON chat checkpoints from tmp/<project-hash>/chats/session-<ts>-<id>.json (one file = one session; each tool call is also emitted as a tool_call row)
Column Type Description
source VARCHAR 'claude', 'claude-desktop', 'copilot', or 'gemini'
session_id VARCHAR Session UUID
project_path VARCHAR Project/working directory path
project_dir VARCHAR Raw encoded directory name (Claude only)
file_name VARCHAR Source filename
is_agent BOOLEAN Sub-agent conversation (Claude only)
line_number BIGINT Line number within file (1-based)
message_type VARCHAR See message type mappings below
uuid VARCHAR Message/event UUID
parent_uuid VARCHAR Parent message/event UUID
timestamp VARCHAR ISO 8601 timestamp
message_role VARCHAR user, assistant, tool, or NULL
message_content VARCHAR Text content
model VARCHAR AI model used
tool_name VARCHAR Tool called
tool_use_id VARCHAR Tool use/call identifier
tool_input VARCHAR Tool input as JSON string
input_tokens BIGINT Input token count (Claude/Gemini per-message, Copilot truncation)
output_tokens BIGINT Output token count
cache_creation_tokens BIGINT Cache creation tokens (Claude only)
cache_read_tokens BIGINT Cache read tokens (Claude; Gemini cached)
slug VARCHAR Session slug (Claude only)
git_branch VARCHAR Git branch
cwd VARCHAR Working directory
version VARCHAR Agent CLI version
stop_reason VARCHAR Stop reason (Claude only)
repository VARCHAR GitHub repository (Copilot only)

Message type mappings:

Claude Copilot Gemini Description
user user user User message
assistant assistant assistant (from gemini) Assistant response
system System prompt
summary Conversation summary
reasoning Assistant reasoning
turn_start / turn_end Assistant turn boundaries
tool_start / tool_result tool_call Tool execution events
session_start / session_resume Session lifecycle
session_info / session_error info / error Session info/errors
truncation / model_change Context management
compaction_start / compaction_complete Context compaction
abort User cancellation

Gemini tool calls: each assistant (gemini) turn may embed multiple toolCalls. The first is surfaced inline on the assistant row (tool_name / tool_use_id / tool_input), and every call additionally gets its own tool_call row whose parent_uuid links back to the assistant message and whose message_content holds the call status (success / error / cancelled).

read_plans([path], [source])

Reads plan files.

  • Claude: plans/*.md
  • Copilot: session-state/<uuid>/plan.md
Column Type Description
source VARCHAR 'claude', 'claude-desktop', or 'copilot'
session_id VARCHAR Parent session UUID (Copilot only, NULL for Claude)
plan_name VARCHAR Plan name (filename stem or workspace summary)
file_name VARCHAR Full filename
file_path VARCHAR Absolute file path
content VARCHAR Full markdown content
file_size BIGINT File size in bytes

read_todos([path], [source])

Reads todo/checklist items.

  • Claude: todos/<session>-agent-<agent>.json
  • Copilot: Checkpoint markdown checklists from session-state/<uuid>/checkpoints/*.md
Column Type Description
source VARCHAR 'claude', 'claude-desktop', or 'copilot'
session_id VARCHAR Parent session UUID
agent_id VARCHAR Agent UUID (Claude only, NULL for Copilot)
file_name VARCHAR Source filename
item_index BIGINT 0-based index (-1 for parse errors)
content VARCHAR Todo item text
status VARCHAR pending, in_progress, completed, or _parse_error
active_form VARCHAR Active form description (Claude only)

read_history([path], [source])

Reads command history.

  • Claude: history.jsonl (structured JSONL)
  • Copilot: command-history-state.json (simple string array)
Column Type Description
source VARCHAR 'claude', 'claude-desktop', or 'copilot'
line_number BIGINT Line/entry number (1-based)
timestamp_ms BIGINT Unix timestamp in ms (Claude only)
project VARCHAR Project path (Claude only)
session_id VARCHAR Session UUID (Claude only)
display VARCHAR Command/prompt text
pasted_contents VARCHAR Pasted content as JSON (Claude only)

read_stats([path], [source])

Reads daily activity stats. Currently Claude only — returns empty for Copilot, Claude Desktop, and Gemini.

Column Type Description
source VARCHAR 'claude'
date VARCHAR Date (YYYY-MM-DD)
message_count BIGINT Messages sent that day
session_count BIGINT Sessions started that day
tool_call_count BIGINT Tool calls made that day

Provider Detection

The extension auto-detects the data source by examining the directory structure:

  • Claude Desktop: contains local-agent-mode-sessions/ directory
  • Claude: contains projects/ directory
  • Copilot: contains session-state/ directory
  • Gemini CLI: contains a tmp/ directory plus an installation_id file
  • Unknown: returns empty results (or use source parameter to force)
-- Auto-detect
FROM read_conversations(path='~/.claude');   -- detected as Claude
FROM read_conversations(path='~/Library/Application Support/Claude');  -- detected as Claude Desktop
FROM read_conversations(path='~/.copilot');  -- detected as Copilot
FROM read_conversations(path='~/.gemini');   -- detected as Gemini CLI

-- Override detection
FROM read_conversations(path='custom/dir', source='gemini');

Join Keys

Tables can be joined within the same source:

-- Conversations ↔ History (via session_id)
SELECT c.*, h.display
FROM read_conversations(path='~/.claude') c
JOIN read_history(path='~/.claude') h ON c.session_id = h.session_id;

-- Cross-source: always filter by source
SELECT * FROM (
    SELECT * FROM read_conversations(path='~/.claude')
    UNION ALL
    SELECT * FROM read_conversations(path='~/.copilot')
) WHERE source = 'copilot';
Join Left Key Right Key Notes
conversations ↔ history session_id session_id Same source only
conversations ↔ todos session_id session_id Same source only
conversations ↔ plans slug plan_name Claude only
conversations ↔ history project_path project Claude only

Parse Error Policy

When a JSONL line or JSON file cannot be parsed, the extension emits a row with:

  • message_type = '_parse_error' (conversations)
  • status = '_parse_error' (todos)
  • display = 'Parse error: ...' (history)

Filter them with WHERE message_type != '_parse_error'.

Examples

Terminal TUI (Agent Chronicle)

Keyboard-driven terminal explorer with session browser, overview dashboard, and SQL editor:

cd examples/tui
uv sync
uv run python -m agent_chronicle

See examples/tui/README.md for details.

Marimo Notebook

Interactive notebook for exploring agent data:

cd examples/marimo
uv sync
marimo edit explore.py

Streamlit Explorer

Multi-page web application with session browser and SQL query interface:

cd examples/explorer
uv sync
streamlit run app.py

See examples/explorer/README.md for details.

Testing

# Build and run all SQLLogicTest assertions
make test

231 pinned assertions across 15 test files covering row counts, column validation, cross-source queries, join invariants, edge cases, and parse error handling.

Building from Source

# First time: configure build environment
make configure

# Build debug extension
make debug

# Run tests
make test

The compiled extension is at build/debug/agent_data.duckdb_extension (or build/release/ for make release).

# Load directly from a local build
duckdb -unsigned -c "LOAD 'build/debug/agent_data.duckdb_extension'; FROM read_conversations();"

Community Extension Release Checklist

When DuckDB publishes a new stable release, keep the community package aligned with the latest stable target:

  1. Check whether the repo is still aligned with DuckDB latest stable:

    python3 scripts/update_duckdb_release.py --check
  2. If a new stable release exists, update all local release surfaces:

    python3 scripts/update_duckdb_release.py --apply

    This updates duckdb-release.toml, Cargo.toml, Cargo.lock, Makefile, .github/workflows/MainDistributionPipeline.yml, and example DuckDB Python constraints.

  3. Run the full local validation entrypoint:

    scripts/validate_duckdb_release.sh

    This runs Rust/build/test checks and an exact-version DuckDB Python smoke test that loads build/debug/agent_data.duckdb_extension.

  4. Update duckdb/community-extensions after the source commit is validated:

    scripts/prepare_community_extension_pr.py --open-pr

    Keep upstream repo.ref immutable. A commit SHA is preferred; an immutable tag is also acceptable when maintainers intentionally release by tag. Do not use main as the stable ref because community builds should be reproducible.

  5. After the upstream community-extension workflow deploys from a trusted context, verify publication:

    uv run --with duckdb==$(python3 - <<'PY'

import tomllib print(tomllib.load(open("duckdb-release.toml", "rb"))["duckdb"]["python_version"]) PY ) python scripts/verify_community_publication.py


Pull-request deploys may build artifacts without publishing public binaries
when upstream secrets are unavailable.

The scheduled `DuckDB Release Monitor` workflow checks for stable-release drift
and opens a PR when the updater and validation pass. The `DuckDB Next
Compatibility` workflow can be run on schedule or manually to test DuckDB
`main`/next without publishing binaries. For DuckDB feature-freeze branches,
use upstream `repo.ref_next` for pre-release validation, then update `repo.ref`
after the stable source commit is merged.

## License

MIT — see [LICENSE](LICENSE).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors