agent_data — DuckDB Extension for AI Agent Session Data

A DuckDB extension written in Rust for querying, analysing and inspecting AI coding agents history. Read conversations, plans, todos, history, and usage stats directly from your local agent data directories.

Supported agents: Claude Code (~/.claude), Claude Desktop ("Cowork", ~/Library/Application Support/Claude), GitHub Copilot CLI (~/.copilot) and Gemini CLI (~/.gemini).

OpenAI Codex Coming Soon™.

Written in 🦀 Rust.

Quickstart

Load Extension

INSTALL agent_data FROM community;
LOAD agent_data;

Run Query

SELECT c.*, h.display
FROM read_conversations(path='~/.claude') c
JOIN read_history(path='~/.claude') h
ON c.session_id = h.session_id;

Or try the Streamlit Example

Or the Terminal TUI

Overview

All functions read from the default agent directory (~/.claude for Claude Code, ~/.copilot for Copilot) when no path is provided.

The provider is auto-detected from the directory structure.

-- How many conversations have I had with Claude?
SELECT COUNT(DISTINCT session_id) AS sessions,
       COUNT(*) AS total_messages
FROM read_conversations();

-- What did I work on this week?
SELECT date, message_count, tool_call_count
FROM read_stats()
ORDER BY date DESC
LIMIT 7;

-- Which tools does github copilot use most?
SELECT tool_name, COUNT(*) AS uses
FROM read_conversations('~/.copilot')
WHERE tool_name IS NOT NULL
GROUP BY tool_name
ORDER BY uses DESC
LIMIT 10;

-- What are my active todos in my custom claude path?
SELECT content, status
FROM read_todos('~/work_folder/.claude')
WHERE status != 'completed'
ORDER BY item_index;

-- Which tools does Gemini CLI use most?
SELECT tool_name, COUNT(*) AS uses
FROM read_conversations('~/.gemini')
WHERE message_type = 'tool_call'
GROUP BY tool_name
ORDER BY uses DESC
LIMIT 10;

-- Compare activity across Claude, Copilot, and Gemini
SELECT source, COUNT(DISTINCT session_id) AS sessions, COUNT(*) AS messages
FROM (
    SELECT * FROM read_conversations(path='~/.claude')
    UNION ALL
    SELECT * FROM read_conversations(path='~/.copilot')
    UNION ALL
    SELECT * FROM read_conversations(path='~/.gemini')
)
GROUP BY source;

Default Behavior

When called without arguments, each function reads from its provider's default path:

Function	Default path	Detected as
`read_conversations()`	`~/.claude`	Claude Code
`read_plans()`	`~/.claude`	Claude Code
`read_todos()`	`~/.claude`	Claude Code
`read_history()`	`~/.claude`	Claude Code
`read_stats()`	`~/.claude`	Claude Code

To read Claude Desktop, Copilot, or Gemini data, pass the path explicitly:

FROM read_conversations(path='~/Library/Application Support/Claude');  -- detected as Claude Desktop
FROM read_conversations(path='~/.copilot');  -- detected as Copilot
FROM read_conversations(path='~/.gemini');  -- detected as Gemini CLI

Available Functions

All functions accept two optional parameters:

path — data directory path (default: ~/.claude). Auto-detected from folder structure (local-agent-mode-sessions/ → Claude Desktop, projects/ → Claude, session-state/ → Copilot, tmp/ + installation_id → Gemini CLI).
source — explicit provider override: 'claude', 'claude-desktop', 'copilot', or 'gemini'. Use when auto-detection fails or for non-standard directory layouts.

Every table includes a source column ('claude', 'claude-desktop', 'copilot', or 'gemini') as the first column.

`read_conversations([path (opt)], [source (opt)])`

Reads conversation/event data.

Claude: JSONL files from projects/<project>/<session>.jsonl (including nested sub-agent transcripts at projects/<project>/<session>/subagents/agent-*.jsonl)
Claude Desktop: JSONL files from local-agent-mode-sessions/**/.claude/projects/<project>/<session>.jsonl (same schema as Claude Code)
Copilot: JSONL events from session-state/<uuid>/events.jsonl
Gemini: JSON chat checkpoints from tmp/<project-hash>/chats/session-<ts>-<id>.json (one file = one session; each tool call is also emitted as a tool_call row)

Column	Type	Description
`source`	VARCHAR	`'claude'`, `'claude-desktop'`, `'copilot'`, or `'gemini'`
`session_id`	VARCHAR	Session UUID
`project_path`	VARCHAR	Project/working directory path
`project_dir`	VARCHAR	Raw encoded directory name (Claude only)
`file_name`	VARCHAR	Source filename
`is_agent`	BOOLEAN	Sub-agent conversation (Claude only)
`line_number`	BIGINT	Line number within file (1-based)
`message_type`	VARCHAR	See message type mappings below
`uuid`	VARCHAR	Message/event UUID
`parent_uuid`	VARCHAR	Parent message/event UUID
`timestamp`	VARCHAR	ISO 8601 timestamp
`message_role`	VARCHAR	`user`, `assistant`, `tool`, or NULL
`message_content`	VARCHAR	Text content
`model`	VARCHAR	AI model used
`tool_name`	VARCHAR	Tool called
`tool_use_id`	VARCHAR	Tool use/call identifier
`tool_input`	VARCHAR	Tool input as JSON string
`input_tokens`	BIGINT	Input token count (Claude/Gemini per-message, Copilot truncation)
`output_tokens`	BIGINT	Output token count
`cache_creation_tokens`	BIGINT	Cache creation tokens (Claude only)
`cache_read_tokens`	BIGINT	Cache read tokens (Claude; Gemini `cached`)
`slug`	VARCHAR	Session slug (Claude only)
`git_branch`	VARCHAR	Git branch
`cwd`	VARCHAR	Working directory
`version`	VARCHAR	Agent CLI version
`stop_reason`	VARCHAR	Stop reason (Claude only)
`repository`	VARCHAR	GitHub repository (Copilot only)

Message type mappings:

Claude	Copilot	Gemini	Description
`user`	`user`	`user`	User message
`assistant`	`assistant`	`assistant` (from `gemini`)	Assistant response
`system`	—	—	System prompt
`summary`	—	—	Conversation summary
—	`reasoning`	—	Assistant reasoning
—	`turn_start` / `turn_end`	—	Assistant turn boundaries
—	`tool_start` / `tool_result`	`tool_call`	Tool execution events
—	`session_start` / `session_resume`	—	Session lifecycle
—	`session_info` / `session_error`	`info` / `error`	Session info/errors
—	`truncation` / `model_change`	—	Context management
—	`compaction_start` / `compaction_complete`	—	Context compaction
—	`abort`	—	User cancellation

Gemini tool calls: each assistant (gemini) turn may embed multiple toolCalls. The first is surfaced inline on the assistant row (tool_name / tool_use_id / tool_input), and every call additionally gets its own tool_call row whose parent_uuid links back to the assistant message and whose message_content holds the call status (success / error / cancelled).

`read_plans([path], [source])`

Reads plan files.

Claude: plans/*.md
Copilot: session-state/<uuid>/plan.md

Column	Type	Description
`source`	VARCHAR	`'claude'`, `'claude-desktop'`, or `'copilot'`
`session_id`	VARCHAR	Parent session UUID (Copilot only, NULL for Claude)
`plan_name`	VARCHAR	Plan name (filename stem or workspace summary)
`file_name`	VARCHAR	Full filename
`file_path`	VARCHAR	Absolute file path
`content`	VARCHAR	Full markdown content
`file_size`	BIGINT	File size in bytes

`read_todos([path], [source])`

Reads todo/checklist items.

Claude: todos/<session>-agent-<agent>.json
Copilot: Checkpoint markdown checklists from session-state/<uuid>/checkpoints/*.md

Column	Type	Description
`source`	VARCHAR	`'claude'`, `'claude-desktop'`, or `'copilot'`
`session_id`	VARCHAR	Parent session UUID
`agent_id`	VARCHAR	Agent UUID (Claude only, NULL for Copilot)
`file_name`	VARCHAR	Source filename
`item_index`	BIGINT	0-based index (-1 for parse errors)
`content`	VARCHAR	Todo item text
`status`	VARCHAR	`pending`, `in_progress`, `completed`, or `_parse_error`
`active_form`	VARCHAR	Active form description (Claude only)

`read_history([path], [source])`

Reads command history.

Claude: history.jsonl (structured JSONL)
Copilot: command-history-state.json (simple string array)

Column	Type	Description
`source`	VARCHAR	`'claude'`, `'claude-desktop'`, or `'copilot'`
`line_number`	BIGINT	Line/entry number (1-based)
`timestamp_ms`	BIGINT	Unix timestamp in ms (Claude only)
`project`	VARCHAR	Project path (Claude only)
`session_id`	VARCHAR	Session UUID (Claude only)
`display`	VARCHAR	Command/prompt text
`pasted_contents`	VARCHAR	Pasted content as JSON (Claude only)

`read_stats([path], [source])`

Reads daily activity stats. Currently Claude only — returns empty for Copilot, Claude Desktop, and Gemini.

Column	Type	Description
`source`	VARCHAR	`'claude'`
`date`	VARCHAR	Date (YYYY-MM-DD)
`message_count`	BIGINT	Messages sent that day
`session_count`	BIGINT	Sessions started that day
`tool_call_count`	BIGINT	Tool calls made that day

Provider Detection

The extension auto-detects the data source by examining the directory structure:

Claude Desktop: contains local-agent-mode-sessions/ directory
Claude: contains projects/ directory
Copilot: contains session-state/ directory
Gemini CLI: contains a tmp/ directory plus an installation_id file
Unknown: returns empty results (or use source parameter to force)

-- Auto-detect
FROM read_conversations(path='~/.claude');   -- detected as Claude
FROM read_conversations(path='~/Library/Application Support/Claude');  -- detected as Claude Desktop
FROM read_conversations(path='~/.copilot');  -- detected as Copilot
FROM read_conversations(path='~/.gemini');   -- detected as Gemini CLI

-- Override detection
FROM read_conversations(path='custom/dir', source='gemini');

Join Keys

Tables can be joined within the same source:

-- Conversations ↔ History (via session_id)
SELECT c.*, h.display
FROM read_conversations(path='~/.claude') c
JOIN read_history(path='~/.claude') h ON c.session_id = h.session_id;

-- Cross-source: always filter by source
SELECT * FROM (
    SELECT * FROM read_conversations(path='~/.claude')
    UNION ALL
    SELECT * FROM read_conversations(path='~/.copilot')
) WHERE source = 'copilot';

Join	Left Key	Right Key	Notes
conversations ↔ history	`session_id`	`session_id`	Same source only
conversations ↔ todos	`session_id`	`session_id`	Same source only
conversations ↔ plans	`slug`	`plan_name`	Claude only
conversations ↔ history	`project_path`	`project`	Claude only

Parse Error Policy

When a JSONL line or JSON file cannot be parsed, the extension emits a row with:

message_type = '_parse_error' (conversations)
status = '_parse_error' (todos)
display = 'Parse error: ...' (history)

Filter them with WHERE message_type != '_parse_error'.

Examples

Terminal TUI (Agent Chronicle)

Keyboard-driven terminal explorer with session browser, overview dashboard, and SQL editor:

cd examples/tui
uv sync
uv run python -m agent_chronicle

See examples/tui/README.md for details.

Marimo Notebook

Interactive notebook for exploring agent data:

cd examples/marimo
uv sync
marimo edit explore.py

Streamlit Explorer

Multi-page web application with session browser and SQL query interface:

cd examples/explorer
uv sync
streamlit run app.py

See examples/explorer/README.md for details.

Testing

# Build and run all SQLLogicTest assertions
make test

231 pinned assertions across 15 test files covering row counts, column validation, cross-source queries, join invariants, edge cases, and parse error handling.

Building from Source

# First time: configure build environment
make configure

# Build debug extension
make debug

# Run tests
make test

The compiled extension is at build/debug/agent_data.duckdb_extension (or build/release/ for make release).

# Load directly from a local build
duckdb -unsigned -c "LOAD 'build/debug/agent_data.duckdb_extension'; FROM read_conversations();"

Community Extension Release Checklist

When DuckDB publishes a new stable release, keep the community package aligned with the latest stable target:

Check whether the repo is still aligned with DuckDB latest stable:
```
python3 scripts/update_duckdb_release.py --check
```
If a new stable release exists, update all local release surfaces:
```
python3 scripts/update_duckdb_release.py --apply
```
This updates duckdb-release.toml, Cargo.toml, Cargo.lock, Makefile, .github/workflows/MainDistributionPipeline.yml, and example DuckDB Python constraints.
Run the full local validation entrypoint:
```
scripts/validate_duckdb_release.sh
```
This runs Rust/build/test checks and an exact-version DuckDB Python smoke test that loads build/debug/agent_data.duckdb_extension.
Update duckdb/community-extensions after the source commit is validated:
```
scripts/prepare_community_extension_pr.py --open-pr
```
Keep upstream repo.ref immutable. A commit SHA is preferred; an immutable tag is also acceptable when maintainers intentionally release by tag. Do not use main as the stable ref because community builds should be reproducible.
After the upstream community-extension workflow deploys from a trusted context, verify publication:
```
uv run --with duckdb==$(python3 - <<'PY'
```

import tomllib print(tomllib.load(open("duckdb-release.toml", "rb"))["duckdb"]["python_version"]) PY ) python scripts/verify_community_publication.py


Pull-request deploys may build artifacts without publishing public binaries
when upstream secrets are unavailable.

The scheduled `DuckDB Release Monitor` workflow checks for stable-release drift
and opens a PR when the updater and validation pass. The `DuckDB Next
Compatibility` workflow can be run on schedule or manually to test DuckDB
`main`/next without publishing binaries. For DuckDB feature-freeze branches,
use upstream `repo.ref_next` for pre-release validation, then update `repo.ref`
after the stable source commit is merged.

## License

MIT — see [LICENSE](LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.cargo		.cargo
.github		.github
docs		docs
examples		examples
extension-ci-tools @ fccd748		extension-ci-tools @ fccd748
scripts		scripts
src		src
test		test
.envrc		.envrc
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
duckdb-release.toml		duckdb-release.toml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent_data — DuckDB Extension for AI Agent Session Data

Quickstart

Load Extension

Run Query

Or try the Streamlit Example

Or the Terminal TUI

Overview

Default Behavior

Available Functions

`read_conversations([path (opt)], [source (opt)])`

`read_plans([path], [source])`

`read_todos([path], [source])`

`read_history([path], [source])`

`read_stats([path], [source])`

Provider Detection

Join Keys

Parse Error Policy

Examples

Terminal TUI (Agent Chronicle)

Marimo Notebook

Streamlit Explorer

Testing

Building from Source

Community Extension Release Checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent_data — DuckDB Extension for AI Agent Session Data

Quickstart

Load Extension

Run Query

Or try the Streamlit Example

Or the Terminal TUI

Overview

Default Behavior

Available Functions

read_conversations([path (opt)], [source (opt)])

read_plans([path], [source])

read_todos([path], [source])

read_history([path], [source])

read_stats([path], [source])

Provider Detection

Join Keys

Parse Error Policy

Examples

Terminal TUI (Agent Chronicle)

Marimo Notebook

Streamlit Explorer

Testing

Building from Source

Community Extension Release Checklist

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`read_conversations([path (opt)], [source (opt)])`

`read_plans([path], [source])`

`read_todos([path], [source])`

`read_history([path], [source])`

`read_stats([path], [source])`

Packages