-
Notifications
You must be signed in to change notification settings - Fork 2
lore
The history of hermes-cashew can be divided into several distinct phases, spanning from April 19, 2026 to the present day.
The repository was created on April 19, 2026 with an initial commit containing LICENSE and README.md. The first week focused on project scaffolding:
-
April 19 —
CLAUDE.mdcreated establishing the plugin architecture, dual-load-path layout, and testing conventions. -
April 19 —
pyproject.tomladded with hatchling backend and cashew-brain git+SHA dependency. Dualplugin.yamlmanifests created (root + nested). -
April 19 —
CashewMemoryProviderstub added with root re-export shim. Test infrastructure established withconftest.py, offline guards, andagent.memory_providerABC stub. -
April 19 — CI pipeline (
tests.yml) added. -
April 20 — Config schema helper module (
config.py) added. Provider lifecycle methods (initialize,shutdown,get_config_schema,save_config) implemented. -
April 20 — Sync worker lifecycle implemented: bounded
queue.Queue(maxsize=16)+ single non-daemon worker thread with sentinel shutdown pattern. -
April 20 — Tool schemas registered (
cashew_query),handle_tool_callrouting implemented,prefetch()method added with eagerContextRetrieverlifecycle.MemoryManagerstub and E2E tests created.
-
April 21 — Phase 8 schema management:
_ensure_db_schemarestructured with per-table transactions,_create_vec_embeddingshelper with guarded extension loading,_add_missing_columnswith idempotent ALTER TABLE. -
April 21 — Version v0.1.0 through v0.1.5: auto-create Cashew DB schema, fix
derivation_edges.timestampschema migration, acceptsession_idinsync_turn(), handle missingconfidencecolumn in edge migration. - April 21 — v0.2.0 released: 31-key config with env overrides and domain helpers, sqlite-vec support, PyPI dependency.
- April 21 — PyPI trusted-publishing release workflow added.
The defining moment in the project's history. The team caught themselves reimplementing large swaths of cashew-brain instead of wrapping it. The result was a 150+ line gutting of custom retrieval code in favor of delegating to upstream's retrieve_recursive_bfs(). The commit that changed everything:
- Removed
_retrieve_with_vec,_retrieve_bfs,_retrieve_keyword,_score_nodes,_apply_filters,_vec_available,_get_query_embedding. - Fixed a critical bug where
vec_embeddingsused SQLiterowidto look upthought_nodes.id(SHA hashes) — semantic search always returned 0 results. - Switched cashew-brain from git+SHA pin to
>=1.1.0,<2.0.0on PyPI. - Added
_keyword_searchSQL LIKE fallback and_migrate_vec_embeddingsauto-migration. - Phase 11 E2E tests: full lifecycle and 4-thread concurrent DB stress test.
The plugin learned to talk to LLMs. The llm_aux_role config key (set to "memory") wired upstream LLM features through Hermes' own auxiliary.<role> config convention. All 9 open issues were reconciled against the thin-adapter architecture. Open source conventions established: CONTRIBUTING.md, issue templates, PR template, DCO sign-off.
exclude_tags filtering added to all retrieval paths, allowing users to tag sensitive nodes and filter them out of query results. The final open issue from the original milestone plan was resolved.
-
v0.6.0 (published as v0.7.0) — Think cycles:
think_intervalconfig key (default 10) controlling how many sync turns pass between upstreamthink_cycle()calls. -
v0.7.0 — Sleep cycles on shutdown: upstream
run_sleep_cycle()called when provider shuts down, if LLM is wired andsleep_cyclesis True. -
v0.7.1 (May 12) — Sleep cycle moved from
shutdown()toon_session_end(), the correct lifecycle hook per Hermes memory provider docs. -
v0.7.2 (May 12) — Spec compliance: worker thread changed from
daemon=Falsetodaemon=True. README added to plugin directory. -
v0.7.3 (May 12) — sqlite-vec macOS fix: switched from
conn.load_extension("vec0")tosqlite_vec.load(conn).
The upstream run_sleep_cycle() proved too heavyweight for synchronous lifecycle hooks. With 6K+ nodes and 59K+ edges, it computed a full N×N embedding similarity matrix (36M+ comparisons) with Bron–Kerbosch clique detection — taking hours and accumulating overlapping instances. The sleep cycle was removed from both on_session_end() and shutdown(), and sleep_cycles became a no-op.
A ground-up refactored sleep cycle replaced the upstream O(N²) implementation. The new sleep_refactor.py used a nine-phase pipeline with vectorized cross-linking (numpy + batched DB writes), processing 7,100 nodes in ~4 seconds vs hours upstream. The sleep cycle was re-enabled in on_session_end(), work-capped at 2,000 nodes per cycle.
-
v0.8.1 — Embedding gap closure:
_embed_orphans()was inserting into the wrong column (model_nameinstead ofmodel), so orphaned nodes were never embedded and the gap never closed. -
v0.8.2 — Hermes
hermes_pluginssynthetic namespace fix: root__init__.pyflat-entry detection missed Hermes 0.8.8+'s new namespace convention.
cashew.json auto-generated with defaults on first load. llm_aux_role defaulted to "memory" so LLM extraction works out of the box. is_available() returns True when deps are present, even without a config file. The on_pre_compress hook added for forest-level insight extraction (topic shifts, framing changes, implicit decisions).
- Cross-source linking: pairs sharing the same
source_fileare skipped during cross-linking, reducing BFS graph noise. - Edge cap (
MAX_EDGES_PER_CYCLE = 100K): prevents runaway cross-linking on dense node batches. - Out-degree selection: sleep cycle prioritizes nodes with fewest existing edges, rebalancing the graph over time.
-
queue_prefetch: background daemon thread ingests assistant responses and pre-warms Cashew context for the next turn. - Background dream dispatch: Phase 8 (dream generation) and Phase 9 (orphan embedding) run in a daemon thread instead of blocking the caller.
/newsession latency dropped from ~118s to ~20s. - Sync queue drain removed from
on_session_end()— the sync worker is non-daemon and keeps running across session boundaries with WAL mode handling concurrent DB writes.
- GC grace period (
GC_GRACE_DAYS = 7): garbage collection now skips nodes created within the last 7 days, preventing systematic memory loss for single-reference knowledge (reported by @boriken72). - Decayed-node filter in
_keyword_search(): keyword fallback now excludes decayed nodes. - Default
CROSS_LINK_THRESHOLDraised from 0.70 to 0.78.
- Sleep cycle migrated from
on_session_end()to Hermes cron job — the plugin registers ano_agentcron job atinitialize()running on a configurable schedule (default: every 12 hours)./newreturns instantly. -
is_available()contract restored: no longer probeshermes_constants.get_hermes_home()— returnsContextRetriever is not Nonewhen_hermes_homeis unknown. -
PRAGMA busy_timeout=5000added to both SQLite connections insleep_refactor.pyto preventdatabase is lockedfailures. -
fcntl.flock(LOCK_EX | LOCK_NB)advisory lock prevents concurrent sleep cycles across Hermes processes. -
vec_embeddingsdimension resolved dynamically: replaced hardcodedfloat[384]with dynamic detection from the configured embedding model, enablingthenlper/gte-large(1024-dim) support. - Embedding model propagated to upstream: configured
embedding_modelpassed toend_session()andembed_nodes()calls. - Repo-root cruft removed (
.planning/directory andNonefile).
-
June 1 — Config fix: resolve provider
base_urlfrom_PROVIDER_BASE_URLSmap. - June 5 — Sleep cycle cron job fix: ensure it runs despite lifecycle churn.
-
June 7 — sqlite-vec made a standard (not optional) dependency. Root
__init__.pymade a proper re-export shim. Stale state self-healing on startup with interpreter-exit race handling. - June 8 — Stale test for sqlite-vec unavailability removed.
These features have been in the codebase since the earliest versions:
| Feature | Since | Description |
|---|---|---|
CashewMemoryProvider class |
v0.1.0 (April 19) | Core provider implementation — survived multiple rewrites |
Dual layout (__init__.py root re-export + nested module) |
v0.1.0 (April 19) | Flat-entry and nested loader paths |
plugin.yaml manifests (root + nested) |
v0.1.0 (April 19) | Hermes plugin registration |
| Config schema with env overrides | v0.2.0 (April 21) | 31-key config originally, now 37 keys |
ContextRetriever-based retrieval |
v0.1.0 (April 20) | Cashew's two-step retrieval API |
| Sync worker with bounded queue | v0.1.0 (April 20) | Non-blocking sync_turn() pattern |
Profile isolation via hermes_home
|
v0.1.0 (April 20) | All paths scoped under hermes_home
|
| Feature | Removed in | Replacement |
|---|---|---|
Custom _ensure_db_schema
|
v0.3.0 (May 12) | Upstream core.db.ensure_schema()
|
Custom retrieval pipeline (_retrieve_with_vec, _retrieve_bfs, _retrieve_keyword, _score_nodes, etc.) |
v0.3.0 (May 12) | Upstream retrieve_recursive_bfs()
|
confidence_threshold config key |
v0.3.0 (May 12) | Removed (upstream dropped the column) |
Upstream run_sleep_cycle() (O(N²)) |
v0.8.0 (May 14) | Refactored sleep_refactor.py
|
Sync queue drain in on_session_end()
|
v0.10.0 (May 17) | WAL mode handles concurrent writes |
Sleep cycle in on_session_end()
|
v0.10.2 (May 27) | Hermes cron job (no_agent cron) |
Hardcoded float[384] embedding dimension |
v0.10.2 (May 27) | Dynamic dimension detection |
Hardcoded all-MiniLM-L6-v2 embedding model |
v0.10.2 (May 27) | Configurable embedding_model
|
The most consequential rewrite. 150+ lines of custom retrieval code gutted and replaced with thin delegation to upstream cashew-brain. The project pivoted from building its own retrieval pipeline to being a thin Hermes-to-Cashew adapter.
A ground-up rewrite of the sleep cycle replacing upstream's O(N²) implementation (hours at 7K nodes) with a vectorized numpy-based nine-phase pipeline (4 seconds at 7K nodes). Created sleep_refactor.py as the new core module.
Dream generation and orphan embedding moved to a daemon thread, cutting /new session latency from ~118s to ~20s.
The sleep cycle was moved out of session lifecycle hooks entirely and into a Hermes cron job running every 12 hours. /new became effectively instant.
| Date | Version | Milestone | Est. codebase size |
|---|---|---|---|
| April 19 | v0.1.0 | Initial scaffolding, stub provider, tests | ~500 lines |
| April 20 | v0.1.0 | Sync worker, tool schemas, prefetch, config | ~1,500 lines |
| April 21 | v0.2.0 | Schema management, sqlite-vec, 31-key config | ~3,000 lines |
| May 12 | v0.3.0 | Upstream integration (gutting custom retrieval) | ~2,800 lines (down) |
| May 12 | v0.4.0–v0.7.4 | LLM integration, privacy, think/sleep cycles | ~4,000 lines |
| May 14 | v0.8.0 | Refactored sleep cycle (sleep_refactor.py) | ~5,500 lines |
| May 15 | v0.9.0 | First-load bootstrap, on_pre_compress | ~6,500 lines |
| May 17 | v0.10.0 | Cross-source linking, queue_prefetch, background dreams | ~8,000 lines |
| May 27 | v0.10.2 | Cron-based scheduling, dynamic embeddings | ~9,000 lines |
| June 10 | current | Ongoing refinements | 9,507 lines |
The codebase grew from ~500 lines to over 9,500 lines in just 52 days, with one significant contraction at v0.3.0 when 150+ lines of custom retrieval were removed in favor of upstream delegation. The test suite grew alongside, now accounting for ~5,700 lines (60% of the Python codebase).