Three-Tier Agent Memory Model
ADR-043 — Three-Tier Agent Memory Model
- Status: Accepted
- Date: 2026-05-07
- Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
- Supersedes: Implicit KV-only memory assumption in invariant #2 (for agent context, not gateway hot path)
- Related: ADR-040 Tracks A + D, ADR-042,
ASCEND_OPERATOR_OS_VISION.md§4 (Memory Architecture),src/lib/agent-memory.ts - Invariant changed: #2 (clarification, not relaxation — see below)
Context
Invariant #2 states “KV-only hot path on the gateway.” This was designed to prevent D1 reads from being added to the gateway request path for tool proxying. It was not designed to govern agent memory — agents did not exist when invariant #2 was written.
The Operator OS vision (§4) defines a four-tier memory model: Working, Episodic, Semantic, Procedural. The V5 gateway currently has one partial tier:
src/lib/agent-memory.tsexists and implements partial Tier 1 (working memory via KV TTL’d keys) and partial Tier 2 (episodic write to D1memory_episodes).- Migration
0013_agent_runs.sql(Track A, 2026-05-07) adds theagent_runsandmemory_episodesD1 tables. - There is no Tier 3 (semantic recall via Vectorize) or Tier 4 (procedural / bandit workflows) implementation.
Without a formal ADR, the line “KV-only” in invariant #2 creates ambiguity: does it prohibit agent memory from using D1 and Vectorize? This ADR resolves the ambiguity and formalizes the three-tier model that ships in Q1 (Tier 4 procedural is Q1-partial, completing in Track D weeks 6–12).
Decision
Adopt a three-tier agent memory model for Q1:
| Tier | Name | Storage | Scope | TTL / Retention | When accessed |
|---|---|---|---|---|---|
| 1 | Working memory | DO SQLite (turn_state, run_log tables in AgentRuntime) + KV (agent_working:{tenant}:{instance_id}:{key}, 1h TTL) | Per agent instance, ephemeral | KV: 1h. DO SQLite: cleared on DO eviction (~24h idle) | In-turn: every agent step reads/writes working state |
| 2 | Episodic memory | D1 agent_runs + memory_episodes tables (cold path) | Per tenant, persistent | 90 days, then pruned by weekly Cron | Cross-turn recall: agent loads last N episodes at turn start. Admin/eval queries. NOT in the gateway tool-proxy hot path. |
| 3 | Semantic memory | Vectorize namespace agent_memory_semantic:{tenant_id} (per-tenant) | Per tenant, persistent | Indefinite (explicit delete only) | Context assembly: called before the agent turn begins, outside the tool-proxy hot path |
Tier 4 (procedural / bandit workflows) is implemented in Q1 as a stub table (procedural_workflows in DO SQLite, reserved in Track A Phase A.1) with Track D wiring the full bandit selector in weeks 6–12.
Why this does NOT violate invariant #2
Invariant #2 governs the gateway tool-proxy hot path: the sequence that handles a POST /api/v1/tools/:tool_name or MCP tool call from Cursor/Codex/claude.ai. That path is:
KV auth lookup → KV token read → KV config lookup → outbound fetch → responseNone of the memory tiers are accessed in this path. Specifically:
- Tier 1 (DO SQLite + KV): Accessed by
AgentRuntimeDO in the agent run path (/v1/agents/:tenant/:agent_type/run). This is an admin-only surface (CF-Access gated), not the tool-proxy hot path. - Tier 2 (D1 cold path): Written via
ctx.waitUntil()(async, non-blocking) after the agent turn. Read only from admin/eval endpoints. Never in the tool-proxy hot path. - Tier 3 (Vectorize): Called in agent context assembly before the agent turn begins. The context assembly happens inside
AgentRuntime.fetch()— again, the agent run path, not the tool-proxy hot path.
The KV-only guarantee of invariant #2 applies specifically to src/handlers/, src/core/, and src/tools/ — the gateway tool-proxy path files. Agent memory lives in src/lib/agent-memory.ts, src/lib/memory-semantic.ts, and DO internals — none of which are in the tool-proxy hot path.
This ADR adds a clarifying sentence to invariant #2 (see below) to make this explicit, so future contributors do not misread “KV-only hot path” as a prohibition on memory tiers.
Invariant #2 change (clarification)
Old text:
KV-only hot path on the gateway. Request latency budgeted at ≤10 ms overhead. D1 only in cold paths (
error_ledger,kv_audit,decision_log). Context-worker D1 is cold path by definition — it’s the context worker’s own cold path.
New text (effective when this ADR merges):
KV-only hot path on the gateway tool-proxy path (
src/handlers/,src/core/,src/tools/). Request latency budgeted at ≤15 ms (non-agent) or ≤30 ms (agent assembly, per ADR-041). D1 only in cold paths:error_ledger,kv_audit,decision_log,agent_runs,memory_episodes,eval_datasets. Vectorize only in agent context assembly (outside the tool-proxy hot path). Context-worker D1 is cold path by definition.
The change: adds the agent-specific D1 tables to the allowed cold-path list, adds Vectorize with the explicit constraint that it only appears outside the tool-proxy hot path, and aligns the latency numbers with ADR-041.
Tier 1: Working memory
Storage: AgentRuntime DO SQLite (turn_state and run_log tables) + KV with TTL.
Schema (DO SQLite): Defined in src/do/agent-runtime.ts blockConcurrencyWhile initializer. Tables: turn_state, run_log, procedural_workflows (stub, Track D fills).
KV working memory: agent_working:{tenant_id}:{instance_id}:{key} → arbitrary JSON, 1h TTL. Used for lightweight cross-step state that doesn’t need SQL (e.g., “current_contact_id”, “email_draft_v2”).
Size constraint: KV values ≤25 MB (CF limit). Working memory is not intended for large payloads — ephemeral identifiers and small JSON objects only. Per-instance budget: <128KB total across all KV keys (aligns with DO 128MB memory limit context).
Implementation status: Partial — agent-runtime.ts Track A Phase A.1 implements turn_state and run_log. agent_working:* KV writes are Track A Phase A.2 (SDR Agent loop).
Tier 2: Episodic memory
Storage: D1 agent_runs + memory_episodes tables (migration 0013_agent_runs.sql).
Retention: 90 days. A weekly CF Cron job (slot: 0 3 * * 0, Sunday 03:00 UTC) prunes memory_episodes rows older than 90 days per tenant.
Writes: Via ctx.waitUntil() from AgentRuntime — always async, never blocks the agent turn response. Implemented in src/lib/agent-memory.ts (startRun, appendEpisode, completeRun, failRun).
Reads: From admin/eval surfaces only (GET /admin/agents/:tenant/runs, eval pipeline queries). Reads are parameterized D1 queries (invariant #2 D1-only-cold-path preserved).
Cross-run recall by the agent: At turn start, the agent receives the last N episodes as context. This is assembled in AgentRuntime.handleRun() via a D1 query before dispatching to runSdrAgent. The query is: SELECT * FROM memory_episodes WHERE tenant_id = ? AND agent_type = ? ORDER BY created_at DESC LIMIT 10. This is cold-path by definition (it happens inside the DO/agent run path, not the gateway tool-proxy path).
Implementation status: Tables exist (migration 0013). agent-memory.ts implements write path (startRun, completeRun, failRun). Read-back for context recall is Track A Phase A.2.
Tier 3: Semantic memory
Storage: Vectorize namespace agent_memory_semantic:{tenant_id} — one namespace per tenant.
Schema: Each vector entry represents a fact or observation the agent has learned. Metadata:
{ "fact_id": "<uuid>", "tenant_id": "<tenant>", "agent_type": "sdr", "subject": "contact:hubspot_id:123456", "predicate": "responded_positively_to", "object": "direct_roi_framing", "confidence": 0.82, "learned_from_run_id": "<run_uuid>", "created_at": 1746700000}Interface (Track D implementation): src/lib/memory-semantic.ts
recall(tenant_id, query, opts): Fact[]— Vectorize similarity query, returns top-k factslearn(tenant_id, fact)— embeds and upserts a fact viactx.waitUntil
When accessed: Context assembly before the agent turn begins, NOT in the tool-proxy hot path.
Implementation status: NOT YET IMPLEMENTED. Track D (weeks 6–12) ships this. The Vectorize namespace pattern tenant_{tenant_id} already exists in the platform (used by the Context Worker). Track D operationalizes the same pattern for agent memory. This ADR formalizes the intent and schema so Track A + Track D don’t diverge on the data model.
Migration trigger for Track D: Track A live in production + agent_runtime DO deployed + memory_episodes accumulating data for ≥1 week.
Tier 4: Procedural memory (stub in Q1, full in Track D)
Storage: procedural_workflows table in AgentRuntime DO SQLite (reserved in Track A Phase A.1).
Schema: workflow_id, agent_type, task_type, spec TEXT, alpha REAL, beta REAL — Thompson-sampling bandit over workflow specs.
Interface (Track D): src/lib/memory-procedural.ts
selectWorkflow(tenant_id, agent_type, task_type)— sample from Beta(alpha, beta) per workflowrecordOutcome(workflow_id, outcome_score)— update alpha/beta based on outcome
Implementation status: Table schema reserved. No logic implemented. Track D scope.
Invariants preserved
| Invariant | Status |
|---|---|
| #2 KV-only hot path | Preserved. Memory tiers are NOT in the tool-proxy hot path. Clarified by this ADR. |
| #6 Request path never touches a DO | Preserved. AgentRuntime DO is only reachable from the admin agent-run path, CF-Access gated. |
| #9 CF Cron for scheduled work | Preserved. Episodic memory pruning and semantic memory consolidation use CF Cron. |
| #15 Sources of truth | Preserved. Episodic → D1. Semantic → Vectorize. Procedural → DO SQLite (agent-local, not a global source of truth). |
Acceptance criteria
Tier 1 (Track A Phase A.2):
turn_staterow created on first/runcall; updated on completion.run_logrow written for every agent turn.agent_working:*KV keys set/read correctly in the SDR Agent loop.
Tier 2 (Track A Phase A.2):
agent_runsrow created viactx.waitUntil(startRun(...))— verified by admin query after first run.memory_episodesrows written for each turn — verified by count query.- Context recall: agent receives last-10 episodes at turn start (logged to
agent_runs.input). - Pruning cron: Sunday 03:00 UTC job removes rows older than 90 days (tested with synthetic old data).
Tier 3 (Track D ship gate):
recall("responded positively to direct ROI framing")returns relevant facts in top-5 after 10+ runs.learn(tenant_id, fact)completes without error and fact appears in subsequentrecallresults.
Consequences
Positive:
- Agent memory architecture is formally documented and sequenced. Track A + Track D can be built in parallel without design drift.
- Invariant #2 clarification prevents future contributors from misreading “KV-only” as a prohibition on agent memory.
- The episodic memory schema (migration 0013) is already on
main— Track A can build against it immediately. - Tier 3 and Tier 4 are deferred to Track D without creating tech debt — the schema is reserved, the interface is documented.
Negative / accepted risk:
- Three different storage systems (KV, D1, Vectorize) for agent memory increases operational surface. Mitigation: each tier has a single owning module (
agent-memory.ts,memory-semantic.ts,memory-procedural.ts) — no cross-tier reads in a single call path. - 90-day D1 episodic retention may fill disk on a high-volume tenant. Mitigation: the pruning cron. Hard limit: 100K
memory_episodesrows per tenant; warn at 80K via the daily brief cron. - Track D semantic memory requires a Vectorize namespace per tenant. At 100 tenants this is 100 namespaces. CF Vectorize currently has a limit of 100 indexes per account (as of 2025-Q4 docs). Mitigation: check current limit via Cloudflare dashboard before Track D ships; if limit is binding, use sub-namespacing within a single index keyed by
tenant_idmetadata filter.