Three-Tier Agent Memory Model

ADR-043 — Three-Tier Agent Memory Model

Status: Accepted
Date: 2026-05-07
Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
Supersedes: Implicit KV-only memory assumption in invariant #2 (for agent context, not gateway hot path)
Related: ADR-040 Tracks A + D, ADR-042, ASCEND_OPERATOR_OS_VISION.md §4 (Memory Architecture), src/lib/agent-memory.ts
Invariant changed: #2 (clarification, not relaxation — see below)

Context

Invariant #2 states “KV-only hot path on the gateway.” This was designed to prevent D1 reads from being added to the gateway request path for tool proxying. It was not designed to govern agent memory — agents did not exist when invariant #2 was written.

The Operator OS vision (§4) defines a four-tier memory model: Working, Episodic, Semantic, Procedural. The V5 gateway currently has one partial tier:

src/lib/agent-memory.ts exists and implements partial Tier 1 (working memory via KV TTL’d keys) and partial Tier 2 (episodic write to D1 memory_episodes).
Migration 0013_agent_runs.sql (Track A, 2026-05-07) adds the agent_runs and memory_episodes D1 tables.
There is no Tier 3 (semantic recall via Vectorize) or Tier 4 (procedural / bandit workflows) implementation.

Without a formal ADR, the line “KV-only” in invariant #2 creates ambiguity: does it prohibit agent memory from using D1 and Vectorize? This ADR resolves the ambiguity and formalizes the three-tier model that ships in Q1 (Tier 4 procedural is Q1-partial, completing in Track D weeks 6–12).

Decision

Adopt a three-tier agent memory model for Q1:

Tier	Name	Storage	Scope	TTL / Retention	When accessed
1	Working memory	DO SQLite (`turn_state`, `run_log` tables in `AgentRuntime`) + KV (`agent_working:{tenant}:{instance_id}:{key}`, 1h TTL)	Per agent instance, ephemeral	KV: 1h. DO SQLite: cleared on DO eviction (~24h idle)	In-turn: every agent step reads/writes working state
2	Episodic memory	D1 `agent_runs` + `memory_episodes` tables (cold path)	Per tenant, persistent	90 days, then pruned by weekly Cron	Cross-turn recall: agent loads last N episodes at turn start. Admin/eval queries. NOT in the gateway tool-proxy hot path.
3	Semantic memory	Vectorize namespace `agent_memory_semantic:{tenant_id}` (per-tenant)	Per tenant, persistent	Indefinite (explicit delete only)	Context assembly: called before the agent turn begins, outside the tool-proxy hot path

Tier 4 (procedural / bandit workflows) is implemented in Q1 as a stub table (procedural_workflows in DO SQLite, reserved in Track A Phase A.1) with Track D wiring the full bandit selector in weeks 6–12.

Why this does NOT violate invariant #2

Invariant #2 governs the gateway tool-proxy hot path: the sequence that handles a POST /api/v1/tools/:tool_name or MCP tool call from Cursor/Codex/claude.ai. That path is:

KV auth lookup → KV token read → KV config lookup → outbound fetch → response

None of the memory tiers are accessed in this path. Specifically:

Tier 1 (DO SQLite + KV): Accessed by AgentRuntime DO in the agent run path (/v1/agents/:tenant/:agent_type/run). This is an admin-only surface (CF-Access gated), not the tool-proxy hot path.
Tier 2 (D1 cold path): Written via ctx.waitUntil() (async, non-blocking) after the agent turn. Read only from admin/eval endpoints. Never in the tool-proxy hot path.
Tier 3 (Vectorize): Called in agent context assembly before the agent turn begins. The context assembly happens inside AgentRuntime.fetch() — again, the agent run path, not the tool-proxy hot path.

The KV-only guarantee of invariant #2 applies specifically to src/handlers/, src/core/, and src/tools/ — the gateway tool-proxy path files. Agent memory lives in src/lib/agent-memory.ts, src/lib/memory-semantic.ts, and DO internals — none of which are in the tool-proxy hot path.

This ADR adds a clarifying sentence to invariant #2 (see below) to make this explicit, so future contributors do not misread “KV-only hot path” as a prohibition on memory tiers.

Invariant #2 change (clarification)

Old text:

KV-only hot path on the gateway. Request latency budgeted at ≤10 ms overhead. D1 only in cold paths (error_ledger, kv_audit, decision_log). Context-worker D1 is cold path by definition — it’s the context worker’s own cold path.

New text (effective when this ADR merges):

KV-only hot path on the gateway tool-proxy path (src/handlers/, src/core/, src/tools/). Request latency budgeted at ≤15 ms (non-agent) or ≤30 ms (agent assembly, per ADR-041). D1 only in cold paths: error_ledger, kv_audit, decision_log, agent_runs, memory_episodes, eval_datasets. Vectorize only in agent context assembly (outside the tool-proxy hot path). Context-worker D1 is cold path by definition.

The change: adds the agent-specific D1 tables to the allowed cold-path list, adds Vectorize with the explicit constraint that it only appears outside the tool-proxy hot path, and aligns the latency numbers with ADR-041.

Tier 1: Working memory

Storage: AgentRuntime DO SQLite (turn_state and run_log tables) + KV with TTL.

Schema (DO SQLite): Defined in src/do/agent-runtime.ts blockConcurrencyWhile initializer. Tables: turn_state, run_log, procedural_workflows (stub, Track D fills).

KV working memory: agent_working:{tenant_id}:{instance_id}:{key} → arbitrary JSON, 1h TTL. Used for lightweight cross-step state that doesn’t need SQL (e.g., “current_contact_id”, “email_draft_v2”).

Size constraint: KV values ≤25 MB (CF limit). Working memory is not intended for large payloads — ephemeral identifiers and small JSON objects only. Per-instance budget: <128KB total across all KV keys (aligns with DO 128MB memory limit context).

Implementation status: Partial — agent-runtime.ts Track A Phase A.1 implements turn_state and run_log. agent_working:* KV writes are Track A Phase A.2 (SDR Agent loop).

Tier 2: Episodic memory

Storage: D1 agent_runs + memory_episodes tables (migration 0013_agent_runs.sql).

Retention: 90 days. A weekly CF Cron job (slot: 0 3 * * 0, Sunday 03:00 UTC) prunes memory_episodes rows older than 90 days per tenant.

Writes: Via ctx.waitUntil() from AgentRuntime — always async, never blocks the agent turn response. Implemented in src/lib/agent-memory.ts (startRun, appendEpisode, completeRun, failRun).

Reads: From admin/eval surfaces only (GET /admin/agents/:tenant/runs, eval pipeline queries). Reads are parameterized D1 queries (invariant #2 D1-only-cold-path preserved).

Cross-run recall by the agent: At turn start, the agent receives the last N episodes as context. This is assembled in AgentRuntime.handleRun() via a D1 query before dispatching to runSdrAgent. The query is: SELECT * FROM memory_episodes WHERE tenant_id = ? AND agent_type = ? ORDER BY created_at DESC LIMIT 10. This is cold-path by definition (it happens inside the DO/agent run path, not the gateway tool-proxy path).

Implementation status: Tables exist (migration 0013). agent-memory.ts implements write path (startRun, completeRun, failRun). Read-back for context recall is Track A Phase A.2.

Tier 3: Semantic memory

Storage: Vectorize namespace agent_memory_semantic:{tenant_id} — one namespace per tenant.

Schema: Each vector entry represents a fact or observation the agent has learned. Metadata:

{
  "fact_id": "<uuid>",
  "tenant_id": "<tenant>",
  "agent_type": "sdr",
  "subject": "contact:hubspot_id:123456",
  "predicate": "responded_positively_to",
  "object": "direct_roi_framing",
  "confidence": 0.82,
  "learned_from_run_id": "<run_uuid>",
  "created_at": 1746700000
}

Interface (Track D implementation): src/lib/memory-semantic.ts

recall(tenant_id, query, opts): Fact[] — Vectorize similarity query, returns top-k facts
learn(tenant_id, fact) — embeds and upserts a fact via ctx.waitUntil

When accessed: Context assembly before the agent turn begins, NOT in the tool-proxy hot path.

Implementation status: NOT YET IMPLEMENTED. Track D (weeks 6–12) ships this. The Vectorize namespace pattern tenant_{tenant_id} already exists in the platform (used by the Context Worker). Track D operationalizes the same pattern for agent memory. This ADR formalizes the intent and schema so Track A + Track D don’t diverge on the data model.

Migration trigger for Track D: Track A live in production + agent_runtime DO deployed + memory_episodes accumulating data for ≥1 week.

Tier 4: Procedural memory (stub in Q1, full in Track D)

Storage: procedural_workflows table in AgentRuntime DO SQLite (reserved in Track A Phase A.1).

Schema: workflow_id, agent_type, task_type, spec TEXT, alpha REAL, beta REAL — Thompson-sampling bandit over workflow specs.

Interface (Track D): src/lib/memory-procedural.ts

selectWorkflow(tenant_id, agent_type, task_type) — sample from Beta(alpha, beta) per workflow
recordOutcome(workflow_id, outcome_score) — update alpha/beta based on outcome

Implementation status: Table schema reserved. No logic implemented. Track D scope.

Invariants preserved

Invariant	Status
#2 KV-only hot path	Preserved. Memory tiers are NOT in the tool-proxy hot path. Clarified by this ADR.
#6 Request path never touches a DO	Preserved. `AgentRuntime` DO is only reachable from the admin agent-run path, CF-Access gated.
#9 CF Cron for scheduled work	Preserved. Episodic memory pruning and semantic memory consolidation use CF Cron.
#15 Sources of truth	Preserved. Episodic → D1. Semantic → Vectorize. Procedural → DO SQLite (agent-local, not a global source of truth).

Acceptance criteria

Tier 1 (Track A Phase A.2):

turn_state row created on first /run call; updated on completion.
run_log row written for every agent turn.
agent_working:* KV keys set/read correctly in the SDR Agent loop.

Tier 2 (Track A Phase A.2):

agent_runs row created via ctx.waitUntil(startRun(...)) — verified by admin query after first run.
memory_episodes rows written for each turn — verified by count query.
Context recall: agent receives last-10 episodes at turn start (logged to agent_runs.input).
Pruning cron: Sunday 03:00 UTC job removes rows older than 90 days (tested with synthetic old data).

Tier 3 (Track D ship gate):

recall("responded positively to direct ROI framing") returns relevant facts in top-5 after 10+ runs.
learn(tenant_id, fact) completes without error and fact appears in subsequent recall results.

Consequences

Positive:

Agent memory architecture is formally documented and sequenced. Track A + Track D can be built in parallel without design drift.
Invariant #2 clarification prevents future contributors from misreading “KV-only” as a prohibition on agent memory.
The episodic memory schema (migration 0013) is already on main — Track A can build against it immediately.
Tier 3 and Tier 4 are deferred to Track D without creating tech debt — the schema is reserved, the interface is documented.

Negative / accepted risk:

Three different storage systems (KV, D1, Vectorize) for agent memory increases operational surface. Mitigation: each tier has a single owning module (agent-memory.ts, memory-semantic.ts, memory-procedural.ts) — no cross-tier reads in a single call path.
90-day D1 episodic retention may fill disk on a high-volume tenant. Mitigation: the pruning cron. Hard limit: 100K memory_episodes rows per tenant; warn at 80K via the daily brief cron.
Track D semantic memory requires a Vectorize namespace per tenant. At 100 tenants this is 100 namespaces. CF Vectorize currently has a limit of 100 indexes per account (as of 2025-Q4 docs). Mitigation: check current limit via Cloudflare dashboard before Track D ships; if limit is binding, use sub-namespacing within a single index keyed by tenant_id metadata filter.