Operator OS Q1 = Five Parallel Tracks

ADR-040 — Operator OS Q1 = Five Parallel Tracks

Status: Accepted
Date: 2026-05-07
Decider: Mishaal Murawala (delegated engineering sequencing to Claude Code as engineering lead)
Supersedes: none
Related: docs/architecture/ASCEND_OPERATOR_OS_VISION.md, docs/plans/OPERATOR-OS-Q1-FOUNDATION.md
Pre-listed follow-up ADRs: ADR-041 (hot-path budget 10ms→30ms), ADR-042 (capability-index replaces static MCP registration), ADR-043 (memory tiers expand beyond KV-only), ADR-044 (composites deferred)

Context

The Ascend Operator OS Vision (ADOPTED 2026-05-07, see vision doc Appendix B) commits us to a multi-tenant agent platform with a capability index, four memory tiers, an inference router, a per-tenant eval system, and a 19-agent topology. The vision retires several V5 invariants (10ms hot-path budget, 35-tool ceiling, KV-only memory, static MCP registration, composite-tools roadmap), each of which needs its own follow-up ADR.

Mishaal’s directive 2026-05-07: “Sequence is up to you. You are the engineering lead. You figure out the best way to do it. Ideally, parallel process as many things as you can that are non-overlapping, but ultimately how you implement is up to you.”

Q1 needs to ship the foundation in 12 weeks while:

not breaking the existing V5 substrate (production traffic continues),
not violating the existing 15 invariants until each retiring ADR ships,
maximizing parallelism across non-overlapping subsystems,
giving Mishaal at least one demoable per-tenant agent within 6 weeks.

Decision

Five parallel tracks, with explicit dependency edges. Each track has an independent owner, an independent acceptance test, and an independent merge cadence.

Track A (weeks 1–6):  per-tenant agent DO + 1 SDR Agent end-to-end
                          │
                          ├── depends on (none — uses today's static MCP tools)
                          └── unblocks fund agents in Track C

Track B (weeks 1–8):  capability index (Vectorize + retrieval helper)
                          │
                          ├── depends on (none — pure additive layer)
                          └── unblocks Track A migration to dynamic retrieval (week 7+)

Track C (weeks 4–12): per-fund agent DO + cross-portco D1 view + Op Partner dashboard
                          │
                          ├── depends on Track A (DO base class) by week 4
                          └── unblocks fund-level land motion (Q2)

Track D (weeks 6–12): semantic + procedural memory tiers
                          │
                          ├── depends on Track A (agent runtime to call into) by week 6
                          └── unblocks compounding loops 3+5 (procedural / pattern bank)

Track E (weeks 8–12): per-tenant evals + A/B routing + Cerebras triage tier
                          │
                          ├── depends on Track A live traffic (need real runs to grade) by week 8
                          └── unblocks outcome-pricing commit (Q2)

Tracks A and B are fully parallel from week 1. Tracks C/D/E stagger in as their dependency edges land.

Track scopes (acceptance tests)

Track A — Per-tenant agent DO + 1 SDR Agent (weeks 1–6)

Goal: ship one production agent end-to-end. SDR Agent is the wedge: it has the clearest outcome metric (qualified meeting) and Mishaal can dogfood it against Ascend’s own outbound.

Scope:

New SQLite-backed DO class AgentRuntime keyed by (tenant_id, agent_type, agent_instance_id).
wrangler.toml migration adds AgentRuntime to new_sqlite_classes.
src/do/agent-runtime.ts — turn loop, working memory (KV TTL’d), episodic memory write to D1 memory_episodes (new table, migration 0008).
src/agents/sdr/ — system prompt, ICP scorer call, draft email + send (gated), reply triage, meeting-booked detection.
Langfuse trace export via existing AI Gateway.
Eval harness shadow mode: every SDR Agent run produces (input, output) tuple in D1 agent_runs (new table, migration 0008).
Surface: POST /v1/agents/:tenant/sdr/run admin route (CF-Access gated).

Acceptance:

One SDR Agent instance for tenant ascend (dogfood) live in production.
50+ runs in agent_runs table within 7 days of go-live.
A/B vs. generic-baseline reply rate logged to D1; report after 4 weeks.
Hot-path budget for non-agent gateway requests stays ≤10ms p99 (no regression to existing invariant 10).

Track B — Capability index (weeks 1–8, parallel to A)

Goal: every tool retrievable by embedding similarity, with cost/latency/success priors. No agent code uses it yet — Track A still calls today’s static tools — but the index is queryable and ready.

Scope:

New Vectorize namespace capability_index (separate from existing tenant_* namespaces).
scripts/embed-tool-catalog.ts — reads src/config/providers.ts + src/handlers/mcp.ts, embeds (description + scope + provider) per tool, writes vectors + metadata to Vectorize.
KV mirror capability_index:{tool_name} with the priors block (schema in vision doc §3.2).
src/lib/capability-retrieval.ts — retrieveCapabilities(intent: string, opts: {top_k, max_cost_usd, max_latency_ms_p99}): ToolCandidate[].
Priors writer: nightly CF Cron job reads last 24h of agent_runs + gateway audit logs, updates per-tool success_rate / latency / cost.
Admin endpoint POST /admin/capabilities/reindex (CF-Access gated).

Acceptance:

All current 33 tools indexed.
retrieveCapabilities("score this account against ICP") returns icp_scorer-class tools in top-3 with non-trivial similarity score.
Priors update visibly within 24h of an agent run.
Track A migration to dynamic retrieval is a follow-up after week 7 (not Q1 scope to flip the switch).

Track C — Per-fund agent DO + cross-portco view + Op Partner dashboard (weeks 4–12)

Goal: prove fund-level land motion is technically real. Operating Partner can ask “how is Portco X tracking against the peer set” and get a useful answer.

Scope:

New DO class FundRuntime keyed by (fund_id, agent_type) — same SQLite-backed pattern as AgentRuntime.
D1 view (cold path) cross_portco_metrics_v — aggregates anonymized last-90d metrics from each portco’s agent_runs table by tenant_id × metric type.
Tenant-isolation contract: portco tenants never see other portcos. Fund tenant sees aggregated metrics only.
src/agents/fund/operating-partner-brief/ — agent that produces a 1-page Op Partner brief on demand for a given fund × portco.
Surface: minimal Cloudflare Pages dashboard (fund-dashboard/) — single-page, list-of-portcos + drill-into-brief. CF-Access gated.

Acceptance:

Fund tenant pointfield-demo configured with 2 mock portcos.
Op Partner brief renders for each.
Cross-portco D1 view query returns in <1s for ≤10 portcos.
Tenant-isolation test: portco tenant kahuna cannot read fund pointfield-demo’s view (403 verified by integration test).

Track D — Semantic + procedural memory tiers (weeks 6–12)

Goal: every agent has access to per-tenant facts (semantic) and learned-workflow priors (procedural). Compounding loops 3+5 begin accruing.

Scope:

Per-tenant Vectorize namespace pattern tenant_{tenant_id} (already exists — Track D operationalizes it for agents).
src/lib/memory-semantic.ts — recall(tenant_id, query, opts): Fact[] and learn(tenant_id, fact).
DO SQLite procedural store inside AgentRuntime — table procedural_workflows with bandit weights.
src/lib/memory-procedural.ts — selectWorkflow(tenant_id, agent_type, task_type) (Thompson-sampling over stored workflows) and recordOutcome(workflow_id, outcome_score).
Migration to wire SDR Agent (Track A) to use both: semantic recall before drafting; procedural workflow selection before sending.

Acceptance:

SDR Agent for ascend writes ≥10 semantic facts in week 1 of going live.
Procedural workflow store has ≥3 distinct workflows recorded after 4 weeks.
Bandit selection demonstrably weights toward higher-outcome workflows over time (chart in weekly summary).

Track E — Per-tenant evals + A/B routing + Cerebras triage (weeks 8–12)

Goal: the eval moat starts compounding, and the inference router can run sub-200ms triage when reasoning isn’t needed.

Scope:

D1 table eval_datasets per-tenant (migration 0009): graded (input, expected_output, actual_output, score, grader_id, graded_at).
Grading pipeline: human + tri-judge (carryover from V5 Quality Harness Phase 4) writes scores back into eval_datasets.
A/B router at orchestration: per (tenant, agent_type, task_type) bandit choosing between (model, prompt_variant) tuples.
Cerebras provider added to api_config:{provider}. inference-router.ts wraps existing llm_invoke with a triage tier that hits Cerebras for low-reasoning tasks (intent classification, format extraction) and falls through to Anthropic/Gemini for reasoning.

Acceptance:

Tenant ascend has ≥100 graded examples in eval_datasets for SDR Agent within 30 days of Track E start.
A/B router visibly switches between two SDR Agent prompt variants based on bandit weights (logged).
Cerebras tier handles ≥30% of agent calls measured over a 7-day window.
p50 inference latency on triage-tier calls <200ms.

Consequences

Positive:

5 tracks running in parallel = 12-week Q1 instead of ~30 weeks serial.
Track A delivers a demoable agent within 6 weeks (Mishaal’s “first useful output” SLA from §7).
Track B is pure additive — no risk of regressing the existing gateway.
Track C unblocks fund-level commercial motion in Q2 without waiting for portco motion to mature.

Negative / accepted risk:

Tracks A and D both touch AgentRuntime DO. Risk of merge conflicts. Mitigation: Track D blocks on Track A skeleton (week 1) before opening any code.
Track B’s priors writer depends on agent_runs D1 schema landing in Track A first. Mitigation: Track A merges migration 0008 by week 2; Track B’s priors job is a no-op until then.
Tracks C and D both add D1 migrations. Mitigation: migrations are ordered (0008 Track A, 0009 Track E, 0010 Track C, 0011 Track D).
Five tracks may exceed single-engineer (Mishaal+Claude) bandwidth. Mitigation: each track ships independently; if a track slips, vision is not invalidated.
Hot-path budget regression risk when capability-index retrieval lands in agent runtime (post-Q1). Mitigation: ADR-041 lands before flip.

Reversal triggers:

If Track A’s SDR Agent fails A/B vs. generic baseline at week 6 → pause Tracks C/D/E, debug agent quality before scaling topology.
If capability-index priors are too noisy at week 8 to be useful → defer Track B integration to Q2; agents stay on static MCP for Q1 demo.
If Cerebras tier shows <10% routing share at week 12 → defer Track E inference-router work; eval system still ships.

Out of scope for Q1 (deferred to Q2+)

Agents 11–19 (only SDR Agent + Operating Partner Brief Agent in Q1)
Outcome-billing infrastructure (Stripe metered usage)
Public capability-index API (closed for Q1, open in Q3 if Bet 5 plays out)
SOC 2 Type 2 (Q3 question per vision §14)
Composites (vision §9 — deferred indefinitely; ADR-044)

Implementation kickoff

After this ADR + the Q1 plan doc + the LEDGER row land on main:

Track A: branch claude/operator-os-track-a-sdr-agent. First commit: wrangler.toml migration + AgentRuntime skeleton.
Track B: branch claude/operator-os-track-b-capability-index. First commit: Vectorize namespace creation script.

Tracks C/D/E branches open at their dependency-unblock weeks per the diagram above.