Ascend Cloud-Native Platform v2 — Engineering Plan

Author: Engineering Leadership (Claude + Mishaal) Version: 1.0 (2026-04-24) Status: Proposed — awaiting sign-off Scope: The entire Ascend GTM stack — gateway, context plane, agent workflow, integrations, observability, security — brought to 2026 cutting-edge and made fully cloud-native. Non-goal: Incremental patching. This plan is a structural upgrade.

Executive summary

Ascend’s V5 stack is architecturally sound — the hot path is edge-native, integrations are unified through a single gateway, Phase 2 Context Plane just landed. But we are carrying stale invariants, hand-rolled patterns that 2026 primitives replace, and a laptop-dependent agent workflow. This plan closes those gaps in three waves over ~4 weeks, ending with:

Agent workflow 100% cloud-native. Any surface (phone, 10-year-old laptop, library PC) reaches full engineering capability via claude.ai/code + Routines. No local process. No SSH. No Tailscale. No laptop dependency.
V5 Gateway registered as a first-class remote MCP server. Streamable HTTP + OAuth 2.1 (per MCP spec 2025-11-25). Discoverable via the Anthropic registry at net.ascendgtm/gateway. Per-tenant isolation via Cloudflare MCP Server Portals.
Observability, security, and reliability upgraded to 2026 edge primitives. AI Gateway in front of every LLM call. Workers Logs + Logpush for audit. Access + WebAuthn on admin endpoints. Workflows replacing DIY cron-for-multi-step. SQLite-backed Durable Objects for every new class.

Research receipts: CF platform audit · MCP ecosystem audit · Claude Code cloud architecture · initial cloud-only audit.

Part I — Current state (where we are)

What’s already cutting-edge (keep)

System	Status	Why it’s right
V5 Gateway on Workers + Hono 4	✅	Single binary, edge-native, <10 ms overhead invariant
KV-only hot path	✅	Spec-correct — request latency bounded by KV reads
Durable Objects for OAuth	✅	Alarm-based proactive refresh; no vendor tax
D1 cold path for audit	✅	Correct hot/cold separation
Phase 2 Context Worker	✅	Two-plane architecture per ADR-016
CF Cron for scheduled work	✅	Free-plan limit (5) respected; replaces n8n
R2 weekly KV backups	✅	Disaster recovery on the correct storage
GitHub Actions CI (typecheck + test + drift check)	✅	Two jobs: `gateway-worker` + `context-worker-typecheck`

What’s drifted (repo docs vs reality)

Caught during this plan’s research pass:

File	Stale claim	Reality	Fix
`.claude/CLAUDE.md` §Architecture Invariants	”ONE Worker only … No Service Bindings … No multi-Worker”	Phase 2 shipped Service Binding to `ascend-context-worker`	Rewrite invariant #1 to reflect ADR-016’s scoped exception
`.claude/CLAUDE.md` §Architecture Invariants	”18 MCP tools”	28 tools post-Phase-2	Rewrite invariant #7 with the current count + category breakdown
`.claude/rules/v5-invariants.md`	Same as above (duplicated)	Same	Rewrite both files from one canonical source
Global `~/.claude/CLAUDE.md`	References legacy n8n/DataTable/VPS architecture	V5 is CF-native	Strip the legacy block; the V5 project-level config already notes “IGNORE” but cleaner to remove the source

This is table stakes — not cutting-edge. It’s doc-hygiene that ships in Wave 1 alongside real architecture work.

What’s not cutting-edge (real gaps)

Gateway speaks MCP over SSE (the 2024-11-05 spec). Current spec is 2025-11-25 → Streamable HTTP is the only supported transport going forward. SSE is formally deprecated.
No OAuth 2.1 on the MCP surface. We use bearer tokens derived from ASCEND_TENANT_BEARER. The MCP spec mandates OAuth 2.1 + RFC 9728 Protected Resource Metadata + PKCE.
Every LLM call goes direct to DeepSeek / Anthropic / Gemini / OpenRouter / Groq / Cerebras. No observability, no fallback chains, no cost caps, no semantic cache.
DIY multi-step pipelines in Queues + DO alarms. Workers Workflows (GA 2025-04) is purpose-built for this. Gong/SFDC ingestion = textbook Workflows use case.
Admin endpoints (/admin/*) behind a static API-key hash. Cloudflare Access + WebAuthn is the 2026 pattern.
Workers Logs not configured. We persist everything to error_ledger D1 manually. CF now offers structured logs + Logpush to R2/S3/Datadog with 7+ day retention for free.
KV-backed Durable Object for TokenManager. SQLite-backed DOs are GA and explicitly recommended for all new namespaces; the old KV-backed pattern is labeled “(Legacy)” in the CF sidebar.
wrangler secret put for every secret. Cloudflare Secrets Store (open beta, 2026-04-16) gives per-secret rotation, version history, and account-scoped access. Not GA yet → land ADR now, migrate at GA.
Agent workflow runs on a MacBook. Everything I just reviewed means we run on the user’s MacBook. This is the biggest gap and the main driver of this plan.
Stdio MCPs for Cloudflare + GitHub + n8n. Every one of those has a registered hosted remote-MCP (or equivalent) as of 2026. Stdio = legacy.
Playwright tunnel on localhost:8931. 2026 pattern is Workers Browser Run (Quick Actions + Stagehand + Playwright MCP) + Chrome MCP for agent-driven browsing.

Part II — Target architecture (where we’re going)

Two-plane, fully-registered, OAuth-guarded, observable from anywhere

┌───────────────────────────────────────────────────────────────────────┐
│  AGENT WORKFLOW (zero laptop dependency)                             │
│                                                                       │
│  Any browser / iPhone / tablet / 10-yo laptop                         │
│            │                                                          │
│            ▼                                                          │
│  claude.ai/code  ── Anthropic-managed VMs ──┐                        │
│  (Web + Routines + Dispatch + Channels)     │                        │
│                                              │ MCP (Streamable HTTP   │
│                                              │      + OAuth 2.1)      │
│                                              ▼                        │
│  ┌───────────────────────────────────────────────────────────────┐   │
│  │  Cloudflare MCP Server Portal — one URL per tenant             │   │
│  │  portal.ascendgtm.workers.dev/mcp/{tenant}                     │   │
│  │  (auth: CF Access SSO + WebAuthn)                              │   │
│  └───────────┬───────────────────────────────────────────────────┘   │
│              │                                                        │
└──────────────┼────────────────────────────────────────────────────────┘
               │
        ┌──────┴──────┐
        ▼             ▼
┌──────────────┐   ┌──────────────────────────────────────────────────┐
│ V5 Gateway   │   │ Hosted third-party MCPs (registered, OAuth'd)     │
│ (EXECUTION)  │   │ • Cloudflare (bindings.mcp.cloudflare.com)        │
│              │   │ • Linear, Notion, Atlassian, Stripe, Supabase     │
│ All 34 tools │   │ • GitHub, Figma, Monday, HuggingFace              │
│ Streamable   │   │ • Slack (Anthropic official)                       │
│ HTTP /mcp    │   │ • ... 216 registered commercial MCPs total         │
│ OAuth 2.1    │   └──────────────────────────────────────────────────┘
└──┬───────────┘
   │ Service Binding (RPC, Streamable HTTP)
   ▼
┌──────────────────────────────────────────────────────────────────────┐
│ V5 Context Worker (CONTEXT PLANE)                                    │
│ • D1: entities + facts + signal_evaluations                          │
│ • Vectorize: ctx_v5_facts (bge-small 384-dim cosine)                 │
│ • Workers Workflows: Gong/SFDC extraction pipelines                  │
│ • CF Queues: Gong + SFDC ingestion (producer only)                   │
│ • Tools: context_query, context_explain                              │
└──────────────────────────────────────────────────────────────────────┘

┌─── Observability & Security plane ─────────────────────────────────┐
│                                                                      │
│  Every outbound LLM call  →  Cloudflare AI Gateway                  │
│                                 (observability, caching, fallback)  │
│  Every admin request      →  Cloudflare Access + WebAuthn           │
│  Every Worker log         →  Workers Logs → Logpush → R2 (30d)      │
│  Every tool invocation    →  Analytics Engine (weekly rollup)       │
│  Every secret             →  Secrets Store (GA) with rotation       │
└──────────────────────────────────────────────────────────────────────┘

Core invariants (new canonical set — supersede both .claude/CLAUDE.md and v5-invariants.md)

Two-plane architecture. Execution (gateway) + Context (context-worker). A third plane is forbidden without an ADR. Service Bindings between these two planes only — no external service bindings.
KV-only hot path on the gateway. Request latency budgeted at ≤10 ms overhead. D1 only in cold paths (error_ledger, kv_audit, decision_log). Context-worker D1 is cold path by definition — it’s the context worker’s own cold path.
Fail-fast, no retries in the request path. Callers retry. Proxy does not.
OAuth 2.1 + Streamable HTTP on every MCP surface — per spec 2025-11-25. No SSE transport.
Composio owns OAuth end-to-end (revised by ADR-057, 2026-05-19). The V5 gateway no longer holds OAuth tokens for any SaaS provider covered by Composio. tokens:{tenant}:{provider}:{account_id} KV is retained only for providers Composio does not cover (AWS via aws4fetch signing keys, Anthropic API key, etc.). No DO-based token refresh. No external auth brokers beyond Composio.
Request path never touches a DO. DO writes KV 10 min before expiry; request reads KV.
Capability index, not static registration. (ADR-042, 2026-05-07) 3 always-on platform tools (call_api, discover_apis, batch_execute) registered statically. All other tools indexed in Vectorize capability_index and retrieved semantically (≤20 per LLM context). Catalog unbounded. Adding a tool: docs/tools/<slug>.md + TOOLS.md row + embed script re-run. No hard ceiling.
Multi-account support is load-bearing. KV key: tokens:{tenant}:{provider}:{account_id}.
CF Cron + CF Workflows for scheduled and multi-step work. No external cron services. No n8n for orchestration.
Gateway overhead ≤10 ms. auth + token + route + AI Gateway callback included.
30 s AbortController timeout on every outbound fetch.
Every LLM call goes through AI Gateway — observability + fallback chain + budget cap.
Every admin endpoint gated by Cloudflare Access. Static API key never sufficient alone.
Secrets live in Secrets Store when GA; wrangler secrets acceptable during open beta with documented migration trigger.
Sources of truth: KV (config), D1 (audit), Vectorize (facts-embedded), R2 (backups), GitHub (code). No other source-of-truth systems without an ADR.

Part III — Adoption roadmap

Wave 1 — Agent workflow cloud-native + drift cleanup (Week 1)

Goal: any agent work can happen from any device. Ascend’s MacBook becomes a thin client.

#	Task	Owner	Size	Cloud destination
1.1	Move Non-Stop Protocol + global rules into repo `.claude/`	agent	45 min	Repo
1.2	Copy hooks + relevant skills + rules into repo	agent	30 min	Repo
1.3	Rewrite invariants to canonical set above. Update `.claude/CLAUDE.md` + `.claude/rules/v5-invariants.md` + `~/CLAUDE.md` (strip legacy n8n block)	agent	45 min	Repo
1.4	Author `.mcp.json` at repo root declaring remote MCPs — gateway (to be OAuth’d in Wave 2), hindsight, Cloudflare hosted, GitHub hosted	agent	30 min	Repo
1.5	Write cloud environment setup script — installs wrangler + gh + npm global deps — commit as `scripts/setup-claude-cloud-env.sh`	agent	20 min	Repo
1.6	Write `docs/cloud-env-seed.md` — list of exact env-var pairs to paste once into claude.ai/code	agent	15 min	Repo
1.7	User: install Claude GitHub App, paste env vars into cloud environment, enable Code on the Web	Mishaal	15 min	claude.ai/code
1.8	Convert 4 local scheduled tasks → Anthropic Routines (token-health, error-pattern, docs-freshness, backup-verify)	agent	1 hr	Anthropic
1.9	Archive 9 local worktrees (tag then delete)	agent	10 min	—
1.10	First cloud-native validation — close laptop, fire a `claude --remote` task, verify PR opens on GitHub without laptop	both	30 min	—

Wave 1 exit criterion: Mishaal powers the MacBook off for 24 hours. Upon reopening, at least one Routine has run, at least one --remote task has completed, all work visible on GitHub + claude.ai/code.

Wave 2 — MCP + Gateway cutting-edge (Week 2)

Goal: the V5 gateway speaks the 2026 MCP spec, authenticates with OAuth 2.1, is discoverable via the Anthropic registry, and lives behind a per-tenant MCP Server Portal.

#	Task	Size	Primitive
2.1	Upgrade gateway `/mcp` from SSE → Streamable HTTP transport. Use `McpAgent` + `OAuthProvider` from `agents` SDK (already a dep — `agents@0.9.0`)	1 day	Streamable HTTP MCP
2.2	Implement OAuth 2.1 authorization server (Streamable HTTP + PKCE + RFC 8707 Resource Indicators + Dynamic Client Registration) using `@cloudflare/workers-oauth-provider`	2 days	CF OAuth Provider
2.3	D1 table for OAuth client registrations + KV for session state	0.5 day	D1 + KV
2.4	Implement the MCP `elicitation/create` spec — URL-mode to off-ramp tenant 3rd-party OAuth onboarding (HubSpot/Salesforce/Google). Agent says “I need Gmail access” → elicitation URL → tenant consent → token stored via existing DO	1 day	MCP Elicitation
2.5	Migrate in-stash stdio MCPs → hosted registered MCPs. `.mcp.json` points to `bindings.mcp.cloudflare.com`, GitHub official, etc. Remove stdio entries from all configs.	2 hrs	Anthropic MCP Registry
2.6	Cloudflare MCP Server Portal — one URL per tenant (`portal.ascendgtm.workers.dev/mcp/{tenant}`). Gates on CF Access SSO + WebAuthn. Forwards to gateway `/mcp` with tenant context pre-derived.	4 hrs	CF MCP Portals
2.7	Register V5 gateway as `net.ascendgtm/gateway` with `visibility: private` in Anthropic MCP Registry	30 min	Anthropic MCP Registry
2.8	Add `worksWith: [claude-code, claude-api, claude-desktop]` metadata; ship server card at `.well-known/mcp-server-card`	1 hr	MCP Server Cards (roadmap 2026-Q3)
2.9	Audit every MCP tool for OAuth scope correctness. Tool-level scopes → DCR → never request more than needed	0.5 day	OAuth 2.1 scope discipline
2.10	Ship telemetry: every MCP call writes `{tenant, tool, auth_method, duration_ms, success, error_code}` to Analytics Engine	2 hrs	Analytics Engine

Wave 2 exit criterion: Claude Code web session connects to portal.ascendgtm.workers.dev/mcp/ascend via OAuth 2.1 (browser consent flow, no copy-pasted tokens), successfully invokes all 34 tools. Registry lookup api.anthropic.com/mcp-registry/.../net.ascendgtm/gateway returns the registered entry. Stdio MCPs removed from every config file — grep returns zero hits.

Wave 3 — Edge primitives + observability (Weeks 3–4)

Goal: replace hand-rolled patterns with CF 2026 primitives. Observability, security, reliability all go up a class.

#	Task	Size	CF primitive
3.1	Workers Logs enabled on gateway + context-worker. Structured `console.log` with trace IDs. Logpush job → R2 bucket `ascend-logs` with 30-day retention	2 hrs	Workers Logs + Logpush
3.2	Cloudflare AI Gateway in front of every LLM call. Replace direct calls in `llm_invoke`, `claude`, `perplexity`, `aws_bedrock_invoke`. One gateway per provider (DeepSeek/Anthropic/Gemini/OpenRouter/Groq/Cerebras). Enable semantic cache + fallback + cost caps.	1–2 days	AI Gateway
3.3	Cloudflare Access in front of `/admin/*`. WebAuthn hardware-key enrollment for Mishaal. Maintain static API key fallback for Routine access via service token.	3 hrs	CF Access
3.4	Workers Workflows for Gong + SFDC ingestion (Phase 2 Tasks 2.5/2.6). Replaces Queue + DO-alarm-orchestrated multi-step with declarative `workflow.step()`.	1 day (then 0.5 per additional pipeline)	Workers Workflows
3.5	Migrate `TokenManager` DO + any new DO class to SQLite-backed storage. SQLite DOs are recommended for all new namespaces; KV-backed is legacy.	1 day	SQLite DO
3.6	Workers Browser Run integration: replace any scraping/screenshot need with the Quick Actions endpoints (`/screenshot`, `/pdf`, `/json`, `/crawl`). Deprecate local Playwright tunnel.	0.5 day	Browser Run
3.7	Analytics Engine dashboards for 5 key metrics: gateway P95 latency, tool invocation rate, OAuth refresh rate, AI Gateway spend, error rate. Shareable URLs.	0.5 day	Analytics Engine
3.8	ADR for Secrets Store adoption — open beta today. Document migration trigger: “within 30 days of GA announcement, run migration script.” Until then: wrangler secrets with doc’d rotation procedure.	1 hr	Secrets Store
3.9	Runbook for `/incident-response` — Access → Workers Logs → AI Gateway dashboard → `/admin/errors`	1 hr	Process

Wave 3 exit criterion: AI Gateway shows traffic from every LLM tool. /admin/errors reachable only via CF Access login. Workers Logs query retrieves a full request trace across gateway + context-worker in <5 s. Gong ingestion Workflow has 3 successful runs in production without manual intervention.

Wave 4 — Polish + hardening (Week 4, optional)

Not blocking. Fire these once Waves 1–3 are stable.

#	Task	Why
4.1	GitHub OIDC → Cloudflare for tokenless deploys from CI	Eliminates the `CLOUDFLARE_API_TOKEN` secret entirely
4.2	Environment separation (resolves tech-debt row #16) — prod vs dev KV namespaces, wrangler `--env` consistently applied	Prevents a dev write from hitting prod KV
4.3	Property-based tests on auth layer (resolves tech-debt row #17)	Fuzz coverage on token validation
4.4	Retire Tailscale + decommissioned-VPS references from all docs	Doc hygiene
4.5	Telegram + Slack → Channels (Claude Code Channels feature) as alternative to API triggers	Additional surface for mobile command
4.6	Ultraplan + Ultrareview workflow adoption for multi-session features	Higher-quality planning layer

Part IV — Trade-offs + risks

Trade-offs

Decision	Gain	Cost
Streamable HTTP + OAuth 2.1 on MCP	Spec-correct; clients discover + connect without copy-pasted tokens	3–4 days of gateway work; breaks any existing caller relying on bearer token auth until they migrate
MCP Server Portal per tenant	Per-tenant isolation at the network layer; SSO gate	One more Worker to deploy + maintain
AI Gateway in front of every LLM	Observability, caching, cost caps, fallback	Small per-request hop through another Worker (<2 ms); new cost line-item (AI Gateway is free-tier friendly but adds a line)
SQLite DOs	Cheaper, faster, better transaction semantics than KV-backed	One-time migration: export → import per class
Workflows for ingestion	Durable multi-step with retries + replay	New primitive to learn; monitoring dashboard to build
Cloud-only agent workflow	Zero laptop dependency, persistent sessions, iOS app	Env var pasting is manual; no dedicated secrets store yet (visible to env editors)

Risks + mitigations

Risk	Likelihood	Impact	Mitigation
OAuth 2.1 rollout breaks existing n8n workflows calling gateway	Medium	n8n automations go dark until migrated	Phase: keep bearer-token auth active in parallel for 30 days; cut-over after all 83 n8n workflows updated
Routine daily cap hit	Low initially, medium at scale	Scheduled jobs skip	Enable Extra Usage billing; monitor cap via `claude.ai/settings/usage`
Cloud env secrets leaked to someone with env-edit access	Low	Credential exposure	Use scoped short-lived tokens; rotate quarterly; migrate to Secrets Store at GA
AI Gateway fallback misconfigured → wrong model answers	Medium	Quality drop	Test fallback chains with synthetic bad responses before enabling
Workflows + Queues dual-ownership for Phase 2 ingestion	Low	Confusion about which is source of truth	Queues become producer-only to Workflows; all business logic in Workflows
Claude Code web VM cap hit during overnight (currently ~unlimited for Pro)	Low	Overnight run fails	Monitor; fall back to Agent SDK self-hosted on a CF Worker for truly constrained cases
Stale invariants doc leaks old assumptions into new PRs	High if not fixed	Architectural drift compounds	Wave 1 includes invariant rewrite + `.claude/rules/` canonicalization

Part V — Success criteria

Must all be true at end of Wave 3:

Part VI — Operating model after migration

How work happens day-to-day

Morning: iOS app shows any Routine-generated PRs from overnight. Mishaal reviews on phone, merges or comments.
Deep work block: Mishaal opens claude.ai/code from any browser. Fires claude --remote "do X". Closes browser. Goes to meeting.
On-call: Sentry alert → Routine API trigger → Claude session investigates + drafts PR → Slack ping → on-call reviews via CF Access SSO.
Weekly: Routine-driven weekly digest summarizes tech-debt, open PRs, LEDGER drift. No manual report.
New integration request: Mishaal types “add Klaviyo” → cloud session reads spec, scaffolds provider, writes tests, opens PR. Elicitation URL triggers if OAuth consent needed.

How we measure health

AI Gateway dashboard — weekly review: fallback rate, cache hit rate, cost per tool per tenant.
Analytics Engine — weekly rollup: tool usage histogram (the ADR-023 decision was “keep all 25 tools”; this telemetry is what eventually retires anything genuinely zero-use).
Logpush + R2 — monthly random-sample audit: 10 requests traced end-to-end, confirm every step has logs.
Routine dashboard — daily: every scheduled Routine has a successful run in the last 24 h.
LEDGER.md — weekly: zero rows with “last touched >7 days” and no PR.

Operating invariants

Plan-first PR rule stays. Multi-session project = first commit is plan-doc + LEDGER row on main.
Non-Stop Execution Protocol stays. Now lives in repo .claude/CLAUDE.md; cloud sessions load it automatically.
Research-first mandate stays. Every new API / version / limit → live docs fetch before writing a value into code.
Parallelization rule stays. Independent tool calls batch in one message. No sequential ladders.

Part VII — What gets deleted

Migration generates noise. To keep the system elegant, delete these after Waves 1–3 ship:

~/.claude/CLAUDE.md legacy n8n section (strip, keep identity + rules)
Global references to Tailscale, VPS, Mac bridge (all decommissioned)
~/.claude/scheduled-tasks/ (replaced by Routines)
9 local worktrees under .claude/worktrees/ (replaced by cloud sessions)
.claude/mcp.json + stdio claude mcp add history
mcp-server/ directory in the old monorepo (no longer used — CF Worker replaces)
decommission-plan-vps.md ceremony files if present

Each deletion gets a commit in Wave 4 cleanup.

Part VIII — Wave 4 — Hosted-OSS-first inference (cost discipline, zero hardware)

Goal: route default internal workloads through open-source models hosted on managed cloud (Cloudflare Workers AI, DeepSeek direct API, OpenRouter) instead of premium frontier APIs. Frontier tier stays for novel / customer-facing work. Zero new hardware. Added 2026-04-24 after the DeepSeek V4 launch (4/23) + Qwen3.6 / Workers AI catalog review showed ~90% cost reduction opportunity on bulk workloads.

Invariant update

The repo .claude/CLAUDE.md Non-Stop Protocol already encodes “No hardware dependencies — everything cloud.” Wave 4 operationalizes that for the inference layer specifically: every LLM and embedding call must land on either Cloudflare’s managed inference (Workers AI), a serverless OSS endpoint (DeepSeek API, OpenRouter), or a frontier provider when quality demands it — never on a laptop or self-hosted box.

Cost comparison (live-doc-verified 2026-04-24)

Model	Hosted at	$/1M input	$/1M output	Tier
Qwen3-30B-A3B-FP8	Workers AI (`@cf/qwen/qwen3-30b-a3b-fp8`)	$0.051	$0.34	bulk (default)
Llama-3.3-70B-FP8-Fast	Workers AI	$0.29	$0.56	bulk (high-quality)
Qwen2.5-Coder-32B	Workers AI	$0.16	$0.48	bulk (code)
Kimi-K2.6 / GLM-4.7-Flash	Workers AI	varies	varies	bulk (Chinese OSS tier)
DeepSeek V4-Flash	DeepSeek direct API	$0.14	$0.28	standard (1M ctx, MIT)
DeepSeek V4-Pro	DeepSeek direct API	$1.74	$3.48	standard (heavy reasoning)
GPT-5.5	OpenAI via AI Gateway	$5.00	$30.00	frontier
Opus 4.7	Anthropic via AI Gateway	$15.00	$75.00	frontier

Phase A (shipped in Wave 4 PR) — tier-aware llm_invoke

New workers_ai provider uses env.AI.run() binding (zero egress, <10 ms).
New deepseek-v4-flash + deepseek-v4-pro model IDs; V3 deepseek-chat/deepseek-reasoner auto-aliased to V4 (deprecate 2026-07-24 per DeepSeek docs).
tier param: bulk | standard | frontier. Default = bulk.
Routes through CF AI Gateway ascend-workers-ai when CF_AI_GATEWAY_WORKERS_AI_SLUG is set.
ADR-027 documents decision + cost projections.

Phase B (shipped in Wave 4 PR) — context-plane upgrade + reranking

Vectorize index migrated from bge-small-en-v1.5 (384 dim) → bge-m3 (1024 dim, multilingual).
@cf/baai/bge-reranker-base reranks Vectorize top-50 → top-10 before D1 hydration in context_query.
All Workers AI calls route through AI Gateway for unified observability + caching.
ADR-028 (bge-m3 migration) + ADR-029 (LoRA adapter roadmap for future tenant-specific tuning).

Invariant #12 update

Invariant #12 “Every LLM call goes through AI Gateway” now includes Workers AI calls — not just external provider calls. The AI Gateway wraps both via the gateway: option on env.AI.run().

Cost projection

At Phase 2 Gong-extraction volumes (~500 transcripts/day once Kahuna is in production):

Before Wave 4 (DeepSeek V3): ~$6,300/month
After Wave 4 (Workers AI Qwen3-30B bulk tier): ~$450/month

Monthly savings: ~$5,850. Hardware-free. Scales with tenant count, not headcount.

Success criteria (added to Part V)

llm_invoke default tier = bulk → workers_ai qwen3-30b
DeepSeek V4 replaces V3 as standard tier default
Vectorize index at 1024 dim; bge-m3 embeddings live
Reranker reduces “wrong-semantic-match” false-positive rate measurably (A/B on 100 Kahuna queries)
AI Gateway ascend-workers-ai dashboard shows traffic from both gateway + context-worker
TOOL_METRICS dataset aggregates show >80% of llm_invoke calls land on workers_ai tier
Monthly LLM spend drops to <20% of pre-Wave-4 baseline within 30 days of Phase 2 GA

Phase A shipped receipt (2026-04-24)

Status: Shipped 2026-04-24 via PR “Wave 4 Phase A — Workers AI + DeepSeek V4 + tier-aware routing (hosted OSS first)” ADR: ADR-027 — hosted-OSS-first routing

Tier routing table (config-driven, zod-schema’d)

Tier	Provider	Model	$/1M (in/out)	Call path
`bulk` (default)	`workers_ai`	`qwen3-30b` (`@cf/qwen/qwen3-30b-a3b-fp8`)	$0.051 / $0.34	`env.AI.run()` binding — zero egress
`standard`	`deepseek`	`deepseek-v4-flash`	$0.14 / $0.28 (cache-miss) · $0.028 cached input	HTTPS
`frontier`	caller-set	caller-set	frontier-priced	HTTPS

Tasks shipped

#	Task	Files touched
A.1	Add `[ai] binding = "AI"` to wrangler.toml (prod + staging)	`wrangler.toml`
A.2	Extend `Env` type with `AI` binding, `CF_AI_GATEWAY_WORKERS_AI_SLUG`, explicit `DEEPSEEK_API_KEY`	`src/lib/types.ts`
A.3	New helper in `src/lib/ai-gateway.ts` — builds the `{gateway:{id,metadata}}` third-arg for `env.AI.run()`	`src/lib/ai-gateway.ts`
A.4	Extend `src/tools/llm-invoke.ts` — `workers_ai` provider, tier router, `WORKERS_AI_MODELS` catalog, DeepSeek V4 aliases, binding call path, response translator	`src/tools/llm-invoke.ts`
A.5	Update `TOOLS.md` `llm-invoke` row to reflect tier routing	`docs/requirements/TOOLS.md`
A.6	ADR-027 authored with cost math, live URLs, alternatives	`docs/decisions/ADR-027-*.md`
A.7	21 new tests (router, aliases, binding call, AI Gateway option, cost math, model catalog)	`test/tools/llm-invoke-tier-routing.test.ts`

Phase A exit criteria (all met)

npm run typecheck — clean
npm test — 588→609 green, zero regressions
npm run check:pre-commit — 11/11
wrangler deploy --dry-run — bundle within soft target (1102.72 KiB, +10.72 KiB)
ADR-027 merged with live-docs citations for every pricing claim
TOOLS.md row rewritten

Research receipts (all verified 2026-04-24)

Qwen3-30B pricing, ctx, tool calling: developers.cloudflare.com/workers-ai/models/qwen3-30b-a3b-fp8/
env.AI.run() signature + [ai] binding: developers.cloudflare.com/workers-ai/configuration/bindings/
AI Gateway third-arg gateway option: developers.cloudflare.com/ai-gateway/integrations/aig-workers-ai-binding/
DeepSeek V4 pricing + alias deprecation: api-docs.deepseek.com/quick_start/pricing

What’s NOT in Phase A (and why)

Cost-cap enforcement inside the gateway. Deferred to AI Gateway dashboard config (operator sets daily spend cap per gateway). Phase B of Wave 4 will pull the cap into KV for per-tenant differentiation.
tool-scopes.ts integration. The scope-map file doesn’t exist yet (slated for Wave 2 of Cloud-Native v2). llm_invoke stays mcp:read by convention; when scopes land, this tool gets one row, no behavior change.
Frontier-tier alias table. frontier is explicitly caller-driven to avoid accidental Opus spend. A future ADR may add explicit frontier-sonnet / frontier-opus aliases once usage is clearly understood.

Sign-off

This plan reflects 2026 edge-native engineering best practice for a multi-tenant GTM automation platform. Every recommendation cites live docs; every trade-off is named; every risk has a mitigation.

Wave 1 starts with your approval. Waves 2 + 3 proceed under the Non-Stop Protocol once Wave 1 is green. Wave 4 Phase A shipped 2026-04-24 ahead of Waves 2+3 because its blast radius is contained to llm_invoke and the cost win is immediate.

Ascend Cloud-Native Platform v2 — Engineering Plan

Ascend Cloud-Native Platform v2 — Engineering Plan

Executive summary

Part I — Current state (where we are)

What’s already cutting-edge (keep)

What’s drifted (repo docs vs reality)

What’s not cutting-edge (real gaps)

Part II — Target architecture (where we’re going)

Two-plane, fully-registered, OAuth-guarded, observable from anywhere

Core invariants (new canonical set — supersede both .claude/CLAUDE.md and v5-invariants.md)

Part III — Adoption roadmap

Wave 1 — Agent workflow cloud-native + drift cleanup (Week 1)

Wave 2 — MCP + Gateway cutting-edge (Week 2)

Wave 3 — Edge primitives + observability (Weeks 3–4)

Wave 4 — Polish + hardening (Week 4, optional)

Part IV — Trade-offs + risks

Trade-offs

Risks + mitigations

Part V — Success criteria

Part VI — Operating model after migration

How work happens day-to-day

How we measure health

Operating invariants

Part VII — What gets deleted

Part VIII — Wave 4 — Hosted-OSS-first inference (cost discipline, zero hardware)

Invariant update

Cost comparison (live-doc-verified 2026-04-24)

Phase A (shipped in Wave 4 PR) — tier-aware llm_invoke

Phase B (shipped in Wave 4 PR) — context-plane upgrade + reranking

Invariant #12 update

Cost projection

Success criteria (added to Part V)

Phase A shipped receipt (2026-04-24)

Tier routing table (config-driven, zod-schema’d)

Tasks shipped

Phase A exit criteria (all met)

Research receipts (all verified 2026-04-24)

What’s NOT in Phase A (and why)

Sign-off

References