Cloudflare Platform Audit — State of the Art for V5 Gateway
Cloudflare Platform Audit — State of the Art for V5 Gateway
Date: 2026-04-24
Target system: Ascend GTM V5 Gateway (Hono + Cloudflare Workers, D1, KV, R2, Vectorize, Workers AI, Durable Objects, CF Queues, CF Cron)
Method: Live fetch of developers.cloudflare.com and blog.cloudflare.com. No training-data facts. Every claim has a source URL. Docs last-updated dates embedded where shown on the page.
1. Workers Secrets Store
(a) Current state
- Status: Open beta as of 2026-04-16 (the Secrets Store overview page carries an “Available in open beta” banner).
- Scope: account-level secret, referenced from one or more Workers via a binding; not worker-scoped like
wrangler secret put. - Integration surfaces today: Workers and AI Gateway (BYOK). More surfaces planned.
- Worker access pattern is async: you get the plaintext value via
await env.BINDING.get()on every request (no syncenv.SECRET_NAMEusage). wrangler.toml/wrangler.jsoncshape:[[secrets_store_secrets]]binding = "MY_BINDING"store_id = "<STORE_ID>"secret_name = "MY_SECRET"- Local dev requires
wrangler secrets-store secret …without--remote(prod secrets are not readable fromwrangler dev --local).
How Secrets Store differs from wrangler secret put
| Attribute | wrangler secret put | Secrets Store |
|---|---|---|
| Scope | Per-Worker | Per-account (one store, many bindings, many Workers) |
| Rotation | Manual per Worker, redeploy required per script | Rotate once in the store, all bound Workers immediately see new value (no redeploy) |
| Access in code | env.SECRET_NAME synchronous | await env.BINDING.get() asynchronous |
| Visibility in dashboard | Secret is attached to the Worker | Central audit + listing across the account |
| Reuse across workers | Duplicated per Worker | Single source of truth |
| Used by AI Gateway BYOK | No | Yes (dedicated integration) |
(b) Application to V5
V5 today uses per-Worker wrangler secret put for 25+ credentials (HubSpot, Salesforce, Google, Slack, Gong, OpenAI, Anthropic, etc.). Adding the scheduler Worker, a dev Worker, preview Workers, etc. multiplies rotation pain linearly. Secrets Store solves that exactly.
Async .get() matters for hot paths: on a 100k req/day gateway, an extra await per request per secret needs to be scoped carefully. Best practice: call .get() once inside the provider adapter and cache in module scope only if the secret isn’t rotated mid-request (Secrets Store rotation is eventual, not instantaneous).
(c) Recommendation: ADOPT in Phase 3, not now
Reasons to wait:
- Still open beta (not GA) as of 2026-04-16. Production-critical credential plane on a beta is a real risk. No SLA.
- The async-on-read pattern is a non-trivial refactor across 53+ provider adapters (
resolveTokenData()style). The migration has to be boring and mechanical — worth doing in one sweep, not drive-by.
Do now: start an ADR (docs/architecture/decisions/) capturing the migration plan so the refactor lands the day after GA.
(d) Migration cost if adopted today
- 1 store creation + ~25 secrets import (1 hour).
- Adapter refactor: replace
env.OPENAI_API_KEYwithawait env.OPENAI_API_KEY.get()in every provider adapter + memoize per request context. Estimate 0.5–1 day at current adapter count. - CI/deploy pipeline change:
wrangler.jsoncusessecrets_store_secretsblocks; removewrangler secret putautomation scripts.
2. Cloudflare AI Gateway
(a) Current state
- Status: GA, available on all plans. Overview page last updated 2026-04-20.
- Unified API (OpenAI-compat) supports: Workers AI, Bedrock, Anthropic, Azure OpenAI, Baseten, Cartesia, Cerebras, Cohere, Deepgram, DeepSeek, ElevenLabs, Fal AI, Google AI Studio, Google Vertex AI, Groq, HuggingFace, Ideogram, Mistral, OpenAI, OpenRouter, Parallel, Perplexity, Replicate, xAI. Source: same overview page.
- Feature matrix (per overview + features sub-pages):
- Caching — exact-request cache, per-request control via
cf-aig-cache-ttl/cf-aig-skip-cache/cf-aig-cache-key.cf-aig-cache-status: HIT|MISS. Semantic caching is on the roadmap but not shipped. Source: https://developers.cloudflare.com/ai-gateway/features/caching/ - Rate limiting — per-gateway and per-user.
- Dynamic routing (Beta) — visual or JSON-configured route graph: Start → Conditional / Percentage / RateLimit / BudgetLimit → Model nodes → End. Supports per-user-plan branching, A/B, gradual rollouts, budget caps with fallback. Source: https://developers.cloudflare.com/ai-gateway/features/dynamic-routing/
- Guardrails (Beta) — input/output policy enforcement.
- DLP (Beta) — data-loss prevention on prompts/responses.
- BYOK / Store Keys (Beta) — upstream provider keys live in the gateway (optionally Secrets-Store-backed), your Worker carries only the gateway key.
- Custom providers (Beta) — bring-your-own OpenAI-compatible endpoint.
- Custom costs — override per-model cost values for accurate spend tracking when providers aren’t priced yet by Cloudflare.
- OpenTelemetry export — traces/metrics to any OTel collector.
- Logpush — request/response logs to R2/S3/Datadog/etc.
- WebSockets API (Beta) — realtime + non-realtime.
- Caching — exact-request cache, per-request control via
(b) Application to V5
V5 calls OpenAI, Anthropic, Google Gemini, OpenRouter, DeepSeek, Groq, plus Workers AI directly. That means:
- No unified observability (each provider’s dashboard is a silo).
- No cross-provider cost visibility (custom Analytics-Engine plumbing covers part of it but is incomplete).
- No cache layer → we pay for duplicate extraction prompts on Gong transcripts, ICP scoring, etc.
- No fallback chain → one provider 429 = user-visible failure.
- No prompt/response archive for training-data + incident forensics.
Integration pattern for a Worker already doing fetch('https://api.openai.com/v1/chat/completions', ...):
- Change base URL to
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_slug}/openai. - Same request body; the gateway is transparent.
- For multi-provider: use the Unified API path
/v1/{account_id}/{gateway_slug}/compat/chat/completionswithmodel: "openai/gpt-4o",model: "anthropic/claude-sonnet-…"etc. Code becomes one client, many providers.- Source: unified API page reachable via https://developers.cloudflare.com/ai-gateway/ (Unified API section in sidebar).
(c) Recommendation: ADOPT NOW (top priority)
This is the single highest-leverage change available. Caching alone likely recovers 10–25% of LLM spend; dynamic routing replaces hand-rolled fallback logic; Logpush replaces homebrew prompt logging; OpenTelemetry integrates cleanly with Workers observability.
(d) Migration cost
- ~2 hours for the core gateway flip (one feature flag + base URL change across the 5–7 LLM adapters).
- ~1 day for dynamic-routing migration (move the current model-router logic into a dynamic route, so it’s declarative + versioned instead of code).
- ~0.5 day for Logpush → R2 (archive) + optional Datadog/Axiom.
- ~0.5 day for BYOK — requires keys to be uploaded to gateway; store them in Secrets Store (gated on §1 adoption) or inline in the gateway.
No functional risk if we flip one provider at a time. Start with OpenRouter (highest call volume, easiest rollback) → DeepSeek → Groq → Anthropic → OpenAI → Gemini.
3. Workers Pipelines
(a) Current state
- Status: Open beta, Workers Paid only. Docs last updated 2026-04-21.
- Architecture: Streams → Pipelines (SQL transform) → Sinks.
- Streams: durable buffered queues, ingested via HTTP endpoint or Worker binding.
- Pipelines: SQL (filter / project / transform / enrich) applied as events flow through.
- Sinks: R2 as Apache Iceberg tables (via R2 Data Catalog) or Parquet/JSON files.
- Guarantees: durable ingestion + exactly-once delivery to R2.
- Pricing: outside R2 storage/ops costs, not currently billed.
- Setup:
npx wrangler pipelines setup. - Source: overview + https://developers.cloudflare.com/pipelines/pipelines/ +
pipelines setupcommand listed in wrangler reference.
Pipelines vs Queues — decision matrix
| Dimension | Queues | Pipelines |
|---|---|---|
| Primary purpose | Async work dispatch | Analytical ingest to lakehouse |
| Delivery target | Worker consumer (JS code) | R2 (Iceberg/Parquet/JSON) |
| Ordering | Per-queue FIFO (best effort) | Stream order preserved |
| Delivery semantics | At-least-once | Exactly-once to R2 |
| Transformations | In your Worker consumer | SQL at ingest time |
| Backpressure | Queue depth + retries | Buffered streams |
| Best for | ”do something when X happens" | "store every X event for analytics” |
| Max payload | 128 KB message | TBD per stream |
(b) Application to V5
Phase 2 ingestion (Gong, Salesforce, HubSpot event streams) has two distinct patterns:
- “Act on this event now” — new call recorded → run transcript extraction → upsert signal. This is a Queues job, clearly.
- “Archive every event for replay + analytics” — raw webhook payload, every field, forever. Currently written to D1 or KV by the ingest Worker. Pipelines + Iceberg in R2 is strictly better: columnar, queryable by external tools, no D1 row-count bloat.
(c) Recommendation: ADOPT for archival ingest in Phase 2; KEEP Queues for job dispatch
They are complementary, not a replacement. Use Pipelines to write an append-only “raw events” lake in R2 (one Iceberg table per source), Queues to drive business-logic reactions.
(d) Migration cost
- Pilot with Gong webhook stream (~0.5 day to set up stream + pipeline + R2 Iceberg sink + SQL validator).
- Second pilot with Salesforce CDC (~0.5 day, same pattern).
- Caveat: still beta, so do not deprecate existing D1 event logs until Pipelines goes GA.
4. Workers Workflows
(a) Current state
- Status: GA since 2025-04-07. Available on Free and Paid plans.
- Source (GA blog): https://blog.cloudflare.com/workflows-ga-production-ready-durable-execution/
- Docs overview last updated 2026-04-22: https://developers.cloudflare.com/workflows/
- Primitives:
step.do()(durable retry),step.sleep(),step.sleepUntil(),step.waitForEvent()(pause for webhook/user input with timeout), plus programmatic trigger/pause/resume/terminate. - Python SDK is beta; TypeScript is GA.
- Visualize Workflows is in beta.
- Source: sidebar + Workers API subpages.
Workflows vs Queues + DO-alarms for multi-step pipelines
| Concern | Queues + DO alarms | Workflows |
|---|---|---|
| Durable step retry | Hand-rolled | Built-in, per-step |
| State across steps | DO storage writes | Implicit (step return values) |
| Pause for human/webhook | DO alarm + polling or webhook callback that writes DO state | step.waitForEvent() native |
| Long sleep (hours/days) | DO alarm | step.sleepUntil() native |
| Debugging / observability | Logs + manual DO state dump | Built-in step-by-step UI |
| Code complexity | High (state machine in app code) | Low (linear code with await step.do()) |
| Failure blast radius | You own retry correctness | CF owns retry correctness |
(b) Application to V5
Concrete example from the brief: “Gong transcript → keyword pre-filter → LLM extraction → D1 insert → Vectorize upsert → signal evaluation”. Today this is (presumably) a queue consumer that runs all steps in one invocation and retries the whole thing on failure. That has two problems:
- An LLM rate-limit blip re-runs the keyword pre-filter (wasted compute, duplicate D1 writes if not perfectly idempotent).
- A CPU-time hit on one step kills the whole invocation.
Workflow form:
export class GongIngestWorkflow extends WorkflowEntrypoint { async run(event, step) { const text = await step.do("fetch-transcript", () => gong.fetch(event.payload.callId)); const keep = await step.do("keyword-prefilter", () => preFilter(text)); if (!keep) return { skipped: true }; const extract = await step.do("llm-extract", { retries: { limit: 5 } }, () => llm.extract(text)); await step.do("d1-insert", () => d1.insertSignal(extract)); await step.do("vectorize-upsert", () => vectorize.upsert(extract.embedding)); await step.do("signal-eval", () => signals.evaluate(extract)); }}Each step retries independently with its own backoff. A D1 transient failure only re-runs the D1 step, not the LLM call.
(c) Recommendation: ADOPT NOW for the Gong + Salesforce multi-step pipelines; keep Queues for single-step dispatch
Human-in-the-Loop ad-spend approval (in current roadmap) is the textbook step.waitForEvent() use case — replace whatever cron/alarm polling exists today.
(d) Migration cost
- 1 day to port the Gong pipeline. Lock in the pattern.
- 0.5 day per additional multi-step job (SFDC enrichment, ad-spend approval, client-report generation).
- Zero runtime risk — Workflows run alongside existing Workers, no platform change needed.
5. Workers Browser Rendering / Browser Run
(a) Current state
- Renamed to “Browser Run” (formerly Browser Rendering). Still the same product with expanded surface.
- Source: https://developers.cloudflare.com/browser-rendering/ (renamed product page).
- Two integration tiers:
- Quick Actions (stateless, no code): HTTP endpoints
/content,/screenshot,/pdf,/markdown,/snapshot,/scrape,/json(AI-structured extraction),/links,/crawl(beta). - Browser Sessions (stateful): Puppeteer, Playwright, Chrome DevTools Protocol, Stagehand (AI-driven, finds elements by intent), Playwright MCP (for LLM agents).
- Quick Actions (stateless, no code): HTTP endpoints
- New features relative to earlier versions: Live View (beta), Human-in-the-Loop (beta), Session recording, WebMCP (beta), Custom fonts, Session reuse.
- Pricing (Workers Paid): 10 browser-hours included/mo, then $0.09/hr. 10 concurrent browsers averaged/mo included, then $2.00/extra. 120 concurrent max on paid plan.
- Source: https://developers.cloudflare.com/browser-rendering/platform/pricing/ and /limits/.
- Browser timeout 60s default.
Does it replace Playwright for scrape/screenshot/PDF?
- For scraping / screenshots / PDFs without behavioral complexity: yes, Quick Actions are strictly better (zero code, pay-per-use, edge-local, no infra).
- For full Playwright scripts (form flows, auth, multi-page agentic work): no replacement needed — Browser Run runs Playwright via CDP. You write the same Playwright code and connect to CF’s browsers instead of local Chromium.
- AI agent browsing: Stagehand + Playwright MCP is state-of-the-art here (as of docs). Current rule
~/.claude/rules/print-pdf.mdstill applies if you are generating PDFs with image fidelity — the 2× viewport trick is a Chromium print-renderer fact, not a CF fact.
(b) Application to V5
Use cases that fit:
- Competitor site snapshots for the GTM researcher subagent (
/screenshot,/markdown). - Client-report PDF generation (replace any local wkhtmltopdf/Playwright infra).
- AI-structured extraction from customer websites for ICP scoring (
/jsonendpoint with a schema). - Crawl endpoint for competitor full-site ingestion at Phase 3.
(c) Recommendation: ADOPT NOW for Quick Actions (screenshot, PDF, markdown, json); ADOPT Stagehand when AI agents go autonomous in Phase 3
Explicit reject: do not move the existing Playwright-based print-PDF pipeline unless the output matches the 2× viewport + html{zoom:2} + scale:0.5 trick (see ~/.claude/rules/print-pdf.md). Quick Actions /pdf uses the same Chromium print renderer, so the same fidelity rules apply — test before switching.
(d) Migration cost
- Quick Actions: zero code, just HTTP calls. Hours, not days.
- Stagehand integration: ~1 day to wrap in an ascend-gateway tool.
6. Workers RPC + Service Bindings (WorkerEntrypoint)
(a) Current state
- Service bindings support two modes:
HTTP(legacy fetch-over-binding) and RPC (WorkerEntrypoint), which is the recommended pattern. - Pattern:
// worker-bimport { WorkerEntrypoint } from "cloudflare:workers";export class Upstream extends WorkerEntrypoint {async ping(x: number) { return x + 1; }}export default Upstream;// worker-a calling worker-bconst out = await env.UPSTREAM.ping(42); // zero-copy, no JSON, no HTTP
- Supports promise pipelining:
await env.UPSTREAM.getUser(id).profile.emailchains across the binding with a single round trip. - RPC works for Service bindings and Durable Object stubs, with the same semantics.
- TypeScript: fully typed across bindings using
@cloudflare/workers-types+ wrangler’s types gen. - Local dev:
wrangler devsupports multi-service dev (separate terminal per service) with live RPC.
(b) Application to V5
Current V5 is a single Worker (Hono). As we split (scheduler Worker, long-running pipeline Worker, admin Worker, webhook dispatcher), RPC is the right boundary:
- No HTTP overhead between Workers.
- Type-safe contracts (your IDE catches cross-worker breakage).
- Works as the back-end for the ADR-013 Headless 360 split without re-plumbing.
Explicit best practice as of 2026: export class X extends WorkerEntrypoint {} not export default { fetch } when the Worker is meant to be called from another Worker.
(c) Recommendation: ADOPT NOW for any new multi-Worker surface
Active items that benefit:
- Scheduler V5 (separate Worker per the spec in memory) → expose as
SchedulerEntrypointRPC, main gateway calls it via binding. - Admin Worker (see §12): behind Access, RPCs into the main gateway for read-only data.
- Any future “heavy compute” worker (bulk embedding, batch PDF rendering) that should not share the gateway’s CPU budget per request.
(d) Migration cost
- Zero for net-new workers — start them as
WorkerEntrypoint. - If an existing internal HTTP Worker call gets converted, it’s ~1 hour per call site (change
fetch(url)toawait env.BINDING.method()+ remove JSON boilerplate).
7. Durable Objects — SQLite vs KV storage, current patterns
(a) Current state
- SQLite-backed DOs are GA and recommended for all new namespaces. KV-backed storage is marked “(Legacy)” in the sidebar and “for backwards compatibility” in best-practices docs.
- Source: https://developers.cloudflare.com/durable-objects/ (“SQLite storage and corresponding Storage API methods like sql.exec have moved from beta to general availability. New Durable Object classes should use wrangler configuration for SQLite storage.”)
- Source: https://developers.cloudflare.com/durable-objects/best-practices/access-durable-objects-storage/ (“Cloudflare recommends all new Durable Object namespaces use the SQLite storage backend.”)
- SQLite DOs offer:
- SQL (
sql.exec) + KV API on the same storage. - Structured tables (multi-column), indexes.
- Point-in-time recovery up to 30 days.
- Free tier eligibility.
- Storage billing enabled Jan 2026 (per overview).
- SQL (
- WebSocket hibernation — keep stateful WebSocket connections without holding DO in memory. Still best practice.
- RPC methods on DOs — define methods on the DO class, call via stub:
await stub.myMethod(args). Preferred overfetchfor internal calls. - Alarms — schedule future DO execution at a per-object timestamp. Good for per-tenant scheduled work.
- Key-value-only DOs can migrate to SQLite in future (migration path announced, not shipped as of 2026-04).
(b) Application to V5
V5 uses DOs for OAuth (token cache + refresh lock). That’s a small-schema, high-read, low-write use case. Current backend (whichever) is cheap.
However:
- If the current V5 OAuth DO is KV-backed → migrate to SQLite when the migration path ships, or recreate the DO class for new tenants immediately.
- Rate-limiting (if we add it per-tenant) is a textbook SQLite-DO use case (time-window counters in a tiny indexed table).
- Scheduler V5 per-tenant state → SQLite DO with alarms is the correct primitive.
(c) Recommendation: ADOPT SQLite DO for all new DO classes. Plan migration for OAuth DO.
Explicit: use RPC methods on the DO class, not fetch over the stub. Hibernate WebSockets. Use alarms for scheduled per-object work.
(d) Migration cost
- New DO classes: zero extra cost, just use SQLite config (
new_sqlite_classesinwrangler.toml). - Existing KV-backed OAuth DO: blocked on CF-provided migration path. Track with an open tech-debt row; do not do manual migration.
8. Cloudflare Containers on Workers
(a) Current state
- Status: GA, Workers Paid only. Docs last updated 2026-04-21.
- Model: a Container class extends Durable Object; each instance runs your OCI image with
defaultPort,sleepAfter,max_instances. Routed via a DO namespace. - Pattern:
getContainer(env.MY_CONTAINER, sessionId).fetch(request)from a Worker. - Use cases (per docs):
- Resource-intensive workloads (multiple CPU cores, lots of memory/disk).
- Full filesystem / specific runtime / Linux-only libs.
- Pre-existing images (e.g. headless apps shipped as containers).
(b) Application to V5
V5 is 100% Workers today. Evaluated containers for:
- LLM extraction: stays on Workers (bounded CPU per request).
- Large file processing (Gong transcript > 10MB, bulk embedding): Workers 30s CPU limit + 128MB memory can be a ceiling. Containers would help, but Workflows + chunking is the first-line fix.
- Pandoc / custom report rendering with heavy native deps: possible fit.
- Third-party binaries we don’t control: possible fit.
(c) Recommendation: NOT NOW. Re-evaluate at Phase 3 when specific CPU/memory ceilings are breached.
Moving to containers means giving up a big part of the Workers edge (cold-start advantage, zero infra, request-level isolation) and adding new ops surface (image building, size discipline, SSH). Today V5 has no CPU/memory pressure that Workflows-chunking can’t solve.
Explicit reject (at current phase): no V5 workload meets the “resource-intensive, needs full filesystem, requires specific runtime” bar.
(d) Migration cost if adopted
- Per container: Dockerfile + wrangler
[[containers]]block + DO class binding (~0.5 day for first one). - Ongoing: image size management, patching, scanning. Non-trivial ops overhead.
9. Hyperdrive
(a) Current state
- Status: GA, Free + Paid. Supports Postgres + MySQL (MySQL is labeled Beta).
- Features:
- Global connection pool — a hot pool maintained close to the Worker, removes cold TCP handshake + auth handshake per request.
- Query caching — idempotent queries cached at the edge.
- Private database support via Tunnel (beta) — reach DBs inside a VPC without public exposure.
- Credential rotation — supported.
- Works with standard drivers:
pg,postgres.js,mysql2,drizzle,prisma.
(b) Application to V5
V5 has no Postgres or MySQL today (D1 is the source of truth; KV/R2/Vectorize cover the rest). Hyperdrive is only relevant if we add:
- A client-owned Postgres (some PE clients will have their own warehouse we need to read).
- A shared Postgres for analytics we don’t want in D1 (D1 has row/size limits that Pipelines+R2 or external Postgres can beat for wide analytics).
(c) Recommendation: NOT NOW. ADOPT immediately if we ever connect a Worker to an external Postgres/MySQL.
Direct connection pools from Workers do not work well at scale — each invocation is isolated, TCP handshakes are sequential, and credential rotation is painful. Hyperdrive is the only sane pattern.
(d) Migration cost
- Near-zero:
[[hyperdrive]]binding + connection-string swap in the driver. ~30 minutes the day we actually need it.
10. Workers Logs + Logpush
(a) Current state
- Workers Logs — built-in structured log store with query UI in the dashboard.
- Source: https://developers.cloudflare.com/workers/observability/logs/workers-logs/
- Retention: Free 3 days, Paid 7 days.
- Limits: 5B logs/account/day (overflow auto-sampled at 1% head). 256 KB max single log.
- Pricing (Paid): 20M log events/month included, then $0.60 per additional million.
- Head-based sampling: set
head_sampling_rateinwrangler.toml observabilityblock (e.g.0.1for 10%).
- Workers Logpush — streams logs out of CF to external destinations (no 7-day retention cap).
- Source: https://developers.cloudflare.com/logs/get-started/enable-destinations/
- Supported destinations: R2, Cloudflare Pipelines (to R2/Iceberg/Parquet), generic HTTP, Amazon S3, S3-compatible endpoints, Datadog, Elastic, Google Cloud Storage, BigQuery, Azure, New Relic, SentinelOne, Splunk, Sumo Logic, Amazon Kinesis, IBM QRadar, IBM Cloud Logs. Plus third-party integrations: Axiom, Taegis, Exabeam, Sekoia.
- Dedicated egress IPs available.
- Tail Workers — a Worker that receives log events from other Workers in real time (for custom processing before export).
(b) Application to V5
Today V5 relies on dashboard logs + Analytics Engine telemetry. Two concrete problems this solves:
- Retention > 7 days: every incident post-mortem that reaches back more than a week is blind. Logpush to R2 (cheapest path) solves this with Parquet-friendly format forever.
- Structured query: Workers Logs UI supports filtering on JSON fields — if we log
{event: "tool_call", tenant: "...", tool: "...", duration_ms: ...}we get a free tool-telemetry view without standing up Grafana/Datadog.
(c) Recommendation: ADOPT NOW
Specifically:
- Enable Workers Logs on every Worker (it’s a one-line
wrangler.tomlchange); sethead_sampling_rateto 1.0 (100%) for now since traffic is below the 20M free/mo threshold. - Enable Logpush → R2 with Parquet format for long-term forensics. Cost is R2 storage + ops, trivial at current volume.
- Convert all
console.log(...)toconsole.log(JSON.stringify({...}))— a structured logging sweep across adapters. ~0.5 day. - Defer Datadog/Grafana integration until traffic or incident frequency warrants paying for it.
(d) Migration cost
- ~2 hours total across the changes above. This is the lowest-effort high-value item in the audit.
11. Workers Analytics Engine
(a) Current state
- Status: GA. Unlimited-cardinality custom analytics with SQL API.
- Write path:
env.MY_AE.writeDataPoint({ blobs, doubles, indexes })inside a Worker. - Query path:
- SQL API (REST) —
SELECT blob1, SUM(double1) FROM ... GROUP BY .... - Grafana — native Cloudflare plugin for AE.
- Worker-based query — read AE via SQL API from a Worker and expose a JSON endpoint.
- SQL API (REST) —
- Sampling: AE samples when cardinality/volume is high; aggregation queries are sample-adjusted via
_sample_interval.
(b) Application to V5
Already used for tool telemetry (per project CLAUDE.md). 2026 best practice from docs:
- One AE dataset per logical concern (
tool_telemetry,llm_spend,tenant_usage), not one giant dataset. - Always set
indexes: [tenant_id]on every write so billing aggregation is cheap. - Query from a
/admin/analytics/*route (behind Access — see §12) in a Worker; don’t point Grafana at it directly from the public internet.
(c) Recommendation: AUGMENT in-place
- Audit existing
writeDataPointcalls for missingindexes(tenant). - Stand up a Grafana Cloud org + AE plugin for non-engineering dashboards (spend/usage per client). The AE plugin is free; Grafana Cloud free tier is enough for now.
- Build
/admin/analytics/summary.json(Worker endpoint behind Access) as the canonical source, rather than having every client build their own AE query.
(d) Migration cost
- ~0.5 day for the index audit + Grafana setup.
12. Cloudflare Access / Zero Trust — admin endpoint protection
(a) Current state
- Cloudflare Access is the right primitive for
/admin/*on a Worker. Self-hosted apps docs describe putting a policy in front of any domain/path. - Modern auth primitives supported: WebAuthn (passkeys), SAML/OIDC SSO, TOTP, Google/GitHub/Okta/Azure/OneLogin IdPs, Service Tokens for machine-to-machine.
- Managed OAuth (Beta) — lets Access broker OAuth flows for third-party apps.
- JWT is issued after auth and presented as
Cf-Access-Jwt-Assertionheader — Workers can validate it without a round trip.- Source (JWT validation): https://developers.cloudflare.com/cloudflare-one/applications/configure-apps/validate-jwts/
- Require MFA / Independent MFA per app is supported.
(b) Application to V5
V5’s admin path today is an API-key header. That’s:
- Shared across users (no per-admin audit trail).
- No session management.
- Rotation is painful.
- Not revokable without deploying a new key.
Access + passkey:
- Per-admin identity.
- WebAuthn = phishing-resistant, no password.
- Instant revocation from dashboard.
- Workers validate the
Cf-Access-Jwt-Assertionand derive admin identity.
(c) Recommendation: ADOPT NOW for /admin/* paths on V5
- Create a Self-hosted Access Application for
ascend-gateway-v5.ascendgtm.workers.dev/admin/*(and any custom domain). - Policy:
Require → Email → mishaal@ascendgtm.net+Require → Authentication method: WebAuthn. - Add a service token for programmatic admin (Claude Code / automation) with its own policy row.
- Keep the API-key check as defense-in-depth for the near term (belt + suspenders).
(d) Migration cost
- ~1 hour for the Access config, ~2 hours to add JWT validation middleware in the Worker, ~30 min per automation client to switch to service tokens.
13. Email Workers
(a) Current state
- Email Routing + Email Workers is GA. A Worker receives inbound email via an
async email(message, env, ctx)handler, withmessage.from,message.to,message.raw,message.setReject(),message.forward(), and reply/send APIs.- Source: https://developers.cloudflare.com/email-routing/email-workers/
- Reply/send from Workers: https://developers.cloudflare.com/email-routing/email-workers/reply-email-workers/ and /send-email-workers/
(b) Application to V5
Real use cases:
- Inbound HubSpot/SFDC webhook-via-email — some legacy systems only alert via email. An Email Worker can parse the payload and trigger a Queue/Workflow.
- Customer BCC to a dedicated address for Gong-like transcript ingestion without external SaaS.
- Agent “drop a transcript into inbox → pipeline runs” UX — Mishaal-only ingest path.
(c) Recommendation: NOT NOW (no current triggering use case). Keep as a tool in the toolbox for Phase 3.
Explicit reject reason: V5 has no inbound-email dependency today. Building this speculatively violates the Karpathy simplicity-first rule.
(d) Migration cost
- ~1 hour per Email Worker the day we actually need one.
14. Cloudflare Tunnel / Zero Trust Network
(a) Current state
cloudflared+ Cloudflare Tunnel exposes private services to CF’s edge without opening inbound firewall ports.- Supported use cases (per docs): SSH, RDP, SMB, gRPC, VNC, private hostnames, private IP/CIDR ranges, browser-rendered SSH/RDP terminals, Access for Infrastructure.
(b) Application to V5
Historical context: the old VPS (Hetzner, Bridge API 8888) is decommissioned per project CLAUDE.md. V5 is 100% on Cloudflare Workers edge. There is no private origin to front.
Possible future uses:
- Client-owned Postgres reachable only on their VPC — Tunnel is the sanctioned way for Hyperdrive to reach it.
- A self-hosted Cal.com or SES relay if Scheduler V5 goes partly off-CF.
(c) Recommendation: NOT NOW. ADOPT when we first need to reach a client’s private network for Hyperdrive or a self-hosted service.
Explicit reject for current architecture: there is no on-prem or VPC-only service to tunnel to. Adding Tunnel to a 100% edge stack is pure overhead.
(d) Migration cost
- ~0.5 day per private origin when the need arises (install cloudflared in their VPC, define hostname + policy).
Prioritized adoption roadmap
Top 5 — adopt immediately (this sprint)
- Workers Logs + Logpush to R2 (§10) — lowest-effort, highest-value. 2 hours. Unblocks >7-day forensics and structured incident review.
- AI Gateway — unified API, caching, dynamic routing, Logpush (§2) — single biggest LLM cost + resilience lever. 1–2 days total, per-provider rollout.
- Cloudflare Access on
/admin/*(§12) — phishing-resistant admin auth. 3 hours. - Workflows for the Gong/SFDC multi-step pipelines (§4) — retries,
waitForEventfor HITL ad-spend. 1 day for first pipeline, 0.5 day each after. - SQLite DOs for all new DO classes (§7) — free upgrade for any new stateful surface (scheduler, per-tenant rate limiting, session objects). Zero cost if done net-new.
Next 5 — adopt in Phase 3 / when trigger hits
- Secrets Store (§1) — wait for GA. ADR now, rollout the day it GAs.
- Workers Pipelines for analytics/archive ingest (§3) — pilot with Gong webhook stream; beta-acceptable for archive-only path.
- Browser Run Quick Actions + Stagehand (§5) — fold in as the GTM researcher subagent + client report PDF renderer.
- Workers RPC + multi-worker split (§6) — triggered by ADR-013 Headless 360 / Scheduler V5 split. Use
WorkerEntrypointfrom day one on the new Workers. - Analytics Engine augmentation + Grafana (§11) — tenant-indexed AE + Grafana Cloud dashboards for client usage/spend visibility. 0.5 day.
Explicit rejects (do NOT adopt at this phase)
- Containers on Workers (§8) — no workload today meets the bar; retry at Phase 3 if CPU/memory ceilings bite.
- Hyperdrive (§9) — no Postgres/MySQL in scope. Ready to adopt the day we add one.
- Email Workers (§13) — no inbound-email dependency. Speculative build violates simplicity-first.
- Cloudflare Tunnel (§14) — no private origin to front. V5 is 100% edge. Adopt when client VPCs enter scope.
Cross-cutting rules that should land alongside the roadmap
- Every new Worker:
observability.enabled = true,head_sampling_rate = 1.0, Logpush ruleset attached. - Every new DO namespace: SQLite backend.
- Every new inter-Worker call:
WorkerEntrypointRPC, not internalfetch. - Every new admin endpoint: behind Access.
- Every new LLM call: through AI Gateway (so caching, fallback, and observability are free).