Workers Workflows for Gong + SFDC Ingestion; SQLite DOs Mandatory Going Forward

ADR-026 — Workers Workflows for Gong + SFDC Ingestion; SQLite DOs Mandatory Going Forward

Status: Accepted (2026-04-24) Supersedes: Phase 2 Tasks 2.5 / 2.6 ack-only queue scaffold Related: ADR-016 (Context Plane), V5 Cloud-Native v2 Engineering Plan §Wave 3 Research receipts:

https://developers.cloudflare.com/workflows/ (overview, last updated 2026-04-22)
https://developers.cloudflare.com/workflows/build/workers-api/ (API: step.do, retries, step.sleep, step.waitForEvent, WorkflowEntrypoint)
https://developers.cloudflare.com/workflows/get-started/guide/ (wrangler binding syntax, .create() pattern)
https://blog.cloudflare.com/workflows-ga-production-ready-durable-execution/ (GA announcement 2025-04-07)
https://developers.cloudflare.com/durable-objects/ (SQLite DO GA, recommended for new classes)
https://developers.cloudflare.com/durable-objects/best-practices/storage-api/ (storage API — ctx.storage.sql vs legacy ctx.storage.put)

Context

Phase 2 (ADR-016) landed the context worker with queue consumer scaffolds (handleGongQueue, handleSalesforceQueue) that ACK every message as a no-op. The Phase 2 plan deferred the real ingestion logic to Tasks 2.5 (Gong, 12h) and 2.6 (SFDC, 8h).

Rather than implement those as in-handler pipelines (keyword-prefilter → LLM-extract → D1-insert → Vectorize-upsert → signal-eval all in one queue consumer invocation), the Wave 3 plan mandates Workers Workflows. Reasons:

Durable step retry. If the LLM rate-limits on step 3, the keyword prefilter + D1 reads on steps 1-2 DON’T re-run (wasted compute, risk of non-idempotent writes). Workflows checkpoint each step; only the failing step replays.
Replay-safe on failure. A transient 5xx from D1 re-runs only the D1 write, not the LLM call. Over a year of traffic, this compounds to materially lower LLM spend and faster recovery.
CF-managed retry semantics. step.do(name, {retries: {limit, delay, backoff}}) — configured per-step, no hand-rolled exponential-backoff library (invariant #3: no retry libraries on the gateway; Workflows is CF- native, not a library).
Observability. CF dashboard shows per-step success/fail/retry count without us standing up Grafana.
Spec-match. The CF platform audit (§4, “Workflows vs Queues + DO alarms”) names Gong-style multi-step ingestion as the textbook use case.

A secondary concern landed in the same wave: SQLite Durable Objects are GA and are the recommended backend for ALL new DO classes. The CF docs sidebar labels KV-backed DOs “(Legacy)”. Since we’re touching ingestion + DO-adjacent plumbing, codify the rule now rather than wait.

Decision

1. Workers Workflows for all multi-step ingestion pipelines

Gong and SFDC ingestion become Workers Workflows:

context-worker/src/workflows/gong-ingest.ts (GongIngestWorkflow)
context-worker/src/workflows/salesforce-ingest.ts (SalesforceIngestWorkflow)

Queue consumers (handleGongQueue, handleSalesforceQueue) become producer-only: read the queue message, call env.GONG_WORKFLOW.create({id, params}), ack. Business logic lives in the Workflow’s run() method as named step.do(...) calls.

Step sequence for Gong (matches the plan doc):

keyword-prefilter — load ingest_keywords:{tenant} from KV.
insert-engagement-only (conditional branch) — low-confidence fact if no keyword match; workflow terminates.
llm-extract — call llm_invoke on the gateway via the GATEWAY_SERVICE service binding. DeepSeek V3.5 pinned + JSON mode + temperature 0. Retries: 3, exponential backoff starting at 5s.
persist-facts-{i} (per fact) — insert-or-supersede via resolveConflict() in a D1 batch transaction.
schedule-embeddings — call scheduleFactEmbed() for each committed fact (fail-open via ctx.waitUntil).
emit-signal-eval — one signal_evaluations row per batch with the committed fact IDs for replay/audit.

Step sequence for SFDC is analogous but the LLM step only runs for free-text fields (Description, Notes__c). Structured fields (StageName, Amount, CloseDate, Industry, etc.) are translated deterministically via src/lib/sfdc-translator.ts and land as source_authority = SfdcOpportunityStage | SfdcCustomField.

Workflow instance IDs are deterministic (gong-{tenant}-{call_id}, sfdc-{tenant}-{object_type}-{object_id}) so queue re-delivery results in a duplicate-ID error which we treat as “already processed” and ack.

2. SQLite Durable Objects mandatory for new classes

All new Durable Object classes MUST use SQLite storage:

wrangler.toml: add to [[migrations]] new_sqlite_classes = [...].
code: use this.ctx.storage.sql.exec(...).

Codified in .claude/rules/v5-invariants.md.

3. TokenManager already SQLite-backed — no migration needed

Verified 2026-04-24 by reading src/do/token-manager.ts:

Line 27: this.sql = ctx.storage.sql;
Constructor uses this.sql.exec(CREATE TABLE IF NOT EXISTS ...)
All reads/writes go through this.sql.exec().

Plus src/wrangler.toml line 36: new_sqlite_classes = ["TokenManager"].

No migration script is needed. The task brief instructed “If it’s still KV API: migrate to SQLite … Write scripts/migrate-do-to-sqlite.ts” — the precondition is false, so the migration script is explicitly NOT written.

Consequences

Positive

Gong/SFDC pipelines gain free durable retry + replay.
CF dashboard provides per-step observability for free.
LLM spend reduced on transient-failure paths (only the failing step re- runs; keyword prefilter + D1 write don’t pay twice).
New primitive (Workflows) codified with a canonical pattern + reusable step helpers (stepJson wrapper for Serializable constraint on unknown-typed payload fields).
Future multi-step pipelines (HITL ad-spend approval, client-report generation) have a ready template.

Negative

Two new primitives (Workflows + service bindings) to learn + monitor.
Step replay semantics require every step to be idempotent — harder than a “just run it” inline pipeline. Mitigated by deterministic fact IDs (hashString(source|subject|predicate|object)) + INSERT OR IGNORE on engagement-only rows.
cloudflare:workers isn’t resolvable in Node-env vitest; we added a test shim at context-worker/test/_shims/cloudflare-workers.ts aliased via vitest.config.ts. Tests exercise step orchestration by injecting a fakeStep that invokes callbacks synchronously.

Trade-offs named

Per-fact step vs batched step. We chose per-fact (persist-facts-0, persist-facts-1, …) so a D1 contention error retries one fact, not the whole batch. Cost: more step records per run. Budget: Workflows caps at 25,000 steps per run; an extraction that yields 50 facts uses ~60 steps (well within budget).
LLM call via service binding vs direct. Going through the gateway’s llm_invoke keeps provider routing + prompt-caching discount logic in one place. Cost: one extra Worker hop (~1ms). When the gateway exports a WorkerEntrypoint class, this flips to native RPC with zero code changes in the extractor (documented in src/lib/llm-extractor.ts top-of-file).
JSON marshaling at step boundaries. ExtractedFact.object and CommittedFact.object are unknown — not Serializable per CF types. The stepJson helper stringifies inside the step and parses at the caller side. This matches what the Workflows runtime does internally anyway (serializes step returns to durable storage), so the cost is negligible and the boundary is explicit.

Alternatives considered

Queue + DO-alarm hand-rolled state machine. Rejected. Requires writing + testing retry + replay logic ourselves; all new bugs.
Inline pipeline in queue consumer. Rejected. No step-level retry; an LLM 429 re-runs the keyword prefilter + any already-committed D1 writes (non-idempotent without the deterministic-fact-ID trick we adopted anyway).
Containers on Workers for the LLM-heavy step. Rejected. Workflows solves the real problem (durable multi-step) without giving up the edge. Containers are the right primitive when CPU/memory actually bites, which it doesn’t here (each LLM call is bounded by the model’s context window and runs <30s).
Migrate TokenManager DO to a fresh SQLite class with a migration script. Rejected after live verification: TokenManager is ALREADY SQLite-backed. Writing a migration script would be a band-aid for a non-existent problem.

Verification (at merge time)

cd context-worker && npx vitest run — 102/102 tests pass (76 pre- existing + 16 Gong workflow + 10 SFDC workflow).
npx tsc --noEmit in both repo root AND context-worker/ — clean.
grep -n "ctx.storage.sql" src/do/token-manager.ts — returns at least one match (verification of SQLite backend).
grep -n "new_sqlite_classes" src/wrangler.toml — returns new_sqlite_classes = ["TokenManager"].
.claude/rules/v5-invariants.md includes the two new invariants.

Open follow-ups (not blocking)

After gateway adds WorkerEntrypoint for RPC, switch llm-extractor.ts from fetch-over-binding to env.GATEWAY_SERVICE.llmInvoke(...) (native RPC, no JSON serialize/deserialize on the hop).
Add Analytics Engine writes from the Workflow (per-step duration + tenant counters) — fits the Wave 3 §3.7 dashboards task.
HITL ad-spend approval workflow (step.waitForEvent) — textbook next use of the primitive, folded into a later wave.