Workers Workflows for Gong + SFDC Ingestion; SQLite DOs Mandatory Going Forward
ADR-026 — Workers Workflows for Gong + SFDC Ingestion; SQLite DOs Mandatory Going Forward
Status: Accepted (2026-04-24) Supersedes: Phase 2 Tasks 2.5 / 2.6 ack-only queue scaffold Related: ADR-016 (Context Plane), V5 Cloud-Native v2 Engineering Plan §Wave 3 Research receipts:
- https://developers.cloudflare.com/workflows/ (overview, last updated 2026-04-22)
- https://developers.cloudflare.com/workflows/build/workers-api/ (API:
step.do, retries,step.sleep,step.waitForEvent,WorkflowEntrypoint) - https://developers.cloudflare.com/workflows/get-started/guide/ (wrangler binding syntax,
.create()pattern) - https://blog.cloudflare.com/workflows-ga-production-ready-durable-execution/ (GA announcement 2025-04-07)
- https://developers.cloudflare.com/durable-objects/ (SQLite DO GA, recommended for new classes)
- https://developers.cloudflare.com/durable-objects/best-practices/storage-api/ (storage API —
ctx.storage.sqlvs legacyctx.storage.put)
Context
Phase 2 (ADR-016) landed the context worker with queue consumer scaffolds
(handleGongQueue, handleSalesforceQueue) that ACK every message as a no-op.
The Phase 2 plan deferred the real ingestion logic to Tasks 2.5 (Gong, 12h)
and 2.6 (SFDC, 8h).
Rather than implement those as in-handler pipelines (keyword-prefilter → LLM-extract → D1-insert → Vectorize-upsert → signal-eval all in one queue consumer invocation), the Wave 3 plan mandates Workers Workflows. Reasons:
- Durable step retry. If the LLM rate-limits on step 3, the keyword prefilter + D1 reads on steps 1-2 DON’T re-run (wasted compute, risk of non-idempotent writes). Workflows checkpoint each step; only the failing step replays.
- Replay-safe on failure. A transient 5xx from D1 re-runs only the D1 write, not the LLM call. Over a year of traffic, this compounds to materially lower LLM spend and faster recovery.
- CF-managed retry semantics.
step.do(name, {retries: {limit, delay, backoff}})— configured per-step, no hand-rolled exponential-backoff library (invariant #3: no retry libraries on the gateway; Workflows is CF- native, not a library). - Observability. CF dashboard shows per-step success/fail/retry count without us standing up Grafana.
- Spec-match. The CF platform audit (§4, “Workflows vs Queues + DO alarms”) names Gong-style multi-step ingestion as the textbook use case.
A secondary concern landed in the same wave: SQLite Durable Objects are GA and are the recommended backend for ALL new DO classes. The CF docs sidebar labels KV-backed DOs “(Legacy)”. Since we’re touching ingestion + DO-adjacent plumbing, codify the rule now rather than wait.
Decision
1. Workers Workflows for all multi-step ingestion pipelines
Gong and SFDC ingestion become Workers Workflows:
context-worker/src/workflows/gong-ingest.ts(GongIngestWorkflow)context-worker/src/workflows/salesforce-ingest.ts(SalesforceIngestWorkflow)
Queue consumers (handleGongQueue, handleSalesforceQueue) become
producer-only: read the queue message, call
env.GONG_WORKFLOW.create({id, params}), ack. Business logic lives in the
Workflow’s run() method as named step.do(...) calls.
Step sequence for Gong (matches the plan doc):
keyword-prefilter— loadingest_keywords:{tenant}from KV.insert-engagement-only(conditional branch) — low-confidence fact if no keyword match; workflow terminates.llm-extract— callllm_invokeon the gateway via theGATEWAY_SERVICEservice binding. DeepSeek V3.5 pinned + JSON mode + temperature 0. Retries: 3, exponential backoff starting at 5s.persist-facts-{i}(per fact) — insert-or-supersede viaresolveConflict()in a D1 batch transaction.schedule-embeddings— callscheduleFactEmbed()for each committed fact (fail-open viactx.waitUntil).emit-signal-eval— onesignal_evaluationsrow per batch with the committed fact IDs for replay/audit.
Step sequence for SFDC is analogous but the LLM step only runs for free-text
fields (Description, Notes__c). Structured fields (StageName, Amount,
CloseDate, Industry, etc.) are translated deterministically via
src/lib/sfdc-translator.ts and land as source_authority = SfdcOpportunityStage | SfdcCustomField.
Workflow instance IDs are deterministic (gong-{tenant}-{call_id},
sfdc-{tenant}-{object_type}-{object_id}) so queue re-delivery results in a
duplicate-ID error which we treat as “already processed” and ack.
2. SQLite Durable Objects mandatory for new classes
All new Durable Object classes MUST use SQLite storage:
wrangler.toml: add to[[migrations]] new_sqlite_classes = [...].- code: use
this.ctx.storage.sql.exec(...).
Codified in .claude/rules/v5-invariants.md.
3. TokenManager already SQLite-backed — no migration needed
Verified 2026-04-24 by reading src/do/token-manager.ts:
- Line 27:
this.sql = ctx.storage.sql; - Constructor uses
this.sql.exec(CREATE TABLE IF NOT EXISTS ...) - All reads/writes go through
this.sql.exec().
Plus src/wrangler.toml line 36: new_sqlite_classes = ["TokenManager"].
No migration script is needed. The task brief instructed “If it’s still KV API: migrate to SQLite … Write scripts/migrate-do-to-sqlite.ts” — the precondition is false, so the migration script is explicitly NOT written.
Consequences
Positive
- Gong/SFDC pipelines gain free durable retry + replay.
- CF dashboard provides per-step observability for free.
- LLM spend reduced on transient-failure paths (only the failing step re- runs; keyword prefilter + D1 write don’t pay twice).
- New primitive (Workflows) codified with a canonical pattern + reusable step
helpers (
stepJsonwrapper for Serializable constraint onunknown-typed payload fields). - Future multi-step pipelines (HITL ad-spend approval, client-report generation) have a ready template.
Negative
- Two new primitives (Workflows + service bindings) to learn + monitor.
- Step replay semantics require every step to be idempotent — harder than
a “just run it” inline pipeline. Mitigated by deterministic fact IDs
(
hashString(source|subject|predicate|object)) +INSERT OR IGNOREon engagement-only rows. cloudflare:workersisn’t resolvable in Node-env vitest; we added a test shim atcontext-worker/test/_shims/cloudflare-workers.tsaliased viavitest.config.ts. Tests exercise step orchestration by injecting afakeStepthat invokes callbacks synchronously.
Trade-offs named
- Per-fact step vs batched step. We chose per-fact (
persist-facts-0,persist-facts-1, …) so a D1 contention error retries one fact, not the whole batch. Cost: more step records per run. Budget: Workflows caps at 25,000 steps per run; an extraction that yields 50 facts uses ~60 steps (well within budget). - LLM call via service binding vs direct. Going through the gateway’s
llm_invokekeeps provider routing + prompt-caching discount logic in one place. Cost: one extra Worker hop (~1ms). When the gateway exports a WorkerEntrypoint class, this flips to native RPC with zero code changes in the extractor (documented insrc/lib/llm-extractor.tstop-of-file). - JSON marshaling at step boundaries.
ExtractedFact.objectandCommittedFact.objectareunknown— not Serializable per CF types. ThestepJsonhelper stringifies inside the step and parses at the caller side. This matches what the Workflows runtime does internally anyway (serializes step returns to durable storage), so the cost is negligible and the boundary is explicit.
Alternatives considered
- Queue + DO-alarm hand-rolled state machine. Rejected. Requires writing + testing retry + replay logic ourselves; all new bugs.
- Inline pipeline in queue consumer. Rejected. No step-level retry; an LLM 429 re-runs the keyword prefilter + any already-committed D1 writes (non-idempotent without the deterministic-fact-ID trick we adopted anyway).
- Containers on Workers for the LLM-heavy step. Rejected. Workflows solves the real problem (durable multi-step) without giving up the edge. Containers are the right primitive when CPU/memory actually bites, which it doesn’t here (each LLM call is bounded by the model’s context window and runs <30s).
- Migrate TokenManager DO to a fresh SQLite class with a migration script. Rejected after live verification: TokenManager is ALREADY SQLite-backed. Writing a migration script would be a band-aid for a non-existent problem.
Verification (at merge time)
-
cd context-worker && npx vitest run— 102/102 tests pass (76 pre- existing + 16 Gong workflow + 10 SFDC workflow). -
npx tsc --noEmitin both repo root ANDcontext-worker/— clean. -
grep -n "ctx.storage.sql" src/do/token-manager.ts— returns at least one match (verification of SQLite backend). -
grep -n "new_sqlite_classes" src/wrangler.toml— returnsnew_sqlite_classes = ["TokenManager"]. -
.claude/rules/v5-invariants.mdincludes the two new invariants.
Open follow-ups (not blocking)
- After gateway adds
WorkerEntrypointfor RPC, switchllm-extractor.tsfrom fetch-over-binding toenv.GATEWAY_SERVICE.llmInvoke(...)(native RPC, no JSON serialize/deserialize on the hop). - Add Analytics Engine writes from the Workflow (per-step duration + tenant counters) — fits the Wave 3 §3.7 dashboards task.
- HITL ad-spend approval workflow (
step.waitForEvent) — textbook next use of the primitive, folded into a later wave.