Async AWS Job Orchestration Pattern (BDA + Transcribe)
ADR-033: Async AWS Job Orchestration Pattern (BDA + Transcribe)
Status: Accepted Date: 2026-05-01 Author: Claude Code (engineering) Business owner: Mishaal Murawala Related: docs/plans/AWS-SERVICES-ENGINEERING-PLAN.md
Context
Two pending AWS tools — aws_bda_analyze (Bedrock Data Automation) and aws_transcribe (Amazon Transcribe) — are inherently async:
| Service | Start call | Completion | Typical latency |
|---|---|---|---|
| BDA | InvokeDataAutomationAsync → returns invocationArn | Poll GetDataAutomationStatus | 30s – 5 min per document |
| Transcribe | StartTranscriptionJob → returns jobName | Poll GetTranscriptionJob | 30s – 10 min per audio file |
Neither fits the V5 gateway’s synchronous request model:
- The gateway has a 30s
AbortControlleron every outbound fetch (invariant #11) - The gateway does not retry (invariant #3)
- MCP tools return a single response per call (stateless HTTP, no streaming)
Three patterns exist for async jobs in a stateless gateway:
| Pattern | Description | Verdict |
|---|---|---|
| Blocking poll within the tool | Loop GetStatus until done or 30s timeout | ❌ Fails for BDA/Transcribe — typical latency exceeds 30s |
| CF Workflow | Gateway starts job, hands invocationArn to a CF Workflow that polls and writes result to KV | ⚠️ Valid but high complexity — requires new Workflow class per service, CF Workflows quota, and a KV polling surface for the caller |
| Two-call MCP pattern | Tool exposes operation: "start" | "status" — caller stores the job handle and polls on their own schedule | ✅ Simple, composable, maps naturally to agentic workflows |
Decision
Use the two-call MCP pattern. Each async tool exposes a discriminated-union operation field:
operation: z.enum(['start', 'status', 'list_blueprints']) // BDAoperation: z.enum(['start', 'status', 'list_jobs']) // Transcribestartsubmits the job and returns a job handle (invocationArn/jobName) immediately. No polling.statuspolls once for the current job state and returns the result (or partial progress) for a given handle.list_*enumerates recent jobs / available blueprints — utility for agent planning.
The caller (Claude, n8n, any MCP client) stores the handle in agent_state or its own context, waits a reasonable interval (model-dependent — BDA: ~60s, Transcribe: ~90s), then calls status. This is the same pattern humans use with long-running jobs and matches how agentic loops naturally work.
Why not CF Workflows?
CF Workflows (ADR-026) are the right choice for server-side multi-step ingestion pipelines (e.g., the Gong and Salesforce ingest pipelines in the Context Worker). For these AWS tools, the caller is an AI agent that already has state management — adding CF Workflow plumbing would:
- Require a new KV polling surface for results (complicates the tool contract)
- Burn a CF Workflows quota slot per job submission
- Create a hidden asynchrony that confuses agents expecting deterministic tool returns
- Add operational surface (Workflow errors are separate from gateway errors)
The two-call pattern is transparent to the caller — the agent knows exactly what it submitted and can decide when to check.
Why not blocking poll?
A blocking implementation that keeps a connection open waiting for BDA/Transcribe would:
- Violate the 30s AbortController (invariant #11)
- Hold a CF Worker isolate for potentially 5–10 minutes per call
- Have no retry on the status side — a transient
InProgresscheck would fail permanently
Implementation contract
Tool shape
// Both tools follow the same discriminated-union shape:const Input = z.discriminatedUnion('operation', [ z.object({ operation: z.literal('start'), // ... job-specific input params (document bytes, audio URL, blueprint ARN, etc.) account_id: z.string().optional(), }).strict(), z.object({ operation: z.literal('status'), job_handle: z.string() .describe('invocationArn (BDA) or jobName (Transcribe) from a prior start call'), account_id: z.string().optional(), }).strict(), // ... optional list_* variant]);Start response
{ "operation": "start", "status": "submitted", "job_handle": "<invocationArn or jobName>", "poll_hint_seconds": 60, "message": "Job submitted. Call with operation:'status', job_handle:'<handle>' to check progress."}poll_hint_seconds is a non-binding hint for the calling agent. BDA default: 60. Transcribe default: 90. Agents may ignore it but it surfaces in tool descriptions.
Status response
{ "operation": "status", "job_handle": "<handle>", "status": "InProgress" | "Success" | "Failed" | "ClientError" | "ServiceError", "result": { /* present only when status == "Success" */ }, "error": { "type": "...", "message": "..." }, /* present when failed */ "poll_hint_seconds": 60 /* present when still InProgress */}For BDA Success: result includes the parsed output from S3 (result.json). The tool fetches S3 output as part of the status call (requires s3:GetObject permission on the output bucket).
For Transcribe Success: result includes the transcript text and confidence, fetched from the S3 output URI in the job response.
Agent usage example
// Turn 1: start the jobaws_bda_analyze({ operation: "start", document_bytes: "<base64>", blueprint: "invoice"})→ { job_handle: "arn:aws:bedrock:...:invocation/abc123", poll_hint_seconds: 60 }
// Agent stores handle in agent_state, waits ~60s
// Turn 2: check resultaws_bda_analyze({ operation: "status", job_handle: "arn:aws:bedrock:...:invocation/abc123"})→ { status: "Success", result: { vendor: "Acme Corp", amount: 1234.56, ... } }Alternatives considered
Alternative: Single-call with KV result caching
The tool could poll aggressively (every 2s for up to 30s), and if the job isn’t done, write the invocationArn to KV with a TTL. Subsequent calls with the same invocationArn would return the cached result.
Rejected because:
- KV polling at 2s × 15 iterations is 15 upstream calls per tool invocation — wasteful and races against the 30s abort
- The “subsequent calls” pattern is identical to
operation: "status"but with hidden state in KV — two-call is simpler and explicit
Alternative: CF Queue + consumer
Submit job in the tool, enqueue (invocationArn, tenantId, callbackKvKey) to a CF Queue. A consumer Worker polls BDA/Transcribe and writes to KV. Caller polls KV for result.
Rejected because:
- Adds a Queue binding, consumer Worker, and KV result schema — 3× the surface
- CF Queues have at-least-once delivery + visibility timeout complexity
- Solves the same problem as two-call but with more moving parts
Consequences
Positive:
- Zero new infrastructure (no CF Workflows, no Queues, no consumer Workers)
- Agent controls polling frequency — good for bursty workloads
- Tool contract is transparent and testable
- Consistent with the gateway’s stateless design
Negative:
- Agents must implement polling logic (2 tool calls per job instead of 1)
- No server-side completion notification (no webhook / EventBridge integration)
- If the agent is interrupted between start and status, the job runs to completion but the handle may be lost (mitigated by
list_jobsoperation andagent_statepersistence)
Mitigations:
poll_hint_secondsin the start response coaches agents on when to call backlist_jobs/list_blueprintsoperations let agents recover lost handlesagent_statetool provides durable handle storage across context window resets
Invariant check
| Invariant | Impact |
|---|---|
| #2 KV-only hot path | ✅ Both start and status read KV config only. S3 fetch in status is an outbound call, not a KV read. |
| #3 Fail-fast, no retries | ✅ Both calls are single-fetch. No internal retry loops. |
| #9 CF Cron + Workflows | ✅ This pattern is explicitly NOT CF Workflows — it’s a two-call MCP pattern. CF Workflows are for server-initiated pipelines (ADR-026). |
| #10 ≤10ms overhead | ✅ Overhead is the same KV + token + route path. AWS latency is separate from gateway SLA. |
| #11 30s AbortController | ✅ Every outbound fetch (start call, status call, S3 result fetch) has its own 30s AbortController. |