Skip to content

Async AWS Job Orchestration Pattern (BDA + Transcribe)

ADR-033: Async AWS Job Orchestration Pattern (BDA + Transcribe)

Status: Accepted Date: 2026-05-01 Author: Claude Code (engineering) Business owner: Mishaal Murawala Related: docs/plans/AWS-SERVICES-ENGINEERING-PLAN.md

Context

Two pending AWS tools — aws_bda_analyze (Bedrock Data Automation) and aws_transcribe (Amazon Transcribe) — are inherently async:

ServiceStart callCompletionTypical latency
BDAInvokeDataAutomationAsync → returns invocationArnPoll GetDataAutomationStatus30s – 5 min per document
TranscribeStartTranscriptionJob → returns jobNamePoll GetTranscriptionJob30s – 10 min per audio file

Neither fits the V5 gateway’s synchronous request model:

  • The gateway has a 30s AbortController on every outbound fetch (invariant #11)
  • The gateway does not retry (invariant #3)
  • MCP tools return a single response per call (stateless HTTP, no streaming)

Three patterns exist for async jobs in a stateless gateway:

PatternDescriptionVerdict
Blocking poll within the toolLoop GetStatus until done or 30s timeout❌ Fails for BDA/Transcribe — typical latency exceeds 30s
CF WorkflowGateway starts job, hands invocationArn to a CF Workflow that polls and writes result to KV⚠️ Valid but high complexity — requires new Workflow class per service, CF Workflows quota, and a KV polling surface for the caller
Two-call MCP patternTool exposes operation: "start" | "status" — caller stores the job handle and polls on their own schedule✅ Simple, composable, maps naturally to agentic workflows

Decision

Use the two-call MCP pattern. Each async tool exposes a discriminated-union operation field:

operation: z.enum(['start', 'status', 'list_blueprints']) // BDA
operation: z.enum(['start', 'status', 'list_jobs']) // Transcribe
  • start submits the job and returns a job handle (invocationArn / jobName) immediately. No polling.
  • status polls once for the current job state and returns the result (or partial progress) for a given handle.
  • list_* enumerates recent jobs / available blueprints — utility for agent planning.

The caller (Claude, n8n, any MCP client) stores the handle in agent_state or its own context, waits a reasonable interval (model-dependent — BDA: ~60s, Transcribe: ~90s), then calls status. This is the same pattern humans use with long-running jobs and matches how agentic loops naturally work.

Why not CF Workflows?

CF Workflows (ADR-026) are the right choice for server-side multi-step ingestion pipelines (e.g., the Gong and Salesforce ingest pipelines in the Context Worker). For these AWS tools, the caller is an AI agent that already has state management — adding CF Workflow plumbing would:

  1. Require a new KV polling surface for results (complicates the tool contract)
  2. Burn a CF Workflows quota slot per job submission
  3. Create a hidden asynchrony that confuses agents expecting deterministic tool returns
  4. Add operational surface (Workflow errors are separate from gateway errors)

The two-call pattern is transparent to the caller — the agent knows exactly what it submitted and can decide when to check.

Why not blocking poll?

A blocking implementation that keeps a connection open waiting for BDA/Transcribe would:

  1. Violate the 30s AbortController (invariant #11)
  2. Hold a CF Worker isolate for potentially 5–10 minutes per call
  3. Have no retry on the status side — a transient InProgress check would fail permanently

Implementation contract

Tool shape

// Both tools follow the same discriminated-union shape:
const Input = z.discriminatedUnion('operation', [
z.object({
operation: z.literal('start'),
// ... job-specific input params (document bytes, audio URL, blueprint ARN, etc.)
account_id: z.string().optional(),
}).strict(),
z.object({
operation: z.literal('status'),
job_handle: z.string()
.describe('invocationArn (BDA) or jobName (Transcribe) from a prior start call'),
account_id: z.string().optional(),
}).strict(),
// ... optional list_* variant
]);

Start response

{
"operation": "start",
"status": "submitted",
"job_handle": "<invocationArn or jobName>",
"poll_hint_seconds": 60,
"message": "Job submitted. Call with operation:'status', job_handle:'<handle>' to check progress."
}

poll_hint_seconds is a non-binding hint for the calling agent. BDA default: 60. Transcribe default: 90. Agents may ignore it but it surfaces in tool descriptions.

Status response

{
"operation": "status",
"job_handle": "<handle>",
"status": "InProgress" | "Success" | "Failed" | "ClientError" | "ServiceError",
"result": { /* present only when status == "Success" */ },
"error": { "type": "...", "message": "..." }, /* present when failed */
"poll_hint_seconds": 60 /* present when still InProgress */
}

For BDA Success: result includes the parsed output from S3 (result.json). The tool fetches S3 output as part of the status call (requires s3:GetObject permission on the output bucket).

For Transcribe Success: result includes the transcript text and confidence, fetched from the S3 output URI in the job response.

Agent usage example

// Turn 1: start the job
aws_bda_analyze({
operation: "start",
document_bytes: "<base64>",
blueprint: "invoice"
})
→ { job_handle: "arn:aws:bedrock:...:invocation/abc123", poll_hint_seconds: 60 }
// Agent stores handle in agent_state, waits ~60s
// Turn 2: check result
aws_bda_analyze({
operation: "status",
job_handle: "arn:aws:bedrock:...:invocation/abc123"
})
→ { status: "Success", result: { vendor: "Acme Corp", amount: 1234.56, ... } }

Alternatives considered

Alternative: Single-call with KV result caching

The tool could poll aggressively (every 2s for up to 30s), and if the job isn’t done, write the invocationArn to KV with a TTL. Subsequent calls with the same invocationArn would return the cached result.

Rejected because:

  • KV polling at 2s × 15 iterations is 15 upstream calls per tool invocation — wasteful and races against the 30s abort
  • The “subsequent calls” pattern is identical to operation: "status" but with hidden state in KV — two-call is simpler and explicit

Alternative: CF Queue + consumer

Submit job in the tool, enqueue (invocationArn, tenantId, callbackKvKey) to a CF Queue. A consumer Worker polls BDA/Transcribe and writes to KV. Caller polls KV for result.

Rejected because:

  • Adds a Queue binding, consumer Worker, and KV result schema — 3× the surface
  • CF Queues have at-least-once delivery + visibility timeout complexity
  • Solves the same problem as two-call but with more moving parts

Consequences

Positive:

  • Zero new infrastructure (no CF Workflows, no Queues, no consumer Workers)
  • Agent controls polling frequency — good for bursty workloads
  • Tool contract is transparent and testable
  • Consistent with the gateway’s stateless design

Negative:

  • Agents must implement polling logic (2 tool calls per job instead of 1)
  • No server-side completion notification (no webhook / EventBridge integration)
  • If the agent is interrupted between start and status, the job runs to completion but the handle may be lost (mitigated by list_jobs operation and agent_state persistence)

Mitigations:

  • poll_hint_seconds in the start response coaches agents on when to call back
  • list_jobs / list_blueprints operations let agents recover lost handles
  • agent_state tool provides durable handle storage across context window resets

Invariant check

InvariantImpact
#2 KV-only hot path✅ Both start and status read KV config only. S3 fetch in status is an outbound call, not a KV read.
#3 Fail-fast, no retries✅ Both calls are single-fetch. No internal retry loops.
#9 CF Cron + Workflows✅ This pattern is explicitly NOT CF Workflows — it’s a two-call MCP pattern. CF Workflows are for server-initiated pipelines (ADR-026).
#10 ≤10ms overhead✅ Overhead is the same KV + token + route path. AWS latency is separate from gateway SLA.
#11 30s AbortController✅ Every outbound fetch (start call, status call, S3 result fetch) has its own 30s AbortController.