Async AWS Job Orchestration Pattern (BDA + Transcribe)

ADR-033: Async AWS Job Orchestration Pattern (BDA + Transcribe)

Status: Accepted Date: 2026-05-01 Author: Claude Code (engineering) Business owner: Mishaal Murawala Related: docs/plans/AWS-SERVICES-ENGINEERING-PLAN.md

Context

Two pending AWS tools — aws_bda_analyze (Bedrock Data Automation) and aws_transcribe (Amazon Transcribe) — are inherently async:

Service	Start call	Completion	Typical latency
BDA	`InvokeDataAutomationAsync` → returns `invocationArn`	Poll `GetDataAutomationStatus`	30s – 5 min per document
Transcribe	`StartTranscriptionJob` → returns `jobName`	Poll `GetTranscriptionJob`	30s – 10 min per audio file

Neither fits the V5 gateway’s synchronous request model:

The gateway has a 30s AbortController on every outbound fetch (invariant #11)
The gateway does not retry (invariant #3)
MCP tools return a single response per call (stateless HTTP, no streaming)

Three patterns exist for async jobs in a stateless gateway:

Pattern	Description	Verdict
Blocking poll within the tool	Loop `GetStatus` until done or 30s timeout	❌ Fails for BDA/Transcribe — typical latency exceeds 30s
CF Workflow	Gateway starts job, hands `invocationArn` to a CF Workflow that polls and writes result to KV	⚠️ Valid but high complexity — requires new Workflow class per service, CF Workflows quota, and a KV polling surface for the caller
Two-call MCP pattern	Tool exposes `operation: "start" \| "status"` — caller stores the job handle and polls on their own schedule	✅ Simple, composable, maps naturally to agentic workflows

Decision

Use the two-call MCP pattern. Each async tool exposes a discriminated-union operation field:

operation: z.enum(['start', 'status', 'list_blueprints'])  // BDA
operation: z.enum(['start', 'status', 'list_jobs'])         // Transcribe

start submits the job and returns a job handle (invocationArn / jobName) immediately. No polling.
status polls once for the current job state and returns the result (or partial progress) for a given handle.
list_* enumerates recent jobs / available blueprints — utility for agent planning.

The caller (Claude, n8n, any MCP client) stores the handle in agent_state or its own context, waits a reasonable interval (model-dependent — BDA: ~60s, Transcribe: ~90s), then calls status. This is the same pattern humans use with long-running jobs and matches how agentic loops naturally work.

Why not CF Workflows?

CF Workflows (ADR-026) are the right choice for server-side multi-step ingestion pipelines (e.g., the Gong and Salesforce ingest pipelines in the Context Worker). For these AWS tools, the caller is an AI agent that already has state management — adding CF Workflow plumbing would:

Require a new KV polling surface for results (complicates the tool contract)
Burn a CF Workflows quota slot per job submission
Create a hidden asynchrony that confuses agents expecting deterministic tool returns
Add operational surface (Workflow errors are separate from gateway errors)

The two-call pattern is transparent to the caller — the agent knows exactly what it submitted and can decide when to check.

Why not blocking poll?

A blocking implementation that keeps a connection open waiting for BDA/Transcribe would:

Violate the 30s AbortController (invariant #11)
Hold a CF Worker isolate for potentially 5–10 minutes per call
Have no retry on the status side — a transient InProgress check would fail permanently

Implementation contract

Tool shape

// Both tools follow the same discriminated-union shape:
const Input = z.discriminatedUnion('operation', [
  z.object({
    operation: z.literal('start'),
    // ... job-specific input params (document bytes, audio URL, blueprint ARN, etc.)
    account_id: z.string().optional(),
  }).strict(),
  z.object({
    operation: z.literal('status'),
    job_handle: z.string()
      .describe('invocationArn (BDA) or jobName (Transcribe) from a prior start call'),
    account_id: z.string().optional(),
  }).strict(),
  // ... optional list_* variant
]);

Start response

{
  "operation": "start",
  "status": "submitted",
  "job_handle": "<invocationArn or jobName>",
  "poll_hint_seconds": 60,
  "message": "Job submitted. Call with operation:'status', job_handle:'<handle>' to check progress."
}

poll_hint_seconds is a non-binding hint for the calling agent. BDA default: 60. Transcribe default: 90. Agents may ignore it but it surfaces in tool descriptions.

Status response

{
  "operation": "status",
  "job_handle": "<handle>",
  "status": "InProgress" | "Success" | "Failed" | "ClientError" | "ServiceError",
  "result": { /* present only when status == "Success" */ },
  "error": { "type": "...", "message": "..." },  /* present when failed */
  "poll_hint_seconds": 60  /* present when still InProgress */
}

For BDA Success: result includes the parsed output from S3 (result.json). The tool fetches S3 output as part of the status call (requires s3:GetObject permission on the output bucket).

For Transcribe Success: result includes the transcript text and confidence, fetched from the S3 output URI in the job response.

Agent usage example

// Turn 1: start the job
aws_bda_analyze({
  operation: "start",
  document_bytes: "<base64>",
  blueprint: "invoice"
})
→ { job_handle: "arn:aws:bedrock:...:invocation/abc123", poll_hint_seconds: 60 }

// Agent stores handle in agent_state, waits ~60s

// Turn 2: check result
aws_bda_analyze({
  operation: "status",
  job_handle: "arn:aws:bedrock:...:invocation/abc123"
})
→ { status: "Success", result: { vendor: "Acme Corp", amount: 1234.56, ... } }

Alternatives considered

Alternative: Single-call with KV result caching

The tool could poll aggressively (every 2s for up to 30s), and if the job isn’t done, write the invocationArn to KV with a TTL. Subsequent calls with the same invocationArn would return the cached result.

Rejected because:

KV polling at 2s × 15 iterations is 15 upstream calls per tool invocation — wasteful and races against the 30s abort
The “subsequent calls” pattern is identical to operation: "status" but with hidden state in KV — two-call is simpler and explicit

Alternative: CF Queue + consumer

Submit job in the tool, enqueue (invocationArn, tenantId, callbackKvKey) to a CF Queue. A consumer Worker polls BDA/Transcribe and writes to KV. Caller polls KV for result.

Rejected because:

Adds a Queue binding, consumer Worker, and KV result schema — 3× the surface
CF Queues have at-least-once delivery + visibility timeout complexity
Solves the same problem as two-call but with more moving parts

Consequences

Positive:

Zero new infrastructure (no CF Workflows, no Queues, no consumer Workers)
Agent controls polling frequency — good for bursty workloads
Tool contract is transparent and testable
Consistent with the gateway’s stateless design

Negative:

Agents must implement polling logic (2 tool calls per job instead of 1)
No server-side completion notification (no webhook / EventBridge integration)
If the agent is interrupted between start and status, the job runs to completion but the handle may be lost (mitigated by list_jobs operation and agent_state persistence)

Mitigations:

poll_hint_seconds in the start response coaches agents on when to call back
list_jobs / list_blueprints operations let agents recover lost handles
agent_state tool provides durable handle storage across context window resets

Invariant check

Invariant	Impact
#2 KV-only hot path	✅ Both `start` and `status` read KV config only. S3 fetch in `status` is an outbound call, not a KV read.
#3 Fail-fast, no retries	✅ Both calls are single-fetch. No internal retry loops.
#9 CF Cron + Workflows	✅ This pattern is explicitly NOT CF Workflows — it’s a two-call MCP pattern. CF Workflows are for server-initiated pipelines (ADR-026).
#10 ≤10ms overhead	✅ Overhead is the same KV + token + route path. AWS latency is separate from gateway SLA.
#11 30s AbortController	✅ Every outbound fetch (start call, status call, S3 result fetch) has its own 30s AbortController.