Nango for OAuth Token Lifecycle Management
ADR-038: Nango for OAuth Token Lifecycle Management
Date: 2026-05-06
Status: Superseded by ADR-057 (2026-05-19 cutover) — Nango retired; Composio owns OAuth end-to-end. See ADR-057.
Superseded by: ADR-057
Overrides: Invariant #5 (“No external vendors in the token path”) — itself revised in ADR-057.
Context: V5 Master Optimization Plan, Phase 1
Decision
Adopt Nango Cloud (Starter plan, $50/mo) as the OAuth lifecycle manager for all multi-account OAuth providers. Nango replaces the TokenManager Durable Object alarm-chain for every provider it covers during a 30-day soak period, after which the DO alarm code for Nango-managed providers is deleted.
Context
What we had
The TokenManager Durable Object uses CF alarm-based token refresh:
- Each
{tenant}:{provider}:{account}triplet has its own DO instance - The DO alarm fires 10 minutes before expiry
- The DO calls the provider’s token endpoint directly with stored
refresh_token - On success: writes fresh token to KV
- On
invalid_grant: writes reauth marker to KV, fires Slack alert (after Phase 0 fix), logs to D1
Problems with the DO approach:
refresh_tokenstored in SQLite at rest — encrypted by CF, but still an expanded attack surface- DO alarm scheduling is unreliable at scale (alarms can miss during CF edge restarts)
- Each new provider integration requires implementing a new token endpoint + grant flow in DO code
- Rate-limit and backoff logic is hand-rolled per provider
invalid_grantdetection is heuristic string matching
What Nango provides
- Hosted OAuth 2.0 + PKCE flows for 300+ providers
- Automatic token refresh with provider-specific retry and backoff
- Webhook delivery when tokens refresh (
auth.credential_refreshedevent) - 10 free connections on Starter plan; scales with usage
- Connection IDs scoped per environment (prod/staging isolated)
Invariant #5 Override Justification
Invariant #5 states: “No external vendors in the token path. DOs alarm-based refresh. No Nango, Composio, or third-party auth brokers.”
This was written when the main concern was introducing a new dependency that hadn’t been evaluated. The concerns it was protecting against:
- Vendor lock-in — mitigated: Nango is the write path; our KV schema is unchanged. If Nango disappears, the existing DO alarm path takes over within 10 minutes (alarm buffer).
- Increased latency — not applicable: Nango is async (webhook-delivered). The hot request path still reads KV; Nango only writes.
- Reduced security surface — Nango improves this: they manage
refresh_tokenstorage rather than us. DO SQLite stores credentials encrypted at rest but is still code we maintain. - Complexity — Nango simplifies: 300+ providers handled without per-provider DO alarm code.
The override is scoped: Nango enters the token write path (refresh → KV write). The hot read path (KV lookup) is unchanged. Invariant #2 (KV-only hot path) and #6 (request path never touches a DO) are preserved.
Architecture
Connection ID Convention
{tenant}__{provider}__{account_id}Double-underscore delimiter (colons are invalid in Nango connection IDs).
Examples:
kahuna__google__default— Kahuna’s Google OAuth tokenkahuna__salesforce__prod— Kahuna’s Salesforce instancepointfield__hubspot__default— Point Field’s HubSpot
Token Write Path (new)
Nango refresh cycle → POST /admin/sync/nango (HMAC-verified webhook) → fetch fresh token from Nango API (GET /connection/{id}) → write tokens:{tenant}:{provider}:{account} to KV → log kv_audit row to D1 → return 200Token Read Path (unchanged)
Request → auth gate → getToken() → KV lookup → upstream APIFallback (DO alarm, unchanged during soak)
The TokenManager DO alarm continues to run. During the 30-day soak period, both systems write to the same KV key. Whichever fires first wins (last-writer-wins is safe because both sources have the correct current token). After soak exit criteria are met, the DO alarm is disabled for Nango-managed providers.
Soak Period
Status: Soak exited 2026-05-08 (product decision — parallel soak impractical on the same day as launch).
Early exit criteria were never triggered. The DO alarm remains active as fallback indefinitely until explicit removal per docs/plans/nango-do-alarm-removal.md (not yet written — removal requires separate ADR).
Prior soak plan (archived for reference):
- Duration: 30 days from first Nango-managed token refresh
- Early exit criteria: zero
invalid_grantrows, zero missed refreshes, ≥99% webhook delivery rate - Soak failure criteria: Nango outage >15 min, token staleness on smoke-test, cost >$200/mo
Nango Connect Onboarding
Endpoint: POST /admin/nango/connect (admin-key gated)
List connections: GET /admin/nango/connections?tenant=X (admin-key gated)
Flow:
- Admin calls
POST /admin/nango/connectwith{ tenant, provider, account?, display_name? } - V5 calls Nango Connect Sessions API (
POST https://api.nango.dev/connect/sessions) withend_user.id = connectionId({tenant}__{provider}__{account}) andallowed_integrations = [nangoKey] - V5 returns
{ connect_link, expires_at }— a 30-minute TTL hosted OAuth URL - Admin opens the URL in a browser to complete the OAuth consent flow
- On completion Nango fires
operation: "creation"webhook toPOST /admin/sync/nango(existing handler) - Existing webhook handler resolves
end_user.endUserId→ V5 connection ID → fetches token from Nango API → writestokens:{tenant}:{provider}:{account}to KV
Connection ID convention: {tenant}__{provider}__{account} passed as end_user.id to Nango Connect Sessions API.
Implementation: src/handlers/admin/nango-connect.ts
Nango Integration Inventory
| Nango Integration Key | V5 Provider(s) | Account |
|---|---|---|
google | google_ads, ga4, gsc, gmail, google_calendar | default |
salesforce | salesforce | prod |
hubspot | hubspot | default |
linkedin | linkedin_ads | default |
microsoft | microsoft_calendar | default |
microsoft-ads | microsoft_ads | default |
gong-oauth | gong | default |
slack | slack | default |
aws | aws_bedrock | default |
github | github | default |
Consequences
Positive:
- Zero token refresh code to maintain per provider
- Automatic handling of provider-specific quirks (PKCE, token rotation, offline access)
- Refresh token never touches V5 source code
- Built-in monitoring via Nango dashboard
Negative:
- $50/mo Starter plan (acceptable; < 1 hour engineer time)
- Nango outage = tokens stale until DO alarm fires (max 10-min gap — the alarm buffer)
- New external dependency to monitor
Neutral:
- KV token schema unchanged — no migration required
tokens:KV key format identical to what DO alarm writes
Review Trigger
Re-evaluate if Nango pricing exceeds $200/mo (signals plan limit), if Nango has an outage > 1h, or if a superior OAuth lifecycle management option emerges.