Skip to content

Nango for OAuth Token Lifecycle Management

ADR-038: Nango for OAuth Token Lifecycle Management

Date: 2026-05-06
Status: Superseded by ADR-057 (2026-05-19 cutover) — Nango retired; Composio owns OAuth end-to-end. See ADR-057. Superseded by: ADR-057 Overrides: Invariant #5 (“No external vendors in the token path”) — itself revised in ADR-057. Context: V5 Master Optimization Plan, Phase 1


Decision

Adopt Nango Cloud (Starter plan, $50/mo) as the OAuth lifecycle manager for all multi-account OAuth providers. Nango replaces the TokenManager Durable Object alarm-chain for every provider it covers during a 30-day soak period, after which the DO alarm code for Nango-managed providers is deleted.


Context

What we had

The TokenManager Durable Object uses CF alarm-based token refresh:

  • Each {tenant}:{provider}:{account} triplet has its own DO instance
  • The DO alarm fires 10 minutes before expiry
  • The DO calls the provider’s token endpoint directly with stored refresh_token
  • On success: writes fresh token to KV
  • On invalid_grant: writes reauth marker to KV, fires Slack alert (after Phase 0 fix), logs to D1

Problems with the DO approach:

  • refresh_token stored in SQLite at rest — encrypted by CF, but still an expanded attack surface
  • DO alarm scheduling is unreliable at scale (alarms can miss during CF edge restarts)
  • Each new provider integration requires implementing a new token endpoint + grant flow in DO code
  • Rate-limit and backoff logic is hand-rolled per provider
  • invalid_grant detection is heuristic string matching

What Nango provides

  • Hosted OAuth 2.0 + PKCE flows for 300+ providers
  • Automatic token refresh with provider-specific retry and backoff
  • Webhook delivery when tokens refresh (auth.credential_refreshed event)
  • 10 free connections on Starter plan; scales with usage
  • Connection IDs scoped per environment (prod/staging isolated)

Invariant #5 Override Justification

Invariant #5 states: “No external vendors in the token path. DOs alarm-based refresh. No Nango, Composio, or third-party auth brokers.”

This was written when the main concern was introducing a new dependency that hadn’t been evaluated. The concerns it was protecting against:

  1. Vendor lock-in — mitigated: Nango is the write path; our KV schema is unchanged. If Nango disappears, the existing DO alarm path takes over within 10 minutes (alarm buffer).
  2. Increased latency — not applicable: Nango is async (webhook-delivered). The hot request path still reads KV; Nango only writes.
  3. Reduced security surface — Nango improves this: they manage refresh_token storage rather than us. DO SQLite stores credentials encrypted at rest but is still code we maintain.
  4. Complexity — Nango simplifies: 300+ providers handled without per-provider DO alarm code.

The override is scoped: Nango enters the token write path (refresh → KV write). The hot read path (KV lookup) is unchanged. Invariant #2 (KV-only hot path) and #6 (request path never touches a DO) are preserved.


Architecture

Connection ID Convention

{tenant}__{provider}__{account_id}

Double-underscore delimiter (colons are invalid in Nango connection IDs).

Examples:

  • kahuna__google__default — Kahuna’s Google OAuth token
  • kahuna__salesforce__prod — Kahuna’s Salesforce instance
  • pointfield__hubspot__default — Point Field’s HubSpot

Token Write Path (new)

Nango refresh cycle
→ POST /admin/sync/nango (HMAC-verified webhook)
→ fetch fresh token from Nango API (GET /connection/{id})
→ write tokens:{tenant}:{provider}:{account} to KV
→ log kv_audit row to D1
→ return 200

Token Read Path (unchanged)

Request → auth gate → getToken() → KV lookup → upstream API

Fallback (DO alarm, unchanged during soak)

The TokenManager DO alarm continues to run. During the 30-day soak period, both systems write to the same KV key. Whichever fires first wins (last-writer-wins is safe because both sources have the correct current token). After soak exit criteria are met, the DO alarm is disabled for Nango-managed providers.


Soak Period

Status: Soak exited 2026-05-08 (product decision — parallel soak impractical on the same day as launch).

Early exit criteria were never triggered. The DO alarm remains active as fallback indefinitely until explicit removal per docs/plans/nango-do-alarm-removal.md (not yet written — removal requires separate ADR).

Prior soak plan (archived for reference):

  • Duration: 30 days from first Nango-managed token refresh
  • Early exit criteria: zero invalid_grant rows, zero missed refreshes, ≥99% webhook delivery rate
  • Soak failure criteria: Nango outage >15 min, token staleness on smoke-test, cost >$200/mo

Nango Connect Onboarding

Endpoint: POST /admin/nango/connect (admin-key gated) List connections: GET /admin/nango/connections?tenant=X (admin-key gated)

Flow:

  1. Admin calls POST /admin/nango/connect with { tenant, provider, account?, display_name? }
  2. V5 calls Nango Connect Sessions API (POST https://api.nango.dev/connect/sessions) with end_user.id = connectionId ({tenant}__{provider}__{account}) and allowed_integrations = [nangoKey]
  3. V5 returns { connect_link, expires_at } — a 30-minute TTL hosted OAuth URL
  4. Admin opens the URL in a browser to complete the OAuth consent flow
  5. On completion Nango fires operation: "creation" webhook to POST /admin/sync/nango (existing handler)
  6. Existing webhook handler resolves end_user.endUserId → V5 connection ID → fetches token from Nango API → writes tokens:{tenant}:{provider}:{account} to KV

Connection ID convention: {tenant}__{provider}__{account} passed as end_user.id to Nango Connect Sessions API.

Implementation: src/handlers/admin/nango-connect.ts


Nango Integration Inventory

Nango Integration KeyV5 Provider(s)Account
googlegoogle_ads, ga4, gsc, gmail, google_calendardefault
salesforcesalesforceprod
hubspothubspotdefault
linkedinlinkedin_adsdefault
microsoftmicrosoft_calendardefault
microsoft-adsmicrosoft_adsdefault
gong-oauthgongdefault
slackslackdefault
awsaws_bedrockdefault
githubgithubdefault

Consequences

Positive:

  • Zero token refresh code to maintain per provider
  • Automatic handling of provider-specific quirks (PKCE, token rotation, offline access)
  • Refresh token never touches V5 source code
  • Built-in monitoring via Nango dashboard

Negative:

  • $50/mo Starter plan (acceptable; < 1 hour engineer time)
  • Nango outage = tokens stale until DO alarm fires (max 10-min gap — the alarm buffer)
  • New external dependency to monitor

Neutral:

  • KV token schema unchanged — no migration required
  • tokens: KV key format identical to what DO alarm writes

Review Trigger

Re-evaluate if Nango pricing exceeds $200/mo (signals plan limit), if Nango has an outage > 1h, or if a superior OAuth lifecycle management option emerges.