Skip to content

`llm_invoke` default provider: DeepSeek Direct, not OpenRouter

ADR-017 — llm_invoke default provider: DeepSeek Direct, not OpenRouter

Status: Accepted Date: 2026-04-22 Deciders: Mishaal Murawala Relates to: ADR-016 (context plane), docs/audits/cost-optimization-tiers.md, src/tools/llm-invoke.ts

Context

ADR-016’s two-plane architecture relies on bulk LLM calls for ingestion (Gong transcripts → facts, SFDC records → entities, email threads → signals). These calls share two properties that make them cost-sensitive:

  1. Repetitive system prompts — the same ICP template, extraction schema, or classification rubric is attached to every call in a batch.
  2. Template-heavy context — each call includes the same few-shot examples + structured output instructions.

The initial llm_invoke tool (landed in the 2026-04-17–22 session) defaulted to OpenRouter because of its breadth — 300+ models via one key. That default was picked on breadth alone without evaluating the shape of our actual workload.

A 2026-04-22 research pass across the gateway landscape (OpenRouter, Groq, Cerebras, Fireworks, Together, Vercel AI Gateway, Cloudflare Workers AI, direct provider APIs) found that OpenRouter is not optimal for our workload shape.

Decision

Switch llm_invoke’s default provider from openrouterdeepseek.

Keep OpenRouter, Groq, Cerebras as opt-in alternatives via the provider parameter.

Add DEEPSEEK_API_KEY as a required wrangler secret for the default path (already set 2026-04-22).

Rationale

DeepSeek Direct wins on three dimensions that matter to us

DimensionDeepSeek DirectOpenRouterWinner
Raw price, uncached input$0.28 / 1M tokens (V3.2)$0.14–0.30 / 1M tokens (DeepSeek V3 on OR + aggregation markup)Roughly tied on base
Prompt-cached input$0.028 / 1M tokens (10× discount when system prompt is constant across calls)Not available — OpenRouter has no prompt-cache passthroughDeepSeek 10×
Latency (TTFT)~400–600msSlowest in benchmarks (~25s P95 reported)DeepSeek
Model breadthTwo models (chat, reasoner)300+ modelsOpenRouter
Single-key coverage of all 5 target open modelsNo — DeepSeek onlyNo — missing GLM-4.7, Kimi K2Neither; Fireworks would be the only single-vendor full-coverage option

The math on cached prompts

Ascend’s real bulk workloads (ICP scoring against a constant rubric, CIM summarization with a fixed extraction schema, signal classification against a stable ruleset) all share a constant system prompt. At $1M input tokens/month of template-repetitive work:

  • OpenRouter / Fireworks / Together: $150–$300/month input cost
  • DeepSeek Direct uncached: $280/month
  • DeepSeek Direct cached: $28/month

That’s a ~$125-270/month saving the moment this tool has real volume.

Breadth tradeoff is small in practice

We advertised the OpenRouter default because it gave access to GLM-4.7, Kimi K2, Qwen-3, Llama-3.3. In practice:

  • DeepSeek-V3 and DeepSeek-Reasoner benchmark at parity with or above GLM-4.7, Qwen-3, Llama-3.3 on the tasks we care about (classification, extraction, summarization).
  • When breadth is needed (e.g. running an A/B across 3 models for a one-off eval), pass provider: "openrouter" explicitly — feature remains available.
  • For latency-critical flows, provider: "groq" with Llama-3.3-70B.

The “wrong default” risk

Keeping OpenRouter as the default would mean every reflex call to llm_invoke skips the prompt-cache discount silently. The cost only shows up months later as an over-spent invoice line. Making DeepSeek the default means the common case wins automatically and the power-user case (explicit provider: "openrouter") is still one param away.

Consequences

Positive

  • Default path is the cost-optimal path for the workload shape we actually have.
  • DEEPSEEK_API_KEY is already set (see 2026-04-22 session log).
  • Prompt caching works out-of-the-box on DeepSeek Direct — no code changes required to benefit; just ensure system prompts are stable across calls in a batch.
  • OpenRouter/Groq/Cerebras remain one param away.

Negative

  • Adds a fourth provider config to maintain (one extra keyName union entry + PROVIDER_CONFIG row).
  • DeepSeek’s API is hosted in China; occasional rate-limit surges and geo-latency spikes are possible. For any workload where this is a hard-blocker, the explicit opt-in to Groq/Fireworks is available.
  • Users who memorized the old OpenRouter model IDs (e.g. zai-org/glm-4.7) need to adjust — either pass provider: "openrouter" explicitly, or switch to DeepSeek model IDs (deepseek-chat, deepseek-reasoner).

Neutral

  • Tool signature is backwards-compatible — existing callers that passed provider: "openrouter" continue to work. Callers that relied on the implicit default will switch to DeepSeek on next call; their existing model IDs will fail (since they were OpenRouter-format). This is a documented breaking change for implicit callers, acceptable because llm_invoke has no production traffic yet.

Implementation

Changes in this PR:

  1. src/tools/llm-invoke.ts:
    • provider enum: ['deepseek', 'openrouter', 'groq', 'cerebras'], default 'deepseek'
    • PROVIDER_CONFIG: new deepseek entry, base URL https://api.deepseek.com/v1, key DEEPSEEK_API_KEY
    • JSDoc + tool description updated to reflect new default + DeepSeek model IDs
  2. Wrangler secret DEEPSEEK_API_KEY already set in prod (via wrangler secret put 2026-04-22).
  3. docs/audits/cost-optimization-tiers.md — no update needed; the three-tier routing guidance remains valid. Economy-tier provider list just widens.

Verification

After merge + deploy:

Terminal window
# Test DeepSeek Direct path end-to-end
curl -sS https://ascend-gateway-v5.ascendgtm.workers.dev/mcp \
-H "authorization: Bearer <tenant-token>" \
-H "content-type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"llm_invoke","arguments":{"model":"deepseek-chat","messages":[{"role":"user","content":"return the word OK"}]}}}'

Expected: success: true, data.choices[0].message.content contains “OK”.

Alternatives considered

  • Fireworks AI as default — only single vendor serving all 5 original target models, but +100–300% markup vs DeepSeek Direct. Not worth the breadth when our workload rarely needs model switching mid-batch.
  • Keep OpenRouter as default, add DeepSeek as option — leaves the wrong-default trap in place. Every reflex call pays the aggregation markup + misses the prompt-cache discount. Rejected.
  • Multi-provider routing layer (Vercel AI Gateway, Portkey, LangDB) — valuable at 10+ tenants with observability/fallback needs; overkill at our current scale. Revisit at Phase 5.

References