`llm_invoke` default provider: DeepSeek Direct, not OpenRouter
ADR-017 — llm_invoke default provider: DeepSeek Direct, not OpenRouter
Status: Accepted
Date: 2026-04-22
Deciders: Mishaal Murawala
Relates to: ADR-016 (context plane), docs/audits/cost-optimization-tiers.md, src/tools/llm-invoke.ts
Context
ADR-016’s two-plane architecture relies on bulk LLM calls for ingestion (Gong transcripts → facts, SFDC records → entities, email threads → signals). These calls share two properties that make them cost-sensitive:
- Repetitive system prompts — the same ICP template, extraction schema, or classification rubric is attached to every call in a batch.
- Template-heavy context — each call includes the same few-shot examples + structured output instructions.
The initial llm_invoke tool (landed in the 2026-04-17–22 session) defaulted to OpenRouter because of its breadth — 300+ models via one key. That default was picked on breadth alone without evaluating the shape of our actual workload.
A 2026-04-22 research pass across the gateway landscape (OpenRouter, Groq, Cerebras, Fireworks, Together, Vercel AI Gateway, Cloudflare Workers AI, direct provider APIs) found that OpenRouter is not optimal for our workload shape.
Decision
Switch llm_invoke’s default provider from openrouter → deepseek.
Keep OpenRouter, Groq, Cerebras as opt-in alternatives via the provider parameter.
Add DEEPSEEK_API_KEY as a required wrangler secret for the default path (already set 2026-04-22).
Rationale
DeepSeek Direct wins on three dimensions that matter to us
| Dimension | DeepSeek Direct | OpenRouter | Winner |
|---|---|---|---|
| Raw price, uncached input | $0.28 / 1M tokens (V3.2) | $0.14–0.30 / 1M tokens (DeepSeek V3 on OR + aggregation markup) | Roughly tied on base |
| Prompt-cached input | $0.028 / 1M tokens (10× discount when system prompt is constant across calls) | Not available — OpenRouter has no prompt-cache passthrough | DeepSeek 10× |
| Latency (TTFT) | ~400–600ms | Slowest in benchmarks (~25s P95 reported) | DeepSeek |
| Model breadth | Two models (chat, reasoner) | 300+ models | OpenRouter |
| Single-key coverage of all 5 target open models | No — DeepSeek only | No — missing GLM-4.7, Kimi K2 | Neither; Fireworks would be the only single-vendor full-coverage option |
The math on cached prompts
Ascend’s real bulk workloads (ICP scoring against a constant rubric, CIM summarization with a fixed extraction schema, signal classification against a stable ruleset) all share a constant system prompt. At $1M input tokens/month of template-repetitive work:
- OpenRouter / Fireworks / Together: $150–$300/month input cost
- DeepSeek Direct uncached: $280/month
- DeepSeek Direct cached: $28/month
That’s a ~$125-270/month saving the moment this tool has real volume.
Breadth tradeoff is small in practice
We advertised the OpenRouter default because it gave access to GLM-4.7, Kimi K2, Qwen-3, Llama-3.3. In practice:
- DeepSeek-V3 and DeepSeek-Reasoner benchmark at parity with or above GLM-4.7, Qwen-3, Llama-3.3 on the tasks we care about (classification, extraction, summarization).
- When breadth is needed (e.g. running an A/B across 3 models for a one-off eval), pass
provider: "openrouter"explicitly — feature remains available. - For latency-critical flows,
provider: "groq"with Llama-3.3-70B.
The “wrong default” risk
Keeping OpenRouter as the default would mean every reflex call to llm_invoke skips the prompt-cache discount silently. The cost only shows up months later as an over-spent invoice line. Making DeepSeek the default means the common case wins automatically and the power-user case (explicit provider: "openrouter") is still one param away.
Consequences
Positive
- Default path is the cost-optimal path for the workload shape we actually have.
DEEPSEEK_API_KEYis already set (see 2026-04-22 session log).- Prompt caching works out-of-the-box on DeepSeek Direct — no code changes required to benefit; just ensure system prompts are stable across calls in a batch.
- OpenRouter/Groq/Cerebras remain one param away.
Negative
- Adds a fourth provider config to maintain (one extra
keyNameunion entry + PROVIDER_CONFIG row). - DeepSeek’s API is hosted in China; occasional rate-limit surges and geo-latency spikes are possible. For any workload where this is a hard-blocker, the explicit opt-in to Groq/Fireworks is available.
- Users who memorized the old OpenRouter model IDs (e.g.
zai-org/glm-4.7) need to adjust — either passprovider: "openrouter"explicitly, or switch to DeepSeek model IDs (deepseek-chat,deepseek-reasoner).
Neutral
- Tool signature is backwards-compatible — existing callers that passed
provider: "openrouter"continue to work. Callers that relied on the implicit default will switch to DeepSeek on next call; their existing model IDs will fail (since they were OpenRouter-format). This is a documented breaking change for implicit callers, acceptable becausellm_invokehas no production traffic yet.
Implementation
Changes in this PR:
src/tools/llm-invoke.ts:providerenum:['deepseek', 'openrouter', 'groq', 'cerebras'], default'deepseek'PROVIDER_CONFIG: newdeepseekentry, base URLhttps://api.deepseek.com/v1, keyDEEPSEEK_API_KEY- JSDoc + tool description updated to reflect new default + DeepSeek model IDs
- Wrangler secret
DEEPSEEK_API_KEYalready set in prod (viawrangler secret put2026-04-22). docs/audits/cost-optimization-tiers.md— no update needed; the three-tier routing guidance remains valid. Economy-tier provider list just widens.
Verification
After merge + deploy:
# Test DeepSeek Direct path end-to-endcurl -sS https://ascend-gateway-v5.ascendgtm.workers.dev/mcp \ -H "authorization: Bearer <tenant-token>" \ -H "content-type: application/json" \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"llm_invoke","arguments":{"model":"deepseek-chat","messages":[{"role":"user","content":"return the word OK"}]}}}'Expected: success: true, data.choices[0].message.content contains “OK”.
Alternatives considered
- Fireworks AI as default — only single vendor serving all 5 original target models, but +100–300% markup vs DeepSeek Direct. Not worth the breadth when our workload rarely needs model switching mid-batch.
- Keep OpenRouter as default, add DeepSeek as option — leaves the wrong-default trap in place. Every reflex call pays the aggregation markup + misses the prompt-cache discount. Rejected.
- Multi-provider routing layer (Vercel AI Gateway, Portkey, LangDB) — valuable at 10+ tenants with observability/fallback needs; overkill at our current scale. Revisit at Phase 5.
References
- Research pass 2026-04-22 (session log): OpenRouter vs Groq vs Cerebras vs Fireworks vs Together vs CF Workers AI vs direct.
docs/audits/cost-optimization-tiers.md— three-tier routing guidance.- DeepSeek API docs: https://api-docs.deepseek.com/
- DeepSeek prompt caching: https://api-docs.deepseek.com/guides/kv_cache