`llm_invoke` default provider: DeepSeek Direct, not OpenRouter

ADR-017 — `llm_invoke` default provider: DeepSeek Direct, not OpenRouter

Status: Accepted Date: 2026-04-22 Deciders: Mishaal Murawala Relates to: ADR-016 (context plane), docs/audits/cost-optimization-tiers.md, src/tools/llm-invoke.ts

Context

ADR-016’s two-plane architecture relies on bulk LLM calls for ingestion (Gong transcripts → facts, SFDC records → entities, email threads → signals). These calls share two properties that make them cost-sensitive:

Repetitive system prompts — the same ICP template, extraction schema, or classification rubric is attached to every call in a batch.
Template-heavy context — each call includes the same few-shot examples + structured output instructions.

The initial llm_invoke tool (landed in the 2026-04-17–22 session) defaulted to OpenRouter because of its breadth — 300+ models via one key. That default was picked on breadth alone without evaluating the shape of our actual workload.

A 2026-04-22 research pass across the gateway landscape (OpenRouter, Groq, Cerebras, Fireworks, Together, Vercel AI Gateway, Cloudflare Workers AI, direct provider APIs) found that OpenRouter is not optimal for our workload shape.

Decision

Switch llm_invoke’s default provider from openrouter → deepseek.

Keep OpenRouter, Groq, Cerebras as opt-in alternatives via the provider parameter.

Add DEEPSEEK_API_KEY as a required wrangler secret for the default path (already set 2026-04-22).

Rationale

DeepSeek Direct wins on three dimensions that matter to us

Dimension	DeepSeek Direct	OpenRouter	Winner
Raw price, uncached input	$0.28 / 1M tokens (V3.2)	$0.14–0.30 / 1M tokens (DeepSeek V3 on OR + aggregation markup)	Roughly tied on base
Prompt-cached input	$0.028 / 1M tokens (10× discount when system prompt is constant across calls)	Not available — OpenRouter has no prompt-cache passthrough	DeepSeek 10×
Latency (TTFT)	~400–600ms	Slowest in benchmarks (~25s P95 reported)	DeepSeek
Model breadth	Two models (chat, reasoner)	300+ models	OpenRouter
Single-key coverage of all 5 target open models	No — DeepSeek only	No — missing GLM-4.7, Kimi K2	Neither; Fireworks would be the only single-vendor full-coverage option

The math on cached prompts

Ascend’s real bulk workloads (ICP scoring against a constant rubric, CIM summarization with a fixed extraction schema, signal classification against a stable ruleset) all share a constant system prompt. At $1M input tokens/month of template-repetitive work:

OpenRouter / Fireworks / Together: $150–$300/month input cost
DeepSeek Direct uncached: $280/month
DeepSeek Direct cached: $28/month

That’s a ~$125-270/month saving the moment this tool has real volume.

Breadth tradeoff is small in practice

We advertised the OpenRouter default because it gave access to GLM-4.7, Kimi K2, Qwen-3, Llama-3.3. In practice:

DeepSeek-V3 and DeepSeek-Reasoner benchmark at parity with or above GLM-4.7, Qwen-3, Llama-3.3 on the tasks we care about (classification, extraction, summarization).
When breadth is needed (e.g. running an A/B across 3 models for a one-off eval), pass provider: "openrouter" explicitly — feature remains available.
For latency-critical flows, provider: "groq" with Llama-3.3-70B.

The “wrong default” risk

Keeping OpenRouter as the default would mean every reflex call to llm_invoke skips the prompt-cache discount silently. The cost only shows up months later as an over-spent invoice line. Making DeepSeek the default means the common case wins automatically and the power-user case (explicit provider: "openrouter") is still one param away.

Consequences

Positive

Default path is the cost-optimal path for the workload shape we actually have.
DEEPSEEK_API_KEY is already set (see 2026-04-22 session log).
Prompt caching works out-of-the-box on DeepSeek Direct — no code changes required to benefit; just ensure system prompts are stable across calls in a batch.
OpenRouter/Groq/Cerebras remain one param away.

Negative

Adds a fourth provider config to maintain (one extra keyName union entry + PROVIDER_CONFIG row).
DeepSeek’s API is hosted in China; occasional rate-limit surges and geo-latency spikes are possible. For any workload where this is a hard-blocker, the explicit opt-in to Groq/Fireworks is available.
Users who memorized the old OpenRouter model IDs (e.g. zai-org/glm-4.7) need to adjust — either pass provider: "openrouter" explicitly, or switch to DeepSeek model IDs (deepseek-chat, deepseek-reasoner).

Neutral

Tool signature is backwards-compatible — existing callers that passed provider: "openrouter" continue to work. Callers that relied on the implicit default will switch to DeepSeek on next call; their existing model IDs will fail (since they were OpenRouter-format). This is a documented breaking change for implicit callers, acceptable because llm_invoke has no production traffic yet.

Implementation

Changes in this PR:

src/tools/llm-invoke.ts:
- provider enum: ['deepseek', 'openrouter', 'groq', 'cerebras'], default 'deepseek'
- PROVIDER_CONFIG: new deepseek entry, base URL https://api.deepseek.com/v1, key DEEPSEEK_API_KEY
- JSDoc + tool description updated to reflect new default + DeepSeek model IDs
Wrangler secret DEEPSEEK_API_KEY already set in prod (via wrangler secret put 2026-04-22).
docs/audits/cost-optimization-tiers.md — no update needed; the three-tier routing guidance remains valid. Economy-tier provider list just widens.

Verification

After merge + deploy:

# Test DeepSeek Direct path end-to-end
curl -sS https://ascend-gateway-v5.ascendgtm.workers.dev/mcp \
  -H "authorization: Bearer <tenant-token>" \
  -H "content-type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"llm_invoke","arguments":{"model":"deepseek-chat","messages":[{"role":"user","content":"return the word OK"}]}}}'

Expected: success: true, data.choices[0].message.content contains “OK”.

Alternatives considered

Fireworks AI as default — only single vendor serving all 5 original target models, but +100–300% markup vs DeepSeek Direct. Not worth the breadth when our workload rarely needs model switching mid-batch.
Keep OpenRouter as default, add DeepSeek as option — leaves the wrong-default trap in place. Every reflex call pays the aggregation markup + misses the prompt-cache discount. Rejected.
Multi-provider routing layer (Vercel AI Gateway, Portkey, LangDB) — valuable at 10+ tenants with observability/fallback needs; overkill at our current scale. Revisit at Phase 5.

References

Research pass 2026-04-22 (session log): OpenRouter vs Groq vs Cerebras vs Fireworks vs Together vs CF Workers AI vs direct.
docs/audits/cost-optimization-tiers.md — three-tier routing guidance.
DeepSeek API docs: https://api-docs.deepseek.com/
DeepSeek prompt caching: https://api-docs.deepseek.com/guides/kv_cache