Capability Index Replaces Static MCP Tool Registration

ADR-042 — Capability Index Replaces Static MCP Tool Registration

Status: Accepted
Date: 2026-05-07
Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
Supersedes: Invariant #7 in .claude/rules/v5-invariants.md (“34 tools registered, ceiling 35”)
Related: ADR-040 Track B, ADR-041, ASCEND_OPERATOR_OS_VISION.md §3.2 (Capability Index), §9 (tool ceiling retirement)
Invariant changed: #7
Pre-condition for invariant update: Track B completion (weeks 1–8 per ADR-040)

Context

Invariant #7 set a hard ceiling of 35 registered MCP tools. The ceiling existed for two reasons:

Reason 1: LLM context window cost. server.registerTool fanout sends every registered tool’s JSON schema to the LLM in every request. At 35 tools with typical input schemas (~300 tokens each), that is ~10,500 tokens of tool schema per request. At Anthropic Claude 3.5 Sonnet pricing (2026-Q1), this adds ~$0.003 per request in input tokens — $300/month at 100K requests/month before accounting for output. Worse, it degrades reasoning quality because the LLM must attend to irrelevant tool schemas on every turn.

Reason 2: Registration complexity. Adding a tool to src/handlers/mcp.ts requires modifying a file that is in the gateway hot path, risks merge conflicts, and requires a manual code review gate. At 35+ tools, this becomes a coordination bottleneck.

The vision doc (§9, adopted 2026-05-07) explicitly retires the 35-tool ceiling:

“Static registration is a liability. The ceiling is not a feature — it is a proxy for the cost of the context window. Once we index capabilities in Vectorize and retrieve only what’s relevant, the ceiling has no engineering basis.”

The Capability Index (Track B, ADR-040) solves both problems:

Context cost: retrieve ≤20 semantically relevant tools per LLM context instead of all 35. Most tasks need 3–5 tools. At ≤20 retrieved, the schema overhead is ≤6,000 tokens.
Coordination cost: adding a tool is a Vectorize embed operation, not a code change to the gateway hot path.

Decision

Retire static server.registerTool fanout and the 35-tool hard ceiling. Replace with the Capability Index: a Vectorize namespace (capability_index) that embeds every tool’s description, schema summary, scopes, and priors (cost, latency, success rate). The gateway surfaces ≤20 tools per LLM context via semantic retrieval. Total catalog is unbounded.

This decision takes effect when Track B ships. Until then, the existing static registration remains in place — this ADR is the formal acceptance that the old invariant is retired and Track B is the implementation vehicle.

Capability Index architecture

Vectorize namespace

Namespace: capability_index (separate from tenant_{tenant_id} namespaces)

Each vector entry metadata schema:

{
  "tool_name": "hubspot_crm",
  "provider": "hubspot",
  "description": "Query and mutate HubSpot CRM objects...",
  "scopes": ["crm.objects.contacts.read", "crm.objects.deals.write"],
  "tags": ["crm", "contacts", "deals", "pipeline"],
  "estimated_cost_usd": 0.0002,
  "p95_latency_ms": 380,
  "success_rate": 0.97,
  "last_updated_at": 1746700000
}

The embedding text is: {tool_name} | {description} | scopes: {scopes} | tags: {tags}. This produces embeddings that respond well to intent queries like “score this account” or “pull pipeline data”.

KV mirror

KV key: capability_index:{tool_name}

Stores the same metadata block as JSON. KV is the fast read path when the caller knows the exact tool name (tool re-use in an agent turn, priors update).

Retrieval helper

src/lib/capability-retrieval.ts — retrieveCapabilities(intent, opts): ToolCandidate[]

interface RetrievalOpts {
  top_k: number;          // default 20
  max_cost_usd?: number;  // filter: exclude tools exceeding per-call cost
  max_latency_ms_p99?: number; // filter: exclude tools with high tail latency
  tenant_id?: string;     // future: per-tenant tool allowlists
}

interface ToolCandidate {
  tool_name: string;
  score: number;          // cosine similarity 0–1
  metadata: CapabilityMetadata;
}

The retrieval call is made in agent context assembly (before the agent turn begins), NOT in the gateway request hot path for tool execution. This is consistent with invariant #2 — the KV-only hot path for tool proxying is not changed.

Priors writer

A nightly CF Cron job reads the last 24 hours of agent_runs D1 records and the gateway audit logs, computes per-tool success_rate, p95_latency_ms, and estimated_cost_usd, and writes updated metadata to both Vectorize and KV. This is a cold-path operation (Cron job, not request path).

What happens to `server.registerTool`

Platform tools — remain statically registered forever

Three tools bypass the capability index because they are always-on and must be present in every LLM context regardless of intent:

Tool	Why always-on
`call_api`	Generic proxy. Every agent session may need it. Hiding it would require every retrieval query to include “generic API call” intent.
`discover_apis`	Capability exploration. Must be available for agents to discover what they can do.
`batch_execute`	Multi-call orchestration. Required for any agent that fans out.

These three retain server.registerTool calls permanently.

Curated tools (current 31, all non-platform) — migrate to Capability Index on Track B ship

All 31 curated tools are embedded in capability_index as part of Track B’s initial catalog seeding (scripts/embed-tool-catalog.ts). Their server.registerTool calls are removed from src/handlers/mcp.ts when Track B merges.

Migration checklist (to be executed in the Track B merge PR):

Run scripts/embed-tool-catalog.ts — all 31 tools embedded and verified in capability_index
retrieveCapabilities smoke test passes for all P0 tool intents
Remove server.registerTool calls for all 31 curated tools from src/handlers/mcp.ts
Update src/config/providers.ts tool count metadata
Update invariant #7 in both .claude/rules/v5-invariants.md and docs/architecture/ASCEND-CLOUD-NATIVE-V2-ENGINEERING-PLAN.md
Run npm run typecheck && npm test && npm run check:pre-commit — all pass

Invariant #7 change

Old text:

34 MCP tools registered (24 curated + 10 platform). Adding a tool requires an ADR or TOOLS.md row documenting scope + owner. Hard ceiling: 35. client_wiki (ADR-037) is Phase 8 — would bring count to 35, the hard ceiling.

New text (effective when Track B merges):

Platform tools always statically registered: call_api, discover_apis, batch_execute. All other tools live in the Capability Index (Vectorize namespace capability_index). Total catalog unbounded. ≤20 tools surfaced per LLM context via semantic retrieval. Adding a tool requires a docs/tools/<slug>.md entry + TOOLS.md row + scripts/embed-tool-catalog.ts re-run. No hard ceiling.

The invariant files MUST be updated in the same commit that merges the Track B implementation PR, not in this plan-first PR.

Rollout phases

Phase	Timing	Action
0 (this ADR)	2026-05-07	ADR accepted. Static registration still in place. No code change.
1 (Track B weeks 1–4)	Weeks 1–4	Vectorize namespace created. `scripts/embed-tool-catalog.ts` embeds all 31 curated tools. `capability-retrieval.ts` queryable. Smoke tests pass. No `registerTool` removals yet.
2 (Track B weeks 5–8)	Weeks 5–8	Track A agent wired to use `retrieveCapabilities` for context assembly. Priors writer cron active. `capability_index` priors update visibly within 24h.
3 (Track B completion)	Week 8	Track B merge PR: remove `registerTool` for 31 curated tools, update invariant #7 in both files, verify no regression on P0 tool calls.

Acceptance criteria

All 31 curated tools indexed with non-trivial similarity scores (verified by scripts/verify-capability-index.ts).
retrieveCapabilities("score this account against ICP") returns icp_scorer in top-3.
retrieveCapabilities("pull pipeline data from HubSpot") returns hubspot_crm in top-3.
Priors update visibly within 24h of an agent run that calls a tool.
P0 tool invocations via call_api work correctly after registerTool removal (integration test).
npm run typecheck && npm test pass after the removal commit.

Consequences

Positive:

Tool catalog is unbounded. Ascend can add domain-specific tools per client without hitting a ceiling.
LLM context cost drops from ~10,500 tokens/request (35 schemas) to ~1,800–6,000 tokens/request (6–20 retrieved schemas).
Adding a tool no longer requires touching gateway hot-path code.
Priors (success_rate, latency, cost) feed the inference router (Track E) enabling cost-aware retrieval.

Negative / accepted risk:

Retrieval adds ~8–12 ms to agent assembly (addressed by ADR-041).
Retrieval quality depends on embedding model quality. Low-quality embeddings surface wrong tools. Mitigation: acceptance test on all P0 intents before Track B merges; revert gate if top-3 accuracy drops below 90% on the P0 intent suite.
The priors writer is a cold-path cron job. If it fails silently, priors become stale. Mitigation: the cron job writes a capability_index:priors_updated_at KV key on every successful run; a health check alerts if this key is >26h old.
Track B and Track A both touch src/lib/ files. Risk of merge conflicts during weeks 5–8 when both are active. Mitigation: Track B owns capability-retrieval.ts exclusively; Track A owns agent-memory.ts exclusively. No shared file writes.