Capability Index Replaces Static MCP Tool Registration
ADR-042 — Capability Index Replaces Static MCP Tool Registration
- Status: Accepted
- Date: 2026-05-07
- Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
- Supersedes: Invariant #7 in
.claude/rules/v5-invariants.md(“34 tools registered, ceiling 35”) - Related: ADR-040 Track B, ADR-041,
ASCEND_OPERATOR_OS_VISION.md§3.2 (Capability Index), §9 (tool ceiling retirement) - Invariant changed: #7
- Pre-condition for invariant update: Track B completion (weeks 1–8 per ADR-040)
Context
Invariant #7 set a hard ceiling of 35 registered MCP tools. The ceiling existed for two reasons:
Reason 1: LLM context window cost. server.registerTool fanout sends every registered tool’s JSON schema to the LLM in every request. At 35 tools with typical input schemas (~300 tokens each), that is ~10,500 tokens of tool schema per request. At Anthropic Claude 3.5 Sonnet pricing (2026-Q1), this adds ~$0.003 per request in input tokens — $300/month at 100K requests/month before accounting for output. Worse, it degrades reasoning quality because the LLM must attend to irrelevant tool schemas on every turn.
Reason 2: Registration complexity. Adding a tool to src/handlers/mcp.ts requires modifying a file that is in the gateway hot path, risks merge conflicts, and requires a manual code review gate. At 35+ tools, this becomes a coordination bottleneck.
The vision doc (§9, adopted 2026-05-07) explicitly retires the 35-tool ceiling:
“Static registration is a liability. The ceiling is not a feature — it is a proxy for the cost of the context window. Once we index capabilities in Vectorize and retrieve only what’s relevant, the ceiling has no engineering basis.”
The Capability Index (Track B, ADR-040) solves both problems:
- Context cost: retrieve ≤20 semantically relevant tools per LLM context instead of all 35. Most tasks need 3–5 tools. At ≤20 retrieved, the schema overhead is ≤6,000 tokens.
- Coordination cost: adding a tool is a Vectorize embed operation, not a code change to the gateway hot path.
Decision
Retire static server.registerTool fanout and the 35-tool hard ceiling. Replace with the Capability Index: a Vectorize namespace (capability_index) that embeds every tool’s description, schema summary, scopes, and priors (cost, latency, success rate). The gateway surfaces ≤20 tools per LLM context via semantic retrieval. Total catalog is unbounded.
This decision takes effect when Track B ships. Until then, the existing static registration remains in place — this ADR is the formal acceptance that the old invariant is retired and Track B is the implementation vehicle.
Capability Index architecture
Vectorize namespace
Namespace: capability_index (separate from tenant_{tenant_id} namespaces)
Each vector entry metadata schema:
{ "tool_name": "hubspot_crm", "provider": "hubspot", "description": "Query and mutate HubSpot CRM objects...", "scopes": ["crm.objects.contacts.read", "crm.objects.deals.write"], "tags": ["crm", "contacts", "deals", "pipeline"], "estimated_cost_usd": 0.0002, "p95_latency_ms": 380, "success_rate": 0.97, "last_updated_at": 1746700000}The embedding text is: {tool_name} | {description} | scopes: {scopes} | tags: {tags}. This produces embeddings that respond well to intent queries like “score this account” or “pull pipeline data”.
KV mirror
KV key: capability_index:{tool_name}
Stores the same metadata block as JSON. KV is the fast read path when the caller knows the exact tool name (tool re-use in an agent turn, priors update).
Retrieval helper
src/lib/capability-retrieval.ts — retrieveCapabilities(intent, opts): ToolCandidate[]
interface RetrievalOpts { top_k: number; // default 20 max_cost_usd?: number; // filter: exclude tools exceeding per-call cost max_latency_ms_p99?: number; // filter: exclude tools with high tail latency tenant_id?: string; // future: per-tenant tool allowlists}
interface ToolCandidate { tool_name: string; score: number; // cosine similarity 0–1 metadata: CapabilityMetadata;}The retrieval call is made in agent context assembly (before the agent turn begins), NOT in the gateway request hot path for tool execution. This is consistent with invariant #2 — the KV-only hot path for tool proxying is not changed.
Priors writer
A nightly CF Cron job reads the last 24 hours of agent_runs D1 records and the gateway audit logs, computes per-tool success_rate, p95_latency_ms, and estimated_cost_usd, and writes updated metadata to both Vectorize and KV. This is a cold-path operation (Cron job, not request path).
What happens to server.registerTool
Platform tools — remain statically registered forever
Three tools bypass the capability index because they are always-on and must be present in every LLM context regardless of intent:
| Tool | Why always-on |
|---|---|
call_api | Generic proxy. Every agent session may need it. Hiding it would require every retrieval query to include “generic API call” intent. |
discover_apis | Capability exploration. Must be available for agents to discover what they can do. |
batch_execute | Multi-call orchestration. Required for any agent that fans out. |
These three retain server.registerTool calls permanently.
Curated tools (current 31, all non-platform) — migrate to Capability Index on Track B ship
All 31 curated tools are embedded in capability_index as part of Track B’s initial catalog seeding (scripts/embed-tool-catalog.ts). Their server.registerTool calls are removed from src/handlers/mcp.ts when Track B merges.
Migration checklist (to be executed in the Track B merge PR):
- Run
scripts/embed-tool-catalog.ts— all 31 tools embedded and verified incapability_index -
retrieveCapabilitiessmoke test passes for all P0 tool intents - Remove
server.registerToolcalls for all 31 curated tools fromsrc/handlers/mcp.ts - Update
src/config/providers.tstool count metadata - Update invariant #7 in both
.claude/rules/v5-invariants.mdanddocs/architecture/ASCEND-CLOUD-NATIVE-V2-ENGINEERING-PLAN.md - Run
npm run typecheck && npm test && npm run check:pre-commit— all pass
Invariant #7 change
Old text:
34 MCP tools registered (24 curated + 10 platform). Adding a tool requires an ADR or
TOOLS.mdrow documenting scope + owner. Hard ceiling: 35.client_wiki(ADR-037) is Phase 8 — would bring count to 35, the hard ceiling.
New text (effective when Track B merges):
Platform tools always statically registered:
call_api,discover_apis,batch_execute. All other tools live in the Capability Index (Vectorize namespacecapability_index). Total catalog unbounded. ≤20 tools surfaced per LLM context via semantic retrieval. Adding a tool requires adocs/tools/<slug>.mdentry + TOOLS.md row +scripts/embed-tool-catalog.tsre-run. No hard ceiling.
The invariant files MUST be updated in the same commit that merges the Track B implementation PR, not in this plan-first PR.
Rollout phases
| Phase | Timing | Action |
|---|---|---|
| 0 (this ADR) | 2026-05-07 | ADR accepted. Static registration still in place. No code change. |
| 1 (Track B weeks 1–4) | Weeks 1–4 | Vectorize namespace created. scripts/embed-tool-catalog.ts embeds all 31 curated tools. capability-retrieval.ts queryable. Smoke tests pass. No registerTool removals yet. |
| 2 (Track B weeks 5–8) | Weeks 5–8 | Track A agent wired to use retrieveCapabilities for context assembly. Priors writer cron active. capability_index priors update visibly within 24h. |
| 3 (Track B completion) | Week 8 | Track B merge PR: remove registerTool for 31 curated tools, update invariant #7 in both files, verify no regression on P0 tool calls. |
Acceptance criteria
- All 31 curated tools indexed with non-trivial similarity scores (verified by
scripts/verify-capability-index.ts). retrieveCapabilities("score this account against ICP")returnsicp_scorerin top-3.retrieveCapabilities("pull pipeline data from HubSpot")returnshubspot_crmin top-3.- Priors update visibly within 24h of an agent run that calls a tool.
- P0 tool invocations via
call_apiwork correctly afterregisterToolremoval (integration test). npm run typecheck && npm testpass after the removal commit.
Consequences
Positive:
- Tool catalog is unbounded. Ascend can add domain-specific tools per client without hitting a ceiling.
- LLM context cost drops from ~10,500 tokens/request (35 schemas) to ~1,800–6,000 tokens/request (6–20 retrieved schemas).
- Adding a tool no longer requires touching gateway hot-path code.
- Priors (success_rate, latency, cost) feed the inference router (Track E) enabling cost-aware retrieval.
Negative / accepted risk:
- Retrieval adds ~8–12 ms to agent assembly (addressed by ADR-041).
- Retrieval quality depends on embedding model quality. Low-quality embeddings surface wrong tools. Mitigation: acceptance test on all P0 intents before Track B merges; revert gate if top-3 accuracy drops below 90% on the P0 intent suite.
- The priors writer is a cold-path cron job. If it fails silently, priors become stale. Mitigation: the cron job writes a
capability_index:priors_updated_atKV key on every successful run; a health check alerts if this key is >26h old. - Track B and Track A both touch
src/lib/files. Risk of merge conflicts during weeks 5–8 when both are active. Mitigation: Track B ownscapability-retrieval.tsexclusively; Track A ownsagent-memory.tsexclusively. No shared file writes.