Tool Integration Depth System (TIDS)
ADR-049 — Tool Integration Depth System (TIDS)
- Status: Accepted
- Date: 2026-05-13
- Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
- Related: ADR-042 (capability index — TIDS regenerates it), ADR-026 (Workflows for multi-step work), ADR-031,
docs/plans/tids.md,.claude/plans/polished-wobbling-clock.md - Invariant changed: none (operates within invariants #2, #6, #7, #9, #12)
- Supersedes prior draft: earlier draft framed this as “Capability Coverage System (CCS)” focused on upstream API endpoint breadth. After scope clarification 2026-05-13, the system is renamed to TIDS and refocused on tool integration depth — how many of each tool’s wiring slots we actually use, not how many of its endpoints we proxy.
Context
Two separate coverage problems exist; the prior CCS draft conflated them.
Problem A — Upstream API breadth. “Salesforce exposes 2,000+ endpoints; we proxy 5.” This is about gateway endpoint coverage and was solvable by OpenAPI parsing. Deferred.
Problem B — Tool integration depth. Each tool we adopt ships with a rich plugin / extension / binding surface, and we typically wire only the obvious shape:
| Tool | What we wired | What the tool exposes |
|---|---|---|
| Hermes | base agent + ~5 hooks | skills, hooks, platforms, providers, context engines, CLI commands, memory backends, directives, mental models, reflections, sync_retain bindings, multi-bank routing |
| Cloudflare | KV / D1 / R2 / Workers core | full binding surface × features-per-binding (Queues, DOs, Workflows, Service Bindings, AI Gateway, Browser Rendering, Email, Hyperdrive, Rate Limiting, Analytics Engine, Cron, Vectorize) |
| Salesforce | REST query | 8 distinct API shapes (REST / Bulk / Streaming / Apex / Flow / Composite / Tooling / Connect) |
| Slack | webhook + a few events | events × interactivity × Block Kit × workflows × CLI |
| Anthropic | basic chat completions | tool_use, files API, batches, agents, prompt caching, computer use, MCP connector |
| Hindsight | 3 of (memories, recall, sync_retain) | directives, mental_models, reflections, sync_retain, banks, document ops, operations |
The failure mode for Problem B is different from Problem A. It isn’t “we didn’t read the endpoint list.” It’s: agents (human or AI) build against the obvious integration shape — a base agent, an HTTP MCP, a CF Worker — and never come back to wire the secondary shapes. Each tool has its own taxonomy of wiring slots. The system has to discover each tool’s taxonomy from its source-of-truth artifacts, not assume one canonical shape.
There is no commercial product solving this. Backstage software catalog comes closest but covers internal services, not the integration depth of third-party tools we install.
The Capability Index (ADR-042) gives the gateway runtime tool discovery from Vectorize. TIDS is the upstream of that index — it maintains the registry, measures wiring depth, and generates the implementation backlog.
Decision
Build a Tool Integration Depth System (TIDS) that owns the slot-level inventory of every tool we install. TIDS extracts integration slots from source-of-truth artifacts (open-source repos via tree-sitter AST, SDK type definitions via ts-morph, our own wrangler.toml, live MCP introspection, product docs as last resort), normalizes into one D1 table + Vectorize index, measures tiered coverage (Tier 1 wired_pct — any reference in our code; Tier 2 prod_used_pct — actual invocation in production), and auto-generates a planning backlog using cheap models (DeepSeek-V3 + Gemini 2.5 Flash + Haiku-for-retry only).
The plan ships in 7 phases. Phase 0 — foundations only — is this PR.
What TIDS owns vs. existing systems
| Concern | Owner |
|---|---|
| Per-tool integration slot inventory (skills, bindings, events, API shapes, config keys) | TIDS — new (D1 integration_slots) |
| Runtime tool discovery (semantic ranking, scope filtering) | ADR-042 capability index (Vectorize capability_index) — TIDS regenerates it |
| Tool-call logging, p95 latency, success rate | Existing tool_traces + AI Gateway analytics — TIDS reads, does not duplicate |
| Tool implementation in the gateway | Existing src/tools/* + src/config/providers.ts — TIDS produces the backlog, ADR-or-TOOLS.md row still required to merge |
| LLM judge/eval pipeline | Existing tri-judge harness — TIDS does not touch |
TIDS sits in the cold path only. The gateway request path is unchanged. Invariant #2 (KV-only hot path) and invariant #6 (no DO in request path) are preserved.
Architecture (summary — full design in docs/plans/tids.md)
Source artifact ──► Adapter (5 types) ──► Normalizer/Reconciler ──► D1 integration_slots │ ▼ Coverage tracker (Tier 1 grep + Tier 2 AI Gateway + Analytics) │ ▼ Planner (Gemini 2.5 Flash) │ ▼ docs/automation-backlog.md + GitHub draft issues │ ▼ Non-stop execution loop (DeepSeek headless `task` runs)Five adapters:
ast-adapter— Hermes (NousResearch/hermes-agent), Infisical, MCP servers, n8n nodes — tree-sitter on cloned sourcesdk-types-adapter—@slack/web-api,@anthropic-ai/sdk,@cloudflare/workers-types,stripevia ts-morphwrangler-config-adapter— our ownwrangler.toml+context-worker/wrangler.toml— reveals declared-but-unused bindingsmcp-introspection-adapter— Hindsight, any live MCP server —list_tools/list_resources/list_prompts+ serverinstructionsdoc-llm-adapter(last resort) — Salesforce REST, Gong, Apollo (no spec); output is keyed on API shape, not endpoint count
Adapter precedence on merge: ast > sdk_types > wrangler_config > mcp_introspection > doc_llm. Contradictions land in slot_contradictions for review.
Tiered coverage (load-bearing design choice)
Two numbers per tool, both must move:
wired_pct(floor) — fraction of slots with at least one reference in our repos (tree-sitter grep). Free, nightly. Tells us what we could invoke.prod_used_pct(ceiling) — fraction of slots with ≥1 invocation in the trailing 30 days (AI Gateway + Workers Analytics Engine +usage_metrics:*KV). Tells us what we actually invoke.
The category that matters most is “wired but never used in 30 days” — code exists, the slot is plumbed, nobody calls it. The Tier 2 instrumentation is a one-line decorator at registration sites emitting slot_id to Analytics, phased in as we touch each slot.
Defaults: wired_target = 0.70, prod_used_target = 0.40 — lower bar by design; some slots are wired-but-rarely-used.
Model policy
Bulk extraction never touches Claude. The split:
| Stage | Model | Volume |
|---|---|---|
| AST / SDK / wrangler / MCP introspection | None (deterministic) | High |
| Doc-only extraction (last-resort SaaS) | DeepSeek-V3 with prompt caching | Medium |
| Contradiction reconciliation | Claude Haiku 4.5 | Low |
| Plan generation (per slot → task) | Gemini 2.5 Flash (1M context) | Medium |
| Retry on JSON failure | Claude Haiku 4.5 | Tiny |
Aligned with the LLM Model Policy in ~/.claude/CLAUDE.md: background/cron LLM calls use DeepSeek by default; Claude reserved for user-facing MCP calls or explicit product features. Haiku at the retry/reconciliation layer is justified by low volume and the cost of a silent extraction error (poisons downstream coverage % and backlog).
Model IDs are not hardcoded. They live in KV under tids_config:extraction_model, tids_config:planner_model, tids_config:reconciler_model, tids_config:wired_target, tids_config:prod_used_target — same shape as the existing judge_config:{provider}:current_model pattern.
Estimated steady-state monthly LLM budget: $40–80.
Invariants preserved
- #2 KV-only hot path — TIDS runs entirely on the cold path. Extraction is cron-scheduled on context-worker. Gateway request path untouched.
- #6 Request path never touches a DO — no DO involvement in TIDS.
- #7 Capability index, not static tool count — TIDS regenerates the existing
capability_indexVectorize binding. No parallel index. - #9 CF Cron + CF Workflows for scheduled/multi-step — TIDS extractor and planner run on Cloudflare Cron Triggers. Multi-step pipelines (clone repo → walk AST → write D1 → embed → notify) use Workflows per ADR-026.
- #12 LLM calls through AI Gateway — all TIDS LLM calls go through AI Gateway for observability + budget cap.
Trade-offs
Slot taxonomies differ per tool → coverage % is not directly comparable across tools. Accepted. Dashboard shows N independent per-tool numbers, not one global ”%”.
Cost of a doc-LLM hallucination. Doc-only extraction can invent slots. Mitigated by: contradiction flag on every doc_llm-sourced row; promotion from unknown to unwired requires either a second source agreeing or human review.
Tier 2 instrumentation requires touching every registration site. Phased: Tier 1 first (no code change — tree-sitter grep is non-invasive); Tier 2 backfills as we touch each slot.
AST adapter misses runtime-registered slots. Doc-LLM enrichment pass + contradictions reviewed by Haiku.
Reversal criteria
- LLM extraction cost exceeds $200/month for two consecutive months → drop doc-LLM adapter, restrict TIDS to AST/SDK/wrangler/MCP sources.
wired_pctstays flat for 4+ weeks after Phase 5 ships → the bottleneck is implementation capacity, not visibility. Pause auto-backlog generation; re-evaluate.- Vectorize re-embedding cost becomes material → cache embeddings keyed by
(slot_id, content_hash).
This ADR can be reversed without unwinding code: drop the cron triggers, leave the D1 tables in place as a read-only artifact.
Acceptance criteria (Phase 0)
- ADR merged on
main(this file). - LEDGER row added.
- Plan permalinked at
docs/plans/tids.md. - D1 migration
0019_tids_tables.sqlready (apply on next deploy). - TypeScript types in
src/lib/tids/types.tsmatching the migration schema. -
knip.jsonignore updatedsrc/lib/ccs/**→src/lib/tids/**. - CHANGELOG entry under
[Unreleased] → Added.
Phase 1 (Hermes adapter — flagged by user as the highest-value first target) begins on a new branch (feat/tids-phase1-hermes) once this PR merges.
References
- Plan (in-repo permalink):
docs/plans/tids.md - Plan (canonical authored version):
.claude/plans/polished-wobbling-clock.md - Existing capability index ADR: ADR-042
- Multi-step ingestion pattern: ADR-026
- LLM Model Policy:
~/.claude/CLAUDE.md→ “LLM Model Policy” - Plan-First PR Discipline:
.claude/CLAUDE.md→ “Plan-First PR Discipline (Anti-Orphan Rule)”