Tool Integration Depth System (TIDS)

ADR-049 — Tool Integration Depth System (TIDS)

Status: Accepted
Date: 2026-05-13
Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
Related: ADR-042 (capability index — TIDS regenerates it), ADR-026 (Workflows for multi-step work), ADR-031, docs/plans/tids.md, .claude/plans/polished-wobbling-clock.md
Invariant changed: none (operates within invariants #2, #6, #7, #9, #12)
Supersedes prior draft: earlier draft framed this as “Capability Coverage System (CCS)” focused on upstream API endpoint breadth. After scope clarification 2026-05-13, the system is renamed to TIDS and refocused on tool integration depth — how many of each tool’s wiring slots we actually use, not how many of its endpoints we proxy.

Context

Two separate coverage problems exist; the prior CCS draft conflated them.

Problem A — Upstream API breadth. “Salesforce exposes 2,000+ endpoints; we proxy 5.” This is about gateway endpoint coverage and was solvable by OpenAPI parsing. Deferred.

Problem B — Tool integration depth. Each tool we adopt ships with a rich plugin / extension / binding surface, and we typically wire only the obvious shape:

Tool	What we wired	What the tool exposes
Hermes	base agent + ~5 hooks	skills, hooks, platforms, providers, context engines, CLI commands, memory backends, directives, mental models, reflections, sync_retain bindings, multi-bank routing
Cloudflare	KV / D1 / R2 / Workers core	full binding surface × features-per-binding (Queues, DOs, Workflows, Service Bindings, AI Gateway, Browser Rendering, Email, Hyperdrive, Rate Limiting, Analytics Engine, Cron, Vectorize)
Salesforce	REST query	8 distinct API shapes (REST / Bulk / Streaming / Apex / Flow / Composite / Tooling / Connect)
Slack	webhook + a few events	events × interactivity × Block Kit × workflows × CLI
Anthropic	basic chat completions	tool_use, files API, batches, agents, prompt caching, computer use, MCP connector
Hindsight	3 of (memories, recall, sync_retain)	directives, mental_models, reflections, sync_retain, banks, document ops, operations

The failure mode for Problem B is different from Problem A. It isn’t “we didn’t read the endpoint list.” It’s: agents (human or AI) build against the obvious integration shape — a base agent, an HTTP MCP, a CF Worker — and never come back to wire the secondary shapes. Each tool has its own taxonomy of wiring slots. The system has to discover each tool’s taxonomy from its source-of-truth artifacts, not assume one canonical shape.

There is no commercial product solving this. Backstage software catalog comes closest but covers internal services, not the integration depth of third-party tools we install.

The Capability Index (ADR-042) gives the gateway runtime tool discovery from Vectorize. TIDS is the upstream of that index — it maintains the registry, measures wiring depth, and generates the implementation backlog.

Decision

Build a Tool Integration Depth System (TIDS) that owns the slot-level inventory of every tool we install. TIDS extracts integration slots from source-of-truth artifacts (open-source repos via tree-sitter AST, SDK type definitions via ts-morph, our own wrangler.toml, live MCP introspection, product docs as last resort), normalizes into one D1 table + Vectorize index, measures tiered coverage (Tier 1 wired_pct — any reference in our code; Tier 2 prod_used_pct — actual invocation in production), and auto-generates a planning backlog using cheap models (DeepSeek-V3 + Gemini 2.5 Flash + Haiku-for-retry only).

The plan ships in 7 phases. Phase 0 — foundations only — is this PR.

What TIDS owns vs. existing systems

Concern	Owner
Per-tool integration slot inventory (skills, bindings, events, API shapes, config keys)	TIDS — new (D1 `integration_slots`)
Runtime tool discovery (semantic ranking, scope filtering)	ADR-042 capability index (Vectorize `capability_index`) — TIDS regenerates it
Tool-call logging, p95 latency, success rate	Existing `tool_traces` + AI Gateway analytics — TIDS reads, does not duplicate
Tool implementation in the gateway	Existing `src/tools/` + `src/config/providers.ts` — TIDS produces the backlog*, ADR-or-TOOLS.md row still required to merge
LLM judge/eval pipeline	Existing tri-judge harness — TIDS does not touch

TIDS sits in the cold path only. The gateway request path is unchanged. Invariant #2 (KV-only hot path) and invariant #6 (no DO in request path) are preserved.

Architecture (summary — full design in `docs/plans/tids.md`)

Source artifact ──► Adapter (5 types) ──► Normalizer/Reconciler ──► D1 integration_slots
                                                                         │
                                                                         ▼
                                          Coverage tracker (Tier 1 grep + Tier 2 AI Gateway + Analytics)
                                                                         │
                                                                         ▼
                                                          Planner (Gemini 2.5 Flash)
                                                                         │
                                                                         ▼
                                               docs/automation-backlog.md  +  GitHub draft issues
                                                                         │
                                                                         ▼
                                                  Non-stop execution loop (DeepSeek headless `task` runs)

Five adapters:

ast-adapter — Hermes (NousResearch/hermes-agent), Infisical, MCP servers, n8n nodes — tree-sitter on cloned source
sdk-types-adapter — @slack/web-api, @anthropic-ai/sdk, @cloudflare/workers-types, stripe via ts-morph
wrangler-config-adapter — our own wrangler.toml + context-worker/wrangler.toml — reveals declared-but-unused bindings
mcp-introspection-adapter — Hindsight, any live MCP server — list_tools / list_resources / list_prompts + server instructions
doc-llm-adapter (last resort) — Salesforce REST, Gong, Apollo (no spec); output is keyed on API shape, not endpoint count

Adapter precedence on merge: ast > sdk_types > wrangler_config > mcp_introspection > doc_llm. Contradictions land in slot_contradictions for review.

Tiered coverage (load-bearing design choice)

Two numbers per tool, both must move:

wired_pct (floor) — fraction of slots with at least one reference in our repos (tree-sitter grep). Free, nightly. Tells us what we could invoke.
prod_used_pct (ceiling) — fraction of slots with ≥1 invocation in the trailing 30 days (AI Gateway + Workers Analytics Engine + usage_metrics:* KV). Tells us what we actually invoke.

The category that matters most is “wired but never used in 30 days” — code exists, the slot is plumbed, nobody calls it. The Tier 2 instrumentation is a one-line decorator at registration sites emitting slot_id to Analytics, phased in as we touch each slot.

Defaults: wired_target = 0.70, prod_used_target = 0.40 — lower bar by design; some slots are wired-but-rarely-used.

Model policy

Bulk extraction never touches Claude. The split:

Stage	Model	Volume
AST / SDK / wrangler / MCP introspection	None (deterministic)	High
Doc-only extraction (last-resort SaaS)	DeepSeek-V3 with prompt caching	Medium
Contradiction reconciliation	Claude Haiku 4.5	Low
Plan generation (per slot → task)	Gemini 2.5 Flash (1M context)	Medium
Retry on JSON failure	Claude Haiku 4.5	Tiny

Aligned with the LLM Model Policy in ~/.claude/CLAUDE.md: background/cron LLM calls use DeepSeek by default; Claude reserved for user-facing MCP calls or explicit product features. Haiku at the retry/reconciliation layer is justified by low volume and the cost of a silent extraction error (poisons downstream coverage % and backlog).

Model IDs are not hardcoded. They live in KV under tids_config:extraction_model, tids_config:planner_model, tids_config:reconciler_model, tids_config:wired_target, tids_config:prod_used_target — same shape as the existing judge_config:{provider}:current_model pattern.

Estimated steady-state monthly LLM budget: $40–80.

Invariants preserved

#2 KV-only hot path — TIDS runs entirely on the cold path. Extraction is cron-scheduled on context-worker. Gateway request path untouched.
#6 Request path never touches a DO — no DO involvement in TIDS.
#7 Capability index, not static tool count — TIDS regenerates the existing capability_index Vectorize binding. No parallel index.
#9 CF Cron + CF Workflows for scheduled/multi-step — TIDS extractor and planner run on Cloudflare Cron Triggers. Multi-step pipelines (clone repo → walk AST → write D1 → embed → notify) use Workflows per ADR-026.
#12 LLM calls through AI Gateway — all TIDS LLM calls go through AI Gateway for observability + budget cap.

Trade-offs

Slot taxonomies differ per tool → coverage % is not directly comparable across tools. Accepted. Dashboard shows N independent per-tool numbers, not one global ”%”.

Cost of a doc-LLM hallucination. Doc-only extraction can invent slots. Mitigated by: contradiction flag on every doc_llm-sourced row; promotion from unknown to unwired requires either a second source agreeing or human review.

Tier 2 instrumentation requires touching every registration site. Phased: Tier 1 first (no code change — tree-sitter grep is non-invasive); Tier 2 backfills as we touch each slot.

AST adapter misses runtime-registered slots. Doc-LLM enrichment pass + contradictions reviewed by Haiku.

Reversal criteria

LLM extraction cost exceeds $200/month for two consecutive months → drop doc-LLM adapter, restrict TIDS to AST/SDK/wrangler/MCP sources.
wired_pct stays flat for 4+ weeks after Phase 5 ships → the bottleneck is implementation capacity, not visibility. Pause auto-backlog generation; re-evaluate.
Vectorize re-embedding cost becomes material → cache embeddings keyed by (slot_id, content_hash).

This ADR can be reversed without unwinding code: drop the cron triggers, leave the D1 tables in place as a read-only artifact.

Acceptance criteria (Phase 0)

ADR merged on main (this file).
LEDGER row added.
Plan permalinked at docs/plans/tids.md.
D1 migration 0019_tids_tables.sql ready (apply on next deploy).
TypeScript types in src/lib/tids/types.ts matching the migration schema.
knip.json ignore updated src/lib/ccs/** → src/lib/tids/**.
CHANGELOG entry under [Unreleased] → Added.

Phase 1 (Hermes adapter — flagged by user as the highest-value first target) begins on a new branch (feat/tids-phase1-hermes) once this PR merges.

References

Plan (in-repo permalink): docs/plans/tids.md
Plan (canonical authored version): .claude/plans/polished-wobbling-clock.md
Existing capability index ADR: ADR-042
Multi-step ingestion pattern: ADR-026
LLM Model Policy: ~/.claude/CLAUDE.md → “LLM Model Policy”
Plan-First PR Discipline: .claude/CLAUDE.md → “Plan-First PR Discipline (Anti-Orphan Rule)”