Skip to content

Tool Integration Depth System (TIDS)

ADR-049 — Tool Integration Depth System (TIDS)

  • Status: Accepted
  • Date: 2026-05-13
  • Decider: Mishaal Murawala (engineering sequencing delegated to Claude Code per ADR-040)
  • Related: ADR-042 (capability index — TIDS regenerates it), ADR-026 (Workflows for multi-step work), ADR-031, docs/plans/tids.md, .claude/plans/polished-wobbling-clock.md
  • Invariant changed: none (operates within invariants #2, #6, #7, #9, #12)
  • Supersedes prior draft: earlier draft framed this as “Capability Coverage System (CCS)” focused on upstream API endpoint breadth. After scope clarification 2026-05-13, the system is renamed to TIDS and refocused on tool integration depth — how many of each tool’s wiring slots we actually use, not how many of its endpoints we proxy.

Context

Two separate coverage problems exist; the prior CCS draft conflated them.

Problem A — Upstream API breadth. “Salesforce exposes 2,000+ endpoints; we proxy 5.” This is about gateway endpoint coverage and was solvable by OpenAPI parsing. Deferred.

Problem B — Tool integration depth. Each tool we adopt ships with a rich plugin / extension / binding surface, and we typically wire only the obvious shape:

ToolWhat we wiredWhat the tool exposes
Hermesbase agent + ~5 hooksskills, hooks, platforms, providers, context engines, CLI commands, memory backends, directives, mental models, reflections, sync_retain bindings, multi-bank routing
CloudflareKV / D1 / R2 / Workers corefull binding surface × features-per-binding (Queues, DOs, Workflows, Service Bindings, AI Gateway, Browser Rendering, Email, Hyperdrive, Rate Limiting, Analytics Engine, Cron, Vectorize)
SalesforceREST query8 distinct API shapes (REST / Bulk / Streaming / Apex / Flow / Composite / Tooling / Connect)
Slackwebhook + a few eventsevents × interactivity × Block Kit × workflows × CLI
Anthropicbasic chat completionstool_use, files API, batches, agents, prompt caching, computer use, MCP connector
Hindsight3 of (memories, recall, sync_retain)directives, mental_models, reflections, sync_retain, banks, document ops, operations

The failure mode for Problem B is different from Problem A. It isn’t “we didn’t read the endpoint list.” It’s: agents (human or AI) build against the obvious integration shape — a base agent, an HTTP MCP, a CF Worker — and never come back to wire the secondary shapes. Each tool has its own taxonomy of wiring slots. The system has to discover each tool’s taxonomy from its source-of-truth artifacts, not assume one canonical shape.

There is no commercial product solving this. Backstage software catalog comes closest but covers internal services, not the integration depth of third-party tools we install.

The Capability Index (ADR-042) gives the gateway runtime tool discovery from Vectorize. TIDS is the upstream of that index — it maintains the registry, measures wiring depth, and generates the implementation backlog.


Decision

Build a Tool Integration Depth System (TIDS) that owns the slot-level inventory of every tool we install. TIDS extracts integration slots from source-of-truth artifacts (open-source repos via tree-sitter AST, SDK type definitions via ts-morph, our own wrangler.toml, live MCP introspection, product docs as last resort), normalizes into one D1 table + Vectorize index, measures tiered coverage (Tier 1 wired_pct — any reference in our code; Tier 2 prod_used_pct — actual invocation in production), and auto-generates a planning backlog using cheap models (DeepSeek-V3 + Gemini 2.5 Flash + Haiku-for-retry only).

The plan ships in 7 phases. Phase 0 — foundations only — is this PR.


What TIDS owns vs. existing systems

ConcernOwner
Per-tool integration slot inventory (skills, bindings, events, API shapes, config keys)TIDS — new (D1 integration_slots)
Runtime tool discovery (semantic ranking, scope filtering)ADR-042 capability index (Vectorize capability_index) — TIDS regenerates it
Tool-call logging, p95 latency, success rateExisting tool_traces + AI Gateway analytics — TIDS reads, does not duplicate
Tool implementation in the gatewayExisting src/tools/* + src/config/providers.ts — TIDS produces the backlog, ADR-or-TOOLS.md row still required to merge
LLM judge/eval pipelineExisting tri-judge harness — TIDS does not touch

TIDS sits in the cold path only. The gateway request path is unchanged. Invariant #2 (KV-only hot path) and invariant #6 (no DO in request path) are preserved.


Architecture (summary — full design in docs/plans/tids.md)

Source artifact ──► Adapter (5 types) ──► Normalizer/Reconciler ──► D1 integration_slots
Coverage tracker (Tier 1 grep + Tier 2 AI Gateway + Analytics)
Planner (Gemini 2.5 Flash)
docs/automation-backlog.md + GitHub draft issues
Non-stop execution loop (DeepSeek headless `task` runs)

Five adapters:

  • ast-adapter — Hermes (NousResearch/hermes-agent), Infisical, MCP servers, n8n nodes — tree-sitter on cloned source
  • sdk-types-adapter@slack/web-api, @anthropic-ai/sdk, @cloudflare/workers-types, stripe via ts-morph
  • wrangler-config-adapter — our own wrangler.toml + context-worker/wrangler.toml — reveals declared-but-unused bindings
  • mcp-introspection-adapter — Hindsight, any live MCP server — list_tools / list_resources / list_prompts + server instructions
  • doc-llm-adapter (last resort) — Salesforce REST, Gong, Apollo (no spec); output is keyed on API shape, not endpoint count

Adapter precedence on merge: ast > sdk_types > wrangler_config > mcp_introspection > doc_llm. Contradictions land in slot_contradictions for review.


Tiered coverage (load-bearing design choice)

Two numbers per tool, both must move:

  • wired_pct (floor) — fraction of slots with at least one reference in our repos (tree-sitter grep). Free, nightly. Tells us what we could invoke.
  • prod_used_pct (ceiling) — fraction of slots with ≥1 invocation in the trailing 30 days (AI Gateway + Workers Analytics Engine + usage_metrics:* KV). Tells us what we actually invoke.

The category that matters most is “wired but never used in 30 days” — code exists, the slot is plumbed, nobody calls it. The Tier 2 instrumentation is a one-line decorator at registration sites emitting slot_id to Analytics, phased in as we touch each slot.

Defaults: wired_target = 0.70, prod_used_target = 0.40 — lower bar by design; some slots are wired-but-rarely-used.


Model policy

Bulk extraction never touches Claude. The split:

StageModelVolume
AST / SDK / wrangler / MCP introspectionNone (deterministic)High
Doc-only extraction (last-resort SaaS)DeepSeek-V3 with prompt cachingMedium
Contradiction reconciliationClaude Haiku 4.5Low
Plan generation (per slot → task)Gemini 2.5 Flash (1M context)Medium
Retry on JSON failureClaude Haiku 4.5Tiny

Aligned with the LLM Model Policy in ~/.claude/CLAUDE.md: background/cron LLM calls use DeepSeek by default; Claude reserved for user-facing MCP calls or explicit product features. Haiku at the retry/reconciliation layer is justified by low volume and the cost of a silent extraction error (poisons downstream coverage % and backlog).

Model IDs are not hardcoded. They live in KV under tids_config:extraction_model, tids_config:planner_model, tids_config:reconciler_model, tids_config:wired_target, tids_config:prod_used_target — same shape as the existing judge_config:{provider}:current_model pattern.

Estimated steady-state monthly LLM budget: $40–80.


Invariants preserved

  • #2 KV-only hot path — TIDS runs entirely on the cold path. Extraction is cron-scheduled on context-worker. Gateway request path untouched.
  • #6 Request path never touches a DO — no DO involvement in TIDS.
  • #7 Capability index, not static tool count — TIDS regenerates the existing capability_index Vectorize binding. No parallel index.
  • #9 CF Cron + CF Workflows for scheduled/multi-step — TIDS extractor and planner run on Cloudflare Cron Triggers. Multi-step pipelines (clone repo → walk AST → write D1 → embed → notify) use Workflows per ADR-026.
  • #12 LLM calls through AI Gateway — all TIDS LLM calls go through AI Gateway for observability + budget cap.

Trade-offs

Slot taxonomies differ per tool → coverage % is not directly comparable across tools. Accepted. Dashboard shows N independent per-tool numbers, not one global ”%”.

Cost of a doc-LLM hallucination. Doc-only extraction can invent slots. Mitigated by: contradiction flag on every doc_llm-sourced row; promotion from unknown to unwired requires either a second source agreeing or human review.

Tier 2 instrumentation requires touching every registration site. Phased: Tier 1 first (no code change — tree-sitter grep is non-invasive); Tier 2 backfills as we touch each slot.

AST adapter misses runtime-registered slots. Doc-LLM enrichment pass + contradictions reviewed by Haiku.


Reversal criteria

  • LLM extraction cost exceeds $200/month for two consecutive months → drop doc-LLM adapter, restrict TIDS to AST/SDK/wrangler/MCP sources.
  • wired_pct stays flat for 4+ weeks after Phase 5 ships → the bottleneck is implementation capacity, not visibility. Pause auto-backlog generation; re-evaluate.
  • Vectorize re-embedding cost becomes material → cache embeddings keyed by (slot_id, content_hash).

This ADR can be reversed without unwinding code: drop the cron triggers, leave the D1 tables in place as a read-only artifact.


Acceptance criteria (Phase 0)

  • ADR merged on main (this file).
  • LEDGER row added.
  • Plan permalinked at docs/plans/tids.md.
  • D1 migration 0019_tids_tables.sql ready (apply on next deploy).
  • TypeScript types in src/lib/tids/types.ts matching the migration schema.
  • knip.json ignore updated src/lib/ccs/**src/lib/tids/**.
  • CHANGELOG entry under [Unreleased] → Added.

Phase 1 (Hermes adapter — flagged by user as the highest-value first target) begins on a new branch (feat/tids-phase1-hermes) once this PR merges.


References

  • Plan (in-repo permalink): docs/plans/tids.md
  • Plan (canonical authored version): .claude/plans/polished-wobbling-clock.md
  • Existing capability index ADR: ADR-042
  • Multi-step ingestion pattern: ADR-026
  • LLM Model Policy: ~/.claude/CLAUDE.md → “LLM Model Policy”
  • Plan-First PR Discipline: .claude/CLAUDE.md → “Plan-First PR Discipline (Anti-Orphan Rule)”