Tool Utilization Framework (TUF)
ADR-062 — Tool Utilization Framework (TUF)
- Status: Accepted
- Date: 2026-05-19
- Supersedes: none
- INVARIANTS-UNCHANGED — TUF invariants I1–I7 below are TUF-scoped (measurement framework), not amendments to the v5 gateway invariants in
.claude/rules/v5-invariants.md. - Related: ADR-042 (capability index), ADR-053 (lean stack), ADR-054 (V5 retirement), ADR-057 (cutover 2026-05-19),
~/.claude/rules/doctrine.mdD3 (two-layer measurement separation)
Context
We keep installing tools and using a fraction of their surface area — Hermes V3 wired as a cron runner, SEMrush at 5 of 35 report types, HubSpot at ~20% API coverage. Each gap was caught by accident, not by a systematic process.
There is no measurement layer that answers two questions independently:
- What % of a tool’s feature universe have we wired into the codebase?
- What % of what we have wired do we actually invoke?
Conflating those two numbers (e.g. “we use HubSpot at 20%”) is the exact failure mode that hides whether the gap is implementation work or adoption work. Adoption work on top of incomplete implementation is premature optimization.
Decision
Build the Tool Utilization Framework (TUF) — a measurement system on the existing CF Workers stack that produces two separate scores per tool:
- Layer 1 — Implementation Coverage =
implemented_features / total_features_in_universe. - Layer 2 — Utilization =
features_called_last_N_days / implemented_features. Computed only when Layer 1 = 100%. OtherwiseLayer 2 = N/A (incomplete).
The framework is composed of:
- Adapter interface locked at M1:
discover() / checkImplementation() / fetchFeed(). Every integration conforms. Adding integration N+1 never modifies the interface. - D1 cold-path tables for measurement state:
tools_register,feature_universe,feature_calls,change_log,triage_queue. - 5 markdown artifacts per tool under
docs/tools/<slug>/:profile.md,feature-universe.md,implementation-status.md,implementation-backlog.md,utilization-report.md. - CF Workflow + Cron drives scheduled audit runs (M4).
- Vectorize semantic index over
feature_universe(M6) — reuses the existingcapability_indexbinding pattern from ADR-042. - Slack + GitHub sinks for drift digests + auto-filed issues (M7).
tool-auditskill at~/.claude/skills/tool-audit/SKILL.mdwith subcommandsonboard / run / review / score.
Out of scope
- Issue #557 — lands as a single row in Mem0’s
implementation-backlog.md, not as a top-level TUF requirement. - New vendors. TUF is built entirely on CF Workers + Workflows + Cron + D1 + Vectorize + KV + Slack + GitHub + Claude (via AI Gateway). Any new vendor needs its own ADR (doctrine D5).
- Tool selection. TUF measures coverage and utilization; it does not decide which tools to install.
Invariants (I1–I7)
- Two-layer scores are separate. Layer 2 reports
N/A (incomplete)whenever Layer 1 < 100%. No blended single-number score. - Adapter interface frozen at M1. Changes require an amendment ADR.
- Existing-stack-only. No new vendors, runtimes, or hosted services.
- Feature universe is the source of truth for Layer 1 denominators. Hand-edited supplements go in
implementation-backlog.md; the universe file regenerates fromdiscover()and is overwritten on each run. feature_callswrites are cold path. TUF never reads or writes D1 on the gateway hot path.- 5 artifacts per tool, no more, no less. Adding a sixth needs an amendment ADR.
- Drift is a CI gate, not a doc problem. Mismatch between
feature_universe.mdand the adapter’sdiscover()output fails pre-commit (doctrine D6).
Consequences
- Positive: surfaces hidden gaps (Hermes-style underutilization, SEMrush-style implementation gaps) as separate, addressable backlogs. Forces every new integration through the same shape, so audits are cheap to repeat.
- Positive: reuses CF Workflow / Cron / Vectorize patterns we already operate — zero new ops surface.
- Negative: doubles the doc-maintenance burden per integration (5 artifacts). Mitigated by
tool-audit runregenerating 3 of the 5 from telemetry +discover(). - Negative: requires every existing integration to backfill a
discover()implementation before its Layer 1 score is meaningful. Sequenced across M2–M7.
Reversal trigger
If after M7 + 30 days of runtime, fewer than 3 integrations have backfilled discover(), or no decision has been made on a flagged drift, the framework has failed adoption. Re-evaluate scope or sunset.
References
docs/projects/tool-utilization-framework/GOAL.md— frozen contract (problem, DONE checklist, OOS, invariants, adapter contract, milestones, stop conditions).docs/projects/tool-utilization-framework/working-context.md— operational state.docs/projects/tool-utilization-framework/status.md— engineering decision log.~/.claude/rules/doctrine.mdD3 — two-layer measurement separation rule that TUF instantiates.