Tool Risk Ratings, Rationale Field, and Agent Feedback Tool
ADR-036: Tool Risk Ratings, Rationale Field, and Agent Feedback Tool
Status: Proposed Date: 2026-05-01 Author: Claude Code (claude/quality-harness-and-knowledge-layer) Supersedes: — Related: ADR-024 (OAuth 2.1), ADR-030 (Universal MCP Access), ADR-034 (Harness), ADR-035 (KL+BF) Plan: V5-QUALITY-HARNESS-AND-KNOWLEDGE-LAYER.md
Context
Three findings from production agent platforms (Ramp, OpenAI, multiple early-V5 user reports) point to the same gap: V5 today does not give calling agents (Claude Code, Cursor, ChatGPT, Codex, Hermes) enough structure to reliably succeed.
- OpenAI’s Practical Guide to Building Agents (2025) §Tool safeguards. Every tool should carry a risk rating (low / med / high). High-risk tools should automatically pause for confirmation or escalate to a human. V5 today executes
aws_ses_send(sends real customer emails) on the same code path asga4_report(read-only). - Ramp’s “Designing for Agents” (Apr 2026, Teddy Riker). Ramp shipped two near-zero-cost upgrades to their MCP that materially improved agent success: (a) every tool requires a
rationaleparameter, used to reconstruct intent from chat traffic Ramp can’t see; (b) a standalonesubmit_feedbacktool that calling agents invoke when blocked. They learn product-features-to-build directly from agent feedback. V5 today has neither. - Notion’s MCP tool description pattern.
notion-create-pagesopens with: “For the complete Markdown specification, always first fetch the MCP resource at notion://docs/enhanced-markdown-spec. Do NOT guess or hallucinate Markdown syntax.” Every Notion-MCP-via-Claude write succeeds first time. V5’s tool descriptions today are generic — agents hallucinate property names onhubspot_crm,salesforce_crm,meta_adswrites routinely.
These three are independent fixes that share the same shape: small upgrades to every existing tool that compound dramatically across the 31-tool surface.
Decision
Three coordinated upgrades to the MCP tool surface. All three are non-breaking for existing callers (additive parameters with safe defaults).
Decision 1 — rationale parameter required on every MCP tool
- Add to every Zod input schema in
src/tools/*.ts:rationale: z.string().min(10).max(500).describe("One-sentence explanation of why you are making this call. " +"Used by V5 to reconstruct intent for product analytics — never shown to end users. " +"Required.") - Persist
rationaletokv_audit(D1) on every call. - Validation enforced by Zod
.strict()(already invariant). Missing rationale →VALIDATION_ERRORat the gateway. - Cost. ~10 input tokens per call. Negligible. Rationale is logged but does not modify upstream behavior.
Decision 2 — submit_feedback MCP tool
- New tool, registered in
src/handlers/mcp.ts. - Inputs:
{attempted_action: string; // "Tried to create a HubSpot deal with custom property X"expected_outcome: string; // "Deal created with property X populated"actual_outcome: string; // "API returned error: property X does not exist on deal object"tool_called?: string; // "hubspot_crm"error_seen?: string; // raw error messageseverity: 'low' | 'med' | 'high';rationale: string; // (required by Decision 1)}
- Persists to D1
agent_feedback(migration0009_agent_feedback.sql). - Admin endpoint
GET /admin/feedback?since=&severity=for weekly triage. - Mishaal triages weekly. Patterns become product features.
Decision 3 — Tool risk ratings + auto-approve gate
- New field on
ProviderMeta(and per-tool override):risk: 'low' | 'medium' | 'high'. - Tagging rules:
- low — read-only operations (
ga4_report,gsc_performance,search_knowledge,context_query,*_queryfor read-only intents). - medium — write operations on internal records (
hubspot_crmupsert,salesforce_crmupdate,agent_statewrite). - high — financial / destructive / external-messaging operations (
aws_ses_send,gmailwithsend, any*_mutateon advertising platforms,salesforce_crmdelete,hubspot_crmdelete).
- low — read-only operations (
- Execution gate: high-risk tool call without
confirm: trueAND tenant configauto_approve_high_risk = false(default false) → returns:{"requires_human_approval": true,"approval_id": "uuid-here","approval_url": "/admin/approvals/uuid-here","summary": "<rendered action summary>","ttl_seconds": 86400} - Pending approval persisted to KV
pending_approval:{id}with 24h TTL. - Admin endpoints:
POST /admin/approvals/{id}/approve→ executes the original call;POST /admin/approvals/{id}/reject→ marks rejected. - Tenant override:
tenant_config:{tenant}.auto_approve_high_risk: true→ high-risk calls execute directly. Documented as opt-in for advanced tenants only.
Decision 4 — Spec-first tool descriptions (Notion pattern)
- For every provider with non-trivial schema (HubSpot, Salesforce, Google Ads, Meta Ads, LinkedIn Ads, GA4): create one MCP Resource at
mcp://{provider}/cheatsheetcontaining property names, association types, common query patterns, gotchas. - Every affected tool description opens with: “Before any write or non-trivial query, fetch
mcp://{provider}/cheatsheetfirst. Do NOT guess property names or association types.” - Cheatsheets are markdown files in
src/resources/cheatsheets/{provider}.md, registered as MCP Resources alongside tools.
Consequences
Positive
rationalereconstructs intent. Even though V5 can’t see ChatGPT/Cursor/Codex chat content, rationales aggregate into clear product-discovery signal. Cluster the rationales weekly → see what calling agents are actually trying to do.submit_feedbackis a direct line from calling agents to our roadmap. Ramp’s experience: agents are more specific and more consistent in feedback than human users. We get to the third decimal place of “what to build next” without surveying.- Risk ratings prevent expensive accidents. A miscalibrated agent can no longer auto-fire an SES batch send to 10K subscribers without explicit confirmation.
- Spec-first descriptions cut the most common failure mode. “Agent hallucinated a property name and the call 4xx’d” goes from baseline-frequent to near-zero on the providers we cheatsheet.
- All four decisions compose. The Grader (ADR-034) reads
rationaleto category-condition rubrics. Risk ratings inform Bridge abort thresholds (high-risk regressions abort faster). Feedback feeds the engineering pipeline directly.
Negative
- Breaking change risk for existing API callers —
rationalebecomes required. Mitigated by:- Two-week grace period: rationale optional with deprecation warning logged + Slack alert per offending caller.
- Then enforced strictly.
- Only non-MCP REST callers are affected; MCP callers (
/mcproute) get the new schema automatically because they read schemas from the server.
- Per-call latency from cheatsheet fetches. Each
mcp://*/cheatsheetresource read is one extra round-trip. Mitigated by aggressive caching (ETag + 1h TTL on the resource). - Approval flow adds friction for legitimate batch operations. Mitigated by
auto_approve_high_risk: trueopt-in on a per-tenant basis.
Neutral
- New D1 table
agent_feedback,pending_approvals(or KV-only for the latter — TBD in implementation). - Tool count: +1 (
submit_feedback). Combined with ADR-035’s +3, hits the 35 ceiling. kv_auditschema gains arationalecolumn. Migration0011_kv_audit_rationale.sql.
Alternatives considered
- Skip
rationale, infer intent from tool sequence. Rejected — too lossy. Ramp’s quoted experience (“rationales reconstruct intent we can’t see”) is the empirical case for this. - Skip risk ratings, rely on per-tool human-approval flags ad hoc. Rejected — already failing today (we have no flags). Categorical taxonomy is cleaner than per-tool flags.
- Make
rationaleoptional. Rejected — optional fields are silently skipped. Required is the only enforcement that survives. - Use OpenAI’s standard “tool risk” parameter (if it exists in MCP spec). MCP spec 2025-11-25 does not currently surface a standardized risk field. We use the V5-internal pattern; if MCP spec adds one in 2026, migrate.
- Build cheatsheets dynamically (LLM-generated from API docs). Rejected for v1 — manual cheatsheets are higher quality and only need to be written once per provider. Revisit if cheatsheets get out of sync with provider versions.
- Approval flow via Slack interactive message instead of admin endpoint. Acceptable enhancement. Phase 2 ships the admin endpoint; Slack interactivity can be added in a follow-up if Mishaal wants approve-from-mobile.
Reversal criteria
rationaleregret signal: <5% of calls have meaningful rationales after 30 days, AND clustering rationales produces no actionable product insights. Then make rationale optional and downgrade tokv_audit-only logging.submit_feedbackregret signal: <10 unique feedback submissions per month after 90 days. Then deprecate the tool (don’t delete the table).- Risk-rating regret signal: >50% of high-risk calls trigger the approval flow but get unanimously approved within 5 min. Then either re-tier those tools or default
auto_approve_high_risk: trueand document.
Acceptance
Mishaal Murawala approves all four decisions and authorizes Phase 1 + Phase 2 implementation upon plan-first PR merge.