Hermes Agent Layer — Two-Track Architecture
ADR-031: Hermes Agent Layer — Two-Track Architecture
Status: Superseded by ADR-057 (2026-05-19 cutover) — Nous Hermes archived 2026-05-15; Hermes V3 is a headless claude -p wrapper at ~/.hermes-v3/. See ADR-057.
Date: 2026-04-27
Author: Claude Code (claude/suspicious-brahmagupta-f07de4)
Supersedes: —
Superseded by: ADR-057
Related: ADR-016 (Context Plane), ADR-024 (OAuth 2.1), ADR-027 (LLM routing)
Context
Ascend GTM needs a natural-language agent interface that exposes V5’s 28 MCP tools to (a) Mishaal personally for GTM operations and (b) clients for a co-pilot product. Three options were evaluated:
-
Hermes Agent (Nous Research, v0.11.0) — open-source self-hosted agent runtime. Ships with multi-platform messaging (Telegram, Slack, WhatsApp), persistent memory, self-improving skills, and native Hindsight memory provider support.
-
CF Agents SDK native — Cloudflare’s
McpAgent/ Agents SDK (Project Think, GA April 2026). Stateful via Durable Objects + SQLite, multi-tenant by design, WebSocket real-time, embeds natively into V5. -
Anthropic Routines — scheduled one-shot Claude reasoning tasks. Not conversation-stateful. Orthogonal to both options above.
A fourth option (custom UI built from scratch) was rejected as unnecessary given the above.
Decision
Two-track architecture. Neither track replaces the other.
Track A — Hermes for Mishaal (personal agent, single-tenant)
Hermes Agent deployed at ~/hermes-personal/ (Mac-local, isolated from repo). Connects to V5 /mcp as its sole tool source. LLM inference routes through V5’s new /v1/chat/completions passthrough (see below), preserving invariant #12. Hindsight memory provider configured with bank_id: mishaal (existing bank). Full-trust single-user deployment — intentionally single-tenant per Hermes SECURITY.md design.
Purpose: Mishaal’s personal GTM operations assistant. Also serves as the learning environment that validates skill designs and interaction patterns before they inform Track B.
Track B — CF Agents SDK for client-facing product (ascend-agent-worker)
New third plane: ascend-agent-worker. This Worker provides multi-tenant stateful agent sessions for clients via DO-backed conversation state, exposes /v1/agents/{tenant}/chat, and delivers responses via Slack/Telegram adapters. LLM calls route through V5’s AI Gateway (invariant #12 preserved). Hindsight memory via existing MCP binding, per-client bank_id.
Purpose: Client-facing GTM co-pilot product. Native multi-tenant with cryptographic isolation via V5’s existing auth model.
V5 LLM Passthrough (prerequisite for both tracks)
New endpoint: POST /v1/chat/completions in the gateway Worker.
- Accepts OpenAI-compatible request format (Hermes’ default API format)
- Authenticates via standard V5 Bearer token (same KV hash lookup as
/mcp) - Routes to AI Gateway → Anthropic (invariant #12 preserved)
- Includes
tenant_idincf-aig-metadatafor per-tenant cost attribution - Returns OpenAI-compatible response (Hermes consumes this natively)
This endpoint means ALL Hermes LLM calls flow through V5’s AI Gateway, giving full cost observability per tenant.
Invariant compliance
| Invariant | Track A | Track B | LLM Passthrough |
|---|---|---|---|
| #1 Two-plane | ⚠️ Hermes is external, not a plane | ✅ New plane, this ADR authorizes it | ✅ Adds to existing gateway |
| #2 KV-only hot path | ✅ V5 handles this | ✅ V5 handles this | ✅ Auth is KV lookup only |
| #3 Fail-fast | ✅ | ✅ | ✅ AbortController 30s |
| #4 OAuth 2.1 on MCP | ✅ Hermes uses Bearer | ✅ | N/A |
| #5 No external token vendors | ✅ | ✅ | ✅ |
| #6 Request path never touches DO | ✅ | ✅ Sessions use DO but not on hot path | ✅ |
| #7 ≤35 MCP tools | N/A | N/A | N/A |
| #10 ≤10ms overhead | ✅ V5 handles this | ✅ | ✅ |
| #11 30s timeout | ✅ | ✅ | ✅ AbortController |
| #12 All LLM via AI Gateway | ✅ Via /v1 passthrough | ✅ Native routing | ✅ This IS the fix |
| #13 Admin behind CF Access | ✅ | ✅ | ✅ /v1 is NOT admin |
| #15 KV/D1/Vectorize/R2/GitHub as sources | ✅ | ✅ | ✅ |
Invariant #1 note: ascend-agent-worker as a third plane requires this ADR explicitly (per invariant #1 text: “A third plane is forbidden without an ADR”). This ADR is that authorization.
Rejected alternatives
Hermes for clients (single Hermes for both tracks): Rejected. Hermes SECURITY.md explicitly states single-tenant design. Profile-based isolation on a shared VPS is configuration-level, not cryptographic. V5’s client data isolation is contractual — requires cryptographic guarantees. Hermes’ AGENTS.md/CLAUDE.md auto-discovery would read V5 infrastructure details into client-facing agent context if deployed near the repo.
CF Agents SDK for both (skip Hermes entirely): Rejected. Track A (Mishaal personal) would lose Hermes’ multi-platform messaging, skills runtime, and Hindsight provider — all of which are valuable for the single-user case. Track A also serves as the proving ground for Track B’s skill designs.
Custom UI from scratch: Rejected. Both tracks are available via existing tools. Custom UI is a distraction without validated interaction patterns first.
Consequences
- Hermes is NOT a component of V5 — it’s a consumer of V5’s MCP and LLM passthrough endpoints.
ascend-agent-workeris a new CF Worker in this repo (agent-worker/directory), deployed separately from the gateway.- The LLM passthrough creates a new route group
/v1/in the gateway. CORS: permissive (same as/api/*). - Track A ships first (~4h). Track B ships after Track A validates skill patterns (~8-10h additional).
- Anthropic Routines remain the mechanism for scheduled one-shot Claude reasoning tasks (daily briefings, weekly summaries). They are orthogonal to both tracks and are not replaced.
- Hindsight bank governance: per-client banks (
{client}-agent) are isolated;ascend-gtm-playbookis a shared read-only bank curated by Mishaal.
Reversal criteria
This ADR is reversed and Track B is shut down if:
- CF Agents SDK’s multi-turn stateful sessions cannot maintain Hindsight context injection with <200ms overhead
- ascend-agent-worker’s DO-backed session cost exceeds $50/mo at 5 clients with 20 conversations/day
- Tenant isolation audit finds cross-tenant memory bleed in Hindsight banks
Track A (Hermes personal) is indefinitely retained regardless — it’s a personal productivity tool with no client data involved.