Skip to content

Hermes Agent Layer — Two-Track Architecture

ADR-031: Hermes Agent Layer — Two-Track Architecture

Status: Superseded by ADR-057 (2026-05-19 cutover) — Nous Hermes archived 2026-05-15; Hermes V3 is a headless claude -p wrapper at ~/.hermes-v3/. See ADR-057. Date: 2026-04-27
Author: Claude Code (claude/suspicious-brahmagupta-f07de4)
Supersedes:
Superseded by: ADR-057 Related: ADR-016 (Context Plane), ADR-024 (OAuth 2.1), ADR-027 (LLM routing)


Context

Ascend GTM needs a natural-language agent interface that exposes V5’s 28 MCP tools to (a) Mishaal personally for GTM operations and (b) clients for a co-pilot product. Three options were evaluated:

  1. Hermes Agent (Nous Research, v0.11.0) — open-source self-hosted agent runtime. Ships with multi-platform messaging (Telegram, Slack, WhatsApp), persistent memory, self-improving skills, and native Hindsight memory provider support.

  2. CF Agents SDK native — Cloudflare’s McpAgent / Agents SDK (Project Think, GA April 2026). Stateful via Durable Objects + SQLite, multi-tenant by design, WebSocket real-time, embeds natively into V5.

  3. Anthropic Routines — scheduled one-shot Claude reasoning tasks. Not conversation-stateful. Orthogonal to both options above.

A fourth option (custom UI built from scratch) was rejected as unnecessary given the above.


Decision

Two-track architecture. Neither track replaces the other.

Track A — Hermes for Mishaal (personal agent, single-tenant)

Hermes Agent deployed at ~/hermes-personal/ (Mac-local, isolated from repo). Connects to V5 /mcp as its sole tool source. LLM inference routes through V5’s new /v1/chat/completions passthrough (see below), preserving invariant #12. Hindsight memory provider configured with bank_id: mishaal (existing bank). Full-trust single-user deployment — intentionally single-tenant per Hermes SECURITY.md design.

Purpose: Mishaal’s personal GTM operations assistant. Also serves as the learning environment that validates skill designs and interaction patterns before they inform Track B.

Track B — CF Agents SDK for client-facing product (ascend-agent-worker)

New third plane: ascend-agent-worker. This Worker provides multi-tenant stateful agent sessions for clients via DO-backed conversation state, exposes /v1/agents/{tenant}/chat, and delivers responses via Slack/Telegram adapters. LLM calls route through V5’s AI Gateway (invariant #12 preserved). Hindsight memory via existing MCP binding, per-client bank_id.

Purpose: Client-facing GTM co-pilot product. Native multi-tenant with cryptographic isolation via V5’s existing auth model.

V5 LLM Passthrough (prerequisite for both tracks)

New endpoint: POST /v1/chat/completions in the gateway Worker.

  • Accepts OpenAI-compatible request format (Hermes’ default API format)
  • Authenticates via standard V5 Bearer token (same KV hash lookup as /mcp)
  • Routes to AI Gateway → Anthropic (invariant #12 preserved)
  • Includes tenant_id in cf-aig-metadata for per-tenant cost attribution
  • Returns OpenAI-compatible response (Hermes consumes this natively)

This endpoint means ALL Hermes LLM calls flow through V5’s AI Gateway, giving full cost observability per tenant.


Invariant compliance

InvariantTrack ATrack BLLM Passthrough
#1 Two-plane⚠️ Hermes is external, not a plane✅ New plane, this ADR authorizes it✅ Adds to existing gateway
#2 KV-only hot path✅ V5 handles this✅ V5 handles this✅ Auth is KV lookup only
#3 Fail-fast✅ AbortController 30s
#4 OAuth 2.1 on MCP✅ Hermes uses BearerN/A
#5 No external token vendors
#6 Request path never touches DO✅ Sessions use DO but not on hot path
#7 ≤35 MCP toolsN/AN/AN/A
#10 ≤10ms overhead✅ V5 handles this
#11 30s timeout✅ AbortController
#12 All LLM via AI Gateway✅ Via /v1 passthrough✅ Native routing✅ This IS the fix
#13 Admin behind CF Access✅ /v1 is NOT admin
#15 KV/D1/Vectorize/R2/GitHub as sources

Invariant #1 note: ascend-agent-worker as a third plane requires this ADR explicitly (per invariant #1 text: “A third plane is forbidden without an ADR”). This ADR is that authorization.


Rejected alternatives

Hermes for clients (single Hermes for both tracks): Rejected. Hermes SECURITY.md explicitly states single-tenant design. Profile-based isolation on a shared VPS is configuration-level, not cryptographic. V5’s client data isolation is contractual — requires cryptographic guarantees. Hermes’ AGENTS.md/CLAUDE.md auto-discovery would read V5 infrastructure details into client-facing agent context if deployed near the repo.

CF Agents SDK for both (skip Hermes entirely): Rejected. Track A (Mishaal personal) would lose Hermes’ multi-platform messaging, skills runtime, and Hindsight provider — all of which are valuable for the single-user case. Track A also serves as the proving ground for Track B’s skill designs.

Custom UI from scratch: Rejected. Both tracks are available via existing tools. Custom UI is a distraction without validated interaction patterns first.


Consequences

  • Hermes is NOT a component of V5 — it’s a consumer of V5’s MCP and LLM passthrough endpoints.
  • ascend-agent-worker is a new CF Worker in this repo (agent-worker/ directory), deployed separately from the gateway.
  • The LLM passthrough creates a new route group /v1/ in the gateway. CORS: permissive (same as /api/*).
  • Track A ships first (~4h). Track B ships after Track A validates skill patterns (~8-10h additional).
  • Anthropic Routines remain the mechanism for scheduled one-shot Claude reasoning tasks (daily briefings, weekly summaries). They are orthogonal to both tracks and are not replaced.
  • Hindsight bank governance: per-client banks ({client}-agent) are isolated; ascend-gtm-playbook is a shared read-only bank curated by Mishaal.

Reversal criteria

This ADR is reversed and Track B is shut down if:

  • CF Agents SDK’s multi-turn stateful sessions cannot maintain Hindsight context injection with <200ms overhead
  • ascend-agent-worker’s DO-backed session cost exceeds $50/mo at 5 clients with 20 conversations/day
  • Tenant isolation audit finds cross-tenant memory bleed in Hindsight banks

Track A (Hermes personal) is indefinitely retained regardless — it’s a personal productivity tool with no client data involved.