Tenant Knowledge Layer + Brand Foundation (KBL + BF)

ADR-035: Tenant Knowledge Layer + Brand Foundation (KBL + BF)

Status: Proposed Date: 2026-05-01 Author: Claude Code (claude/quality-harness-and-knowledge-layer) Supersedes: — Related: ADR-016 (Context Plane), ADR-028 (BGE-M3 embeddings), ADR-031 (Hermes), ADR-034 (Harness) Plan: V5-QUALITY-HARNESS-AND-KNOWLEDGE-LAYER.md

Context

Today V5 has two persistent knowledge stores:

KV — config, tokens, tenant metadata. Hot path.
Vectorize (client-knowledge, 768-dim cosine) — RAG-style embedding index over client docs, queried via search_knowledge.
D1 entity/fact tables (context-worker) — structured GTM knowledge (accounts, opps, signals).

Every customer who connects to V5 gets a clean-slate agent for content generation. Their voice rules, banned phrases, positioning, ICP, internal terminology — none of it is encoded anywhere V5 reads. Every conversation starts cold. Every output drifts toward generic AI prose.

Two empirical findings shift the math:

Karpathy (Apr 2026): at ~100 articles / ~400K words, a compiled markdown wiki + index.md outperforms RAG for Q&A. Graphify measured 71.5× fewer tokens per query vs raw-file search. Most client knowledge bases live exactly in this regime. RAG is overkill.
Shann Holmberg (Apr 2026): the highest-leverage layer is not the dynamic knowledge base (KBL) — it’s the static, human-edited Brand Foundation (BF). Voice rules. Banned words. Positioning. ICP. The agents read it before producing anything but never rewrite it. This is what keeps output sounding like the customer.

The combination — KBL (LLM-maintained) + BF (human-maintained) — is the architecture validated by Shann’s $1.5K-$3K + $300-$500/mo retainer service. There is a productizable wedge here for V5 directly.

Decision

Add a per-tenant Knowledge Layer to V5, structured as the two layers Shann describes, stored in primitives V5 already operates.

Layer 1 — Brand Foundation (static, human-edited)

Storage. KV key brand_foundation:{tenant}. Schema:

interface BrandFoundation {
  voice_rules: string[];           // "Never use exclamation marks", "Use 'we' not 'I'"
  banned_phrases: string[];        // "best in class", "synergize"
  banned_words: string[];          // "innovative", "leverage"
  positioning: string;             // 1-3 paragraphs
  icp_summary: string;             // 1 paragraph
  banned_competitors_to_mention: string[];
  house_style_examples: string[];  // 2-5 short examples of "this is how we sound"
  updated_at: string;              // ISO
  updated_by: string;              // email or service principal
}

Mutation. Admin endpoints GET/PUT /admin/brand-foundation/{tenant}. CF Access gated. No tool can mutate BF — only humans (or the human-equivalent service principal).
Reads. Every content-generating tool (claude, gemini_invoke, llm_invoke, aws_bedrock_invoke, gamma_generate) reads BF at call time and injects it as a system-prompt suffix. Read budget ≤1ms (KV).
Default behavior. New tenants get an empty BF object ({voice_rules: [], ...}). Tools fall back to no BF suffix when fields are empty. Never blocks a call.

Layer 2 — Knowledge Base Layer (dynamic, LLM-maintained)

Storage. R2 prefix wiki/{tenant}/. Markdown files. Layout:

wiki/{tenant}/
  raw/                # immutable source dump (LLM never modifies)
    clippings/
    ideas/
    bookmarks/
    articles/
    papers/
    transcripts/
  wiki/
    index.md          # master index with TLDRs (LLM-maintained)
    log.md            # append-only changelog
    concepts/         # ideas, frameworks, topics
    entities/         # people, companies, tools
    sources/          # one summary per raw source
    outputs/          # filed answers to client_wiki_query calls
    syntheses/        # cross-cutting analyses

Why R2 not D1. Wikis are read-heavy + version-controlled-via-git-style. R2 cost-per-GB is two orders of magnitude lower than D1. Wikis are intentionally NOT in the hot request path — the access pattern is “calling agent invokes client_wiki_query tool, tool reads R2, returns synthesized answer.” Cold path. Invariant 2 preserved.
Why not Vectorize. Vectorize remains for embedded fact retrieval (the existing search_knowledge tool). The wiki is a parallel, complementary artifact. Karpathy’s data shows compiled wikis beat RAG at sub-300-page scale; we add qmd (BM25 + vector + LLM rerank) as an opt-in tool for tenants whose wiki crosses 300 pages.

Tools (4 net new)

client_wiki_ingest — takes a source (URL / raw text / R2 file ID), classifies it, summarizes it, writes a sources/ page, updates index.md, updates relevant entities/ and concepts/ pages, appends log.md. One call typically touches 10-15 wiki pages. Risk: medium.
client_wiki_query — reads index.md, picks relevant pages, synthesizes a cited answer. Output formats: markdown, comparison table, summary with sources. Files the answer back to outputs/. Risk: low.
client_wiki_lint — health-check pass. Finds contradictions, orphans, stale claims, missing cross-refs, undocumented concepts. Outputs lint-report.md. Risk: low.
submit_feedback is not part of this ADR — that’s ADR-036. Counted in this ADR’s tool budget anyway because they ship together: 31 + 4 = 35, exactly at the invariant-7 ceiling.

Wiki page hygiene (Shann + Karpathy combined patterns)

Every wiki page generated by client_wiki_ingest MUST have:

explored: false frontmatter (validation gate). Flipped to true only by an admin endpoint with CF Access.
confidence: high | medium | low | uncertain.
A ## Counter-arguments section.
A ## Data gaps section.
A ## Sources section with backlinks to raw/ files.

Every page lives in git-style version control via R2’s eventual immutability + a prior_versions/ archive prefix per page.

Lint cron

src/cron/wiki-lint.ts runs weekly per active tenant. Posts diff to Slack #v5-knowledge-layer if material findings (new contradictions, new orphans, >5 stale claims).

Productization (commercial path)

V5 Pro tier:

Setup: $1.5K-$3K (one-time). Includes: BF authored from a 1-hr discovery call, first 50-source ingest, first lint cycle, agent demo.
Monthly: $300-$500. Includes: ongoing ingest from connected sources (Slack, Gmail, Notion, Drive via existing tools), weekly lint, monthly synthesis report.
Mishaal owns pricing + sales motion. ADR-035 ships only the platform capability.

Consequences

Positive

Solves the cold-start problem at the customer level. Every client interaction inherits BF + wiki context.
Productizable wedge. Direct revenue line tied to a platform capability competitors can’t copy without building the same two layers.
Compounds. Every conversation can client_wiki_query and the answer gets filed back. Wiki gets richer with use.
R2 is dirt cheap. A 500-source wiki is ~10MB = $0.00015/mo per tenant. 100 tenants ≈ $0.015/mo total storage cost.
Karpathy / Shann pattern is empirically validated. We’re not inventing — we’re packaging a proven pattern at platform scale.
Composes with ADR-016 (context-worker). Vectorize/D1 stays for structured GTM facts. Wiki is for unstructured knowledge. They are complementary, not competing.

Negative

Tool count hits the ceiling. This ADR + ADR-036 add exactly 4 tools, taking us to 35/35. Future tools require ADR + a deprecation. Hard ceiling.
Ingest cost. Each client_wiki_ingest call costs ~$0.05-$0.15 in LLM inference. Capped per-call at $0.15. Per-tenant monthly budget capped at $20.
R2 read latency (~50-200ms) makes wiki query slower than KV/Vectorize lookups. Acceptable: wiki query is a deliberate, high-value action — not a hot-path read.
Manual BF authoring is a service overhead (Mishaal-time during the discovery call). Acceptable as it’s the paid-tier setup cost.

Neutral

New R2 prefix per tenant. No new bucket needed (uses existing ASCEND_BACKUP bucket).
New KV key per tenant: brand_foundation:{tenant}.
New D1 columns on kv_audit to track BF mutations: handled by existing audit pipeline.

Alternatives considered

Use Vectorize for everything (extend search_knowledge). Rejected — Karpathy’s data: at <300 pages, compiled wiki beats RAG by an order of magnitude in tokens-per-query. Vectorize stays for the use case it’s good at (semantic facts retrieval at scale).
Store wiki in D1 instead of R2. Rejected — D1 row cost + write costs are unfriendly for many-small-markdown-files workload. R2 cost is 100× lower.
Use the context-worker D1 facts table for everything. Rejected — that table is for structured entity-fact-source triples (per ADR-016). Free-form wiki content doesn’t fit. The two are complementary.
Skip BF, only ship KBL. Rejected — Shann’s empirical claim is that BF (the static layer) does most of the work for content quality. KBL without BF still produces generic-feeling output.
Skip KBL, only ship BF. Acceptable as Phase 3 wedge. Plan does exactly this — Phase 3 ships BF only; Phase 7 ships full KBL. Ordering matches the leverage curve.
Build a UI for wiki management. Rejected for v1 — Shann’s pattern is that humans never edit the wiki, the LLM does. Mishaal/clients edit via tool calls or Obsidian against R2-mounted markdown. Add UI only if a paying client requests it.

Reversal criteria

Roll back if:

After 6 months of V5 Pro tier availability, fewer than 3 paying tenants subscribe. Then deprecate the KBL tools (keep BF, since it costs near-zero to maintain).
Per-tenant ingest cost exceeds $50/mo for >50% of active tenants. Then reduce default ingest model from Sonnet to Haiku.
R2 storage exceeds 100 GB across all tenants. Then introduce a per-tenant 1 GB cap with archive-to-cold-storage policy.

Acceptance

Mishaal Murawala approves the architecture and authorizes Phase 3 (BF wedge) for immediate implementation upon plan-first PR merge. Phase 7 (full KBL) gated on Phase 3 + Phase 4 (Grader) shipping first.