Vectorize Namespace Registry

ENGINEERING_STANDARD §OO-EngStd-002 — Every Vectorize index must appear here with its binding name, dimensions, metric, isolation model, and owner.

Generated from codebase audit 2026-05-09. Canonical source: this file.

Index summary

Index name	Binding	Dimensions	Metric	Isolation	Created by	Owner
`capability_index`	`CAPABILITY_INDEX`	1024	cosine	global	`wrangler vectorize create capability_index --dimensions=1024 --metric=cosine`	`src/lib/capability-retrieval.ts`
`memory-index`	`MEMORY_INDEX`	1024	cosine	per-tenant (metadata filter)	`wrangler vectorize create memory-index --dimensions=1024 --metric=cosine`	`src/lib/memory-patterns.ts`
`pattern-bank`	`PATTERN_INDEX`	1024	cosine	global	`wrangler vectorize create pattern-bank --dimensions=1024 --metric=cosine`	`src/cron/seed-pattern-bank.ts`
`client-knowledge`	`VECTORIZE_INDEX`	1024	cosine	per-tenant (metadata filter)	`wrangler vectorize create client-knowledge --dimensions=1024 --metric=cosine`	`src/tools/search-knowledge.ts`

Index details

`capability_index` — Tool capability embeddings

Binding: CAPABILITY_INDEX
Dimensions: 1024 — model @cf/baai/bge-m3 (parity with memory-index)
Metric: cosine
Isolation: Global — single namespace, no per-tenant partitioning. All tenants query the same index.
Purpose: ADR-042 capability catalog. Stores semantic embeddings of every tool capability entry so the gateway can retrieve relevant tools at runtime via retrieveCapabilities(). Powers the unbounded tool catalog (no hard ceiling on registered capabilities).
Vector ID schema: {tool_slug}:{capability_slug} (e.g. hubspot_crm:search_contacts)

Metadata schema:

{
  "toolName": "string",
  "action": "string",
  "description": "string",
  "category": "string",
  "phase": "number"
}

Write path: scripts/embed-capabilities.ts (manual re-run after config/capabilities/registry.yaml changes). CI runs verify-capability-registry.mjs to detect drift.
Read path: src/lib/capability-retrieval.ts → retrieveCapabilities(query, topK). Called from discover_apis and batch_execute tools.
Staging: Same global index reused for staging (wrangler.toml [[env.staging.vectorize]] points to capability_index).

`memory-index` — Per-tenant semantic memory

Binding: MEMORY_INDEX
Dimensions: 1024 — model @cf/baai/bge-m3
Metric: cosine
Isolation: Per-tenant via Vectorize metadata filter { tenant_id: "{tenantId}" } at query time. Shared physical namespace; logical isolation enforced in application layer.
Purpose: Long-term semantic memory for tenant conversations and learned patterns. Written by learnSemanticMemory() via ctx.waitUntil (non-blocking, cold path). Read by memory retrieval operations.
Vector ID schema: {tenant_id}:{timestamp_ms}:{hash8} — ensures no collisions across tenants or time.

Metadata schema:

{
  "tenant_id": "string",
  "content_type": "fact | preference | pattern | entity",
  "source": "string",
  "created_at": "ISO8601"
}

Write path: src/lib/memory-patterns.ts → learnSemanticMemory().
Read path: src/lib/memory-patterns.ts → recallSemanticMemory(tenantId, query, topK).
Tenant isolation invariant: Every query() call MUST include filter: { tenant_id: tenantId }. A missing filter exposes cross-tenant data. Enforced by recallSemanticMemory() wrapper — never call MEMORY_INDEX.query() directly.

`pattern-bank` — Harness pattern embeddings

Binding: PATTERN_INDEX
Dimensions: 1024
Metric: cosine
Isolation: Global — single namespace. Patterns are tool-agnostic quality exemplars, not tenant data.
Purpose: Quality evaluation patterns for the harness. Stores embedded examples of good/bad tool responses that the harness uses for few-shot comparison during evals.
Vector ID schema: pattern:{run_id}:{seq} — run_id from seed job, seq for ordering within a run.

Metadata schema:

{
  "tool_name": "string",
  "quality_label": "good | bad",
  "category": "string",
  "seeded_at": "ISO8601"
}

Write path: src/cron/seed-pattern-bank.ts — runs daily at 0 4 * * * (multiplexed). Idempotent via pattern_bank:seeded:{run_id} KV guard.
Read path: src/workflows/harness-investigate.ts, src/workflows/harness-autofix.ts.

`client-knowledge` — Client-specific knowledge base

Binding: VECTORIZE_INDEX
Dimensions: 1024
Metric: cosine
Isolation: Per-tenant via metadata filter { tenant_id: "{tenantId}" }. Same pattern as memory-index.
Purpose: Client-uploaded / ingested knowledge documents (product specs, playbooks, ICP profiles, competitive intel). Used by the search_knowledge MCP tool.
Vector ID schema: {tenant_id}:{doc_id}:{chunk_seq} — doc_id from ingestion pipeline, chunk_seq for multi-chunk documents.

Metadata schema:

{
  "tenant_id": "string",
  "doc_id": "string",
  "doc_title": "string",
  "chunk_seq": "number",
  "source": "upload | api | webhook",
  "created_at": "ISO8601"
}

Write path: Ingestion pipeline (future) + POST /admin/knowledge endpoint.
Read path: src/tools/search-knowledge.ts → search_knowledge MCP tool. Requires VECTORIZE_INDEX binding to be set.
Tenant isolation invariant: Same as memory-index — every query() call MUST include filter: { tenant_id: tenantId }. The search_knowledge tool enforces this via ctx.tenantId (Invariant 3 — tenant from context, never from args).

Adding a new index

Create the index: wrangler vectorize create {name} --dimensions=1024 --metric=cosine
Add [[vectorize]] block to wrangler.toml with binding name.
Add binding to Env interface in src/lib/types.ts.
Add a row + detail section to this file.
Decide isolation model:
- Global: No filter at query time. Use for tool catalogs, config embeddings.
- Per-tenant: Always filter { tenant_id: tenantId }. Use for any tenant data. Document the isolation invariant clearly and enforce via a wrapper function — never expose raw .query() to callers.
If per-tenant, add an isolation invariant to .claude/rules/v5-invariants.md if the pattern is new.

Vectorize Namespace Registry

Vectorize Namespace Registry

Index summary

Index details

capability_index — Tool capability embeddings

memory-index — Per-tenant semantic memory

pattern-bank — Harness pattern embeddings

client-knowledge — Client-specific knowledge base

Adding a new index

`capability_index` — Tool capability embeddings

`memory-index` — Per-tenant semantic memory

`pattern-bank` — Harness pattern embeddings

`client-knowledge` — Client-specific knowledge base