Vectorize Namespace Registry
Vectorize Namespace Registry
ENGINEERING_STANDARD §OO-EngStd-002 — Every Vectorize index must appear here with its binding name, dimensions, metric, isolation model, and owner.
Generated from codebase audit 2026-05-09. Canonical source: this file.
Index summary
| Index name | Binding | Dimensions | Metric | Isolation | Created by | Owner |
|---|---|---|---|---|---|---|
capability_index | CAPABILITY_INDEX | 1024 | cosine | global | wrangler vectorize create capability_index --dimensions=1024 --metric=cosine | src/lib/capability-retrieval.ts |
memory-index | MEMORY_INDEX | 1024 | cosine | per-tenant (metadata filter) | wrangler vectorize create memory-index --dimensions=1024 --metric=cosine | src/lib/memory-patterns.ts |
pattern-bank | PATTERN_INDEX | 1024 | cosine | global | wrangler vectorize create pattern-bank --dimensions=1024 --metric=cosine | src/cron/seed-pattern-bank.ts |
client-knowledge | VECTORIZE_INDEX | 1024 | cosine | per-tenant (metadata filter) | wrangler vectorize create client-knowledge --dimensions=1024 --metric=cosine | src/tools/search-knowledge.ts |
Index details
capability_index — Tool capability embeddings
- Binding:
CAPABILITY_INDEX - Dimensions: 1024 — model
@cf/baai/bge-m3(parity withmemory-index) - Metric: cosine
- Isolation: Global — single namespace, no per-tenant partitioning. All tenants query the same index.
- Purpose: ADR-042 capability catalog. Stores semantic embeddings of every tool capability entry so the gateway can retrieve relevant tools at runtime via
retrieveCapabilities(). Powers the unbounded tool catalog (no hard ceiling on registered capabilities). - Vector ID schema:
{tool_slug}:{capability_slug}(e.g.hubspot_crm:search_contacts) - Metadata schema:
{"toolName": "string","action": "string","description": "string","category": "string","phase": "number"}
- Write path:
scripts/embed-capabilities.ts(manual re-run afterconfig/capabilities/registry.yamlchanges).CIrunsverify-capability-registry.mjsto detect drift. - Read path:
src/lib/capability-retrieval.ts→retrieveCapabilities(query, topK). Called fromdiscover_apisandbatch_executetools. - Staging: Same global index reused for staging (
wrangler.toml[[env.staging.vectorize]]points tocapability_index).
memory-index — Per-tenant semantic memory
- Binding:
MEMORY_INDEX - Dimensions: 1024 — model
@cf/baai/bge-m3 - Metric: cosine
- Isolation: Per-tenant via Vectorize metadata filter
{ tenant_id: "{tenantId}" }at query time. Shared physical namespace; logical isolation enforced in application layer. - Purpose: Long-term semantic memory for tenant conversations and learned patterns. Written by
learnSemanticMemory()viactx.waitUntil(non-blocking, cold path). Read by memory retrieval operations. - Vector ID schema:
{tenant_id}:{timestamp_ms}:{hash8}— ensures no collisions across tenants or time. - Metadata schema:
{"tenant_id": "string","content_type": "fact | preference | pattern | entity","source": "string","created_at": "ISO8601"}
- Write path:
src/lib/memory-patterns.ts→learnSemanticMemory(). - Read path:
src/lib/memory-patterns.ts→recallSemanticMemory(tenantId, query, topK). - Tenant isolation invariant: Every
query()call MUST includefilter: { tenant_id: tenantId }. A missing filter exposes cross-tenant data. Enforced byrecallSemanticMemory()wrapper — never callMEMORY_INDEX.query()directly.
pattern-bank — Harness pattern embeddings
- Binding:
PATTERN_INDEX - Dimensions: 1024
- Metric: cosine
- Isolation: Global — single namespace. Patterns are tool-agnostic quality exemplars, not tenant data.
- Purpose: Quality evaluation patterns for the harness. Stores embedded examples of good/bad tool responses that the harness uses for few-shot comparison during evals.
- Vector ID schema:
pattern:{run_id}:{seq}— run_id from seed job, seq for ordering within a run. - Metadata schema:
{"tool_name": "string","quality_label": "good | bad","category": "string","seeded_at": "ISO8601"}
- Write path:
src/cron/seed-pattern-bank.ts— runs daily at0 4 * * *(multiplexed). Idempotent viapattern_bank:seeded:{run_id}KV guard. - Read path:
src/workflows/harness-investigate.ts,src/workflows/harness-autofix.ts.
client-knowledge — Client-specific knowledge base
- Binding:
VECTORIZE_INDEX - Dimensions: 1024
- Metric: cosine
- Isolation: Per-tenant via metadata filter
{ tenant_id: "{tenantId}" }. Same pattern asmemory-index. - Purpose: Client-uploaded / ingested knowledge documents (product specs, playbooks, ICP profiles, competitive intel). Used by the
search_knowledgeMCP tool. - Vector ID schema:
{tenant_id}:{doc_id}:{chunk_seq}— doc_id from ingestion pipeline, chunk_seq for multi-chunk documents. - Metadata schema:
{"tenant_id": "string","doc_id": "string","doc_title": "string","chunk_seq": "number","source": "upload | api | webhook","created_at": "ISO8601"}
- Write path: Ingestion pipeline (future) +
POST /admin/knowledgeendpoint. - Read path:
src/tools/search-knowledge.ts→search_knowledgeMCP tool. RequiresVECTORIZE_INDEXbinding to be set. - Tenant isolation invariant: Same as
memory-index— everyquery()call MUST includefilter: { tenant_id: tenantId }. Thesearch_knowledgetool enforces this viactx.tenantId(Invariant 3 — tenant from context, never from args).
Adding a new index
- Create the index:
wrangler vectorize create {name} --dimensions=1024 --metric=cosine - Add
[[vectorize]]block towrangler.tomlwith binding name. - Add binding to
Envinterface insrc/lib/types.ts. - Add a row + detail section to this file.
- Decide isolation model:
- Global: No filter at query time. Use for tool catalogs, config embeddings.
- Per-tenant: Always filter
{ tenant_id: tenantId }. Use for any tenant data. Document the isolation invariant clearly and enforce via a wrapper function — never expose raw.query()to callers.
- If per-tenant, add an isolation invariant to
.claude/rules/v5-invariants.mdif the pattern is new.