Multi-Tenant Data Classification and Reuse Policy

ADR-045 — Multi-Tenant Data Classification and Reuse Policy

Status: Accepted
Date: 2026-05-07
Decider: Mishaal Murawala (delegated engineering judgment to Claude Code as engineering lead)
Supersedes: none
Related: ASCEND_OPERATOR_OS_VISION.md, ASCEND_OPERATOR_OS_ENGINEERING_STANDARD.md, ADR-035, ADR-039

Context

The Operator OS hosts data from three distinct buyer types with conflicting reuse expectations:

Portcos (B2B SaaS / services companies inside a fund) — paying for a tenant agent that operates on their CRM, calls, emails, prospects. Their data is competitively sensitive to other portcos in the same fund and to other tenants entirely.
PE funds (Operating Partners) — paying for cross-portco aggregates, anonymized benchmarks, pattern recognition. They expect to see their portfolio’s aggregates, not other funds’ portfolios. They expect to contribute to anonymized cross-platform benchmarks but never to leak portco identity.
Ascend (the platform itself) — operates the system, needs platform-level operating data (latency, cost, error rates, capability priors) to run the business and improve the product.

Without an enforced classification policy:

Tenant data leaks across tenants via shared Vectorize namespaces or D1 views.
Pattern banks reverse-map to source portcos (a portco competitor identifies a portco’s customer list).
Fund tenants gain visibility into other funds.
Platform analytics surface tenant-private data verbatim in dashboards or capability priors.

The Operator OS Vision document calls out these failure modes implicitly. Multi-model review (GPT-5.5 / Gemini 3.1 Pro / Kimi K2.6) flagged the absence of a classification contract as the single highest-risk governance gap. This ADR closes it.

Decision

Every datum the platform stores or moves carries one of four classification labels. The label governs storage location, access path, and reuse rights.

The four classes

Class	Description	Storage	Reuse policy
`tenant_private`	Single portco’s CRM rows, calls, emails, prospects, drafts, conversations, OAuth tokens, agent run history, episodic memories, semantic memories.	Per-tenant D1 rows (filtered by `tenant_id`); per-tenant KV prefix; per-tenant Vectorize namespace.	Used to serve that tenant. Never copied to another tenant. Never used to train cross-tenant models without anonymization step (rule 4 below).
`fund_private`	A PE fund’s portfolio-level aggregates for their own portfolio: metric rollups, cross-portco views, fund-internal pattern recognition.	Fund-tenant D1 view + Vectorize namespace. Underlying portco data must already be aggregated/anonymized before fund tenant has read access.	Fund-internal only. Never shared with other fund tenants. Never returned to portcos (a portco never sees fund-level aggregates).
`anonymized_benchmark`	Cross-tenant patterns derived through an explicit anonymization pipeline: industry benchmarks, playbook libraries, anonymized tactic banks, capability-index outcome priors. Must not contain PII, account names, or other reverse-mappable identifiers.	Cross-tenant Vectorize namespace (`pattern_bank`, `capability_index`); cross-tenant D1 cold-path tables prefixed `anonymized_*`.	Reusable across all tenants. Cannot be reverse-mapped to source tenant. Anonymization is a Workflow step, audited and idempotent.
`ascend_platform_metric`	Platform operating data: gateway latency, error counts per provider, agent run counts, cost-per-run aggregates, Workflow durations, eval scores per agent type.	Cross-tenant D1 (`error_ledger`, `decision_log`, `kv_audit`, `agent_runs` aggregates), KV (`capability_index:{tool_name}`).	Internal Ascend operating only. Surfaced only in aggregate dashboards. Never returned to a tenant verbatim with another tenant’s identity attached.

Enforcement contract

Storage-layer label. Every D1 column carrying tenant data has its class documented in the migration file’s header comment. Every Vectorize namespace declared in wrangler.toml has its class in the comment beside the binding. Every KV prefix in the registry has a class.
Code-path label. Every function that reads or writes data takes a typed DataClass parameter or has the class encoded in its name (e.g., writeAnonymizedBenchmark() vs. writeTenantPrivateMemory()). Mixed-class reads are forbidden without an explicit anonymization Workflow.
Anonymization is a Workflow, not a query. Promoting tenant_private to anonymized_benchmark happens only via a named Workflow with idempotency key, output-schema validation, and audit log entry. The Workflow strips PII, removes account-level identifiers, k-anonymizes, and writes to the cross-tenant store. The Workflow’s source code is the auditable artifact.
Cross-tenant test required. Every new agent surface ships with a test proving tenant A cannot read tenant B’s tenant_private data even when the agent’s code attempts to. The test exercises the actual auth + storage path.
Fund-to-portco isolation. Fund tenants reading fund_private views only see aggregates pre-computed by the anonymization Workflow over their portfolio. Fund tenants cannot query tenant_private rows directly, even for portcos they own.
Default is tenant_private. When in doubt during development, default to tenant_private. Promoting later requires an ADR review.

Operational rules

kv_audit D1 table records every admin write that crosses a class boundary (e.g., adding a new portco to a fund-tenant’s portfolio-membership table).
Quarterly: classification audit. Run a script that samples every D1 table and Vectorize namespace, asserts the class label, alerts on drift.
Slack alert on any code path that reads two tenants’ tenant_private data in a single Worker invocation.

Alternatives considered

Single tenant_id-scoped model with no class labels. Insufficient: doesn’t distinguish between fund-internal aggregates (which legitimately span portcos within a fund) and cross-platform pattern banks (which span funds). Would either over-restrict fund Operating Partner workflows or under-protect cross-tenant pattern reuse.
Per-tenant separate Cloudflare accounts. Operationally infeasible for the multi-portco-per-fund growth path. Each new portco requires a fresh CF account, which breaks fund-level cross-portco views by construction.
Encrypt at rest with per-tenant keys. Doesn’t solve the access-path problem (decrypted data still flows through cross-tenant code paths during runtime). Adds operational overhead without closing the leak risk this ADR addresses. Reversal trigger: a paying client requires per-tenant key isolation in their security review.

Consequences

Wins

Tenants can be assured their CRM/calls data does not leak to a competitor portco.
Fund tenants can be assured their portfolio aggregates do not leak to other funds.
Ascend can build cross-tenant pattern banks that compound platform value without breaching tenant trust.
Every data movement between classes is auditable.

Costs

Engineering surface area: every new D1 table, KV prefix, Vectorize namespace requires a class label and a registry entry.
Anonymization Workflows must be authored and maintained per pattern type (e.g., one for SDR drafts, one for pipeline-reviewer summaries, one for Op Partner brief patterns).
Quarterly audit script is operational overhead.

Open items (tracked as tech debt)

docs/architecture/KV_KEY_REGISTRY.md — to be created when first Operator OS Q1 KV prefix lands.
docs/architecture/VECTORIZE_NAMESPACE_REGISTRY.md — same.
scripts/audit-data-classification.ts — quarterly audit script.
First anonymization Workflow (anticipated: SDR Agent draft → pattern bank) — Track D Phase D.3 in Q1 plan.

Reversal criteria

This ADR is reversed only if:

A regulatory regime (e.g., EU AI Act) requires a stricter scheme — then we extend, not retreat.
An external audit finds a class boundary insufficient — then we tighten, not loosen.

The ADR is never reversed by reducing classes or weakening enforcement. Tenant trust is the platform’s load-bearing asset.