Stage: LiveSend your GxP data to AI without exposing it to plaintext — even during inference.
What it is
Hardware-encrypted AI inference engine for pharma manufacturing. Every 9robots solution runs on it. Mathematical data sovereignty — no plaintext for us to produce.
How encryption works
AMD SEV-SNP encrypts CPU memory in silicon; NVIDIA Confidential Computing extends it to the GPU; L4 routing keeps TLS inside the trust boundary. Zero exposed plaintext steps.
vs the alternatives
PortKey / Dataiku gateways terminate TLS in plaintext. Bedrock-Mantle isolates by software but leaves GPU memory plaintext. ColdVault is the encrypted gateway.
Per-request attestation
Customers verify the silicon themselves. Your nonce, our signed report, Google's OIDC JWT — proof on every request, not a promise once a quarter.
Inference modes & multi-model
Vault Mode (encrypted inference, recommended for GxP) and Partner Mode (proprietary models, your choice). Multi-model debate raises the floor of reliability.
Compliance & deployment
ALCOA+, Annex 11, 21 CFR Part 11. Designed against draft Annex 22. Cloud (recommended default) or Sovereign deployment (our GPU hardware, your racks, air-gapped).
What it is
Whether you send data directly to Claude, Gemini, GPT, or through gateways like Dataiku or PortKey, delivery is encrypted in transit. Then your data is decrypted at the entry point, processed in plaintext on the provider's infrastructure, and runs on GPUs where your prompts sit in memory as plain text.
Security incidents at AI providers aren't hypothetical — exposed prompts, leaked code, misconfigured caches. Different failures, one pattern: things get exposed that weren't meant to be. If your GxP data is sitting in plaintext when that happens, it can be exposed too.
The fix is to build systems where a mistake can't expose your data. ColdVault is an encrypted AI inference engine built on confidential computing principles — the confidential AI platform underneath every 9robots product.
ColdVault keeps your data in plaintext only inside CPU and GPU chips. Even the load balancer operates on encrypted traffic only. In RAM, on buses, in GPU memory, across the links between them — your data is never in plaintext. Human error or malicious access yields ciphertext, not plaintext. Even 9robots employees can't access your data — enforced by architecture and hardware, not only by company policy.
Confidential computing will become standard for AI. ColdVault is built that way today — for pharma teams whose compliance won't clear standard AI providers for production GxP data.
Mathematical data sovereignty: there is no plaintext for us to produce in response to a disclosure order.
| Architecture | What a disclosure order returns | Why |
|---|---|---|
| AI gateways & direct APIs | Full plaintext | Gateway and provider both process customer data in plaintext during routing and inference |
| AWS Bedrock-Mantle | Plaintext via GPU or software | GPU memory is plaintext at inference; CPU isolation is software-enforced and amendable under compulsion |
| ColdVault (SEV-SNP + CC) | Cryptographic noise | Zero exposed plaintext steps. Memory keys live inside AMD/NVIDIA silicon, not with 9robots or the cloud provider |
How encryption works
Three places standard architectures expose plaintext that ColdVault doesn't:
- TLS termination point. Most cloud APIs terminate TLS at the L7 load balancer — the prompt is plaintext on the load balancer before it reaches the workload. ColdVault uses L4 passthrough ingress; TLS terminates inside the SEV-SNP Confidential VM. The load balancer routes encrypted TCP without inspecting payload.
- Routing-layer memory. AI gateways inspect requests at L7 to do their job — routing, observability, billing — which structurally requires plaintext. ColdVault's routing happens inside the SEV-SNP CVM, in encrypted host memory.
- GPU memory at inference. Standard cloud GPUs hold model weights, KV cache, and activations in plaintext. AWS specifically cannot offer GPU encryption today (Nitro is not a CPU TEE; NVIDIA Confidential Computing requires a CPU TEE underneath). ColdVault runs on customer-dedicated GPUs in NVIDIA Confidential Computing mode — encrypted in HBM.
vs the alternatives
The realistic pharma baseline today. Most pharma calls into AI through a workflow gateway like PortKey or Dataiku, which routes to a provider's direct API. Three plaintext tiers: gateway terminates TLS for routing/audit; provider opens a fresh TLS for content filtering; provider's underlying cloud runs inference on standard GPUs with no hardware encryption. Four trust anchors deep — customer, gateway vendor, provider, provider's cloud — with zero in-use encryption.
Bedrock-Mantle — the strongest comparable. Anthropic Claude on AWS Bedrock runs inside Nitro Enclaves with explicit zero-operator-access. Real protection, but: memory protection is software-enforced (Nitro hypervisor), not silicon (AWS has no CPU TEE on any current EC2 type). And there is no GPU memory encryption — model weights, KV cache, and activations live in plaintext during inference.
How ColdVault collapses the choice. ColdVault is the gateway, and it runs inside a Confidential VM. The protection chain starts at our API endpoint, not at one provider's endpoint. Any open-weight model under encryption — on our GPUs in CC mode. Claude under encryption — routed upstream to Bedrock-Mantle from inside the Confidential VM. A model no provider offers under TEE — Partner Mode, explicit user choice, transparent.
| Layer | PortKey / Dataiku → provider | Bedrock-Mantle | ColdVault |
|---|---|---|---|
| TLS termination | At the L7 gateway | At Bedrock’s API edge | Inside the Confidential VM (L4 passthrough) |
| Customer → gateway / API | Gateway terminates TLS, sees plaintext | Direct to Bedrock; Mantle starts here | TLS to Confidential VM; encrypted host memory |
| Routing layer | Plaintext at gateway, plaintext to provider | Single-model, no routing | Routing inside the Confidential VM |
| CPU memory at inference | Plaintext on standard cloud | Software-enforced isolation (Nitro) | Silicon-encrypted (AMD SEV-SNP) |
| GPU memory at inference | Plaintext on standard GPUs | Plaintext (AWS lacks CPU TEE) | Silicon-encrypted (NVIDIA CC mode) |
| CPU↔GPU PCIe | Plaintext | Plaintext | Encrypted (PPCIE / TEE-I/O) |
| Operator path | Multiple operators see plaintext | No path by Nitro + Mantle design | No path by silicon |
| Trust anchors | 4 deep | 3 deep | Customer + 9R + GCP + AMD + NVIDIA silicon |
| Model choice | Whatever the gateway routes to | Claude only | Any open-weight on our GPUs; Claude / GPT / Gemini upstream |
| Per-request attestation | None | Internal (not exposed) | Exposed to client |
Per-request attestation — proof, not promise
Customers can verify the silicon themselves. Per-request attestation flow:
- Your client generates a 16-byte nonce and sends it with the inference request.
- ColdVault returns a signed attestation report (vTPM PCR quote with the nonce embedded), verified by the GCP Confidential Computing API and returned as a Google-signed OIDC JWT.
- Your client verifies the JWT against Google's public OIDC keys — confirming the proof is fresh (nonce echo) and the workload runs on real SEV-SNP silicon.
Auditors are welcome to verify. The attestation flow above is the same proof your QP, CSV, or external auditor can run against any production request.
Inference modes & multi-model
Vault Mode — encrypted inference on open models. Prompts, reasoning, and response all stay within encrypted memory. Recommended for GxP workloads and sensitive production data.
Partner Mode — access to proprietary models (Claude, GPT, Gemini, Grok). Data is sent to the model provider for processing — the encryption guarantee does not apply. You choose per request, for use cases where your compliance policies permit.
Multi-model debate. Multiple AI models discuss, disagree, and converge on an answer. The goal is defensive — raising the floor of reliability. On our live benchmark, aggregation catches roughly four errors for every one it introduces (Safety Multiple 3.9× on GPQA Diamond); for 98% of questions, at least one model in the jury answers correctly — the architecture extracts that knowledge.
Purpose-trained models. We can fine-tune on your data for your specific tasks. First iteration of a custom pharma-domain model can be delivered within weeks once training data is available.
Compliance & deployment
ALCOA+, Annex 11, 21 CFR Part 11. Frozen model versions for validated workflows. Validated API endpoints for GxP. Designed against draft Annex 22 — full control over AI in validated workflows, from model weights to underlying infrastructure.
Cloud — hardware-encrypted inference via API. Elastic scale, confidential compute, immediate deployment. The recommended default. Sovereign deployment — we bring our own GPU hardware into your server racks and manage everything end-to-end. Your data never leaves your network. Air-gapped. Purpose-trained models deployed on-site. In both cases: we own the hardware, we operate the stack; both options are designed against draft Annex 22.
See: Why this is durable (when competitors catch up)
The technical catch-up will happen. AWS will eventually add a CPU TEE; Bedrock will eventually expose customer attestation; PortKey or Dataiku will eventually announce a Confidential VM tier. But security is a side feature for them:
- Anthropic — Claude is the product; security is one feature.
- AWS — cloud is the product; Nitro is one isolation primitive.
- PortKey / Dataiku — observability and routing are the product; security is a tier they may add.
- ColdVault — security is the product. Multi-vendor inference is delivered through the security model.
When a new TEE primitive ships, ColdVault integrates it because it is on the critical path of the product roadmap, not a side feature. Defensible even after competitors catch up on any single feature.
See: Swiss jurisdiction + the honest residual
Swiss jurisdiction. 9robots is Swiss-headquartered. The CLOUD Act doesn't reach us; Switzerland has no equivalent legal regime forcing data disclosure to foreign governments. The cryptography would suffice on its own — silicon does not produce plaintext under any legal regime — but the jurisdiction reinforces what the technology already enforces.
Operational vs cryptographic protections. Operational-security failures happen even at frontier labs — packaging errors, deployment-script bugs, misconfigured access controls. These are caught after the fact. Hardware data-in-use protections cannot be misconfigured by a packaging error or a script change. In-use cryptographic protections close failure modes that operational controls cannot.
The honest residual. The “no plaintext to produce” claim holds against any normal compulsion: subpoena, NSL, court order, CLOUD Act, foreign legal regime. One residual: a nation-state with AMD's master attestation keys plus physical access to the silicon could theoretically extract plaintext from a memory dump. A meaningfully higher bar than typical compelled disclosure; it applies equally to all silicon-encrypted infrastructure, not just ours. We acknowledge it because honesty matters more than a clean marketing claim.
See: Open models that rival proprietary
Encrypted inference runs on open models that rival the best proprietary models, as evaluated by independent benchmarks and our own. The performance gap is narrow — and multi-model debate narrows it further. For pharma workloads where data must stay encrypted end-to-end, open models in Vault Mode meet the bar without compromising on capability.
