9robots logo
Stage: Live

Send your GxP data to AI without exposing it to plaintext — even during inference.

What it is

Whether you send data directly to Claude, Gemini, GPT, or through gateways like Dataiku or PortKey, delivery is encrypted in transit. Then your data is decrypted at the entry point, processed in plaintext on the provider's infrastructure, and runs on GPUs where your prompts sit in memory as plain text.

Security incidents at AI providers aren't hypothetical — exposed prompts, leaked code, misconfigured caches. Different failures, one pattern: things get exposed that weren't meant to be. If your GxP data is sitting in plaintext when that happens, it can be exposed too.

The fix is to build systems where a mistake can't expose your data. ColdVault is an encrypted AI inference engine built on confidential computing principles — the confidential AI platform underneath every 9robots product.

ColdVault keeps your data in plaintext only inside CPU and GPU chips. Even the load balancer operates on encrypted traffic only. In RAM, on buses, in GPU memory, across the links between them — your data is never in plaintext. Human error or malicious access yields ciphertext, not plaintext. Even 9robots employees can't access your data — enforced by architecture and hardware, not only by company policy.

Confidential computing will become standard for AI. ColdVault is built that way today — for pharma teams whose compliance won't clear standard AI providers for production GxP data.

Mathematical data sovereignty: there is no plaintext for us to produce in response to a disclosure order.

ArchitectureWhat a disclosure order returnsWhy
AI gateways & direct APIsFull plaintextGateway and provider both process customer data in plaintext during routing and inference
AWS Bedrock-MantlePlaintext via GPU or softwareGPU memory is plaintext at inference; CPU isolation is software-enforced and amendable under compulsion
ColdVault (SEV-SNP + CC)Cryptographic noiseZero exposed plaintext steps. Memory keys live inside AMD/NVIDIA silicon, not with 9robots or the cloud provider

How encryption works

Three places standard architectures expose plaintext that ColdVault doesn't:

  1. TLS termination point. Most cloud APIs terminate TLS at the L7 load balancer — the prompt is plaintext on the load balancer before it reaches the workload. ColdVault uses L4 passthrough ingress; TLS terminates inside the SEV-SNP Confidential VM. The load balancer routes encrypted TCP without inspecting payload.
  2. Routing-layer memory. AI gateways inspect requests at L7 to do their job — routing, observability, billing — which structurally requires plaintext. ColdVault's routing happens inside the SEV-SNP CVM, in encrypted host memory.
  3. GPU memory at inference. Standard cloud GPUs hold model weights, KV cache, and activations in plaintext. AWS specifically cannot offer GPU encryption today (Nitro is not a CPU TEE; NVIDIA Confidential Computing requires a CPU TEE underneath). ColdVault runs on customer-dedicated GPUs in NVIDIA Confidential Computing mode — encrypted in HBM.

vs the alternatives

The realistic pharma baseline today. Most pharma calls into AI through a workflow gateway like PortKey or Dataiku, which routes to a provider's direct API. Three plaintext tiers: gateway terminates TLS for routing/audit; provider opens a fresh TLS for content filtering; provider's underlying cloud runs inference on standard GPUs with no hardware encryption. Four trust anchors deep — customer, gateway vendor, provider, provider's cloud — with zero in-use encryption.

Bedrock-Mantle — the strongest comparable. Anthropic Claude on AWS Bedrock runs inside Nitro Enclaves with explicit zero-operator-access. Real protection, but: memory protection is software-enforced (Nitro hypervisor), not silicon (AWS has no CPU TEE on any current EC2 type). And there is no GPU memory encryption — model weights, KV cache, and activations live in plaintext during inference.

How ColdVault collapses the choice. ColdVault is the gateway, and it runs inside a Confidential VM. The protection chain starts at our API endpoint, not at one provider's endpoint. Any open-weight model under encryption — on our GPUs in CC mode. Claude under encryption — routed upstream to Bedrock-Mantle from inside the Confidential VM. A model no provider offers under TEE — Partner Mode, explicit user choice, transparent.

LayerPortKey / Dataiku → providerBedrock-MantleColdVault
TLS terminationAt the L7 gatewayAt Bedrock’s API edgeInside the Confidential VM (L4 passthrough)
Customer → gateway / APIGateway terminates TLS, sees plaintextDirect to Bedrock; Mantle starts hereTLS to Confidential VM; encrypted host memory
Routing layerPlaintext at gateway, plaintext to providerSingle-model, no routingRouting inside the Confidential VM
CPU memory at inferencePlaintext on standard cloudSoftware-enforced isolation (Nitro)Silicon-encrypted (AMD SEV-SNP)
GPU memory at inferencePlaintext on standard GPUsPlaintext (AWS lacks CPU TEE)Silicon-encrypted (NVIDIA CC mode)
CPU↔GPU PCIePlaintextPlaintextEncrypted (PPCIE / TEE-I/O)
Operator pathMultiple operators see plaintextNo path by Nitro + Mantle designNo path by silicon
Trust anchors4 deep3 deepCustomer + 9R + GCP + AMD + NVIDIA silicon
Model choiceWhatever the gateway routes toClaude onlyAny open-weight on our GPUs; Claude / GPT / Gemini upstream
Per-request attestationNoneInternal (not exposed)Exposed to client

Per-request attestation — proof, not promise

Customers can verify the silicon themselves. Per-request attestation flow:

  1. Your client generates a 16-byte nonce and sends it with the inference request.
  2. ColdVault returns a signed attestation report (vTPM PCR quote with the nonce embedded), verified by the GCP Confidential Computing API and returned as a Google-signed OIDC JWT.
  3. Your client verifies the JWT against Google's public OIDC keys — confirming the proof is fresh (nonce echo) and the workload runs on real SEV-SNP silicon.

Auditors are welcome to verify. The attestation flow above is the same proof your QP, CSV, or external auditor can run against any production request.

Inference modes & multi-model

Vault Mode — encrypted inference on open models. Prompts, reasoning, and response all stay within encrypted memory. Recommended for GxP workloads and sensitive production data.

Partner Mode — access to proprietary models (Claude, GPT, Gemini, Grok). Data is sent to the model provider for processing — the encryption guarantee does not apply. You choose per request, for use cases where your compliance policies permit.

Multi-model debate. Multiple AI models discuss, disagree, and converge on an answer. The goal is defensive — raising the floor of reliability. On our live benchmark, aggregation catches roughly four errors for every one it introduces (Safety Multiple 3.9× on GPQA Diamond); for 98% of questions, at least one model in the jury answers correctly — the architecture extracts that knowledge.

Purpose-trained models. We can fine-tune on your data for your specific tasks. First iteration of a custom pharma-domain model can be delivered within weeks once training data is available.

Compliance & deployment

ALCOA+, Annex 11, 21 CFR Part 11. Frozen model versions for validated workflows. Validated API endpoints for GxP. Designed against draft Annex 22 — full control over AI in validated workflows, from model weights to underlying infrastructure.

Cloud — hardware-encrypted inference via API. Elastic scale, confidential compute, immediate deployment. The recommended default. Sovereign deployment — we bring our own GPU hardware into your server racks and manage everything end-to-end. Your data never leaves your network. Air-gapped. Purpose-trained models deployed on-site. In both cases: we own the hardware, we operate the stack; both options are designed against draft Annex 22.

See: Why this is durable (when competitors catch up)

The technical catch-up will happen. AWS will eventually add a CPU TEE; Bedrock will eventually expose customer attestation; PortKey or Dataiku will eventually announce a Confidential VM tier. But security is a side feature for them:

  • Anthropic — Claude is the product; security is one feature.
  • AWS — cloud is the product; Nitro is one isolation primitive.
  • PortKey / Dataiku — observability and routing are the product; security is a tier they may add.
  • ColdVault — security is the product. Multi-vendor inference is delivered through the security model.

When a new TEE primitive ships, ColdVault integrates it because it is on the critical path of the product roadmap, not a side feature. Defensible even after competitors catch up on any single feature.

See: Swiss jurisdiction + the honest residual

Swiss jurisdiction. 9robots is Swiss-headquartered. The CLOUD Act doesn't reach us; Switzerland has no equivalent legal regime forcing data disclosure to foreign governments. The cryptography would suffice on its own — silicon does not produce plaintext under any legal regime — but the jurisdiction reinforces what the technology already enforces.

Operational vs cryptographic protections. Operational-security failures happen even at frontier labs — packaging errors, deployment-script bugs, misconfigured access controls. These are caught after the fact. Hardware data-in-use protections cannot be misconfigured by a packaging error or a script change. In-use cryptographic protections close failure modes that operational controls cannot.

The honest residual. The “no plaintext to produce” claim holds against any normal compulsion: subpoena, NSL, court order, CLOUD Act, foreign legal regime. One residual: a nation-state with AMD's master attestation keys plus physical access to the silicon could theoretically extract plaintext from a memory dump. A meaningfully higher bar than typical compelled disclosure; it applies equally to all silicon-encrypted infrastructure, not just ours. We acknowledge it because honesty matters more than a clean marketing claim.

See: Open models that rival proprietary

Encrypted inference runs on open models that rival the best proprietary models, as evaluated by independent benchmarks and our own. The performance gap is narrow — and multi-model debate narrows it further. For pharma workloads where data must stay encrypted end-to-end, open models in Vault Mode meet the bar without compromising on capability.