9robots Heavy LLM
Stage: In developmentFull-scale proprietary model for maximum reasoning depth — paired with Flash to recreate ensemble diversity at a fraction of the cost.
What it is
Full-scale proprietary model for maximum reasoning depth — architecturally distinct from 9robots Flash so the two together recreate ensemble diversity.
Why it matters
Engineered to push the true-positive-to-false-positive ratio as high as possible in regulated contexts where false alarms erode trust as fast as missed findings.
How it works
Different architecture from Flash. Running Flash and Heavy together restores the diversity of a multi-model panel at a fraction of full-ensemble cost.
Where we are today
In development. Requires multi-GPU deployment (H200 cluster). Available via ColdVault API for scenarios demanding the highest reasoning quality.
What it is
Full-scale proprietary model for maximum reasoning depth. Architecturally distinct from 9robots Flash LLM — two models, two architectures, both fine-tuned on multi-model ensemble outputs so they catch different failure modes.
Why it matters
In regulated and governance contexts, false alarms erode trust as quickly as missed findings. The target is the true-positive-to-false-positive ratio. Our ColdVault Benchmark shows that while models do identify real issues, the ratio of valid findings to false alarms varies significantly between models and tasks. Heavy is engineered to push that ratio as high as possible through specialised fine-tuning.
How it works
Running Flash and Heavy together recreates the diversity of a multi-model panel at a fraction of the cost of a full ensemble. Different architectures catch different failure modes — the same principle that makes multi-model review valuable, distilled into two purpose-built models that can be deployed together or independently.
Where we are today
In development. Requires multi-GPU deployment (H200 cluster). Will be available via the ColdVault API for scenarios demanding the highest reasoning quality. Annex 22 ready.
