9robots logo

9robots Heavy LLM

Stage: In development

Full-scale proprietary model for maximum reasoning depth — paired with Flash to recreate ensemble diversity at a fraction of the cost.

What it is

Full-scale proprietary model for maximum reasoning depth. Architecturally distinct from 9robots Flash LLM — two models, two architectures, both fine-tuned on multi-model ensemble outputs so they catch different failure modes.

Why it matters

In regulated and governance contexts, false alarms erode trust as quickly as missed findings. The target is the true-positive-to-false-positive ratio. Our ColdVault Benchmark shows that while models do identify real issues, the ratio of valid findings to false alarms varies significantly between models and tasks. Heavy is engineered to push that ratio as high as possible through specialised fine-tuning.

How it works

Running Flash and Heavy together recreates the diversity of a multi-model panel at a fraction of the cost of a full ensemble. Different architectures catch different failure modes — the same principle that makes multi-model review valuable, distilled into two purpose-built models that can be deployed together or independently.

Where we are today

In development. Requires multi-GPU deployment (H200 cluster). Will be available via the ColdVault API for scenarios demanding the highest reasoning quality. Annex 22 ready.