9robots logo

9robots Flash LLM

Stage: In development

Lightweight proprietary model for multi-perspective reasoning — breadth of an ensemble at the cost of a single GPU.

What it is

A lightweight proprietary model fine-tuned on outputs from multiple open-weight architectures. Captures the union of their perspectives in a single, efficient model that runs on a single GPU. The point isn't to be the biggest model — it's to carry the breadth of an ensemble at the latency and cost of one.

Why it matters

Our ColdVault Benchmark shows every model contributes unique valuable perspectives — but Solo scores stay low (1–7%), meaning most findings overlap across models. The cost-per-token of running an ensemble is real; the marginal new insight is small. Flash compresses the breadth into one model so the marginal cost of that breadth approaches zero.

How it works

Training approach: all open-weight models generate responses independently; a multi-model evaluation framework selects the best combined output; the base model is fine-tuned on those "best-of-ensemble" outputs — including the reasoning chains that produced them. The fine-tuned model learns not just the answers but the way the ensemble arrived at them.

Where we are today

In development. Available via the ColdVault API when released. Designed for high-throughput scenarios where speed matters but multi-perspective reasoning is still required. Annex 22 ready by architecture.