The milestone · The Forge
Training completedThe Forge.
The first iCrisol Mini has been trained from scratch. No cluster of thousands of GPUs, no trillions of tokens, no army of researchers. One person, one machine, 24 hours and ten cents of electricity. This is the open, honest record of that milestone — and of what it means that, with so little, a cognitive organism has come to breathe.
The magnitude is in the disproportion.
The big models are born from hundred-million budgets. iCrisol is born from constraint — and turns it into a thesis: if this holds with so little, the paradigm matters.
Design, architecture, corpus, training and product — one developer.
vs Hundreds of researchers at the big labs.
One NVIDIA DGX Spark GB10 (128 GB unified memory).
vs Tens of thousands of GPUs in dedicated clusters.
Just 0.87% of one epoch over an 18.83 B-token corpus.
vs Trillions of tokens and full epochs.
A single from-scratch run, 5,000 steps, on desktop-class hardware.
vs Months of massive parallel compute.
51.8 W average · 1.27 kWh · at €0.08/kWh.
vs Power bills the size of a city.
Training, in data.
The real loss curve and the perplexity reached — from chance to the model. Ups and downs included: this is how an organism learns from scratch.
✓ Real data · forja_mini_5000 run · 5,000 steps · 24.4 h · 2026
Loss curve (cross-entropy)
CE throughout training — with its real ups and downs, unretouched
Perplexity: from chance to the model
Lower is better · logarithmic scale
Reasoning wakes up on its own
NAR geometry (lower = aligning) · Causal confidence CAG (rising)
The data chasm
Training tokens — logarithmic scale
The astonishing part isn't what it knows. It's how little it learned from.
Each Crisol expert weighs ~105 million parameters — the size of a GPT-2. For a model like that, theory (the Chinchilla law) asks for some 42.6 billion tokens; the industry trains models of that size with hundreds of billions, even trillions. The first Crisol saw 164 million. Less than 1% of a single pass through its library. And still, it breathes.
The size of a GPT-2 (124 M). Crisol has 12, one per layer — and the knowledge is distilled into each, not diluted across a colossus.
The Chinchilla law recommends ~42.6 B tokens for 2.13 B parameters. The model saw 164 M: 0.38%.
A model the size of an expert (≈125 M) is trained today with hundreds of billions of tokens. Ours, with 164 M — between 600 and 1,800 times less.
* Open models of comparable size (2-3 B parameters) are trained today with between 2 and 18 trillion tokens. The "Industry" bar in the chart uses a conservative figure (2 B).
Run spec sheet
The Forge configuration.
Why this milestone matters
If a sovereign cognitive organism can be born with this, it stops being a lab promise and becomes a real possibility.
The first iCrisol doesn't compete on scale. It proves the paradigm — living memory, causality, sovereignty, modularity — works from the very first brick. The rest is growth.
Read the manifesto