Saltar al contenido
iCrisol

The milestone · The Forge

Training completed

The Forge.

The first iCrisol Mini has been trained from scratch. No cluster of thousands of GPUs, no trillions of tokens, no army of researchers. One person, one machine, 24 hours and ten cents of electricity. This is the open, honest record of that milestone — and of what it means that, with so little, a cognitive organism has come to breathe.

The magnitude is in the disproportion.

The big models are born from hundred-million budgets. iCrisol is born from constraint — and turns it into a thesis: if this holds with so little, the paradigm matters.

1
single person

Design, architecture, corpus, training and product — one developer.

vs Hundreds of researchers at the big labs.

1
single machine

One NVIDIA DGX Spark GB10 (128 GB unified memory).

vs Tens of thousands of GPUs in dedicated clusters.

164 M
tokens seen

Just 0.87% of one epoch over an 18.83 B-token corpus.

vs Trillions of tokens and full epochs.

24 h
of training

A single from-scratch run, 5,000 steps, on desktop-class hardware.

vs Months of massive parallel compute.

€0.10
of electricity

51.8 W average · 1.27 kWh · at €0.08/kWh.

vs Power bills the size of a city.

Training, in data.

The real loss curve and the perplexity reached — from chance to the model. Ups and downs included: this is how an organism learns from scratch.

✓ Real data · forja_mini_5000 run · 5,000 steps · 24.4 h · 2026

Loss curve (cross-entropy)

CE throughout training — with its real ups and downs, unretouched

Perplexity: from chance to the model

Lower is better · logarithmic scale

Reasoning wakes up on its own

NAR geometry (lower = aligning) · Causal confidence CAG (rising)

The data chasm

Training tokens — logarithmic scale

The astonishing part isn't what it knows. It's how little it learned from.

Each Crisol expert weighs ~105 million parameters — the size of a GPT-2. For a model like that, theory (the Chinchilla law) asks for some 42.6 billion tokens; the industry trains models of that size with hundreds of billions, even trillions. The first Crisol saw 164 million. Less than 1% of a single pass through its library. And still, it breathes.

105 M
parameters per expert

The size of a GPT-2 (124 M). Crisol has 12, one per layer — and the knowledge is distilled into each, not diluted across a colossus.

1 / 260
of what theory asked for

The Chinchilla law recommends ~42.6 B tokens for 2.13 B parameters. The model saw 164 M: 0.38%.

100–300 B
tokens the industry uses

A model the size of an expert (≈125 M) is trained today with hundreds of billions of tokens. Ours, with 164 M — between 600 and 1,800 times less.

* Open models of comparable size (2-3 B parameters) are trained today with between 2 and 18 trillion tokens. The "Industry" bar in the chart uses a conservative figure (2 B).

Run spec sheet

The Forge configuration.

Parameters ~2.13 B (one expert per layer)
Architecture 12 layers · 5 universal slots · 1 active
Holographic space holo 4096 · NAR 2048 · NOE 2048
Expert (SwiGLU) dim 8192 · 1 per layer
Steps 5,000 · from scratch
Context seq_len 1024 · effective batch 32
Learning rate 1e-4 → 5e-6 · warmup 100 · cosine
Vocabulary 64,000 (multilingual BPE)
Precision bfloat16
Hardware DGX Spark GB10 (ARM64, 128 GB)
Tokens seen 163.84 M · 0.0087 epochs
Forge time 24.4 h · 51.8 W average
Electricity cost ~€0.10 (1.27 kWh at €0.08/kWh)
Best / final CE 3.528 (PPL ~34) / 7.43 — with ups and downs, unretouched

Why this milestone matters

If a sovereign cognitive organism can be born with this, it stops being a lab promise and becomes a real possibility.

The first iCrisol doesn't compete on scale. It proves the paradigm — living memory, causality, sovereignty, modularity — works from the very first brick. The rest is growth.

Read the manifesto