Technical architecture · v1.0
The engine that runs without self-attention.
Crisol drops the transformer's attention tower and replaces it with high-dimensional holographic algebra combined with a geometric Mixture-of-Experts. Context is held in a fixed-size vector, not a quadratic matrix. The result: constant-cost memory, axiomatic reasoning, and a model that is an organism, not a static function.
The path of a token, end to end.
There is no encoder-attention-decoder. There is a holographic encoder, a stack of NebulaForge layers, and a final projection. Four stages, zero quadratic attention.
Token
Input text tokenized against a 64,000-entry vocabulary.
HoloMemZEncoder
Each discrete token is projected to a holographic vector of holo_dim = 4096 on the unit sphere, structured into subspaces.
N × NebulaForgeLayer
12 layers (Mini). Each layer holds HoloBinder + QSE + GestorExpertos. This is where reasoning happens — without a single self-attention operation.
RMSNorm → Logits
Final normalization and projection back to the vocabulary. The next token emerges from a cognitive state, not from an attention matrix.
Piece 1 · HoloMemZEncoder
A structured holographic space, not a flat embedding.
The encoder turns each token into a vector on the unit sphere of dimension holo_dim = 4096. But that sphere is not undifferentiated: it is split into three functional regions, each with a distinct cognitive purpose.
NAR
2048Axiomatic logical-causal reasoning. The subspace where the QSE routes and where expert signatures live.
NOE
2048Physical-causal invariants of the world — 256 invariants in v1.0. The anchor against which the causal engine validates what it claims.
Free
—Trained domain knowledge. The flexible expressive capacity each expert populates in its own way.
This separation between reasoning (NAR) and world knowledge (NOE) is what lets Crisol route by axioms and validate by causality — without confusing what it believes with what it knows.
Piece 2 · HoloBinder: the end of quadratic cost.
The HoloBinder is the piece that removes attention. Instead of a matrix
that grows with the square of sequence length, it keeps a single
h_ctx vector of 4096 dimensions, updated
token by token by Hadamard product with 0.95 decay.
O(1) vs O(n²)
Context cost, live
Move the control to see how the cost of holding context scales. Transformer attention grows with the square of sequence length; iCrisol's HoloBinder keeps it constant.
at 2,048 tokens → Transformer 16× · iCrisol 1×
At 32,768 tokens a transformer needs ~7.5 GB of KV-cache alone, which dies when the session closes. iCrisol holds the context state in a fixed-size vector (4096): CONSTANT cost, persistent across sessions.
Because h_ctx is organism state and not ephemeral inference, it can be saved, restored, and inherited. The conversation does not die when you close the tab. How memory persists →
Inside a NebulaForgeLayer
Four pieces that replace attention.
Each of the 12 layers in the Crisol Mini combines holographic binding, axiomatic routing, slot governance, and specialized experts. This is the anatomy of a layer.
HoloBinder
Context memory in O(1)
Instead of a quadratic attention matrix, it keeps a single h_ctx vector of 4096 dimensions, updated token by token via Hadamard product with 0.95 decay. It is persistent organism state, not an ephemeral cache: it survives across sessions and is rebuilt from the 64 KB HolographicCore.
QSE
Quantum Specialization Engine
The Mixture-of-Experts router. It learns no arbitrary gating matrix: it routes geometrically by cosine similarity against expert signatures in the 2048-dimensional NAR subspace. It activates exactly n_active = 2 experts per step. Signatures are interoperable: an imported package fits into any Crisol in the ecosystem.
GestorExpertos
Typed slots + GobernadorSlots
Experts do not live in anonymous structures: they live in 5 universal typed slots (cognitive, memory, procedural, imported). The GobernadorSlots is the authority that decides which slot activates, with what permissions, and whether an imported expert is compatible. Explicit, auditable, reproducible technical governance.
ExpertNetwork
Specialized SwiGLU FFN
Each active expert is a feed-forward network with SwiGLU activation and dim_expert = 8192. Three projection matrices (gate, value, output) plus normalization sum to ~100 M parameters per physical sub-expert. The output is projected back onto the unit sphere.
Piece 3 · Training
DHTP: every layer with its own local brain.
Crisol is not trained with a single global optimizer like a transformer. It uses the Distributed Holographic Training Protocol: N+1 independent AdamW optimizers — one global plus one per layer. In the Crisol Mini, that is 13 AdamW.
The effect is local learning: lower layers converge fast on syntax, higher ones slowly on abstract reasoning. And when a new slot activates, its optimizer is born without disturbing the rest.
Piece 4 · AKF-Z
The six-phase cognitive loop.
Where a transformer does forward and backward blindly, Crisol runs a cognitive cycle. The perception, prediction, evaluation, and metacognition phases exist in no transformer: they are the layer that turns a model into an organism.
Perception
NomotheticZ generates a curiosity vector that guides the sampling of the batch.
Prediction
HistorianZ predicts the expected loss before the forward pass — the model's surprise signal.
Processing
Forward pass: logits, final h_ctx, and DHTP predictions.
Evaluation
Loss computation and surprise: how far reality diverged from expectation.
Learning
Backward pass and step of the N+1 AdamW optimizers. Each layer learns at its own pace.
Metacognition
The ecosystem of Z agents logs the step; ArchitectZ decides whether to reorganize, InquisitorZ whether to run a causal simulation.
Follow the thread
An engine without attention needs a different memory — and solves problems the transformer cannot.
The HoloBinder is only the beginning. Persistent living memory and the catalog of structural problems complete the picture.