Embedding
Input Token IDs
Token Embedding
BigramHash
SmearGate
Normalization
RMSNorm
Final RMSNorm
Attention
Causal Self-Attention
MLP / FFN
MLP (relu²)
Structural
Encoder Blocks
Residual Mix
U-Net Skip Connections
Decoder Blocks
Output
Output Projection
Logit Softcap
Cross-Entropy Loss
Training
Muon Optimizer
Quantization
Stochastic Weight Averaging
Sliding Window Eval
LoRA TTT
Click a node in the diagram
or sidebar to explore it
or sidebar to explore it