Parameter Golf Architecture Explorer



Embedding
Normalization
Attention
MLP / FFN
Structural
Output
Training
Encoder Blocks (layers 0–3 (baseline))Decoder Blocks (layers 4–8 (baseline))Input Token IDsToken Embedding1024 × 512, tiedBigramHashadvancedSmearGateadvancedRMSNormpost-embeddingResidual Mixper blockCausal Self-AttentionGQA, RoPE, softcapMLP (relu²)2x expand (baseline)U-Net Skip Connectionsencoder → decoderResidual Mixper blockCausal Self-AttentionGQA, RoPE, softcapMLP (relu²)2x expand (baseline)Final RMSNormOutput Projectiontied embeddingLogit Softcaptanh(·30)Cross-Entropy LossMuon Optimizer+ Adam for scalarsQuantizationint8 + zlib (baseline)Stochastic Weight AveragingadvancedSliding Window EvaladvancedLoRA TTTtest-time training
Click a node in the diagram
or sidebar to explore it